[jira] [Resolved] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types
[ https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31229. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27990 [https://github.com/apache/spark/pull/27990] > Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> > complex types > --- > > Key: SPARK-31229 > URL: https://issues.apache.org/jira/browse/SPARK-31229 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.0.0 > > > This JIRA targets to describe to add the unittests. This is rather a followup > work at SPARK-31166; however, this JIRA targets to include the tests with > struct type and array types against null types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types
[ https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-31229: Assignee: Hyukjin Kwon > Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> > complex types > --- > > Key: SPARK-31229 > URL: https://issues.apache.org/jira/browse/SPARK-31229 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > This JIRA targets to describe to add the unittests. This is rather a followup > work at SPARK-31166; however, this JIRA targets to include the tests with > struct type and array types against null types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31205) support string literal as the second argument of date_add/date_sub functions
[ https://issues.apache.org/jira/browse/SPARK-31205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31205. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27965 [https://github.com/apache/spark/pull/27965] > support string literal as the second argument of date_add/date_sub functions > > > Key: SPARK-31205 > URL: https://issues.apache.org/jira/browse/SPARK-31205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)
[ https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065283#comment-17065283 ] Hyukjin Kwon commented on SPARK-31209: -- Please go ahead and upgrade if you're interested in. > Not compatible with new version of scalatest (3.1.0 and above) > -- > > Key: SPARK-31209 > URL: https://issues.apache.org/jira/browse/SPARK-31209 > Project: Spark > Issue Type: Dependency upgrade > Components: Tests >Affects Versions: 3.1.0 >Reporter: Timothy Zhang >Priority: Major > > Since ScalaTest's style traits and classes were moved and renamed > ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find > FunSpec when I add new version of scalatest in library dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065239#comment-17065239 ] jiaan.geng commented on SPARK-31210: Yes, spark 3.0.0-preview2 not contains this fix. > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31223) Update py code to generate data in testsuites
[ https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-31223: --- Summary: Update py code to generate data in testsuites (was: Update py code to generate dates in testsuites) > Update py code to generate data in testsuites > - > > Key: SPARK-31223 > URL: https://issues.apache.org/jira/browse/SPARK-31223 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Trivial > > in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...: > > can not regenerate the test datasets with given py code (like > {color:#676773}X = np.random.rand(20, 6){color}), so: > 1, directly create X like : X = np.array(...); > 2, or, set a seed at first; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27963) Allow dynamic allocation without an external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27963: -- Labels: release-notes (was: ) > Allow dynamic allocation without an external shuffle service > > > Key: SPARK-27963 > URL: https://issues.apache.org/jira/browse/SPARK-27963 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > It would be useful for users to be able to enable dynamic allocation without > the need to provision an external shuffle service. One immediate use case is > the ability to use dynamic allocation on Kubernetes, which doesn't yet have > that service. > This has been suggested before (e.g. > https://github.com/apache/spark/pull/24083, which was attached to the > k8s-specific SPARK-24432), and can actually be done without affecting the > internals of the Spark scheduler (aside from the dynamic allocation code). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31203) Upgrade derby to 10.14.2.0 from 10.12.1.1
[ https://issues.apache.org/jira/browse/SPARK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31203. --- Resolution: Later > Upgrade derby to 10.14.2.0 from 10.12.1.1 > - > > Key: SPARK-31203 > URL: https://issues.apache.org/jira/browse/SPARK-31203 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.5, 3.0.0, 3.1.0 > Environment: This jira is to upgrade derby version to 10.14.2.0 from > 10.12.1.1 > The upgrade is due to an ALREADY DISCLOSED VULNERABILITY (CVE-2018-1313 ) >Reporter: Udbhav Agrawal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064942#comment-17064942 ] Mingli Rui edited comment on SPARK-31210 at 3/23/20, 4:53 PM: -- Thanks for your investigation! I agree this issue is duplicated to SPARK-30254. The fix for SPARK-30254 is done on Dec 13 2019. I am using spark 3.0.0-preview2 which is released on Dec 17 2019. Could you please confirm whether this fix is included in spark 3.0.0-preview2 or not? Thanks a lot! was (Author: minglirui): Thanks for you investigation! I agree this issue is duplicated to SPARK-30254. The fix for SPARK-30254 is Dec 13 2019. I am using spark 3.0.0-preview2 which is released on Dec 17 2019. Could you please confirm whether this fix is included in spark 3.0.0-preview2 or not? Thanks a lot! > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause
[ https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064942#comment-17064942 ] Mingli Rui commented on SPARK-31210: Thanks for you investigation! I agree this issue is duplicated to SPARK-30254. The fix for SPARK-30254 is Dec 13 2019. I am using spark 3.0.0-preview2 which is released on Dec 17 2019. Could you please confirm whether this fix is included in spark 3.0.0-preview2 or not? Thanks a lot! > An issue for Spark SQL LIKE-with-ESCAPE clause > -- > > Key: SPARK-31210 > URL: https://issues.apache.org/jira/browse/SPARK-31210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mingli Rui >Priority: Major > > I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it > doesn't work in below cases. > The database table > == > create or replace table test_table_like ( subject string) > insert into $test_table_like values ('100 times'), ('1000 times'), ('100%') > > Repro > > val result2 = sparkSession.sql( > s"select * from test_table_like where subject like '100^%' escape '^' order > by 1") > "100%" is expected to returned, but it doesn't. I debug into the code to > check the logical plan. > In the logical plan, the LIKE is transformed as "StartsWith(subject#130, > 100^)". It looks it is incorrect. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808 ] leishuiyu edited comment on SPARK-31191 at 3/23/20, 1:41 PM: - hello [~yumwang] when edit conf/spark-defaults.conf ,this error still happen {code:java} // # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # #http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ## Default system properties included when running spark-submit. # This is useful for setting default environmental settings. spark.sql.hive.metastore.version 2.3.0 # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" {code} was (Author: leishuiyu): when edit conf/spark-defaults.conf ,this error still happen {code:java} // # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # #http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ## Default system properties included when running spark-submit. # This is useful for setting default environmental settings. spark.sql.hive.metastore.version 2.3.0 # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" {code} > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Comment Edited] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808 ] leishuiyu edited comment on SPARK-31191 at 3/23/20, 1:40 PM: - when edit conf/spark-defaults.conf ,this error still happen {code:java} // # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # #http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ## Default system properties included when running spark-submit. # This is useful for setting default environmental settings. spark.sql.hive.metastore.version 2.3.0 # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" {code} was (Author: leishuiyu): when edit conf/spark-defaults.conf ,this error still happens !image-2020-03-23-21-37-17-663.png! > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does > not match metastore's schema version 2.3.0 Metastore is not upgraded or > corrupt) at > org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679) > at > org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) > at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at >
[jira] [Commented] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808 ] leishuiyu commented on SPARK-31191: --- when edit conf/spark-defaults.conf ,this error still happens !image-2020-03-23-21-37-17-663.png! > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does > not match metastore's schema version 2.3.0 Metastore is not upgraded or > corrupt) at > org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679) > at > org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) > at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > ... 23 more > {code} > h3. 2.Find the reason > query the source code, in spark jars directory have > hive-metastore-1.2.1.spark2.jar > the 1.2.1 version match 1.2.0 ,so generate the exception > > > {code:java} > //代码占位符 > private static final Map EQUIVALENT_VERSIONS = > ImmutableMap.of("0.13.1", "0.13.0", > "1.0.0", "0.14.0", > "1.0.1", "1.0.0", > "1.1.1", "1.1.0", > "1.2.1", "1.2.0" > ); > {code} > > h3. 3.Is there any solution to this problem > can edit hive-site.xml hive.metastore.schema.verification set true,but > new problems may arise > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leishuiyu updated SPARK-31191: -- Attachment: image-2020-03-23-21-37-17-663.png > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does > not match metastore's schema version 2.3.0 Metastore is not upgraded or > corrupt) at > org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679) > at > org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) > at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > ... 23 more > {code} > h3. 2.Find the reason > query the source code, in spark jars directory have > hive-metastore-1.2.1.spark2.jar > the 1.2.1 version match 1.2.0 ,so generate the exception > > > {code:java} > //代码占位符 > private static final Map EQUIVALENT_VERSIONS = > ImmutableMap.of("0.13.1", "0.13.0", > "1.0.0", "0.14.0", > "1.0.1", "1.0.0", > "1.1.1", "1.1.0", > "1.2.1", "1.2.0" > ); > {code} > > h3. 3.Is there any solution to this problem > can edit hive-site.xml hive.metastore.schema.verification set true,but > new problems may arise > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31230) use statement plans in DataFrameWriter(V2)
Wenchen Fan created SPARK-31230: --- Summary: use statement plans in DataFrameWriter(V2) Key: SPARK-31230 URL: https://issues.apache.org/jira/browse/SPARK-31230 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types
Hyukjin Kwon created SPARK-31229: Summary: Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types Key: SPARK-31229 URL: https://issues.apache.org/jira/browse/SPARK-31229 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Hyukjin Kwon This JIRA targets to describe to add the unittests. This is rather a followup work at SPARK-31166; however, this JIRA targets to include the tests with struct type and array types against null types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types
[ https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31229: - Priority: Minor (was: Major) > Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> > complex types > --- > > Key: SPARK-31229 > URL: https://issues.apache.org/jira/browse/SPARK-31229 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > This JIRA targets to describe to add the unittests. This is rather a followup > work at SPARK-31166; however, this JIRA targets to include the tests with > struct type and array types against null types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31228) Add version information to the configuration of Kafka
jiaan.geng created SPARK-31228: -- Summary: Add version information to the configuration of Kafka Key: SPARK-31228 URL: https://issues.apache.org/jira/browse/SPARK-31228 Project: Spark Issue Type: Sub-task Components: DStreams Affects Versions: 3.1.0 Reporter: jiaan.geng external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31227) Non-nullable null type should not coerce to nullable type
Hyukjin Kwon created SPARK-31227: Summary: Non-nullable null type should not coerce to nullable type Key: SPARK-31227 URL: https://issues.apache.org/jira/browse/SPARK-31227 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hyukjin Kwon {code} scala> spark.range(10).selectExpr("array()").printSchema() root |-- array(): array (nullable = false) ||-- element: null (containsNull = false) scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema() root |-- arr: array (nullable = false) ||-- element: null (containsNull = false) scala> spark.range(10).selectExpr("concat(array(), array(1)) as arr").printSchema() root |-- arr: array (nullable = false) ||-- element: integer (containsNull = true) {code} The last case should not coerce to nullable type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31227) Non-nullable null type should not coerce to nullable type
[ https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31227: - Affects Version/s: (was: 3.0.0) 3.1.0 > Non-nullable null type should not coerce to nullable type > - > > Key: SPARK-31227 > URL: https://issues.apache.org/jira/browse/SPARK-31227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > scala> spark.range(10).selectExpr("array()").printSchema() > root > |-- array(): array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array(), array(1)) as > arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: integer (containsNull = true) > {code} > The last case should not coerce to nullable type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-31194) spark sql runs successfully with query not specifying condition next to where
[ https://issues.apache.org/jira/browse/SPARK-31194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayoub Omari closed SPARK-31194. --- Where works as an alias to tableName in this case > spark sql runs successfully with query not specifying condition next to where > -- > > Key: SPARK-31194 > URL: https://issues.apache.org/jira/browse/SPARK-31194 > Project: Spark > Issue Type: Story > Components: SQL >Affects Versions: 2.4.5 >Reporter: Ayoub Omari >Priority: Major > > When having a sql query as follows: > {color:#00875a}_SELECT *_{color} > {color:#00875a}_FROM people_{color} > {color:#00875a}_WHERE_{color} > shouldn't we throw a parsing exception because of __unspecified _condition_ > _?_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31194) spark sql runs successfully with query not specifying condition next to where
[ https://issues.apache.org/jira/browse/SPARK-31194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064692#comment-17064692 ] Ayoub Omari edited comment on SPARK-31194 at 3/23/20, 10:28 AM: Oh ok ! Closing as Where works as an alias to tableName in this case was (Author: sarmon): Where works as an alias to tableName in this case > spark sql runs successfully with query not specifying condition next to where > -- > > Key: SPARK-31194 > URL: https://issues.apache.org/jira/browse/SPARK-31194 > Project: Spark > Issue Type: Story > Components: SQL >Affects Versions: 2.4.5 >Reporter: Ayoub Omari >Priority: Major > > When having a sql query as follows: > {color:#00875a}_SELECT *_{color} > {color:#00875a}_FROM people_{color} > {color:#00875a}_WHERE_{color} > shouldn't we throw a parsing exception because of __unspecified _condition_ > _?_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31219) YarnShuffleService doesn't close idle netty channel
[ https://issues.apache.org/jira/browse/SPARK-31219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated SPARK-31219: --- Description: Recently, we find our YarnShuffleService has a lot of [half-open connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] where shuffle servers' connections are active while clients have already closed. For example, from server's `ss -nt sport = :7337` output we have {code:java} ESTAB 0 0 server:7337 client:port {code} However, on client `ss -nt dport =: 7337 | grep server` would return nothing. Looking at the code, `YarnShuffleService` creates a `TransportContext` with `closeIdleConnections` set to false. {code:java} public class YarnShuffleService extends AuxiliaryService { ... @Override protected void serviceInit(Configuration conf) throws Exception { ... transportContext = new TransportContext(transportConf, blockHandler); ... } ... } public class TransportContext implements Closeable { ... public TransportContext(TransportConf conf, RpcHandler rpcHandler) { this(conf, rpcHandler, false, false); } public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) { this(conf, rpcHandler, closeIdleConnections, false); } ... }{code} Hence, it's possible the channel may never get closed at server side if the server misses the event that the client has closed it. I find that parameter is true for `ExternalShuffleService`. Is there any reason for the difference here ? Can we enable closeIdleConnections in YarnShuffleService or at least add a configuration to enable it ? was: Recently, we find our YarnShuffleService has a lot of [half-open connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] where shuffle servers' connections are active while clients have already closed. For example, from server's `ss -nt sport = :7337` output we have {code:java} ESTAB 0 0 server:7337 client:port {code} However, on client `ss -nt dport =: 7337 | grep server` would return nothing. Looking at the code, `YarnShuffleService` creates a `TransportContext` with `closeIdleConnections` set to false. {code:java} public class YarnShuffleService extends AuxiliaryService { ... @Override protected void serviceInit(Configuration conf) throws Exception { ... transportContext = new TransportContext(transportConf, blockHandler); ... } ... } public class TransportContext implements Closeable { ... public TransportContext(TransportConf conf, RpcHandler rpcHandler) { this(conf, rpcHandler, false, false); } public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) { this(conf, rpcHandler, closeIdleConnections, false); } ... }{code} Hence, it's possible the channel may never get closed at server side if the server misses the event that the client has closed it. I find that parameter is true for `ExternalShuffleService`. Is there any reason for the difference here ? Will it be valuable to add a configuration to allow enabling closeIdleConnections ? > YarnShuffleService doesn't close idle netty channel > --- > > Key: SPARK-31219 > URL: https://issues.apache.org/jira/browse/SPARK-31219 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.5, 3.0.0 >Reporter: Manu Zhang >Priority: Major > > Recently, we find our YarnShuffleService has a lot of [half-open > connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] > where shuffle servers' connections are active while clients have already > closed. > For example, from server's `ss -nt sport = :7337` output we have > {code:java} > ESTAB 0 0 server:7337 client:port > {code} > However, on client `ss -nt dport =: 7337 | grep server` would return nothing. > Looking at the code, `YarnShuffleService` creates a `TransportContext` with > `closeIdleConnections` set to false. > {code:java} > public class YarnShuffleService extends AuxiliaryService { > ... > @Override protected void serviceInit(Configuration conf) throws Exception > { > ... > transportContext = new TransportContext(transportConf, blockHandler); > ... > } > ... > } > public class TransportContext implements Closeable { > ... > public TransportContext(TransportConf conf, RpcHandler rpcHandler) { > this(conf, rpcHandler, false, false); > } > public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean > closeIdleConnections) { > this(conf, rpcHandler, closeIdleConnections, false); > } > ... > }{code} > Hence, it's possible the channel may never get closed at
[jira] [Created] (SPARK-31226) SizeBasedCoalesce logic error
angerszhu created SPARK-31226: - Summary: SizeBasedCoalesce logic error Key: SPARK-31226 URL: https://issues.apache.org/jira/browse/SPARK-31226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 3.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31226) SizeBasedCoalesce logic error
[ https://issues.apache.org/jira/browse/SPARK-31226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-31226: -- Description: In spark UT, SizeBasedCoalecse's logic is wrong > SizeBasedCoalesce logic error > - > > Key: SPARK-31226 > URL: https://issues.apache.org/jira/browse/SPARK-31226 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: angerszhu >Priority: Minor > > In spark UT, > SizeBasedCoalecse's logic is wrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31190) ScalaReflection should not erasure user defined AnyVal type
[ https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31190: Summary: ScalaReflection should not erasure user defined AnyVal type (was: ScalaReflection should erasure non user defined AnyVal type) > ScalaReflection should not erasure user defined AnyVal type > --- > > Key: SPARK-31190 > URL: https://issues.apache.org/jira/browse/SPARK-31190 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > We should only not do erasure for non user defined AnyVal type, but still do > erasure for other types, e.g. Any, which could give better error message for > end user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31190) ScalaReflection should erasure non user defined AnyVal type
[ https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31190. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27959 [https://github.com/apache/spark/pull/27959] > ScalaReflection should erasure non user defined AnyVal type > --- > > Key: SPARK-31190 > URL: https://issues.apache.org/jira/browse/SPARK-31190 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > We should only not do erasure for non user defined AnyVal type, but still do > erasure for other types, e.g. Any, which could give better error message for > end user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31190) ScalaReflection should erasure non user defined AnyVal type
[ https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31190: --- Assignee: wuyi > ScalaReflection should erasure non user defined AnyVal type > --- > > Key: SPARK-31190 > URL: https://issues.apache.org/jira/browse/SPARK-31190 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > We should only not do erasure for non user defined AnyVal type, but still do > erasure for other types, e.g. Any, which could give better error message for > end user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31220) repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled
[ https://issues.apache.org/jira/browse/SPARK-31220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-31220: Summary: repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled (was: distribute by obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled) > repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum > when spark.sql.adaptive.enabled > --- > > Key: SPARK-31220 > URL: https://issues.apache.org/jira/browse/SPARK-31220 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("CREATE TABLE spark_31220(id int)") > spark.sql("set > spark.sql.adaptive.coalescePartitions.initialPartitionNum=1000") > spark.sql("set spark.sql.adaptive.enabled=true") > {code} > {noformat} > scala> spark.sql("SELECT id from spark_31220 GROUP BY id").explain > == Physical Plan == > AdaptiveSparkPlan(isFinalPlan=false) > +- HashAggregate(keys=[id#5], functions=[]) >+- Exchange hashpartitioning(id#5, 1000), true, [id=#171] > +- HashAggregate(keys=[id#5], functions=[]) > +- FileScan parquet default.spark_31220[id#5] Batched: true, > DataFilters: [], Format: Parquet, Location: > InMemoryFileIndex[file:/root/opensource/apache-spark/spark-warehouse/spark_31220], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > scala> spark.sql("SELECT id from spark_31220 DISTRIBUTE BY id").explain > == Physical Plan == > AdaptiveSparkPlan(isFinalPlan=false) > +- Exchange hashpartitioning(id#5, 200), false, [id=#179] >+- FileScan parquet default.spark_31220[id#5] Batched: true, DataFilters: > [], Format: Parquet, Location: > InMemoryFileIndex[file:/root/opensource/apache-spark/spark-warehouse/spark_31220], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31225) Override `sql` method for OuterReference
Kent Yao created SPARK-31225: Summary: Override `sql` method for OuterReference Key: SPARK-31225 URL: https://issues.apache.org/jira/browse/SPARK-31225 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Kent Yao OuterReference is LeafExpression, so it's children is Nil, which makes its SQL representation always be outer(). This makes our explain-command and error msg unclear when OuterReference exists -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31211. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27974 [https://github.com/apache/spark/pull/27974] > Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5 > --- > > Key: SPARK-31211 > URL: https://issues.apache.org/jira/browse/SPARK-31211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for > instance 1000-02-29: > {code} > $ export TZ="America/Los_Angeles" > {code} > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> > df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap") > scala> val df = > Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) > df: org.apache.spark.sql.DataFrame = [date: date] > scala> df.show > +--+ > | date| > +--+ > |1000-02-29| > +--+ > scala> > df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap") > {code} > Load the parquet files back by Spark 3.1.0-SNAPSHOT: > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_231) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show > +--+ > | date| > +--+ > |1000-03-06| > +--+ > scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true) > scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show > 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a > leap year > at java.time.LocalDate.create(LocalDate.java:429) > at java.time.LocalDate.of(LocalDate.java:269) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31211: --- Assignee: Maxim Gekk > Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5 > --- > > Key: SPARK-31211 > URL: https://issues.apache.org/jira/browse/SPARK-31211 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for > instance 1000-02-29: > {code} > $ export TZ="America/Los_Angeles" > {code} > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> > df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap") > scala> val df = > Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) > df: org.apache.spark.sql.DataFrame = [date: date] > scala> df.show > +--+ > | date| > +--+ > |1000-02-29| > +--+ > scala> > df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap") > {code} > Load the parquet files back by Spark 3.1.0-SNAPSHOT: > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_231) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show > +--+ > | date| > +--+ > |1000-03-06| > +--+ > scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true) > scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show > 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a > leap year > at java.time.LocalDate.create(LocalDate.java:429) > at java.time.LocalDate.of(LocalDate.java:269) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31224) Support views in both SHOW CREATE TABLE and SHOW CREATE TABLE AS SERDE
L. C. Hsieh created SPARK-31224: --- Summary: Support views in both SHOW CREATE TABLE and SHOW CREATE TABLE AS SERDE Key: SPARK-31224 URL: https://issues.apache.org/jira/browse/SPARK-31224 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh For now {{SHOW CREATE TABLE}} command doesn't support views, but {{SHOW CREATE TABLE AS SERDE}} supports it. Since the views syntax are the same between Hive DDL and Spark DDL, we should be able to support views in both two commands. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31173) Spark Kubernetes add tolerations and nodeName support
[ https://issues.apache.org/jira/browse/SPARK-31173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064563#comment-17064563 ] zhongwei liu commented on SPARK-31173: -- [~seedjeffwan] The first one is the key point. > Spark Kubernetes add tolerations and nodeName support > - > > Key: SPARK-31173 > URL: https://issues.apache.org/jira/browse/SPARK-31173 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0, 2.4.6 > Environment: Alibaba Cloud ACK with spark > operator(v1beta2-1.1.0-2.4.5) and spark(2.4.5) >Reporter: zhongwei liu >Priority: Trivial > Labels: features > Original Estimate: 72h > Remaining Estimate: 72h > > When you run spark on serverless kubernetes cluster(virtual-kubelet). you > need to specific the nodeSelectors,tolerations even nodeName when you want to > gain better scheduling performance. Currently spark doesn't support > tolerations. If you want to use this feature, You must use admission > controller webhook to decorate the pod. But the performance is extremely bad. > Here is the benchmark. > With webhook > Batch Size: 500 Pod creation: about 7 Pods/s All Pods running: 5min > Without webhook > Batch Size: 500 Pod creation: more than 500 Pods/s All Pods running: 45s > Adding tolerations and nodeName in spark will bring great help when you want > to run a large scale job on serverless kubernetes cluster. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)
[ https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064561#comment-17064561 ] Timothy Zhang commented on SPARK-31209: --- Yes, I think so. It is better to upgrade all classes extended from FunSuite, FlatSpec, etc.. of scalatest. > Not compatible with new version of scalatest (3.1.0 and above) > -- > > Key: SPARK-31209 > URL: https://issues.apache.org/jira/browse/SPARK-31209 > Project: Spark > Issue Type: Dependency upgrade > Components: Tests >Affects Versions: 3.1.0 >Reporter: Timothy Zhang >Priority: Major > > Since ScalaTest's style traits and classes were moved and renamed > ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find > FunSpec when I add new version of scalatest in library dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31132) Optimized Logical Plan cast('' as timestamp) is null
[ https://issues.apache.org/jira/browse/SPARK-31132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31132. -- Resolution: Incomplete 2.3.x is EOLed. Please try this in higher versions and also show reproducible steps. > Optimized Logical Plan cast('' as timestamp) is null > - > > Key: SPARK-31132 > URL: https://issues.apache.org/jira/browse/SPARK-31132 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: duyf >Priority: Major > Attachments: image-2020-03-12-18-34-22-571.png > > > The type of receive_time is timestamp,The value is '2020-02-10' > SQL: > select case when receive_time<>'' > then 1 else 0 end as tt > from xxx.xxx > where order_id=1234; > The output error is 0; > Because cast( as timestamp) is null, receive_time<>'' is false; > > !image-2020-03-12-18-34-22-571.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064557#comment-17064557 ] Hyukjin Kwon commented on SPARK-31140: -- Seems like very simply able to work around. Also given that RDD API is almost freeze now, I think it's not worthwhile adding it. > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must spend too much time reading it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31140. -- Resolution: Won't Fix > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must spend too much time reading it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31152) Issue with Spark Context Initialization i.e. SparkSession.builder.getOrCreate()
[ https://issues.apache.org/jira/browse/SPARK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064556#comment-17064556 ] Hyukjin Kwon commented on SPARK-31152: -- That error isn't from Spark - seems like nothing suggests it's the issue in Spark. > Issue with Spark Context Initialization i.e. > SparkSession.builder.getOrCreate() > > > Key: SPARK-31152 > URL: https://issues.apache.org/jira/browse/SPARK-31152 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5, 3.0.0 > Environment: OS : Windows 10 > Spark : spark-3.0.0-preview2-bin-hadoop2.7 > > Env_variables : > SPARK_HOME : > C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7 > HADOOP_HOME : > C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7 > JAVA_HOME : C:\Program Files\Java\jdk1.8.0_191 ; C:\Program > Files\Java\jre1.8.0_241\bin > PYTHON VERSION : Python 3.7.1 > ANACONDA VERSION : conda 4.8.2 > > I am not running this pyspark code locally (no hadoop setup)- to develop on > NLP script. > >Reporter: Rohith Bhattaram >Priority: Major > Original Estimate: 20h > Remaining Estimate: 20h > > Issue : > I am trying to initialize spark context using below code : > > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > import pandas as pd > sc = spark.sparkContext > > At getOrCreate() step , code is going to infinite loop and never respond back > either with exception or timeout. > > On execution if i check on VS Terminal or Anaconda cmd prompot - below > statement is getting displayed : > [I 19:13:47.973 NotebookApp] Saving file at /LearnPython/Assignment3/NLP.ipynb > *The filename, directory name, or volume label syntax is incorrect.* > > > Note : This used to work fine till a month back - not sure what chnaged, it > stopped working from yesterday. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31152) Issue with Spark Context Initialization i.e. SparkSession.builder.getOrCreate()
[ https://issues.apache.org/jira/browse/SPARK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31152. -- Resolution: Invalid > Issue with Spark Context Initialization i.e. > SparkSession.builder.getOrCreate() > > > Key: SPARK-31152 > URL: https://issues.apache.org/jira/browse/SPARK-31152 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5, 3.0.0 > Environment: OS : Windows 10 > Spark : spark-3.0.0-preview2-bin-hadoop2.7 > > Env_variables : > SPARK_HOME : > C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7 > HADOOP_HOME : > C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7 > JAVA_HOME : C:\Program Files\Java\jdk1.8.0_191 ; C:\Program > Files\Java\jre1.8.0_241\bin > PYTHON VERSION : Python 3.7.1 > ANACONDA VERSION : conda 4.8.2 > > I am not running this pyspark code locally (no hadoop setup)- to develop on > NLP script. > >Reporter: Rohith Bhattaram >Priority: Major > Original Estimate: 20h > Remaining Estimate: 20h > > Issue : > I am trying to initialize spark context using below code : > > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > import pandas as pd > sc = spark.sparkContext > > At getOrCreate() step , code is going to infinite loop and never respond back > either with exception or timeout. > > On execution if i check on VS Terminal or Anaconda cmd prompot - below > statement is getting displayed : > [I 19:13:47.973 NotebookApp] Saving file at /LearnPython/Assignment3/NLP.ipynb > *The filename, directory name, or volume label syntax is incorrect.* > > > Note : This used to work fine till a month back - not sure what chnaged, it > stopped working from yesterday. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org