[jira] [Updated] (SPARK-29861) Reduce downtime in Spark standalone HA master switch
[ https://issues.apache.org/jira/browse/SPARK-29861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29861: -- Affects Version/s: (was: 2.2.1) 3.0.0 > Reduce downtime in Spark standalone HA master switch > > > Key: SPARK-29861 > URL: https://issues.apache.org/jira/browse/SPARK-29861 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Robin Wolters >Priority: Minor > > As officially stated in the spark [HA > documention|https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper], > the recovery process of Spark (standalone) master in HA with zookeeper takes > about 1-2 minutes. During this time no spark master is active, which makes > interaction with spark essentially impossible. > After looking for a way to reduce this downtime, it seems that this is mainly > caused by the leader election, which waits for open zookeeper connections to > be closed. This seems like an unnecessary downtime for example in case of a > planned VM update. > I have fixed this in my setup by: > # Closing open zookeeper connections during spark shutdown > # Bumping the curator version and implementing a custom error policy that is > tolerant to a zookeeper connection suspension. > I am preparing a pull request for review / further discussion on this issue. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
[ https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25392: -- Affects Version/s: (was: 2.3.1) 3.0.0 > [Spark Job History]Inconsistent behaviour for pool details in spark web UI > and history server page > --- > > Key: SPARK-25392 > URL: https://issues.apache.org/jira/browse/SPARK-25392 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 > Environment: OS: SUSE 11 > Spark Version: 2.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Steps: > 1.Enable spark.scheduler.mode = FAIR > 2.Submitted beeline jobs > create database JH; > use JH; > create table one12( id int ); > insert into one12 values(12); > insert into one12 values(13); > Select * from one12; > 3.Click on JDBC Incompleted Application ID in Job History Page > 4. Go to Job Tab in staged Web UI page > 5. Click on run at AccessController.java:0 under Desription column > 6 . Click default under Pool Name column of Completed Stages table > URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default > 7. It throws below error > HTTP ERROR 400 > Problem accessing /history/application_1536399199015_0006/stages/pool/. > Reason: > Unknown pool: default > Powered by Jetty:// x.y.z > But under > Yarn resource page it display the summary under Fair Scheduler Pool: default > URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default > Summary > Pool Name Minimum Share Pool Weight Active Stages Running Tasks > SchedulingMode > default 0 1 0 0 FIFO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
[ https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25392: -- Component/s: (was: SQL) Spark Core > [Spark Job History]Inconsistent behaviour for pool details in spark web UI > and history server page > --- > > Key: SPARK-25392 > URL: https://issues.apache.org/jira/browse/SPARK-25392 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 > Environment: OS: SUSE 11 > Spark Version: 2.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Steps: > 1.Enable spark.scheduler.mode = FAIR > 2.Submitted beeline jobs > create database JH; > use JH; > create table one12( id int ); > insert into one12 values(12); > insert into one12 values(13); > Select * from one12; > 3.Click on JDBC Incompleted Application ID in Job History Page > 4. Go to Job Tab in staged Web UI page > 5. Click on run at AccessController.java:0 under Desription column > 6 . Click default under Pool Name column of Completed Stages table > URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default > 7. It throws below error > HTTP ERROR 400 > Problem accessing /history/application_1536399199015_0006/stages/pool/. > Reason: > Unknown pool: default > Powered by Jetty:// x.y.z > But under > Yarn resource page it display the summary under Fair Scheduler Pool: default > URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default > Summary > Pool Name Minimum Share Pool Weight Active Stages Running Tasks > SchedulingMode > default 0 1 0 0 FIFO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613 ] Dongjoon Hyun edited comment on SPARK-30262 at 12/15/19 3:33 AM: - Hi, [~southernriver] . Thank you for filing Jira issue and making a PR. BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, please don't fill them at the beginning. was (Author: dongjoon): Hi, @southernriver . Thank you for filing Jira issue and making a PR. BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, please don't fill them at the beginning. > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Attachments: screenshot-1.png > > > We could get the Partitions Statistics Info.But in some specail case, The > Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls > like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. > Here is the empty case: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613 ] Dongjoon Hyun edited comment on SPARK-30262 at 12/15/19 3:33 AM: - Hi, @southernriver . Thank you for filing Jira issue and making a PR. BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, please don't fill them at the beginning. was (Author: dongjoon): Hi, @chenliang . Thank you for filing Jira issue and making a PR. BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, please don't fill them at the beginning. > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Attachments: screenshot-1.png > > > We could get the Partitions Statistics Info.But in some specail case, The > Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls > like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. > Here is the empty case: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30262: -- Description: We could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) {code} Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats. Here is the empty case: !screenshot-1.png! was: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at
[jira] [Commented] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613 ] Dongjoon Hyun commented on SPARK-30262: --- Hi, @chenliang . Thank you for filing Jira issue and making a PR. BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, please don't fill them at the beginning. > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Attachments: screenshot-1.png > > > For Spark2.3.0+, we could get the Partitions Statistics Info.But in some > specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. > When we do some ddls like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. > Here is the empty case: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30262: -- Fix Version/s: (was: 2.4.3) (was: 2.3.2) > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Attachments: screenshot-1.png > > > For Spark2.3.0+, we could get the Partitions Statistics Info.But in some > specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. > When we do some ddls like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. > Here is the empty case: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30262: -- Target Version/s: (was: 2.3.2, 2.4.3) > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Attachments: screenshot-1.png > > > For Spark2.3.0+, we could get the Partitions Statistics Info.But in some > specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. > When we do some ddls like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. > Here is the empty case: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30212) COUNT(DISTINCT) window function should be supported
[ https://issues.apache.org/jira/browse/SPARK-30212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kernel Force updated SPARK-30212: - Description: Suppose we have a typical table in Hive like below: {code:sql} CREATE TABLE DEMO_COUNT_DISTINCT ( demo_date string, demo_id string ); {code} {noformat} ++--+ | demo_count_distinct.demo_date | demo_count_distinct.demo_id | ++--+ | 20180301 | 101 | | 20180301 | 102 | | 20180301 | 103 | | 20180401 | 201 | | 20180401 | 202 | ++--+ {noformat} Now I want to count distinct number of DEMO_DATE but also reserve every columns' data in each row. So I use COUNT(DISTINCT) window function (which is also common in other mainstream databases like Oracle) in Hive beeline and it work: {code:sql} SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T; {code} {noformat} +--++-+ | t.demo_date | t.demo_id | uniq_dates | +--++-+ | 20180401 | 202 | 2 | | 20180401 | 201 | 2 | | 20180301 | 103 | 2 | | 20180301 | 102 | 2 | | 20180301 | 101 | 2 | +--++-+ {noformat} But when I came to SparkSQL, it threw exception even if I run the same SQL. {code:sql} spark.sql(""" SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T """).show {code} {noformat} org.apache.spark.sql.AnalysisException: Distinct window functions are not supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()));; Project [demo_date#1, demo_id#2, UNIQ_DATES#0L] +- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L] +- Window [count(distinct DEMO_DATE#1) windowspecdefinition(null, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS UNIQ_DATES#0L], [null] +- Project [demo_date#1, demo_id#2] +- SubqueryAlias `T` +- SubqueryAlias `default`.`demo_count_distinct` +- HiveTableRelation `default`.`demo_count_distinct`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [demo_date#1, demo_id#2] {noformat} Then I try to use countDistinct function but also got exceptions. {code:sql} spark.sql(""" SELECT T.*, countDistinct(T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T """).show {code} {noformat} org.apache.spark.sql.AnalysisException: Undefined function: 'countDistinct'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 12 at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) .. {noformat} was: Suppose we have a typical table in Hive like below: {code:sql} CREATE TABLE DEMO_COUNT_DISTINCT ( demo_date string, demo_id string ); {code} {noformat} ++--+ | demo_count_distinct.demo_date | demo_count_distinct.demo_id | ++--+ | 20180301 | 101 | | 20180301 | 102 | | 20180301 | 103 | | 20180401 | 201 | | 20180401 | 202 | ++--+ {noformat} Now I want to count distinct number of DEMO_DATE but also reserve every columns' data in each row. So I use COUNT(DISTINCT) window function like below in Hive beeline and it work: {code:sql} SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T; {code} {noformat} +--++-+ | t.demo_date | t.demo_id | uniq_dates | +--++-+ | 20180401 | 202 | 2 | | 20180401 | 201 | 2 | | 20180301 | 103 | 2 | | 20180301 | 102 | 2 | | 20180301 | 101 | 2 | +--++-+ {noformat} But when I came to SparkSQL, it threw exception even if I run the same SQL. {code:sql} spark.sql(""" SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T """).show {code} {noformat} org.apache.spark.sql.AnalysisException: Distinct window functions are not supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()));; Project [demo_date#1, demo_id#2, UNIQ_DATES#0L] +- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L] +- Window [count(distinct DEMO_DATE#1)
[jira] [Resolved] (SPARK-30240) Spark UI redirects do not always work behind (dumb) proxies
[ https://issues.apache.org/jira/browse/SPARK-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30240. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26873 [https://github.com/apache/spark/pull/26873] > Spark UI redirects do not always work behind (dumb) proxies > --- > > Key: SPARK-30240 > URL: https://issues.apache.org/jira/browse/SPARK-30240 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Minor > Fix For: 3.0.0 > > > Spark's support for proxy servers allows the code to prepend a prefix to URIs > generated by Spark pages. But if Spark sends a redirect to the client, then > Spark's own full URL is exposed. If the client cannot access that URL, or > it's incorrect for whatever reason, then things do not work. > For example, if you set up an stunnel HTTPS proxy on port 4443, and get the > root of the Spark UI, you get this back (with all the TLS stuff stripped): > {noformat} > $ curl -v -k https://vanzin-t460p:4443/ > * Trying 127.0.1.1... > * Connected to vanzin-t460p (127.0.1.1) port 4443 (#0) > > GET / HTTP/1.1 > > Host: vanzin-t460p:4443 > > User-Agent: curl/7.58.0 > > Accept: */* > > > < HTTP/1.1 302 Found > < Date: Thu, 12 Dec 2019 22:09:52 GMT > < Cache-Control: no-cache, no-store, must-revalidate > < X-Frame-Options: SAMEORIGIN > < X-XSS-Protection: 1; mode=block > < X-Content-Type-Options: nosniff > < Location: http://vanzin-t460p:4443/jobs/ > < Content-Length: 0 > < Server: Jetty(9.4.18.v20190429) > {noformat} > So you can see that Jetty respects the "Host" header, but that has no > information about the protocol, and Spark has no idea that the proxy is using > HTTPS. So the returned URL does not work. > > Some proxies are smart enough to rewrite responses, but it would be nice (and > pretty easy) for Spark to support this simple use case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30240) Spark UI redirects do not always work behind (dumb) proxies
[ https://issues.apache.org/jira/browse/SPARK-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30240: - Assignee: Marcelo Masiero Vanzin > Spark UI redirects do not always work behind (dumb) proxies > --- > > Key: SPARK-30240 > URL: https://issues.apache.org/jira/browse/SPARK-30240 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Minor > > Spark's support for proxy servers allows the code to prepend a prefix to URIs > generated by Spark pages. But if Spark sends a redirect to the client, then > Spark's own full URL is exposed. If the client cannot access that URL, or > it's incorrect for whatever reason, then things do not work. > For example, if you set up an stunnel HTTPS proxy on port 4443, and get the > root of the Spark UI, you get this back (with all the TLS stuff stripped): > {noformat} > $ curl -v -k https://vanzin-t460p:4443/ > * Trying 127.0.1.1... > * Connected to vanzin-t460p (127.0.1.1) port 4443 (#0) > > GET / HTTP/1.1 > > Host: vanzin-t460p:4443 > > User-Agent: curl/7.58.0 > > Accept: */* > > > < HTTP/1.1 302 Found > < Date: Thu, 12 Dec 2019 22:09:52 GMT > < Cache-Control: no-cache, no-store, must-revalidate > < X-Frame-Options: SAMEORIGIN > < X-XSS-Protection: 1; mode=block > < X-Content-Type-Options: nosniff > < Location: http://vanzin-t460p:4443/jobs/ > < Content-Length: 0 > < Server: Jetty(9.4.18.v20190429) > {noformat} > So you can see that Jetty respects the "Host" header, but that has no > information about the protocol, and Spark has no idea that the proxy is using > HTTPS. So the returned URL does not work. > > Some proxies are smart enough to rewrite responses, but it would be nice (and > pretty easy) for Spark to support this simple use case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed
[ https://issues.apache.org/jira/browse/SPARK-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25100. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26714 [https://github.com/apache/spark/pull/26714] > Using KryoSerializer and setting registrationRequired true can lead job failed > -- > > Key: SPARK-25100 > URL: https://issues.apache.org/jira/browse/SPARK-25100 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: deshanxiao >Assignee: deshanxiao >Priority: Major > Fix For: 3.0.0 > > > When spark.serializer is `org.apache.spark.serializer.KryoSerializer` and > `spark.kryo.registrationRequired` is true in SparkConf. I invoked > saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because > the class TaskCommitMessage hasn't be registered. > > {code:java} > java.lang.IllegalArgumentException: Class is not registered: > org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage > Note: To register this class use: > kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class); > at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488) > at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed
[ https://issues.apache.org/jira/browse/SPARK-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-25100: - Assignee: deshanxiao > Using KryoSerializer and setting registrationRequired true can lead job failed > -- > > Key: SPARK-25100 > URL: https://issues.apache.org/jira/browse/SPARK-25100 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: deshanxiao >Assignee: deshanxiao >Priority: Major > > When spark.serializer is `org.apache.spark.serializer.KryoSerializer` and > `spark.kryo.registrationRequired` is true in SparkConf. I invoked > saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because > the class TaskCommitMessage hasn't be registered. > > {code:java} > java.lang.IllegalArgumentException: Class is not registered: > org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage > Note: To register this class use: > kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class); > at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488) > at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30259) CREATE TABLE throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30259: - Assignee: Hu Fuwang > CREATE TABLE throw error when session catalog specified > --- > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Assignee: Hu Fuwang >Priority: Major > > Spark throw error when the session catalog is specified explicitly in "CREATE > TABLE" and "CREATE TABLE AS SELECT" command, eg. > {code:java} > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30259) CREATE TABLE throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30259. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26887 [https://github.com/apache/spark/pull/26887] > CREATE TABLE throw error when session catalog specified > --- > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Assignee: Hu Fuwang >Priority: Major > Fix For: 3.0.0 > > > Spark throw error when the session catalog is specified explicitly in "CREATE > TABLE" and "CREATE TABLE AS SELECT" command, eg. > {code:java} > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests
[ https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996571#comment-16996571 ] Attila Zsolt Piros commented on SPARK-27021: [~roncenzhao] it seems to me you bumped into https://issues.apache.org/jira/browse/SPARK-26418 > Leaking Netty event loop group for shuffle chunk fetch requests > --- > > Key: SPARK-27021 > URL: https://issues.apache.org/jira/browse/SPARK-27021 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2019-12-14-23-23-50-384.png > > > The extra event loop group created for handling shuffle chunk fetch requests > are never closed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30263) Don't log values of ignored non-Spark properties
[ https://issues.apache.org/jira/browse/SPARK-30263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30263. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 26893 [https://github.com/apache/spark/pull/26893] > Don't log values of ignored non-Spark properties > > > Key: SPARK-30263 > URL: https://issues.apache.org/jira/browse/SPARK-30263 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > > Comment per Aaron Steers: > Is it expected that this error would print aws security keys to log files? > Seems like a serious security concern. > {code} > Warning: Ignoring non-spark config property: > fs.s3a.access.key={full-access-key} > Warning: Ignoring non-spark config property: > fs.s3a.secret.key={full-secret-key} > {code} > Could we not accomplish the same thing by printing the name of the key > without the key's value? > I think we can also redact these, but, also no big reason to log the value of > ignored properties here anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30066) Columnar execution support for interval types
[ https://issues.apache.org/jira/browse/SPARK-30066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30066: - Assignee: Kent Yao > Columnar execution support for interval types > - > > Key: SPARK-30066 > URL: https://issues.apache.org/jira/browse/SPARK-30066 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Columnar execution support for interval types -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30066) Columnar execution support for interval types
[ https://issues.apache.org/jira/browse/SPARK-30066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30066. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26699 [https://github.com/apache/spark/pull/26699] > Columnar execution support for interval types > - > > Key: SPARK-30066 > URL: https://issues.apache.org/jira/browse/SPARK-30066 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > Columnar execution support for interval types -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30236) Clarify date and time patterns supported by date_format
[ https://issues.apache.org/jira/browse/SPARK-30236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30236: - Assignee: John Ayad > Clarify date and time patterns supported by date_format > --- > > Key: SPARK-30236 > URL: https://issues.apache.org/jira/browse/SPARK-30236 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > > Docs for {{date_format}} do not specify which date & time format patterns are > supported, leading to such problems as the one reported in this StackOverflow > [question|https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates]. > Would appreciate linking in {{date_format}}'s docs to the Java class we're > following in the date/time patterns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30236) Clarify date and time patterns supported by date_format
[ https://issues.apache.org/jira/browse/SPARK-30236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30236. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26864 [https://github.com/apache/spark/pull/26864] > Clarify date and time patterns supported by date_format > --- > > Key: SPARK-30236 > URL: https://issues.apache.org/jira/browse/SPARK-30236 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > Fix For: 3.0.0 > > > Docs for {{date_format}} do not specify which date & time format patterns are > supported, leading to such problems as the one reported in this StackOverflow > [question|https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates]. > Would appreciate linking in {{date_format}}'s docs to the Java class we're > following in the date/time patterns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29245) CCE during creating HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-29245: Target Version/s: 3.0.0 > CCE during creating HiveMetaStoreClient > > > Key: SPARK-29245 > URL: https://issues.apache.org/jira/browse/SPARK-29245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > From `master` branch build, when I try to connect to an external HMS, I hit > the following. > {code} > 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException > class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; > ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader > 'bootstrap') > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to > class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module > java.base of loader 'bootstrap') > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > {code} > With HIVE-21508, I can get the following. > {code} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4) > Type in expressions to have them evaluated. > Type :help for more information. > scala> sql("show databases").show > ++ > |databaseName| > ++ > | . | > ... > {code} > With 2.3.7-SNAPSHOT, the following basic tests are tested. > - SHOW DATABASES / TABLES > - DESC DATABASE / TABLE > - CREATE / DROP / USE DATABASE > - CREATE / DROP / INSERT / LOAD / SELECT TABLE -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests
[ https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996445#comment-16996445 ] roncenzhao commented on SPARK-27021: [~attilapiros] Thank you. We have one problem about the memory leak of `StreamState` in `OneForOneStreamManager` which cause the `NodeManager` OOM. Most of the memory in NM is used by `StreamState`, like this: !image-2019-12-14-23-23-50-384.png! This may be caused by the shuffle service because we find the `StreamState` include some application which were already finished. Would you have any idea about this problem? Thanks~ > Leaking Netty event loop group for shuffle chunk fetch requests > --- > > Key: SPARK-27021 > URL: https://issues.apache.org/jira/browse/SPARK-27021 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2019-12-14-23-23-50-384.png > > > The extra event loop group created for handling shuffle chunk fetch requests > are never closed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests
[ https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] roncenzhao updated SPARK-27021: --- Attachment: image-2019-12-14-23-23-50-384.png > Leaking Netty event loop group for shuffle chunk fetch requests > --- > > Key: SPARK-27021 > URL: https://issues.apache.org/jira/browse/SPARK-27021 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2019-12-14-23-23-50-384.png > > > The extra event loop group created for handling shuffle chunk fetch requests > are never closed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30263) Don't log values of ignored non-Spark properties
Sean R. Owen created SPARK-30263: Summary: Don't log values of ignored non-Spark properties Key: SPARK-30263 URL: https://issues.apache.org/jira/browse/SPARK-30263 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4, 3.0.0 Reporter: Sean R. Owen Assignee: Sean R. Owen Comment per Aaron Steers: Is it expected that this error would print aws security keys to log files? Seems like a serious security concern. {code} Warning: Ignoring non-spark config property: fs.s3a.access.key={full-access-key} Warning: Ignoring non-spark config property: fs.s3a.secret.key={full-secret-key} {code} Could we not accomplish the same thing by printing the name of the key without the key's value? I think we can also redact these, but, also no big reason to log the value of ignored properties here anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-30262: -- Attachment: screenshot-1.png > Fix NumberFormatException when totalSize is empty > -- > > Key: SPARK-30262 > URL: https://issues.apache.org/jira/browse/SPARK-30262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: chenliang >Priority: Major > Fix For: 2.3.2, 2.4.3 > > Attachments: screenshot-1.png > > > For Spark2.3.0+, we could get the Partitions Statistics Info.But in some > specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. > When we do some ddls like > {code:java} > desc formatted partition{code} > ,the NumberFormatException is showed as below: > {code:java} > spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', > hour='23'); > 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 > partition(year='2019', month='10', day='17', hour='23')] > java.lang.NumberFormatException: Zero length BigInteger > at java.math.BigInteger.(BigInteger.java:411) > at java.math.BigInteger.(BigInteger.java:597) > at scala.math.BigInt$.apply(BigInt.scala:77) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) > {code} > Although we can use 'Analyze table partition ' to update the > totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw > NumberFormatException for Empty totalSize.We should fix the empty case when > readHiveStats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-30262: -- Description: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) {code} Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats. Here is the empty case: !screenshot-1.png! was: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-30262: -- Description: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) {code} Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats. was: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at
[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty
[ https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-30262: -- Description: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124) {code} Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize. was: For Spark2.3.0+, we could get the Partitions Statistics Info.But in some specail case, The Info like totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like {code:java} desc formatted partition{code} ,the NumberFormatException is showed as below: {code:java} spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23'); 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')] java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.(BigInteger.java:411) at java.math.BigInteger.(BigInteger.java:597) at scala.math.BigInt$.apply(BigInt.scala:77) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056) at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281) at