[jira] [Updated] (SPARK-29861) Reduce downtime in Spark standalone HA master switch

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29861:
--
Affects Version/s: (was: 2.2.1)
   3.0.0

> Reduce downtime in Spark standalone HA master switch
> 
>
> Key: SPARK-29861
> URL: https://issues.apache.org/jira/browse/SPARK-29861
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Robin Wolters
>Priority: Minor
>
> As officially stated in the spark [HA 
> documention|https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper],
>  the recovery process of Spark (standalone) master in HA with zookeeper takes 
> about 1-2 minutes. During this time no spark master is active, which makes 
> interaction with spark essentially impossible. 
> After looking for a way to reduce this downtime, it seems that this is mainly 
> caused by the leader election, which waits for open zookeeper connections to 
> be closed. This seems like an unnecessary downtime for example in case of a 
> planned VM update.
> I have fixed this in my setup by:
>  # Closing open zookeeper connections during spark shutdown
>  # Bumping the curator version and implementing a custom error policy that is 
> tolerant to a zookeeper connection suspension.
> I am preparing a pull request for review / further discussion on this issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25392:
--
Affects Version/s: (was: 2.3.1)
   3.0.0

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25392:
--
Component/s: (was: SQL)
 Spark Core

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613
 ] 

Dongjoon Hyun edited comment on SPARK-30262 at 12/15/19 3:33 AM:
-

Hi, [~southernriver] . Thank you for filing Jira issue and making a PR.

BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, 
please don't fill them at the beginning.


was (Author: dongjoon):
Hi, @southernriver . Thank you for filing Jira issue and making a PR.

BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, 
please don't fill them at the beginning.

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> We could get the Partitions Statistics Info.But in some specail case, The 
> Info  like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls 
> like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition ' to update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for Empty totalSize.We should fix the empty case when 
> readHiveStats.
> Here is the empty case:
>  !screenshot-1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613
 ] 

Dongjoon Hyun edited comment on SPARK-30262 at 12/15/19 3:33 AM:
-

Hi, @southernriver . Thank you for filing Jira issue and making a PR.

BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, 
please don't fill them at the beginning.


was (Author: dongjoon):
Hi, @chenliang . Thank you for filing Jira issue and making a PR.

BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, 
please don't fill them at the beginning.

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> We could get the Partitions Statistics Info.But in some specail case, The 
> Info  like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls 
> like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition ' to update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for Empty totalSize.We should fix the empty case when 
> readHiveStats.
> Here is the empty case:
>  !screenshot-1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30262:
--
Description: 
We could get the Partitions Statistics Info.But in some specail case, The Info  
like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
{code}
Although we can use 'Analyze table partition ' to update the 
totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
NumberFormatException for Empty totalSize.We should fix the empty case when 
readHiveStats.

Here is the empty case:
 !screenshot-1.png!

  was:
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 

[jira] [Commented] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996613#comment-16996613
 ] 

Dongjoon Hyun commented on SPARK-30262:
---

Hi, @chenliang . Thank you for filing Jira issue and making a PR.

BTW, `Fix Version` and `Target Version` are used when your PR is merged. So, 
please don't fill them at the beginning.

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
> specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. 
> When we do some ddls like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition '  to  update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for  Empty totalSize.We should fix the empty case when 
> readHiveStats.
> Here is the empty case:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30262:
--
Fix Version/s: (was: 2.4.3)
   (was: 2.3.2)

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
> specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. 
> When we do some ddls like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition '  to  update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for  Empty totalSize.We should fix the empty case when 
> readHiveStats.
> Here is the empty case:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30262:
--
Target Version/s:   (was: 2.3.2, 2.4.3)

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
> specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. 
> When we do some ddls like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition '  to  update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for  Empty totalSize.We should fix the empty case when 
> readHiveStats.
> Here is the empty case:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30212) COUNT(DISTINCT) window function should be supported

2019-12-14 Thread Kernel Force (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kernel Force updated SPARK-30212:
-
Description: 
Suppose we have a typical table in Hive like below:

{code:sql}
CREATE TABLE DEMO_COUNT_DISTINCT (
demo_date string,
demo_id string
);
{code}

{noformat}
++--+
| demo_count_distinct.demo_date | demo_count_distinct.demo_id |
++--+
| 20180301 | 101 |
| 20180301 | 102 |
| 20180301 | 103 |
| 20180401 | 201 |
| 20180401 | 202 |
++--+
{noformat}


Now I want to count distinct number of DEMO_DATE but also reserve every 
columns' data in each row.
So I use COUNT(DISTINCT) window function (which is also common in other 
mainstream databases like Oracle) in Hive beeline and it work:

{code:sql}
SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES
 FROM DEMO_COUNT_DISTINCT T;
{code}

{noformat}
+--++-+
| t.demo_date | t.demo_id | uniq_dates |
+--++-+
| 20180401 | 202 | 2 |
| 20180401 | 201 | 2 |
| 20180301 | 103 | 2 |
| 20180301 | 102 | 2 |
| 20180301 | 101 | 2 |
+--++-+
{noformat}


But when I came to SparkSQL, it threw exception even if I run the same SQL.

{code:sql}
spark.sql("""
SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES
 FROM DEMO_COUNT_DISTINCT T
""").show
{code}

{noformat}
org.apache.spark.sql.AnalysisException: Distinct window functions are not 
supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()));;
Project [demo_date#1, demo_id#2, UNIQ_DATES#0L]
+- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L]
 +- Window [count(distinct DEMO_DATE#1) windowspecdefinition(null, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
AS UNIQ_DATES#0L], [null]
 +- Project [demo_date#1, demo_id#2]
 +- SubqueryAlias `T`
 +- SubqueryAlias `default`.`demo_count_distinct`
 +- HiveTableRelation `default`.`demo_count_distinct`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [demo_date#1, demo_id#2]
{noformat}


Then I try to use countDistinct function but also got exceptions.

{code:sql}
spark.sql("""
SELECT T.*, countDistinct(T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES
 FROM DEMO_COUNT_DISTINCT T
""").show
{code}

{noformat}
org.apache.spark.sql.AnalysisException: Undefined function: 'countDistinct'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database 'default'.; line 2 pos 12
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279)
 at 
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
 ..
{noformat}


  was:
Suppose we have a typical table in Hive like below:

{code:sql}
CREATE TABLE DEMO_COUNT_DISTINCT (
demo_date string,
demo_id string
);
{code}

{noformat}
++--+
| demo_count_distinct.demo_date | demo_count_distinct.demo_id |
++--+
| 20180301 | 101 |
| 20180301 | 102 |
| 20180301 | 103 |
| 20180401 | 201 |
| 20180401 | 202 |
++--+
{noformat}


Now I want to count distinct number of DEMO_DATE but also reserve every 
columns' data in each row.
So I use COUNT(DISTINCT) window function like below in Hive beeline and it work:

{code:sql}
SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES
 FROM DEMO_COUNT_DISTINCT T;
{code}

{noformat}
+--++-+
| t.demo_date | t.demo_id | uniq_dates |
+--++-+
| 20180401 | 202 | 2 |
| 20180401 | 201 | 2 |
| 20180301 | 103 | 2 |
| 20180301 | 102 | 2 |
| 20180301 | 101 | 2 |
+--++-+
{noformat}


But when I came to SparkSQL, it threw exception even if I run the same SQL.

{code:sql}
spark.sql("""
SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES
 FROM DEMO_COUNT_DISTINCT T
""").show
{code}

{noformat}
org.apache.spark.sql.AnalysisException: Distinct window functions are not 
supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()));;
Project [demo_date#1, demo_id#2, UNIQ_DATES#0L]
+- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L]
 +- Window [count(distinct DEMO_DATE#1) 

[jira] [Resolved] (SPARK-30240) Spark UI redirects do not always work behind (dumb) proxies

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30240.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26873
[https://github.com/apache/spark/pull/26873]

> Spark UI redirects do not always work behind (dumb) proxies
> ---
>
> Key: SPARK-30240
> URL: https://issues.apache.org/jira/browse/SPARK-30240
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> Spark's support for proxy servers allows the code to prepend a prefix to URIs 
> generated by Spark pages. But if Spark sends a redirect to the client, then 
> Spark's own full URL is exposed. If the client cannot access that URL, or 
> it's incorrect for whatever reason, then things do not work.
> For example, if you set up an stunnel HTTPS proxy on port 4443, and get the 
> root of the Spark UI, you get this back (with all the TLS stuff stripped):
> {noformat}
> $ curl -v -k https://vanzin-t460p:4443/
> *   Trying 127.0.1.1...
> * Connected to vanzin-t460p (127.0.1.1) port 4443 (#0)
> > GET / HTTP/1.1
> > Host: vanzin-t460p:4443
> > User-Agent: curl/7.58.0
> > Accept: */*
> > 
> < HTTP/1.1 302 Found
> < Date: Thu, 12 Dec 2019 22:09:52 GMT
> < Cache-Control: no-cache, no-store, must-revalidate
> < X-Frame-Options: SAMEORIGIN
> < X-XSS-Protection: 1; mode=block
> < X-Content-Type-Options: nosniff
> < Location: http://vanzin-t460p:4443/jobs/
> < Content-Length: 0
> < Server: Jetty(9.4.18.v20190429)
> {noformat}
> So you can see that Jetty respects the "Host" header, but that has no 
> information about the protocol, and Spark has no idea that the proxy is using 
> HTTPS. So the returned URL does not work.
>  
> Some proxies are smart enough to rewrite responses, but it would be nice (and 
> pretty easy) for Spark to support this simple use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30240) Spark UI redirects do not always work behind (dumb) proxies

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30240:
-

Assignee: Marcelo Masiero Vanzin

> Spark UI redirects do not always work behind (dumb) proxies
> ---
>
> Key: SPARK-30240
> URL: https://issues.apache.org/jira/browse/SPARK-30240
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>
> Spark's support for proxy servers allows the code to prepend a prefix to URIs 
> generated by Spark pages. But if Spark sends a redirect to the client, then 
> Spark's own full URL is exposed. If the client cannot access that URL, or 
> it's incorrect for whatever reason, then things do not work.
> For example, if you set up an stunnel HTTPS proxy on port 4443, and get the 
> root of the Spark UI, you get this back (with all the TLS stuff stripped):
> {noformat}
> $ curl -v -k https://vanzin-t460p:4443/
> *   Trying 127.0.1.1...
> * Connected to vanzin-t460p (127.0.1.1) port 4443 (#0)
> > GET / HTTP/1.1
> > Host: vanzin-t460p:4443
> > User-Agent: curl/7.58.0
> > Accept: */*
> > 
> < HTTP/1.1 302 Found
> < Date: Thu, 12 Dec 2019 22:09:52 GMT
> < Cache-Control: no-cache, no-store, must-revalidate
> < X-Frame-Options: SAMEORIGIN
> < X-XSS-Protection: 1; mode=block
> < X-Content-Type-Options: nosniff
> < Location: http://vanzin-t460p:4443/jobs/
> < Content-Length: 0
> < Server: Jetty(9.4.18.v20190429)
> {noformat}
> So you can see that Jetty respects the "Host" header, but that has no 
> information about the protocol, and Spark has no idea that the proxy is using 
> HTTPS. So the returned URL does not work.
>  
> Some proxies are smart enough to rewrite responses, but it would be nice (and 
> pretty easy) for Spark to support this simple use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25100.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26714
[https://github.com/apache/spark/pull/26714]

> Using KryoSerializer and setting registrationRequired true can lead job failed
> --
>
> Key: SPARK-25100
> URL: https://issues.apache.org/jira/browse/SPARK-25100
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: deshanxiao
>Assignee: deshanxiao
>Priority: Major
> Fix For: 3.0.0
>
>
> When spark.serializer is `org.apache.spark.serializer.KryoSerializer` and  
> `spark.kryo.registrationRequired` is true in SparkConf. I invoked  
> saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because 
> the class TaskCommitMessage hasn't be registered.
>  
> {code:java}
> java.lang.IllegalArgumentException: Class is not registered: 
> org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage
> Note: To register this class use: 
> kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
> at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25100:
-

Assignee: deshanxiao

> Using KryoSerializer and setting registrationRequired true can lead job failed
> --
>
> Key: SPARK-25100
> URL: https://issues.apache.org/jira/browse/SPARK-25100
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: deshanxiao
>Assignee: deshanxiao
>Priority: Major
>
> When spark.serializer is `org.apache.spark.serializer.KryoSerializer` and  
> `spark.kryo.registrationRequired` is true in SparkConf. I invoked  
> saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because 
> the class TaskCommitMessage hasn't be registered.
>  
> {code:java}
> java.lang.IllegalArgumentException: Class is not registered: 
> org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage
> Note: To register this class use: 
> kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
> at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30259) CREATE TABLE throw error when session catalog specified

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30259:
-

Assignee: Hu Fuwang

> CREATE TABLE throw error when session catalog specified
> ---
>
> Key: SPARK-30259
> URL: https://issues.apache.org/jira/browse/SPARK-30259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hu Fuwang
>Assignee: Hu Fuwang
>Priority: Major
>
> Spark throw error when the session catalog is specified explicitly in "CREATE 
> TABLE" and "CREATE TABLE AS SELECT" command, eg. 
> {code:java}
> CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i;
> {code}
> the error message is like below: 
> {noformat}
> 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl
> 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr  cmd=get_table 
> : db=spark_catalog tbl=tbl
> 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog
> 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr  
> cmd=get_database: spark_catalog 
> 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, 
> returning NoSuchObjectException
> Error in query: Database 'spark_catalog' not found;{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30259) CREATE TABLE throw error when session catalog specified

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30259.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26887
[https://github.com/apache/spark/pull/26887]

> CREATE TABLE throw error when session catalog specified
> ---
>
> Key: SPARK-30259
> URL: https://issues.apache.org/jira/browse/SPARK-30259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hu Fuwang
>Assignee: Hu Fuwang
>Priority: Major
> Fix For: 3.0.0
>
>
> Spark throw error when the session catalog is specified explicitly in "CREATE 
> TABLE" and "CREATE TABLE AS SELECT" command, eg. 
> {code:java}
> CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i;
> {code}
> the error message is like below: 
> {noformat}
> 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl
> 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr  cmd=get_table 
> : db=spark_catalog tbl=tbl
> 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog
> 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr  
> cmd=get_database: spark_catalog 
> 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, 
> returning NoSuchObjectException
> Error in query: Database 'spark_catalog' not found;{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests

2019-12-14 Thread Attila Zsolt Piros (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996571#comment-16996571
 ] 

Attila Zsolt Piros commented on SPARK-27021:


[~roncenzhao] it seems to me you bumped into 
https://issues.apache.org/jira/browse/SPARK-26418

> Leaking Netty event loop group for shuffle chunk fetch requests
> ---
>
> Key: SPARK-27021
> URL: https://issues.apache.org/jira/browse/SPARK-27021
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-12-14-23-23-50-384.png
>
>
> The extra event loop group created for handling shuffle chunk fetch requests 
> are never closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30263) Don't log values of ignored non-Spark properties

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30263.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26893
[https://github.com/apache/spark/pull/26893]

> Don't log values of ignored non-Spark properties
> 
>
> Key: SPARK-30263
> URL: https://issues.apache.org/jira/browse/SPARK-30263
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Comment per Aaron Steers:
> Is it expected that this error would print aws security keys to log files? 
> Seems like a serious security concern.
> {code}
> Warning: Ignoring non-spark config property: 
> fs.s3a.access.key={full-access-key}
> Warning: Ignoring non-spark config property: 
> fs.s3a.secret.key={full-secret-key}
> {code}
> Could we not accomplish the same thing by printing the name of the key 
> without the key's value?
> I think we can also redact these, but, also no big reason to log the value of 
> ignored properties here anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30066) Columnar execution support for interval types

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30066:
-

Assignee: Kent Yao

> Columnar execution support for interval types
> -
>
> Key: SPARK-30066
> URL: https://issues.apache.org/jira/browse/SPARK-30066
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Columnar execution support for interval types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30066) Columnar execution support for interval types

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30066.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26699
[https://github.com/apache/spark/pull/26699]

> Columnar execution support for interval types
> -
>
> Key: SPARK-30066
> URL: https://issues.apache.org/jira/browse/SPARK-30066
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Columnar execution support for interval types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30236) Clarify date and time patterns supported by date_format

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30236:
-

Assignee: John Ayad

> Clarify date and time patterns supported by date_format
> ---
>
> Key: SPARK-30236
> URL: https://issues.apache.org/jira/browse/SPARK-30236
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Assignee: John Ayad
>Priority: Major
>
> Docs for {{date_format}} do not specify which date & time format patterns are 
> supported, leading to such problems as the one reported in this StackOverflow 
> [question|https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates].
> Would appreciate linking in {{date_format}}'s docs to the Java class we're 
> following in the date/time patterns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30236) Clarify date and time patterns supported by date_format

2019-12-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30236.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26864
[https://github.com/apache/spark/pull/26864]

> Clarify date and time patterns supported by date_format
> ---
>
> Key: SPARK-30236
> URL: https://issues.apache.org/jira/browse/SPARK-30236
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Assignee: John Ayad
>Priority: Major
> Fix For: 3.0.0
>
>
> Docs for {{date_format}} do not specify which date & time format patterns are 
> supported, leading to such problems as the one reported in this StackOverflow 
> [question|https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates].
> Would appreciate linking in {{date_format}}'s docs to the Java class we're 
> following in the date/time patterns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29245) CCE during creating HiveMetaStoreClient

2019-12-14 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29245:

Target Version/s: 3.0.0

> CCE during creating HiveMetaStoreClient 
> 
>
> Key: SPARK-29245
> URL: https://issues.apache.org/jira/browse/SPARK-29245
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> From `master` branch build, when I try to connect to an external HMS, I hit 
> the following.
> {code}
> 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException 
> class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; 
> ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 
> 'bootstrap')
> java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to 
> class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module 
> java.base of loader 'bootstrap')
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
> {code}
> With HIVE-21508, I can get the following.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("show databases").show
> ++
> |databaseName|
> ++
> |  .  |
> ...
> {code}
> With 2.3.7-SNAPSHOT, the following basic tests are tested.
> - SHOW DATABASES / TABLES
> - DESC DATABASE / TABLE
> - CREATE / DROP / USE DATABASE
> - CREATE / DROP / INSERT / LOAD / SELECT TABLE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests

2019-12-14 Thread roncenzhao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996445#comment-16996445
 ] 

roncenzhao commented on SPARK-27021:


[~attilapiros] Thank you.

We have one problem about the memory leak of `StreamState` in 
`OneForOneStreamManager` which cause the `NodeManager` OOM. Most of the memory 
in NM is used by `StreamState`, like this:

!image-2019-12-14-23-23-50-384.png!

This may be caused by the shuffle service because we find the `StreamState` 
include some application which were already finished. Would you have any idea 
about this problem? Thanks~

> Leaking Netty event loop group for shuffle chunk fetch requests
> ---
>
> Key: SPARK-27021
> URL: https://issues.apache.org/jira/browse/SPARK-27021
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-12-14-23-23-50-384.png
>
>
> The extra event loop group created for handling shuffle chunk fetch requests 
> are never closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests

2019-12-14 Thread roncenzhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roncenzhao updated SPARK-27021:
---
Attachment: image-2019-12-14-23-23-50-384.png

> Leaking Netty event loop group for shuffle chunk fetch requests
> ---
>
> Key: SPARK-27021
> URL: https://issues.apache.org/jira/browse/SPARK-27021
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-12-14-23-23-50-384.png
>
>
> The extra event loop group created for handling shuffle chunk fetch requests 
> are never closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30263) Don't log values of ignored non-Spark properties

2019-12-14 Thread Sean R. Owen (Jira)
Sean R. Owen created SPARK-30263:


 Summary: Don't log values of ignored non-Spark properties
 Key: SPARK-30263
 URL: https://issues.apache.org/jira/browse/SPARK-30263
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 3.0.0
Reporter: Sean R. Owen
Assignee: Sean R. Owen


Comment per Aaron Steers:

Is it expected that this error would print aws security keys to log files? 
Seems like a serious security concern.

{code}
Warning: Ignoring non-spark config property: fs.s3a.access.key={full-access-key}
Warning: Ignoring non-spark config property: fs.s3a.secret.key={full-secret-key}
{code}

Could we not accomplish the same thing by printing the name of the key without 
the key's value?


I think we can also redact these, but, also no big reason to log the value of 
ignored properties here anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread chenliang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-30262:
--
Attachment: screenshot-1.png

>  Fix NumberFormatException when totalSize is empty
> --
>
> Key: SPARK-30262
> URL: https://issues.apache.org/jira/browse/SPARK-30262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: chenliang
>Priority: Major
> Fix For: 2.3.2, 2.4.3
>
> Attachments: screenshot-1.png
>
>
> For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
> specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. 
> When we do some ddls like   
> {code:java}
> desc formatted partition{code}
>  ,the NumberFormatException is showed as below:
> {code:java}
> spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
> hour='23');
> 19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
> partition(year='2019', month='10', day='17', hour='23')]
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:411)
> at java.math.BigInteger.(BigInteger.java:597)
> at scala.math.BigInt$.apply(BigInt.scala:77)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
> {code}
> Although we can use 'Analyze table partition '  to  update the 
> totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
> NumberFormatException for  Empty totalSize.We should fix the empty case when 
> readHiveStats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread chenliang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-30262:
--
Description: 
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
{code}

Although we can use 'Analyze table partition '  to  update the 
totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
NumberFormatException for  Empty totalSize.We should fix the empty case when 
readHiveStats.

Here is the empty case:
 !screenshot-1.png! 



  was:
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 

[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread chenliang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-30262:
--
Description: 
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
{code}

Although we can use 'Analyze table partition '  to  update the 
totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
NumberFormatException for  Empty totalSize.We should fix the empty case when 
readHiveStats.


  was:
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 

[jira] [Updated] (SPARK-30262) Fix NumberFormatException when totalSize is empty

2019-12-14 Thread chenliang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-30262:
--
Description: 
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
{code}

Although we can use 'Analyze table partition '  to  update the 
totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw 
NumberFormatException for  Empty totalSize.


  was:
For Spark2.3.0+, we could get the Partitions Statistics Info.But in some 
specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When 
we do some ddls like   
{code:java}
desc formatted partition{code}
 ,the NumberFormatException is showed as below:
{code:java}
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', 
hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 
partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
at