[jira] [Assigned] (SPARK-30119) Support pagination for spark streaming tab

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30119:


Assignee: (was: Apache Spark)

> Support pagination for  spark streaming tab
> ---
>
> Key: SPARK-30119
> URL: https://issues.apache.org/jira/browse/SPARK-30119
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Priority: Minor
>
> Support pagination for spark streaming tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30119) Support pagination for spark streaming tab

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30119:


Assignee: Apache Spark

> Support pagination for  spark streaming tab
> ---
>
> Key: SPARK-30119
> URL: https://issues.apache.org/jira/browse/SPARK-30119
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Assignee: Apache Spark
>Priority: Minor
>
> Support pagination for spark streaming tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30119) Support pagination for spark streaming tab

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098314#comment-17098314
 ] 

Apache Spark commented on SPARK-30119:
--

User 'iRakson' has created a pull request for this issue:
https://github.com/apache/spark/pull/28439

> Support pagination for  spark streaming tab
> ---
>
> Key: SPARK-30119
> URL: https://issues.apache.org/jira/browse/SPARK-30119
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Priority: Minor
>
> Support pagination for spark streaming tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30119) Support pagination for spark streaming tab

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098315#comment-17098315
 ] 

Apache Spark commented on SPARK-30119:
--

User 'iRakson' has created a pull request for this issue:
https://github.com/apache/spark/pull/28439

> Support pagination for  spark streaming tab
> ---
>
> Key: SPARK-30119
> URL: https://issues.apache.org/jira/browse/SPARK-30119
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Priority: Minor
>
> Support pagination for spark streaming tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31527) date add/subtract interval only allow those day precision in ansi mode

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098319#comment-17098319
 ] 

Apache Spark commented on SPARK-31527:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28440

> date add/subtract interval only allow those day precision in ansi mode
> --
>
> Key: SPARK-31527
> URL: https://issues.apache.org/jira/browse/SPARK-31527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Under ANSI mode, we should not allow date add interval with hours, minutes... 
> microseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31033) Support 'SHOW views' command

2020-05-03 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-31033.
-
Resolution: Duplicate

> Support 'SHOW views' command
> 
>
> Key: SPARK-31033
> URL: https://issues.apache.org/jira/browse/SPARK-31033
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: PowerBI.png
>
>
> {noformat}
> spark-sql> show views;
> Error in query:
> missing 'FUNCTIONS' at ''(line 1, pos 10)
> == SQL ==
> show views
> --^^^
> spark-sql> show views in default;
> Error in query:
> missing 'FUNCTIONS' at 'in'(line 1, pos 11)
> == SQL ==
> show views in default
> ---^^^
> {noformat}
> Hive 3.1 support this command:
> {noformat}
> hive> select version();
> OK
> 3.1.2 r8190d2be7b7165effa62bd21b7d60ef81fb0e4af
> Time taken: 4.874 seconds, Fetched: 1 row(s)
> hive> show views in default;
> OK
> Time taken: 0.01 seconds
> hive> show views;
> OK
> Time taken: 0.013 seconds
> hive> show views in default;
> OK
> Time taken: 0.013 seconds
> {noformat}
> *PowerBI* also need this command:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31627) Font style of Spark SQL DAG-viz is broken in Chrome on macOS

2020-05-03 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098389#comment-17098389
 ] 

Kousuke Saruta commented on SPARK-31627:


Thanks [~hyukjin.kwon] . I'll close this.

> Font style of Spark SQL DAG-viz is broken in Chrome on macOS
> 
>
> Key: SPARK-31627
> URL: https://issues.apache.org/jira/browse/SPARK-31627
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.0
> Environment: * macOS
> * Chrome 81
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: font-weight-does-not-work.png
>
>
> If all the following condition is true, font style of Spark SQL DAG-viz can 
> be broken.
>  More specifically, plan name will not be bold style if the plan is 
> WholeStageCodegen.
>  * macOS
>  * Chrome (version 81)
> The current master uses Bootstrap4, which defines the default font family as 
> follows.
> {code:java}
> -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica 
> Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI 
> Emoji","Segoe UI Symbol","Noto Color Emoji"
> {code}
> If we use Chrome, BlinkMacSystemFont is used but font-weight property doesn't 
> work with the font when the font is used in SVG tags.
>  This issue is reported here 
> [here|https://bugs.chromium.org/p/chromium/issues/detail?id=1057654]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31627) Font style of Spark SQL DAG-viz is broken in Chrome on macOS

2020-05-03 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-31627.

Resolution: Won't Fix

> Font style of Spark SQL DAG-viz is broken in Chrome on macOS
> 
>
> Key: SPARK-31627
> URL: https://issues.apache.org/jira/browse/SPARK-31627
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.0
> Environment: * macOS
> * Chrome 81
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: font-weight-does-not-work.png
>
>
> If all the following condition is true, font style of Spark SQL DAG-viz can 
> be broken.
>  More specifically, plan name will not be bold style if the plan is 
> WholeStageCodegen.
>  * macOS
>  * Chrome (version 81)
> The current master uses Bootstrap4, which defines the default font family as 
> follows.
> {code:java}
> -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica 
> Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI 
> Emoji","Segoe UI Symbol","Noto Color Emoji"
> {code}
> If we use Chrome, BlinkMacSystemFont is used but font-weight property doesn't 
> work with the font when the font is used in SVG tags.
>  This issue is reported here 
> [here|https://bugs.chromium.org/p/chromium/issues/detail?id=1057654]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31629) "py4j.protocol.Py4JJavaError: An error occurred while calling o90.save" in pyspark 2.3.1

2020-05-03 Thread appleyuchi (Jira)
appleyuchi created SPARK-31629:
--

 Summary: "py4j.protocol.Py4JJavaError: An error occurred while 
calling o90.save" in pyspark 2.3.1
 Key: SPARK-31629
 URL: https://issues.apache.org/jira/browse/SPARK-31629
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.3.1
 Environment: Ubuntu19.10

anaconda3-python3.6.10

scala 2.11.8

apache-hive-3.0.0-bin

hadoop-2.7.7

spark-2.3.1-bin-hadoop2.7

java version "1.8.0_131"

Mysql Server version: 8.0.19-0ubuntu0.19.10.3 (Ubuntu)

driver:mysql-connector-java-8.0.20.jar

[Driver 
link|[https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20]]

 
Reporter: appleyuchi
 Fix For: 1.5.0, 1.4.1


I have search the forum,

[SPARK-8365|https://issues.apache.org/jira/browse/SPARK-8365]

mentioned the same issue in spark 1.4.0

[SPARK-8368|https://issues.apache.org/jira/browse/SPARK-8368] 

fix it in spark 1.4.1 1.5.0

 

However,in spark 2.3.1,this bug occur again

Please help me ,thanks~!!!

#--

test.py

[https://paste.ubuntu.com/p/HJfbcQ2zq3/]

 

running method is:

spark-submit --master yarn --deploy-mode cluster test.py

 

then I got:

Traceback (most recent call last):
 File "test.py", line 45, in 
 password="appleyuchi").mode('append').save()
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/readwriter.py",
 line 703, in save
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/utils.py",
 line 63, in deco
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
*py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.*
: java.sql.SQLSyntaxErrorException: Unknown database 'leaf'
 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
 at 
com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
 at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
 at com.mysql.cj.jdbc.ConnectionImpl.(ConnectionImpl.java:456)
 at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
 at 
com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:197)
 at 
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.DataFrameWriter

[jira] [Updated] (SPARK-31629) "py4j.protocol.Py4JJavaError: An error occurred while calling o90.save" in pyspark 2.3.1

2020-05-03 Thread appleyuchi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

appleyuchi updated SPARK-31629:
---
Environment: 
Ubuntu19.10

 

anaconda3-python3.6.10

 

scala 2.11.8

 

apache-hive-3.0.0-bin

 

hadoop-2.7.7

 

spark-2.3.1-bin-hadoop2.7

 

java version "1.8.0_131"

 

Mysql Server version: 8.0.19-0ubuntu0.19.10.3 (Ubuntu)

 

driver:mysql-connector-java-8.0.20.jar

 

[Driver 
link|[https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20]]

 

  was:
Ubuntu19.10

anaconda3-python3.6.10

scala 2.11.8

apache-hive-3.0.0-bin

hadoop-2.7.7

spark-2.3.1-bin-hadoop2.7

java version "1.8.0_131"

Mysql Server version: 8.0.19-0ubuntu0.19.10.3 (Ubuntu)

driver:mysql-connector-java-8.0.20.jar

[Driver 
link|[https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20]]

 


> "py4j.protocol.Py4JJavaError: An error occurred while calling o90.save" in 
> pyspark 2.3.1
> 
>
> Key: SPARK-31629
> URL: https://issues.apache.org/jira/browse/SPARK-31629
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.1
> Environment: Ubuntu19.10
>  
> anaconda3-python3.6.10
>  
> scala 2.11.8
>  
> apache-hive-3.0.0-bin
>  
> hadoop-2.7.7
>  
> spark-2.3.1-bin-hadoop2.7
>  
> java version "1.8.0_131"
>  
> Mysql Server version: 8.0.19-0ubuntu0.19.10.3 (Ubuntu)
>  
> driver:mysql-connector-java-8.0.20.jar
>  
> [Driver 
> link|[https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20]]
>  
>Reporter: appleyuchi
>Priority: Critical
> Fix For: 1.4.1, 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I have search the forum,
> [SPARK-8365|https://issues.apache.org/jira/browse/SPARK-8365]
> mentioned the same issue in spark 1.4.0
> [SPARK-8368|https://issues.apache.org/jira/browse/SPARK-8368] 
> fix it in spark 1.4.1 1.5.0
>  
> However,in spark 2.3.1,this bug occur again
> Please help me ,thanks~!!!
> #--
> test.py
> [https://paste.ubuntu.com/p/HJfbcQ2zq3/]
>  
> running method is:
> spark-submit --master yarn --deploy-mode cluster test.py
>  
> then I got:
> Traceback (most recent call last):
>  File "test.py", line 45, in 
>  password="appleyuchi").mode('append').save()
>  File 
> "/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/readwriter.py",
>  line 703, in save
>  File 
> "/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>  line 1257, in __call__
>  File 
> "/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/utils.py",
>  line 63, in deco
>  File 
> "/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> *py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.*
> : java.sql.SQLSyntaxErrorException: Unknown database 'leaf'
>  at 
> com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
>  at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
>  at 
> com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
>  at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
>  at com.mysql.cj.jdbc.ConnectionImpl.(ConnectionImpl.java:456)
>  at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
>  at 
> com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:197)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.executi

[jira] [Created] (SPARK-31630) Skip timestamp rebasing after 1900-01-01

2020-05-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-31630:
--

 Summary: Skip timestamp rebasing after 1900-01-01
 Key: SPARK-31630
 URL: https://issues.apache.org/jira/browse/SPARK-31630
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0, 3.1.0
Reporter: Maxim Gekk


The conversions of Catalyst's DATE/TIMESTAMPS to/from Java's types 
java.sql.Date/java.sql.Timestamps have almost the same implementation except 
addition rebasing op. If we look at switch and diffs arrays of all available 
time zones, we can detect that there is a time point when all diffs are 0. This 
is 1900-01-01 00:00:00Z. So, we can compare input micros with the time point 
and skip conversion for modern timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31629) "py4j.protocol.Py4JJavaError: An error occurred while calling o90.save" in pyspark 2.3.1

2020-05-03 Thread appleyuchi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

appleyuchi updated SPARK-31629:
---
Description: 
I have search the forum,

SPARK-8365

mentioned the same issue in spark 1.4.0

SPARK-8368

fix it in spark 1.4.1 1.5.0

 

However,in spark 2.3.1,this bug occur again

Please help me ,thanks~!!!

#--

test.py

[https://paste.ubuntu.com/p/HJfbcQ2zq3/]

 

running method is:

①spark-submit --master yarn --deploy-mode cluster test.py

②pyspark --master yarn( and then paste the code above)

each method can replicate this error.

 

then I got:

Traceback (most recent call last):
 File "test.py", line 45, in 
 password="appleyuchi").mode('append').save()
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/readwriter.py",
 line 703, in save
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/pyspark.zip/pyspark/sql/utils.py",
 line 63, in deco
 File 
"/home/appleyuchi/bigdata/hadoop_tmp/nm-local-dir/usercache/appleyuchi/appcache/application_1588504345289_0003/container_1588504345289_0003_01_01/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
 {color:#FF}*py4j.protocol.Py4JJavaError: An error occurred while calling 
o90.save.*{color}
 : java.sql.SQLSyntaxErrorException: Unknown database 
'leaf'*{color:#FF}(I'm sure this database exists){color}*
 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
 at 
com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
 at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
 at com.mysql.cj.jdbc.ConnectionImpl.(ConnectionImpl.java:456)
 at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
 at 
com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:197)
 at 
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngin

[jira] [Commented] (SPARK-31630) Skip timestamp rebasing after 1900-01-01

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098414#comment-17098414
 ] 

Apache Spark commented on SPARK-31630:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28441

> Skip timestamp rebasing after 1900-01-01
> 
>
> Key: SPARK-31630
> URL: https://issues.apache.org/jira/browse/SPARK-31630
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The conversions of Catalyst's DATE/TIMESTAMPS to/from Java's types 
> java.sql.Date/java.sql.Timestamps have almost the same implementation except 
> addition rebasing op. If we look at switch and diffs arrays of all available 
> time zones, we can detect that there is a time point when all diffs are 0. 
> This is 1900-01-01 00:00:00Z. So, we can compare input micros with the time 
> point and skip conversion for modern timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31630) Skip timestamp rebasing after 1900-01-01

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31630:


Assignee: Apache Spark

> Skip timestamp rebasing after 1900-01-01
> 
>
> Key: SPARK-31630
> URL: https://issues.apache.org/jira/browse/SPARK-31630
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The conversions of Catalyst's DATE/TIMESTAMPS to/from Java's types 
> java.sql.Date/java.sql.Timestamps have almost the same implementation except 
> addition rebasing op. If we look at switch and diffs arrays of all available 
> time zones, we can detect that there is a time point when all diffs are 0. 
> This is 1900-01-01 00:00:00Z. So, we can compare input micros with the time 
> point and skip conversion for modern timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31630) Skip timestamp rebasing after 1900-01-01

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31630:


Assignee: (was: Apache Spark)

> Skip timestamp rebasing after 1900-01-01
> 
>
> Key: SPARK-31630
> URL: https://issues.apache.org/jira/browse/SPARK-31630
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The conversions of Catalyst's DATE/TIMESTAMPS to/from Java's types 
> java.sql.Date/java.sql.Timestamps have almost the same implementation except 
> addition rebasing op. If we look at switch and diffs arrays of all available 
> time zones, we can detect that there is a time point when all diffs are 0. 
> This is 1900-01-01 00:00:00Z. So, we can compare input micros with the time 
> point and skip conversion for modern timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31527) date add/subtract interval only allow those day precision in ansi mode

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098436#comment-17098436
 ] 

Apache Spark commented on SPARK-31527:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28369

> date add/subtract interval only allow those day precision in ansi mode
> --
>
> Key: SPARK-31527
> URL: https://issues.apache.org/jira/browse/SPARK-31527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Under ANSI mode, we should not allow date add interval with hours, minutes... 
> microseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13

2020-05-03 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098448#comment-17098448
 ] 

Sean R. Owen commented on SPARK-29292:
--

I think the issue is that if you import scala.collection.Seq, then yes you keep 
the type, but unless callers do the same thing, you end up not being able to 
write simple things in 2.13 like {{val mySeq: Seq[...] = 
spark.somethingThatReturnsSeq()}} because Spark is returning a collection.Seq, 
not an immutable Seq. It's viable, just think it looked like a lot more change 
for callers. There was more discussion at 
https://issues.apache.org/jira/browse/SPARK-27681


> Fix internal usages of mutable collection for Seq in 2.13
> -
>
> Key: SPARK-29292
> URL: https://issues.apache.org/jira/browse/SPARK-29292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>
> Kind of related to https://issues.apache.org/jira/browse/SPARK-27681, but a 
> simpler subset. 
> In 2.13, a mutable collection can't be returned as a 
> {{scala.collection.Seq}}. It's easy enough to call .toSeq on these as that 
> still works on 2.12.
> {code}
> [ERROR] [Error] 
> /Users/seanowen/Documents/spark_2.13/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:467:
>  type mismatch;
>  found   : Seq[String] (in scala.collection) 
>  required: Seq[String] (in scala.collection.immutable) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13

2020-05-03 Thread Guillaume Martres (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098459#comment-17098459
 ] 

Guillaume Martres commented on SPARK-29292:
---

Right, so ideally you'd change Seq to collection.Seq for parameter types but 
not for result type, but that might be tricky to manage and it looks like that 
was already considered and dismissed in the discussion you linked to.

Going back to the issue with copying when using toSeq everywhere, one way to 
reduce that would be to use builders when possible: instead of creating an 
ArrayBuffer, filling it, then copying using toSeq, one can create a 
`Seq.newBuilder`, fill it, then calll `result` on it to get back an immutable 
Seq without copying. (This might be worse than using ArrayBuffer in 2.12, 
because the default builder will construct a List, ideally one would use 
immutable.ArraySeq.newBuilder to get something backed by a plain Array but that 
one doesn't exist in 2.12). If most of the collections being copied are small 
then this might not make a significant difference any way.

> Fix internal usages of mutable collection for Seq in 2.13
> -
>
> Key: SPARK-29292
> URL: https://issues.apache.org/jira/browse/SPARK-29292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>
> Kind of related to https://issues.apache.org/jira/browse/SPARK-27681, but a 
> simpler subset. 
> In 2.13, a mutable collection can't be returned as a 
> {{scala.collection.Seq}}. It's easy enough to call .toSeq on these as that 
> still works on 2.12.
> {code}
> [ERROR] [Error] 
> /Users/seanowen/Documents/spark_2.13/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:467:
>  type mismatch;
>  found   : Seq[String] (in scala.collection) 
>  required: Seq[String] (in scala.collection.immutable) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31631) Fix test flakiness caused by MiniKdc which throws "address in use" BindException

2020-05-03 Thread Kent Yao (Jira)
Kent Yao created SPARK-31631:


 Summary: Fix test flakiness caused by MiniKdc which throws 
"address in use" BindException
 Key: SPARK-31631
 URL: https://issues.apache.org/jira/browse/SPARK-31631
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Kent Yao



{code:java}
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** 
(15 seconds, 426 milliseconds)
[info]   java.net.BindException: Address already in use
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:433)
[info]   at sun.nio.ch.Net.bind(Net.java:425)
[info]   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
[info]   at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
{code}


This is an issue fixed in hadoop 2.8.0
https://issues.apache.org/jira/browse/HADOOP-12656

We may need apply the approach from HBASE first before we drop Hadoop 2.7.x

https://issues.apache.org/jira/browse/HBASE-14734








--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31631) Fix test flakiness caused by MiniKdc which throws "address in use" BindException

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098488#comment-17098488
 ] 

Apache Spark commented on SPARK-31631:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28442

> Fix test flakiness caused by MiniKdc which throws "address in use" 
> BindException
> 
>
> Key: SPARK-31631
> URL: https://issues.apache.org/jira/browse/SPARK-31631
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
> *** (15 seconds, 426 milliseconds)
> [info]   java.net.BindException: Address already in use
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:433)
> [info]   at sun.nio.ch.Net.bind(Net.java:425)
> [info]   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> [info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
> [info]   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748)
> {code}
> This is an issue fixed in hadoop 2.8.0
> https://issues.apache.org/jira/browse/HADOOP-12656
> We may need apply the approach from HBASE first before we drop Hadoop 2.7.x
> https://issues.apache.org/jira/browse/HBASE-14734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31631) Fix test flakiness caused by MiniKdc which throws "address in use" BindException

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31631:


Assignee: Apache Spark

> Fix test flakiness caused by MiniKdc which throws "address in use" 
> BindException
> 
>
> Key: SPARK-31631
> URL: https://issues.apache.org/jira/browse/SPARK-31631
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
> *** (15 seconds, 426 milliseconds)
> [info]   java.net.BindException: Address already in use
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:433)
> [info]   at sun.nio.ch.Net.bind(Net.java:425)
> [info]   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> [info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
> [info]   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748)
> {code}
> This is an issue fixed in hadoop 2.8.0
> https://issues.apache.org/jira/browse/HADOOP-12656
> We may need apply the approach from HBASE first before we drop Hadoop 2.7.x
> https://issues.apache.org/jira/browse/HBASE-14734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31631) Fix test flakiness caused by MiniKdc which throws "address in use" BindException

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31631:


Assignee: (was: Apache Spark)

> Fix test flakiness caused by MiniKdc which throws "address in use" 
> BindException
> 
>
> Key: SPARK-31631
> URL: https://issues.apache.org/jira/browse/SPARK-31631
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
> *** (15 seconds, 426 milliseconds)
> [info]   java.net.BindException: Address already in use
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:433)
> [info]   at sun.nio.ch.Net.bind(Net.java:425)
> [info]   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> [info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
> [info]   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748)
> {code}
> This is an issue fixed in hadoop 2.8.0
> https://issues.apache.org/jira/browse/HADOOP-12656
> We may need apply the approach from HBASE first before we drop Hadoop 2.7.x
> https://issues.apache.org/jira/browse/HBASE-14734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31527) date add/subtract interval only allow those day precision in ansi mode

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098530#comment-17098530
 ] 

Apache Spark commented on SPARK-31527:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28310

> date add/subtract interval only allow those day precision in ansi mode
> --
>
> Key: SPARK-31527
> URL: https://issues.apache.org/jira/browse/SPARK-31527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Under ANSI mode, we should not allow date add interval with hours, minutes... 
> microseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31527) date add/subtract interval only allow those day precision in ansi mode

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098531#comment-17098531
 ] 

Apache Spark commented on SPARK-31527:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28310

> date add/subtract interval only allow those day precision in ansi mode
> --
>
> Key: SPARK-31527
> URL: https://issues.apache.org/jira/browse/SPARK-31527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Under ANSI mode, we should not allow date add interval with hours, minutes... 
> microseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098547#comment-17098547
 ] 

Apache Spark commented on SPARK-31212:
--

User 'tianshizz' has created a pull request for this issue:
https://github.com/apache/spark/pull/28443

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31212:


Assignee: (was: Apache Spark)

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31212:


Assignee: Apache Spark

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098548#comment-17098548
 ] 

Apache Spark commented on SPARK-31212:
--

User 'tianshizz' has created a pull request for this issue:
https://github.com/apache/spark/pull/28443

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31632) The ApplicationInfo in KVStore may be accessed before it's prepared

2020-05-03 Thread Xingcan Cui (Jira)
Xingcan Cui created SPARK-31632:
---

 Summary: The ApplicationInfo in KVStore may be accessed before 
it's prepared
 Key: SPARK-31632
 URL: https://issues.apache.org/jira/browse/SPARK-31632
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 3.0.0
Reporter: Xingcan Cui


While starting some local tests, I occasionally encountered the following 
exceptions for Web UI.
{noformat}
23:00:29.845 WARN org.eclipse.jetty.server.HttpChannel: /jobs/
 java.util.NoSuchElementException
 at java.util.Collections$EmptyIterator.next(Collections.java:4191)
 at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:467)
 at 
org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:39)
 at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:266)
 at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:89)
 at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
 at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
 at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
 at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 at org.eclipse.jetty.server.Server.handle(Server.java:505)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
 at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
 at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
 at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
 at java.lang.Thread.run(Thread.java:748){noformat}
*Reason*
 That is because {{AppStatusStore.applicationInfo()}} accesses an empty view 
(iterator) returned by {{InMemoryStore}}.

AppStatusStore
{code:java}
def applicationInfo(): v1.ApplicationInfo = {
store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info
}
{code}
InMemoryStore
{code:java}
public  KVStoreView view(Class type){
InstanceList list = inMemoryLists.get(type);
return list != null ? list.view() : emptyView();
 }
{code}
During the initialization of {{SparkContext}}, it first starts the Web UI 
(SparkContext: L475 _ui.foreach(_.bind())) and then setup the 
{{LiveListenerBus}} thread (SparkContext: L608 {{setupAndStartListenerBus()}}) 
for dispatching the {{SparkListenerApplicationStart}} event (which will trigger 
writing the requested {{ApplicationInfo}} to {{InMemoryStore}}).

*Solution*
 Since the {{applicationInfo()}} method is expected to always return a valid 
{{ApplicationInfo}}, maybe we can add a while-loop-check here to guarantee the 
availability of {{ApplicationInfo}}.
{code:java}
def applicationInfo(): v1.ApplicationInfo = {
 var iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator()
 while (!iterator.hasNext){ 
Thread.sleep(20) 
iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator() 
  }
  iterator.next().info
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31632) The ApplicationInfo in KVStore may be accessed before it's prepared

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31632:


Assignee: (was: Apache Spark)

> The ApplicationInfo in KVStore may be accessed before it's prepared
> ---
>
> Key: SPARK-31632
> URL: https://issues.apache.org/jira/browse/SPARK-31632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Xingcan Cui
>Priority: Minor
>
> While starting some local tests, I occasionally encountered the following 
> exceptions for Web UI.
> {noformat}
> 23:00:29.845 WARN org.eclipse.jetty.server.HttpChannel: /jobs/
>  java.util.NoSuchElementException
>  at java.util.Collections$EmptyIterator.next(Collections.java:4191)
>  at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:467)
>  at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:39)
>  at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:266)
>  at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:89)
>  at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.eclipse.jetty.server.Server.handle(Server.java:505)
>  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748){noformat}
> *Reason*
>  That is because {{AppStatusStore.applicationInfo()}} accesses an empty view 
> (iterator) returned by {{InMemoryStore}}.
> AppStatusStore
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
> store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info
> }
> {code}
> InMemoryStore
> {code:java}
> public  KVStoreView view(Class type){
> InstanceList list = inMemoryLists.get(type);
> return list != null ? list.view() : emptyView();
>  }
> {code}
> During the initialization of {{SparkContext}}, it first starts the Web UI 
> (SparkContext: L475 _ui.foreach(_.bind())) and then setup the 
> {{LiveListenerBus}} thread (SparkContext: L608 
> {{setupAndStartListenerBus()}}) for dispatching the 
> {{SparkListenerApplicationStart}} event (which will trigger writing the 
> requested {{ApplicationInfo}} to {{InMemoryStore}}).
> *Solution*
>  Since the {{applicationInfo()}} method is expected to always return a valid 
> {{ApplicationInfo}}, maybe we can add a while-loop-check here to guarantee 
> the availability of {{ApplicationInfo}}.
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
>  var iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator()
>  while (!iterator.hasNext){ 
> Thread.sleep(20) 
> iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator() 
>   }
>   iterator.next().info
> }
> {code}



--
This message was sent by

[jira] [Commented] (SPARK-31632) The ApplicationInfo in KVStore may be accessed before it's prepared

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098590#comment-17098590
 ] 

Apache Spark commented on SPARK-31632:
--

User 'xccui' has created a pull request for this issue:
https://github.com/apache/spark/pull/28444

> The ApplicationInfo in KVStore may be accessed before it's prepared
> ---
>
> Key: SPARK-31632
> URL: https://issues.apache.org/jira/browse/SPARK-31632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Xingcan Cui
>Priority: Minor
>
> While starting some local tests, I occasionally encountered the following 
> exceptions for Web UI.
> {noformat}
> 23:00:29.845 WARN org.eclipse.jetty.server.HttpChannel: /jobs/
>  java.util.NoSuchElementException
>  at java.util.Collections$EmptyIterator.next(Collections.java:4191)
>  at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:467)
>  at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:39)
>  at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:266)
>  at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:89)
>  at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.eclipse.jetty.server.Server.handle(Server.java:505)
>  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748){noformat}
> *Reason*
>  That is because {{AppStatusStore.applicationInfo()}} accesses an empty view 
> (iterator) returned by {{InMemoryStore}}.
> AppStatusStore
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
> store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info
> }
> {code}
> InMemoryStore
> {code:java}
> public  KVStoreView view(Class type){
> InstanceList list = inMemoryLists.get(type);
> return list != null ? list.view() : emptyView();
>  }
> {code}
> During the initialization of {{SparkContext}}, it first starts the Web UI 
> (SparkContext: L475 _ui.foreach(_.bind())) and then setup the 
> {{LiveListenerBus}} thread (SparkContext: L608 
> {{setupAndStartListenerBus()}}) for dispatching the 
> {{SparkListenerApplicationStart}} event (which will trigger writing the 
> requested {{ApplicationInfo}} to {{InMemoryStore}}).
> *Solution*
>  Since the {{applicationInfo()}} method is expected to always return a valid 
> {{ApplicationInfo}}, maybe we can add a while-loop-check here to guarantee 
> the availability of {{ApplicationInfo}}.
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
>  var iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator()
>  while (!iterator.hasNext){ 
> Thread.sleep(20) 
> iterator = store.view(classOf[Applic

[jira] [Assigned] (SPARK-31632) The ApplicationInfo in KVStore may be accessed before it's prepared

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31632:


Assignee: Apache Spark

> The ApplicationInfo in KVStore may be accessed before it's prepared
> ---
>
> Key: SPARK-31632
> URL: https://issues.apache.org/jira/browse/SPARK-31632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Xingcan Cui
>Assignee: Apache Spark
>Priority: Minor
>
> While starting some local tests, I occasionally encountered the following 
> exceptions for Web UI.
> {noformat}
> 23:00:29.845 WARN org.eclipse.jetty.server.HttpChannel: /jobs/
>  java.util.NoSuchElementException
>  at java.util.Collections$EmptyIterator.next(Collections.java:4191)
>  at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:467)
>  at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:39)
>  at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:266)
>  at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:89)
>  at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.eclipse.jetty.server.Server.handle(Server.java:505)
>  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>  at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748){noformat}
> *Reason*
>  That is because {{AppStatusStore.applicationInfo()}} accesses an empty view 
> (iterator) returned by {{InMemoryStore}}.
> AppStatusStore
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
> store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info
> }
> {code}
> InMemoryStore
> {code:java}
> public  KVStoreView view(Class type){
> InstanceList list = inMemoryLists.get(type);
> return list != null ? list.view() : emptyView();
>  }
> {code}
> During the initialization of {{SparkContext}}, it first starts the Web UI 
> (SparkContext: L475 _ui.foreach(_.bind())) and then setup the 
> {{LiveListenerBus}} thread (SparkContext: L608 
> {{setupAndStartListenerBus()}}) for dispatching the 
> {{SparkListenerApplicationStart}} event (which will trigger writing the 
> requested {{ApplicationInfo}} to {{InMemoryStore}}).
> *Solution*
>  Since the {{applicationInfo()}} method is expected to always return a valid 
> {{ApplicationInfo}}, maybe we can add a while-loop-check here to guarantee 
> the availability of {{ApplicationInfo}}.
> {code:java}
> def applicationInfo(): v1.ApplicationInfo = {
>  var iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator()
>  while (!iterator.hasNext){ 
> Thread.sleep(20) 
> iterator = store.view(classOf[ApplicationInfoWrapper]).max(1).iterator() 
>   }
>   iterator.next().info
> }
> {code}



--

[jira] [Commented] (SPARK-31374) Returning complex types in Pandas UDF

2020-05-03 Thread Oleksii Kachaiev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098609#comment-17098609
 ] 

Oleksii Kachaiev commented on SPARK-31374:
--

[~hoeze] do you have an example of the code of {{spark_window_overlap}} 
function?

> Returning complex types in Pandas UDF
> -
>
> Key: SPARK-31374
> URL: https://issues.apache.org/jira/browse/SPARK-31374
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: F. H.
>Priority: Major
>  Labels: features
>
> I would like to return a complex type in an GROUPED_AGG operation:
> {code:python}
> window_overlap_schema = t.StructType([
>  t.StructField("counts", t.ArrayType(t.LongType())),
>  t.StructField("starts", t.ArrayType(t.LongType())),
>  t.StructField("ends", t.ArrayType(t.LongType())),
> ])
> @f.pandas_udf(window_overlap_schema, f.PandasUDFType.GROUPED_AGG)
> def spark_window_overlap([...]):
> [...]
> {code}
> However, I get the following error when trying to run this:
> {code:python}
> NotImplementedError: Invalid returnType with grouped aggregate Pandas UDFs: 
> StructType(List(StructField(counts,ArrayType(LongType,true),true),StructField(starts,ArrayType(LongType,true),true),StructField(ends,ArrayType(LongType,true),true)))
>  is not supported
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098644#comment-17098644
 ] 

Apache Spark commented on SPARK-31212:
--

User 'tianshizz' has created a pull request for this issue:
https://github.com/apache/spark/pull/28445

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-05-03 Thread Rajeev Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098651#comment-17098651
 ] 

Rajeev Kumar commented on SPARK-26385:
--

Sorry for coming late on this thread. I was doing some testing. I found one 
issue in HadoopFSDelegationTokenProvider. I might be wrong also. Please 
validate below.

Spark does not renew the token rather it creates the token at the scheduled 
interval.

Spark needs two HDFS_DELEGATION_TOKEN. One for resource manager and second for 
application user.

 
{code:java}
val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds) 
// Get the token renewal interval if it is not set. It will only be called 
once. 
if (tokenRenewalInterval == null) { 
 tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, 
fsToGetTokens) 
}
{code}
At the first call to the obtainDelegationTokens it creates TWO tokens correctly.

Token for resource manager is getting created by method fetchDelegationTokens.

Token for application user is getting created inside getTokenRenewalInterval 
method.
{code:java}
// Code snippet
private var tokenRenewalInterval: Option[Long] = null
{code}
{code:java}
sparkConf.get(PRINCIPAL).flatMap { renewer => 
  val creds = new Credentials() 
  fetchDelegationTokens(renewer, filesystems, creds) 
{code}
But after 18 hours or whatever the renewal period when scheduled thread of 
AMCredentialRenewer tries to create HDFS_DELEFATION_TOKEN, it creates only one 
token (for resource manager as result of call to fetchDelegationTokens method ).

But it does not create HDFS_DELEFATION_TOKEN for application user because 
tokenRenewalInterval is NOT NULL this time. Hence after expiration of 
HDFS_DELEFATION_TOKEN (typically 24 hrs) spark fails to update the spark 
checkpointing directory and job dies.

As part of my testing, I just called getTokenRenewalInterval in else block and 
job is running fine. It did not die after 24 hrs.
{code:java}
if (tokenRenewalInterval == null) {
// I put this custom log
  logInfo("Token Renewal interval is null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  tokenRenewalInterval =
getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
} else {
// I put this custom log
  logInfo("Token Renewal interval is NOT null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
}
{code}
 Logs are -
{code:java}
20/05/01 14:36:19 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is null. Calling getTokenRenewalInterval rm/host:port
20/05/02 08:36:42 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 02:37:00 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 20:37:18 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
{code}
 

 

> YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in 
> cache
> ---
>
> Key: SPARK-26385
> URL: https://issues.apache.org/jira/browse/SPARK-26385
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
> Environment: Hadoop 2.6.0, Spark 2.4.0
>Reporter: T M
>Priority: Major
>
>  
> Hello,
>  
> I have Spark Structured Streaming job which is runnning on YARN(Hadoop 2.6.0, 
> Spark 2.4.0). After 25-26 hours, my job stops working with following error:
> {code:java}
> 2018-12-16 22:35:17 ERROR 
> org.apache.spark.internal.Logging$class.logError(Logging.scala:91): Query 
> TestQuery[id = a61ce197-1d1b-4e82-a7af-60162953488b, runId = 
> a56878cf-dfc7-4f6a-ad48-02cf738ccc2f] terminated with error 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for REMOVED: HDFS_DELEGATION_TOKEN owner=REMOVED, renewer=yarn, 
> realUser=, issueDate=1544903057122, maxDate=1545507857122, 
> sequenceNumber=10314, masterKeyId=344) can't be found in cache at 
> org.apache.hadoop.ipc.Client.call(Client.java:1470) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1401) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) a

[jira] [Created] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31633:
-

 Summary: Upgrade SLF4J from 1.7.16 to 1.7.30
 Key: SPARK-31633
 URL: https://issues.apache.org/jira/browse/SPARK-31633
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-05-03 Thread Rajeev Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098651#comment-17098651
 ] 

Rajeev Kumar edited comment on SPARK-26385 at 5/4/20, 4:44 AM:
---

Sorry for coming late on this thread. I was doing some testing. I found one 
issue in HadoopFSDelegationTokenProvider. I might be wrong also. Please 
validate below.

Spark does not renew the token rather it creates the token at the scheduled 
interval.

Spark needs two HDFS_DELEGATION_TOKEN. One for resource manager and second for 
application user.

 
{code:java}
val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds) 
// Get the token renewal interval if it is not set. It will only be called 
once. 
if (tokenRenewalInterval == null) { 
 tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, 
fsToGetTokens) 
}
{code}
At the first call to the obtainDelegationTokens it creates TWO tokens correctly.

Token for resource manager is getting created by method fetchDelegationTokens.

Token for application user is getting created inside getTokenRenewalInterval 
method.
{code:java}
// Code snippet
private var tokenRenewalInterval: Option[Long] = null
{code}
{code:java}
sparkConf.get(PRINCIPAL).flatMap { renewer => 
  val creds = new Credentials() 
  fetchDelegationTokens(renewer, filesystems, creds) 
{code}
But after 18 hours or whatever the renewal period when scheduled thread of 
AMCredentialRenewer tries to create HDFS_DELEFATION_TOKEN, it creates only one 
token (for resource manager as result of call to fetchDelegationTokens method ).

But it does not create HDFS_DELEFATION_TOKEN for application user because 
tokenRenewalInterval is NOT NULL this time. Hence after expiration of 
HDFS_DELEFATION_TOKEN (typically 24 hrs) spark fails to update the spark 
checkpointing directory and job dies.

As part of my testing, I just called getTokenRenewalInterval in else block and 
job is running fine. It did not die after 24 hrs.
{code:java}
if (tokenRenewalInterval == null) {
// I put this custom log
  logInfo("Token Renewal interval is null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  tokenRenewalInterval =
getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
} else {
// I put this custom log
  logInfo("Token Renewal interval is NOT null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
}
{code}
 Logs are -
{code:java}
20/05/01 14:36:19 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is null. Calling getTokenRenewalInterval rm/host:port
20/05/02 08:36:42 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 02:37:00 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 20:37:18 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
{code}
 

 [~kabhwan] Let me know if I need to create new ticket.


was (Author: rajeevkumar):
Sorry for coming late on this thread. I was doing some testing. I found one 
issue in HadoopFSDelegationTokenProvider. I might be wrong also. Please 
validate below.

Spark does not renew the token rather it creates the token at the scheduled 
interval.

Spark needs two HDFS_DELEGATION_TOKEN. One for resource manager and second for 
application user.

 
{code:java}
val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds) 
// Get the token renewal interval if it is not set. It will only be called 
once. 
if (tokenRenewalInterval == null) { 
 tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, 
fsToGetTokens) 
}
{code}
At the first call to the obtainDelegationTokens it creates TWO tokens correctly.

Token for resource manager is getting created by method fetchDelegationTokens.

Token for application user is getting created inside getTokenRenewalInterval 
method.
{code:java}
// Code snippet
private var tokenRenewalInterval: Option[Long] = null
{code}
{code:java}
sparkConf.get(PRINCIPAL).flatMap { renewer => 
  val creds = new Credentials() 
  fetchDelegationTokens(renewer, filesystems, creds) 
{code}
But after 18 hours or whatever the renewal period when scheduled thread of 
AMCredentialRenewer tries to create HDFS_DELEFATION_TOKEN, it creates only one 
token (for resource manager as result of call to fetchDelegationTokens method ).

But it does not create HDFS_DELEFATION_TOKEN for application user because 
tokenRenewalInterval is NOT NULL this time. Hence after expiration of 
HDFS_DELEFATION_TOKEN (typically 24 hrs) spark fails to update the spark 
checkpointing directory and job dies.

As part of my testing, I just called getTokenRenewalInterval i

[jira] [Commented] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098654#comment-17098654
 ] 

Apache Spark commented on SPARK-31633:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28446

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31633:


Assignee: Apache Spark

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31633:


Assignee: (was: Apache Spark)

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098653#comment-17098653
 ] 

Apache Spark commented on SPARK-31633:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28446

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31633:
--
Description: 
SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
under Java 9. Also, this will bring all latest bug fixes.
- http://www.slf4j.org/news.html

{quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
parse the "java.version" system property. Assuming an inccorect Java version, 
it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
shipping in this release fixes the issue by tweaking MDC internals by 
reflection, allowing log4j to run under Java 9. See also SLF4J-393.
Fixed issue EventRecodingLogger not saving marker data in the event. This issue 
was reported in SLF4J-379 by Manish Soni with Jonas Neukomm providing the 
relevant PR.
The slf4j-simple module now uses the latest reference to System.out or 
System.err. In previous releases the reference was set at the beginning and 
re-used. This change fixes SLF4J-389 reported by Igor Polevoy.{quote}

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
> under Java 9. Also, this will bring all latest bug fixes.
> - http://www.slf4j.org/news.html
> {quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
> parse the "java.version" system property. Assuming an inccorect Java version, 
> it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
> shipping in this release fixes the issue by tweaking MDC internals by 
> reflection, allowing log4j to run under Java 9. See also SLF4J-393.
> Fixed issue EventRecodingLogger not saving marker data in the event. This 
> issue was reported in SLF4J-379 by Manish Soni with Jonas Neukomm providing 
> the relevant PR.
> The slf4j-simple module now uses the latest reference to System.out or 
> System.err. In previous releases the reference was set at the beginning and 
> re-used. This change fixes SLF4J-389 reported by Igor Polevoy.{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31633:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28446)

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
> under Java 9. Also, this will bring all latest bug fixes.
> - http://www.slf4j.org/news.html
> {quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
> parse the "java.version" system property. Assuming an inccorect Java version, 
> it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
> shipping in this release fixes the issue by tweaking MDC internals by 
> reflection, allowing log4j to run under Java 9. See also SLF4J-393.
> Fixed issue EventRecodingLogger not saving marker data in the event. This 
> issue was reported in SLF4J-379 by Manish Soni with Jonas Neukomm providing 
> the relevant PR.
> The slf4j-simple module now uses the latest reference to System.out or 
> System.err. In previous releases the reference was set at the beginning and 
> re-used. This change fixes SLF4J-389 reported by Igor Polevoy.{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31634) "show tables like" support for SQL wildcard characters (% and _)

2020-05-03 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-31634:
---

 Summary: "show tables like" support for SQL wildcard characters (% 
and _)
 Key: SPARK-31634
 URL: https://issues.apache.org/jira/browse/SPARK-31634
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang


https://docs.snowflake.com/en/sql-reference/sql/show-tables.html
https://clickhouse.tech/docs/en/sql-reference/statements/show/
https://www.mysqltutorial.org/mysql-show-tables/
https://issues.apache.org/jira/browse/HIVE-23359



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31633:
--
Description: 
SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
under Java 9. Also, this will bring all latest bug fixes.
 - [http://www.slf4j.org/news.html]

{quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
parse the "java.version" system property. Assuming an inccorect Java version, 
it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
shipping in this release fixes the issue by tweaking MDC internals by 
reflection, allowing log4j to run under Java 9. See also SLF4J-393.
{quote}

  was:
SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
under Java 9. Also, this will bring all latest bug fixes.
- http://www.slf4j.org/news.html

{quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
parse the "java.version" system property. Assuming an inccorect Java version, 
it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
shipping in this release fixes the issue by tweaking MDC internals by 
reflection, allowing log4j to run under Java 9. See also SLF4J-393.
Fixed issue EventRecodingLogger not saving marker data in the event. This issue 
was reported in SLF4J-379 by Manish Soni with Jonas Neukomm providing the 
relevant PR.
The slf4j-simple module now uses the latest reference to System.out or 
System.err. In previous releases the reference was set at the beginning and 
re-used. This change fixes SLF4J-389 reported by Igor Polevoy.{quote}


> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
> under Java 9. Also, this will bring all latest bug fixes.
>  - [http://www.slf4j.org/news.html]
> {quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
> parse the "java.version" system property. Assuming an inccorect Java version, 
> it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
> shipping in this release fixes the issue by tweaking MDC internals by 
> reflection, allowing log4j to run under Java 9. See also SLF4J-393.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31633) Upgrade SLF4J from 1.7.16 to 1.7.30

2020-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31633:
--
Parent: SPARK-29194
Issue Type: Sub-task  (was: Bug)

> Upgrade SLF4J from 1.7.16 to 1.7.30
> ---
>
> Key: SPARK-31633
> URL: https://issues.apache.org/jira/browse/SPARK-31633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run 
> under Java 9. Also, this will bring all latest bug fixes.
>  - [http://www.slf4j.org/news.html]
> {quote}When running under Java 9, log4j version 1.2.x is unable to correctly 
> parse the "java.version" system property. Assuming an inccorect Java version, 
> it proceeded to disable its MDC functionality. The slf4j-log4j12 module 
> shipping in this release fixes the issue by tweaking MDC internals by 
> reflection, allowing log4j to run under Java 9. See also SLF4J-393.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31267) Flaky test: WholeStageCodegenSparkSubmitSuite.Generated code on driver should not embed platform-specific constant

2020-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31267.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/28438

> Flaky test: WholeStageCodegenSparkSubmitSuite.Generated code on driver should 
> not embed platform-specific constant
> --
>
> Key: SPARK-31267
> URL: https://issues.apache.org/jira/browse/SPARK-31267
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120363/testReport/
> {code}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> failAfter did not complete within 1 minute.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> failAfter did not complete within 1 minute.
>   at java.lang.Thread.getStackTrace(Thread.java:1559)
>   at 
> org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:234)
>   at 
> org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233)
>   at 
> org.apache.spark.deploy.SparkSubmitSuite$.failAfterImpl(SparkSubmitSuite.scala:1416)
>   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230)
>   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229)
>   at 
> org.apache.spark.deploy.SparkSubmitSuite$.failAfter(SparkSubmitSuite.scala:1416)
>   at 
> org.apache.spark.deploy.SparkSubmitSuite$.runSparkSubmit(SparkSubmitSuite.scala:1435)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.$anonfun$new$1(WholeStageCodegenSparkSubmitSuite.scala:53)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
>   at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
>   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
>   at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite.run(Suite.scala:1124)
>   at org.scalatest.Suite.run$(Suite.scala:1106)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
>   at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
>   at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.too

[jira] [Assigned] (SPARK-31626) Port HIVE-10415: hive.start.cleanup.scratchdir configuration is not taking effect

2020-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-31626:


Assignee: Yuming Wang

> Port HIVE-10415: hive.start.cleanup.scratchdir configuration is not taking 
> effect
> -
>
> Key: SPARK-31626
> URL: https://issues.apache.org/jira/browse/SPARK-31626
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-10415



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31626) Port HIVE-10415: hive.start.cleanup.scratchdir configuration is not taking effect

2020-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31626.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28436
[https://github.com/apache/spark/pull/28436]

> Port HIVE-10415: hive.start.cleanup.scratchdir configuration is not taking 
> effect
> -
>
> Key: SPARK-31626
> URL: https://issues.apache.org/jira/browse/SPARK-31626
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> https://issues.apache.org/jira/browse/HIVE-10415



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31606) reduce the perf regression of vectorized parquet reader caused by datetime rebase

2020-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31606.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28406
[https://github.com/apache/spark/pull/28406]

> reduce the perf regression of vectorized parquet reader caused by datetime 
> rebase
> -
>
> Key: SPARK-31606
> URL: https://issues.apache.org/jira/browse/SPARK-31606
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-05-03 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098723#comment-17098723
 ] 

Jungtaek Lim commented on SPARK-26385:
--

[~rajeevkumar]
Yes please raise a separate JIRA issue. Please make sure the new JIRA issue 
contains the information I described above.

https://issues.apache.org/jira/browse/SPARK-26385?focusedCommentId=17091075&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17091075

Please mention explicitly where the log messages come from (driver, AM, 
executor) whenever you refer the log message. Otherwise it's non-trivial to 
track down.

> YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in 
> cache
> ---
>
> Key: SPARK-26385
> URL: https://issues.apache.org/jira/browse/SPARK-26385
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
> Environment: Hadoop 2.6.0, Spark 2.4.0
>Reporter: T M
>Priority: Major
>
>  
> Hello,
>  
> I have Spark Structured Streaming job which is runnning on YARN(Hadoop 2.6.0, 
> Spark 2.4.0). After 25-26 hours, my job stops working with following error:
> {code:java}
> 2018-12-16 22:35:17 ERROR 
> org.apache.spark.internal.Logging$class.logError(Logging.scala:91): Query 
> TestQuery[id = a61ce197-1d1b-4e82-a7af-60162953488b, runId = 
> a56878cf-dfc7-4f6a-ad48-02cf738ccc2f] terminated with error 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for REMOVED: HDFS_DELEGATION_TOKEN owner=REMOVED, renewer=yarn, 
> realUser=, issueDate=1544903057122, maxDate=1545507857122, 
> sequenceNumber=10314, masterKeyId=344) can't be found in cache at 
> org.apache.hadoop.ipc.Client.call(Client.java:1470) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1401) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977) at 
> org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:133) at 
> org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1120) at 
> org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1116) at 
> org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at 
> org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1116) at 
> org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1581) at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:544)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:542)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:542)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:554)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:542)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.