[jira] [Comment Edited] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-14 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928674#comment-16928674
 ] 

feiwang edited comment on SPARK-29037 at 9/15/19 4:41 AM:
--

[~advancedxy]
Thanks for your reply.

I will learn more about dynamic partition. Thanks for your suggestion.


was (Author: hzfeiwang):
[~advancedxy]
Thanks for your reply.
I just checked the code, as shown below.
https://github.com/apache/spark/blob/c56a012bc839cd2f92c2be41faea91d1acfba4eb/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L105-L106

{code:java}
val dynamicPartitionOverwrite = enableDynamicOverwrite && mode == 
SaveMode.Overwrite &&
  staticPartitions.size < partitionColumns.length
{code}

When partitionColumns==1, for the operation of inserting overwrite table 
partition,  dynamicPartitionOverwrite is always false even DynamicOverwrite is 
enabled.
I will learn more about dynamic partition. Thanks for your suggestion.

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly,  when this task complete, it will save output to committedTaskPath, 
> when all tasks of this stage success, all task output under committedTaskPath 
> will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in committedTaskPath, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> committedTaskPath dir.
> And when the task commit stage of new application success, all task output 
> under this committedTaskPath, which contains parts of old application's task 
> output , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29080) Support R file extension case-insensitively

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29080:
--
Summary: Support R file extension case-insensitively  (was: Make r file 
extension check case insensitive)

> Support R file extension case-insensitively
> ---
>
> Key: SPARK-29080
> URL: https://issues.apache.org/jira/browse/SPARK-29080
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29087) Use DelegatingServletContextHandler to avoid CCE

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29087:
--
Component/s: Spark Core

> Use DelegatingServletContextHandler to avoid CCE
> 
>
> Key: SPARK-29087
> URL: https://issues.apache.org/jira/browse/SPARK-29087
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Structured Streaming
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-27122 fixes `ClassCastException` at `yarn` module by using 
> `DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, 
> but the class path issues affected in JDK8, too. This issue aims to fix 
> `streaming` module.
> {code}
> $ build/mvn test -pl streaming
> ...
> UISeleniumSuite:
> - attaching and detaching a Streaming tab *** FAILED ***
>   java.lang.ClassCastException: 
> org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
> ...
> Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0
> *** 1 TEST FAILED ***
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29087) Use DelegatingServletContextHandler to avoid CCE

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29087:
--
Component/s: (was: Structured Streaming)
 DStreams

> Use DelegatingServletContextHandler to avoid CCE
> 
>
> Key: SPARK-29087
> URL: https://issues.apache.org/jira/browse/SPARK-29087
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-27122 fixes `ClassCastException` at `yarn` module by using 
> `DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, 
> but the class path issues affected in JDK8, too. This issue aims to fix 
> `streaming` module.
> {code}
> $ build/mvn test -pl streaming
> ...
> UISeleniumSuite:
> - attaching and detaching a Streaming tab *** FAILED ***
>   java.lang.ClassCastException: 
> org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
> ...
> Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0
> *** 1 TEST FAILED ***
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29087) Use DelegatingServletContextHandler to avoid CCE

2019-09-14 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-29087:
-

 Summary: Use DelegatingServletContextHandler to avoid CCE
 Key: SPARK-29087
 URL: https://issues.apache.org/jira/browse/SPARK-29087
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.0, 3.0.0
Reporter: Dongjoon Hyun


SPARK-27122 fixes `ClassCastException` at `yarn` module by using 
`DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, 
but the class path issues affected in JDK8, too. This issue aims to fix 
`streaming` module.

{code}
$ build/mvn test -pl streaming
...
UISeleniumSuite:
- attaching and detaching a Streaming tab *** FAILED ***
  java.lang.ClassCastException: 
org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to 
org.eclipse.jetty.servlet.ServletContextHandler
...
Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0
*** 1 TEST FAILED ***
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26989:
--
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3
   2.4.4

> Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage 
> attempt don't trigger multiple stage retries
> ---
>
> Key: SPARK-26989
> URL: https://issues.apache.org/jira/browse/SPARK-26989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/
> {noformat}
> org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the 
> same stage attempt don't trigger multiple stage retries
> Error Message
> org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal 
> List(0)
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> ArrayBuffer() did not equal List(0)
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122)
> {noformat}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull
> {code}
> - Barrier task failures from the same stage attempt don't trigger multiple 
> stage retries *** FAILED ***
>   ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29045) Test failed due to table already exists in SQLMetricsSuite

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29045:
--
Affects Version/s: 2.3.0
   2.4.0

> Test failed due to table already exists in SQLMetricsSuite
> --
>
> Key: SPARK-29045
> URL: https://issues.apache.org/jira/browse/SPARK-29045
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Fix For: 3.0.0
>
>
> In method {{SQLMetricsTestUtils.testMetricsDynamicPartition()}}, there is a 
> CREATE TABLE sentence without {{withTable}} block. It causes test failure if 
> use same table name in other unit tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29045) Test failed due to table already exists in SQLMetricsSuite

2019-09-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929829#comment-16929829
 ] 

Dongjoon Hyun commented on SPARK-29045:
---

This is backported to `branch-2.4` via 
https://github.com/apache/spark/commit/339b0f2a0c4043fca9cca52797936c8654910fc9

> Test failed due to table already exists in SQLMetricsSuite
> --
>
> Key: SPARK-29045
> URL: https://issues.apache.org/jira/browse/SPARK-29045
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> In method {{SQLMetricsTestUtils.testMetricsDynamicPartition()}}, there is a 
> CREATE TABLE sentence without {{withTable}} block. It causes test failure if 
> use same table name in other unit tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29045) Test failed due to table already exists in SQLMetricsSuite

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29045:
--
Fix Version/s: 2.4.5

> Test failed due to table already exists in SQLMetricsSuite
> --
>
> Key: SPARK-29045
> URL: https://issues.apache.org/jira/browse/SPARK-29045
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> In method {{SQLMetricsTestUtils.testMetricsDynamicPartition()}}, there is a 
> CREATE TABLE sentence without {{withTable}} block. It causes test failure if 
> use same table name in other unit tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24663:
--
Fix Version/s: 2.4.5
Affects Version/s: 2.4.1
   2.4.2
   2.4.3
   2.4.4

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929824#comment-16929824
 ] 

Dongjoon Hyun commented on SPARK-24663:
---

This is backported to branch-2.4 via 
https://github.com/apache/spark/commit/637a6c2750be8d4f42b1fd11c4cca8d0067e80d8

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28372) Document Spark WEB UI

2019-09-14 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28372.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Document Spark WEB UI
> -
>
> Key: SPARK-28372
> URL: https://issues.apache.org/jira/browse/SPARK-28372
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>
> Spark web UIs are being used to monitor the status and resource consumption 
> of your Spark applications and clusters. However, we do not have the 
> corresponding document. It is hard for end users to use and understand them. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28373) Document JDBC/ODBC Server page

2019-09-14 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28373.
-
Fix Version/s: 3.0.0
 Assignee: Pablo Langa Blanco
   Resolution: Fixed

> Document JDBC/ODBC Server page
> --
>
> Key: SPARK-28373
> URL: https://issues.apache.org/jira/browse/SPARK-28373
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Pablo Langa Blanco
>Priority: Major
> Fix For: 3.0.0
>
>
> !https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png|width=1720,height=503!
>  
> [https://github.com/apache/spark/pull/25062] added a new column CLOSE TIME 
> and EXECUTION TIME. It is hard to understand the difference. We need to 
> document them; otherwise, it is hard for end users to understand them
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-14 Thread Liang-Chi Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh reassigned SPARK-28927:
---

Assignee: Liang-Chi Hsieh

> ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets 
> with 12 billion instances
> ---
>
> Key: SPARK-28927
> URL: https://issues.apache.org/jira/browse/SPARK-28927
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.2.1
>Reporter: Qiang Wang
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Attachments: image-2019-09-02-11-55-33-596.png
>
>
> The stack trace is below:
> {quote}19/08/28 07:00:40 WARN Executor task launch worker for task 325074 
> BlockManager: Block rdd_10916_493 could not be removed as it was not found on 
> disk or in memory 19/08/28 07:00:41 ERROR Executor task launch worker for 
> task 325074 Executor: Exception in task 3.0 in stage 347.1 (TID 325074) 
> java.lang.ArrayIndexOutOfBoundsException: 6741 at 
> org.apache.spark.dpshade.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1460)
>  at 
> org.apache.spark.dpshade.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1440)
>  at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$40$$anonfun$apply$41.apply(PairRDDFunctions.scala:760)
>  at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$40$$anonfun$apply$41.apply(PairRDDFunctions.scala:760)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1041)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1032)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:972) at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1032) 
> at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:763) 
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:285) at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:141)
>  at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:137)
>  at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>  at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:137) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at 
> org.apache.spark.scheduler.Task.run(Task.scala:108) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:358) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {quote}
> This exception happened sometimes.  And we also found that the AUC metric was 
> not stable when evaluating the inner product of the user factors and the item 
> factors with the same dataset and configuration. AUC varied from 0.60 to 0.67 
> which was not stable for production environment. 
> Dataset capacity: ~12 billion ratings
> Here is the our code:
> val trainData = predataUser.flatMap(x => x._1._2.map(y => (x._2.toInt, y._1, 
> y._2.toFloat)))
>   .setName(trainDataName).persist(StorageLevel.MEMORY_AND_DISK_SER)case class 
> ALSData(user:Int, item:Int, rating:Float) extends Serializable
> val ratingData = trainData.map(x => ALSData(x._1, x._2, x._3)).toDF()
> val als = new ALS
> val paramMap = 

[jira] [Updated] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29046:
--
Fix Version/s: (was: 3.0.0)

> Possible NPE on SQLConf.get when SparkContext is stopping in another thread
> ---
>
> Key: SPARK-29046
> URL: https://issues.apache.org/jira/browse/SPARK-29046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
>
> We encountered NPE in listener code which deals with query plan - and 
> according to the stack trace below, only possible case of NPE is 
> SparkContext._dagScheduler being null, which is only possible while stopping 
> SparkContext (unless null is set from outside).
>  
> {code:java}
> 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO 
> cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 
> 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down19/09/11 00:22:24 INFO 
> cluster.SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices(serviceOption=None, services=List(), 
> started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught 
> exception during parsing eventjava.lang.NullPointerException at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at 
> org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at 
> com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102)
>  at 
> com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71)
>  at 
> 

[jira] [Reopened] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread

2019-09-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-29046:
---

> Possible NPE on SQLConf.get when SparkContext is stopping in another thread
> ---
>
> Key: SPARK-29046
> URL: https://issues.apache.org/jira/browse/SPARK-29046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> We encountered NPE in listener code which deals with query plan - and 
> according to the stack trace below, only possible case of NPE is 
> SparkContext._dagScheduler being null, which is only possible while stopping 
> SparkContext (unless null is set from outside).
>  
> {code:java}
> 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO 
> cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 
> 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down19/09/11 00:22:24 INFO 
> cluster.SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices(serviceOption=None, services=List(), 
> started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught 
> exception during parsing eventjava.lang.NullPointerException at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at 
> org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at 
> com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102)
>  at 
> com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71)
>  at 
> 

[jira] [Commented] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread

2019-09-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929765#comment-16929765
 ] 

Dongjoon Hyun commented on SPARK-29046:
---

This is reverted in order to recover Jenkins jobs.

> Possible NPE on SQLConf.get when SparkContext is stopping in another thread
> ---
>
> Key: SPARK-29046
> URL: https://issues.apache.org/jira/browse/SPARK-29046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> We encountered NPE in listener code which deals with query plan - and 
> according to the stack trace below, only possible case of NPE is 
> SparkContext._dagScheduler being null, which is only possible while stopping 
> SparkContext (unless null is set from outside).
>  
> {code:java}
> 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO 
> cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 
> 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down19/09/11 00:22:24 INFO 
> cluster.SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices(serviceOption=None, services=List(), 
> started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught 
> exception during parsing eventjava.lang.NullPointerException at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at 
> org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at 
> com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102)
>  at 
> com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> 

[jira] [Updated] (SPARK-29086) Use added jar's class as Serde class, SparkGetColumnsOperation return empty columns

2019-09-14 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-29086:
--
Summary: Use added jar's class as Serde class, SparkGetColumnsOperation 
return empty columns  (was: In jdk11, SparkGetColumnsOperation return empty 
columns)

> Use added jar's class as Serde class, SparkGetColumnsOperation return empty 
> columns
> ---
>
> Key: SPARK-29086
> URL: https://issues.apache.org/jira/browse/SPARK-29086
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> In jdk11, 
> When we create a table with jar added by 'ADD JAR' sql. 
> When we restart again, we use !columns table_name, we get empty seq of column:
> {code:java}
> 0: jdbc:hive2://localhost:1/default> add jar 
> /Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar;
> INFO  : Added 
> [/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
>  to class path
> INFO  : Added resources: 
> [/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
> +-+
> | result  |
> +-+
> +-+
> No rows selected (0.268 seconds)
> 0: jdbc:hive2://localhost:1/default> CREATE TABLE addJar18(key string) 
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
> +-+
> | Result  |
> +-+
> +-+
> No rows selected (0.444 seconds)
> 0: jdbc:hive2://localhost:1/default> !columns addJar18
> ++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
> | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
> TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX 
>  | NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
> CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
> SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
> ++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
> | NULL   | default  | addjar18| key  | 12 | 
> STRING | NULL | NULL   | NULL| NULL   
>  | 1 |  | NULL| NULL   | NULL  | 
> NULL   | NULL  | YES  | NULL   | NULL 
>  | NULL | NULL  | NO |
> ++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
> 0: jdbc:hive2://localhost:1/default> exit
> {code}
> Then we restart Spark thrift server reconnect to it:
> {code:java}
> 0: jdbc:hive2://localhost:1/default> select * from addJar18;
> Error: Error running query: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe 
> (state=,code=0)
> 0: jdbc:hive2://localhost:1/default> !columns addJar18
> ++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
> | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
> TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX 
>  | NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
> CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
> SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
> 

[jira] [Updated] (SPARK-29086) In jdk11, SparkGetColumnsOperation return empty columns

2019-09-14 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-29086:
--
Description: 
In jdk11, 

When we create a table with jar added by 'ADD JAR' sql. 

When we restart again, we use !columns table_name, we get empty seq of column:
{code:java}
0: jdbc:hive2://localhost:1/default> add jar 
/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar;
INFO  : Added 
[/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
 to class path
INFO  : Added resources: 
[/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
+-+
| result  |
+-+
+-+
No rows selected (0.268 seconds)
0: jdbc:hive2://localhost:1/default> CREATE TABLE addJar18(key string) ROW 
FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
+-+
| Result  |
+-+
+-+
No rows selected (0.444 seconds)
0: jdbc:hive2://localhost:1/default> !columns addJar18
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  
| NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| NULL   | default  | addjar18| key  | 12 | STRING  
   | NULL | NULL   | NULL| NULL| 1  
   |  | NULL| NULL   | NULL  | NULL 
  | NULL  | YES  | NULL   | NULL  | 
NULL | NULL  | NO |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
0: jdbc:hive2://localhost:1/default> exit
{code}
Then we restart Spark thrift server reconnect to it:
{code:java}
0: jdbc:hive2://localhost:1/default> select * from addJar18;
Error: Error running query: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe 
(state=,code=0)
0: jdbc:hive2://localhost:1/default> !columns addJar18
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  
| NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
0: jdbc:hive2://localhost:1/default> add jar 

[jira] [Updated] (SPARK-29086) In jdk11, SparkGetColumnsOperation return empty columns

2019-09-14 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-29086:
--
Description: 
In jdk11, 

When we create a table with jar added by 'ADD JAR' sql. 

When we restart again, we use !columns table_name, we get empty seq of column:
{code:java}
0: jdbc:hive2://localhost:1/default> add jar 
/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar;
INFO  : Added 
[/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
 to class path
INFO  : Added resources: 
[/Users/angerszhu/.m2/repository/org/apache/hive/hcatalog/hive-hcatalog-core/2.3.6/hive-hcatalog-core-2.3.6.jar]
+-+
| result  |
+-+
+-+
No rows selected (0.268 seconds)
0: jdbc:hive2://localhost:1/default> CREATE TABLE addJar18(key string) ROW 
FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
+-+
| Result  |
+-+
+-+
No rows selected (0.444 seconds)
0: jdbc:hive2://localhost:1/default> !columns addJar18
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  
| NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| NULL   | default  | addjar18| key  | 12 | STRING  
   | NULL | NULL   | NULL| NULL| 1  
   |  | NULL| NULL   | NULL  | NULL 
  | NULL  | YES  | NULL   | NULL  | 
NULL | NULL  | NO |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
0: jdbc:hive2://localhost:1/default> exit
{code}
Then we restart Spark thrift server reconnect to it:
{code:java}
0: jdbc:hive2://localhost:1/default> select * from addJar18;
Error: Error running query: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe 
(state=,code=0)
0: jdbc:hive2://localhost:1/default> !columns addJar18
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | 
TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  
| NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | 
CHAR_OCTET_LENGTH  | ORDINAL_POSITION  | IS_NULLABLE  | SCOPE_CATALOG  | 
SCOPE_SCHEMA  | SCOPE_TABLE  | SOURCE_DATA_TYPE  | IS_AUTO_INCREMENT  |
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
++--+-+--+++--++-+-+---+--+-++---++---+--++---+--+---++
0: jdbc:hive2://localhost:1/default> add jar 

[jira] [Created] (SPARK-29086) In jdk11, SparkGetColumnsOperation return empty columns

2019-09-14 Thread angerszhu (Jira)
angerszhu created SPARK-29086:
-

 Summary: In jdk11, SparkGetColumnsOperation return empty columns
 Key: SPARK-29086
 URL: https://issues.apache.org/jira/browse/SPARK-29086
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29085) spark jdbc query to oracle "__SPARK_GEN_JDBC_SUBQUERY_NAME_" "__" not suport by ORACLE

2019-09-14 Thread leookok (Jira)
leookok created SPARK-29085:
---

 Summary: spark jdbc query to oracle 
"__SPARK_GEN_JDBC_SUBQUERY_NAME_" "__" not suport by ORACLE
 Key: SPARK-29085
 URL: https://issues.apache.org/jira/browse/SPARK-29085
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: leookok


when i use spark sql that jdbc connect to query data from oracle,

 
{code:java}
// code placeholder
Dataset jdbcDF = sqlContext.read()
.format("jdbc")
.option("url", "jdbc:oracle:thin:@192.168.2.3/orcltest11g")
.option("query", "select * from tdb.user u")
.option("user", "tdb")
.option("password", "tdb")
.load();
{code}
will throw error exception

 

 
{code:java}
// code placeholder
Exception in thread "main" java.sql.SQLSyntaxErrorException: ORA-00911: invalid 
character
{code}
 

in debug model 

 
{code:java}
// code placeholder
(select * from tdb.user u) __SPARK_GEN_JDBC_SUBQUERY_NAME_0
{code}
copy this sql to oracle client shell

will return same error <{color:#FF}__{color}SPARK_GEN_JDBC_SUBQUERY_NAME_0>

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org