[jira] [Updated] (SPARK-24352) Flaky test: StandaloneDynamicAllocationSuite

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-24352:

Issue Type: Test  (was: Bug)

> Flaky test: StandaloneDynamicAllocationSuite
> 
>
> Key: SPARK-24352
> URL: https://issues.apache.org/jira/browse/SPARK-24352
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> From jenkins:
> [https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/job/spark-branch-2.3-test-maven-hadoop-2.6/384/testReport/junit/org.apache.spark.deploy/StandaloneDynamicAllocationSuite/executor_registration_on_a_blacklisted_host_must_fail/]
>  
> {noformat}
> Error Message
> There is already an RpcEndpoint called CoarseGrainedScheduler
> Stacktrace
>   java.lang.IllegalArgumentException: There is already an RpcEndpoint 
> called CoarseGrainedScheduler
>   at 
> org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:71)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:130)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.createDriverEndpointRef(CoarseGrainedSchedulerBackend.scala:396)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:391)
>   at 
> org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:61)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:512)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495)
>   at 
> org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> {noformat}
> This actually looks like a previous test is leaving some stuff running and 
> making this one fail.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28535) Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28535:

Issue Type: Test  (was: Bug)

> Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"
> ---
>
> Key: SPARK-28535
> URL: https://issues.apache.org/jira/browse/SPARK-28535
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.3, 3.0.0, 2.4.3
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> This is the same flakiness as in SPARK-23881, except the fix there didn't 
> really take, at least on our build machines.
> {noformat}
> org.scalatest.exceptions.TestFailedException: 1 was not less than 1
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
> {noformat}
> Since that bug is short on explanations, the issue is that there's a race 
> between the thread posting the "stage completed" event to the listener which 
> unblocks the test, and the thread killing the task in the executor. If the 
> even arrives first, it will unblock task execution, and there's a chance that 
> all elements will actually be processed before the executor has a chance to 
> stop the task.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28418) Flaky Test: pyspark.sql.tests.test_dataframe: test_query_execution_listener_on_collect

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28418:

Issue Type: Test  (was: Bug)

> Flaky Test: pyspark.sql.tests.test_dataframe: 
> test_query_execution_listener_on_collect
> --
>
> Key: SPARK-28418
> URL: https://issues.apache.org/jira/browse/SPARK-28418
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> {code}
> ERROR [0.164s]: test_query_execution_listener_on_collect 
> (pyspark.sql.tests.test_dataframe.QueryExecutionListenerTests)
> --
> Traceback (most recent call last):
>   File "/home/jenkins/python/pyspark/sql/tests/test_dataframe.py", line 758, 
> in test_query_execution_listener_on_collect
> "The callback from the query execution listener should be called after 
> 'collect'")
> AssertionError: The callback from the query execution listener should be 
> called after 'collect'
> {code}
> Seems it can be failed due to not waiting events to be proceeded.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28335) Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28335:

Issue Type: Test  (was: Bug)

> Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset 
> recovery from kafka
> -
>
> Key: SPARK-28335
> URL: https://issues.apache.org/jira/browse/SPARK-28335
> Project: Spark
>  Issue Type: Test
>  Components: DStreams, Tests
>Affects Versions: 2.1.3, 2.2.3, 2.3.3, 3.0.0, 2.4.3
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
> Attachments: bad.log
>
>
> {code:java}
> org.scalatest.exceptions.TestFailedException: {} was empty
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply$mcV$sp(DirectKafkaStreamSuite.scala:466)
>   at 
> org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply(DirectKafkaStreamSuite.scala:416)
>   at 
> org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply(DirectKafkaStreamSuite.scala:416)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at or
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28357) Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling compressed

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28357:

Issue Type: Test  (was: Bug)

> Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling 
> compressed
> 
>
> Key: SPARK-28357
> URL: https://issues.apache.org/jira/browse/SPARK-28357
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107553/testReport/org.apache.spark.util/FileAppenderSuite/rolling_file_appender___size_based_rolling__compressed_/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24898) Adding spark.checkpoint.compress to the docs

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-24898:

Issue Type: Improvement  (was: Task)

> Adding spark.checkpoint.compress to the docs
> 
>
> Key: SPARK-24898
> URL: https://issues.apache.org/jira/browse/SPARK-24898
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.2.0
>Reporter: Riccardo Corbella
>Assignee: Sandeep
>Priority: Trivial
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> Parameter *spark.checkpoint.compress* is not listed under configuration 
> properties.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28261:

Issue Type: Test  (was: Bug)

> Flaky test: 
> org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
> ---
>
> Key: SPARK-28261
> URL: https://issues.apache.org/jira/browse/SPARK-28261
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 3.0.0, 2.4.3
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> Error message:
> {noformat}
> java.lang.AssertionError: expected:<3> but was:<4>
> ...{noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28247) Flaky test: "query without test harness" in ContinuousSuite

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28247:

Issue Type: Test  (was: Bug)

> Flaky test: "query without test harness" in ContinuousSuite
> ---
>
> Key: SPARK-28247
> URL: https://issues.apache.org/jira/browse/SPARK-28247
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> This test has failed a few times in some PRs, as well as easy to reproduce 
> locally. Example of a failure:
> {noformat}
>  [info] - query without test harness *** FAILED *** (2 seconds, 931 
> milliseconds)
> [info]   scala.Predef.Set.apply[Int](0, 1, 2, 
> 3).map[org.apache.spark.sql.Row, 
> scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
> org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
>  was false
> (ContinuousSuite.scala:226){noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28713) Bump checkstyle from 8.14 to 8.23

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28713:

Issue Type: Improvement  (was: Task)

> Bump checkstyle from 8.14 to 8.23
> -
>
> Key: SPARK-28713
> URL: https://issues.apache.org/jira/browse/SPARK-28713
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> From the GitHub Security Advisory Database:
> Moderate severity vulnerability that affects com.puppycrawl.tools:checkstyle
> Checkstyle prior to 8.18 loads external DTDs by default, which can 
> potentially lead to denial of service attacks or the leaking of confidential 
> information.
> Affected versions: < 8.18



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27596) The JDBC 'query' option doesn't work for Oracle database

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-27596:

Issue Type: Bug  (was: Improvement)

> The JDBC 'query' option doesn't work for Oracle database
> 
>
> Key: SPARK-27596
> URL: https://issues.apache.org/jira/browse/SPARK-27596
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: Xiao Li
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> For the JDBC option `query`, we use the identifier name to start with 
> underscore: s"(${subquery}) 
> __SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}". This is not 
> supported by Oracle. 
> The Oracle doesn't seem to support identifier name to start with non-alphabet 
> character (unless it is quoted) and has length restrictions as well.
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements008.htm
> {code:java}
> Nonquoted identifiers must begin with an alphabetic character from your 
> database character set. Quoted identifiers can begin with any character as 
> per below documentation - 
> Nonquoted identifiers can contain only alphanumeric characters from your 
> database character set and the underscore (_), dollar sign ($), and pound 
> sign (#). Database links can also contain periods (.) and "at" signs (@). 
> Oracle strongly discourages you from using $ and # in nonquoted identifiers.
> {code}
> The alias name '_SPARK_GEN_JDBC_SUBQUERY_NAME' should be fixed to 
> remove "__" prefix ( or make it quoted.not sure if it may impact other 
> sources) to make it work for Oracle. Also the length should be limited as it 
> is hitting below error on removing the prefix.
> {code:java}
> java.sql.SQLSyntaxErrorException: ORA-00972: identifier is too long 
> {code}
> It can be verified using below sqlfiddle link.
> http://www.sqlfiddle.com/#!4/9bbe9a/10050



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28642) Hide credentials in show create table

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28642:

Issue Type: Bug  (was: Improvement)

> Hide credentials in show create table
> -
>
> Key: SPARK-28642
> URL: https://issues.apache.org/jira/browse/SPARK-28642
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> {code:sql}
> spark-sql> show create table mysql_federated_sample;
> CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, 
> `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, 
> `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` 
> STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN)
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> `url` 'jdbc:mysql://localhost/hive?user=root&password=mypasswd',
> `driver` 'com.mysql.jdbc.Driver',
> `dbtable` 'TBLS'
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-23519:

Component/s: (was: Spark Core)

> Create View Commands Fails with  The view output (col1,col1) contains 
> duplicate column name
> ---
>
> Key: SPARK-23519
> URL: https://issues.apache.org/jira/browse/SPARK-23519
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Franck Tago
>Assignee: hemanth meka
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2018-05-10-10-48-57-259.png
>
>
> 1- create and populate a hive table  . I did this in a hive cli session .[ 
> not that this matters ]
> create table  atable (col1 int) ;
> insert  into atable values (10 ) , (100)  ;
> 2. create a view from the table.  
> [These actions were performed from a spark shell ]
> spark.sql("create view  default.aview  (int1 , int2 ) as select  col1 , col1 
> from atable ")
>  java.lang.AssertionError: assertion failed: The view output (col1,col1) 
> contains duplicate column name.
>  at scala.Predef$.assert(Predef.scala:170)
>  at 
> org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920302#comment-16920302
 ] 

Liang-Chi Hsieh commented on SPARK-23519:
-

This was closed and then reopened and fixed. The label 
[bulk-closed|https://issues.apache.org/jira/issues/?jql=labels+%3D+bulk-closed] 
looks not correct. I remove it. Feel free to add it back if I misunderstand it.

 

> Create View Commands Fails with  The view output (col1,col1) contains 
> duplicate column name
> ---
>
> Key: SPARK-23519
> URL: https://issues.apache.org/jira/browse/SPARK-23519
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.2.1
>Reporter: Franck Tago
>Assignee: hemanth meka
>Priority: Major
>  Labels: bulk-closed
> Fix For: 3.0.0
>
> Attachments: image-2018-05-10-10-48-57-259.png
>
>
> 1- create and populate a hive table  . I did this in a hive cli session .[ 
> not that this matters ]
> create table  atable (col1 int) ;
> insert  into atable values (10 ) , (100)  ;
> 2. create a view from the table.  
> [These actions were performed from a spark shell ]
> spark.sql("create view  default.aview  (int1 , int2 ) as select  col1 , col1 
> from atable ")
>  java.lang.AssertionError: assertion failed: The view output (col1,col1) 
> contains duplicate column name.
>  at scala.Predef$.assert(Predef.scala:170)
>  at 
> org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-23519:

Labels:   (was: bulk-closed)

> Create View Commands Fails with  The view output (col1,col1) contains 
> duplicate column name
> ---
>
> Key: SPARK-23519
> URL: https://issues.apache.org/jira/browse/SPARK-23519
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.2.1
>Reporter: Franck Tago
>Assignee: hemanth meka
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2018-05-10-10-48-57-259.png
>
>
> 1- create and populate a hive table  . I did this in a hive cli session .[ 
> not that this matters ]
> create table  atable (col1 int) ;
> insert  into atable values (10 ) , (100)  ;
> 2. create a view from the table.  
> [These actions were performed from a spark shell ]
> spark.sql("create view  default.aview  (int1 , int2 ) as select  col1 , col1 
> from atable ")
>  java.lang.AssertionError: assertion failed: The view output (col1,col1) 
> contains duplicate column name.
>  at scala.Predef$.assert(Predef.scala:170)
>  at 
> org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236)
>  at 
> org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28612) DataSourceV2: Add new DataFrameWriter API for v2

2019-08-31 Thread Burak Yavuz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-28612.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/25354]

> DataSourceV2: Add new DataFrameWriter API for v2
> 
>
> Key: SPARK-28612
> URL: https://issues.apache.org/jira/browse/SPARK-28612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> This tracks adding an API like the one proposed in SPARK-23521:
> {code:lang=scala}
> df.writeTo("catalog.db.table").append() // AppendData
> df.writeTo("catalog.db.table").overwriteDynamic() // 
> OverwritePartiitonsDynamic
> df.writeTo("catalog.db.table").overwrite($"date" === '2019-01-01') // 
> OverwriteByExpression
> df.writeTo("catalog.db.table").partitionBy($"type", $"date").create() // CTAS
> df.writeTo("catalog.db.table").replace() // RTAS
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28612) DataSourceV2: Add new DataFrameWriter API for v2

2019-08-31 Thread Burak Yavuz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz reassigned SPARK-28612:
---

Assignee: Ryan Blue

> DataSourceV2: Add new DataFrameWriter API for v2
> 
>
> Key: SPARK-28612
> URL: https://issues.apache.org/jira/browse/SPARK-28612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>
> This tracks adding an API like the one proposed in SPARK-23521:
> {code:lang=scala}
> df.writeTo("catalog.db.table").append() // AppendData
> df.writeTo("catalog.db.table").overwriteDynamic() // 
> OverwritePartiitonsDynamic
> df.writeTo("catalog.db.table").overwrite($"date" === '2019-01-01') // 
> OverwriteByExpression
> df.writeTo("catalog.db.table").partitionBy($"type", $"date").create() // CTAS
> df.writeTo("catalog.db.table").replace() // RTAS
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-31 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920234#comment-16920234
 ] 

Wing Yew Poon commented on SPARK-28770:
---

I looked into the issue further. In EventLoggingListener, almost all calls to 
logEvent (to write serialized JSON to the event log) are as a direct result of 
an onXXX method being called. The exception is that within onStageCompleted, 
before calling logEvent with the SparkListenerStageCompleted event, if we are 
logging stage executor metrics, there is a bulk call to logEvent with 
SparkListenerStageExecutorMetrics events via a Map.foreach. This Map.foreach 
bulk operation may not log the events with the same order. This is also the 
only place where SparkListenerStageExecutorMetrics events get logged.
For this reason, I think the affected tests ("End-to-end replay" and 
"End-to-end replay with compression", both implemented by calling 
testApplicationReplay) should not compare the SparkListenerStageExecutorMetrics 
events. That should eliminate the indeterminacy of the tests.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920227#comment-16920227
 ] 

Dongjoon Hyun commented on SPARK-28921:
---

BTW, [~andygrove]. I tried to add your PR to this, but it seems to be already 
there, doesn't it? 25640?

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920226#comment-16920226
 ] 

Dongjoon Hyun commented on SPARK-28921:
---

[~psschwei] and [~andygrove]. 
BTW, do you know how many clusters are exposed those versions in the 
productions? Maybe, at least, in EKS/AKS/GKE since they are popular managed 
services.

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28921:
--
Priority: Major  (was: Critical)

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28938) Kubernetes using unsupported docker image

2019-08-31 Thread Rodney Aaron Stainback (Jira)
Rodney Aaron Stainback created SPARK-28938:
--

 Summary: Kubernetes using unsupported docker image
 Key: SPARK-28938
 URL: https://issues.apache.org/jira/browse/SPARK-28938
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 2.4.3, 3.0.0
 Environment: Kubernetes
Reporter: Rodney Aaron Stainback


The current docker image used by Kubernetes
{code:java}
openjdk:8-alpine{code}
is not supported 

[https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links]

It was removed with this commit

[https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099]

Quote from commit "4. no more OpenJDK 8 Alpine images (Alpine/musl is not 
officially supported by the OpenJDK project, so this reflects that -- see 
"Project Portola" for the Alpine porting efforts which I understand are still 
in need of help)"

 

Please move to a supported image for Kubernetes



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28803) Document DESCRIBE TABLE in SQL Reference.

2019-08-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28803.
-
Fix Version/s: 3.0.0
 Assignee: Dilip Biswal
   Resolution: Fixed

> Document DESCRIBE TABLE in SQL Reference.
> -
>
> Key: SPARK-28803
> URL: https://issues.apache.org/jira/browse/SPARK-28803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms

2019-08-31 Thread Ruben Berenguel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920213#comment-16920213
 ] 

Ruben Berenguel commented on SPARK-25994:
-

Hi [~mju], I’ve had a series of unforeseen increases in “stuff” that are 
preventing me from doing much open source work. For now I’ll stay as an 
interested bystander, if I manage to find time I’ll step in. 

> SPIP: Property Graphs, Cypher Queries, and Algorithms
> -
>
> Key: SPARK-25994
> URL: https://issues.apache.org/jira/browse/SPARK-25994
> Project: Spark
>  Issue Type: Epic
>  Components: Graph
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Martin Junghanns
>Priority: Major
>  Labels: SPIP
>
> Copied from the SPIP doc:
> {quote}
> GraphX was one of the foundational pillars of the Spark project, and is the 
> current graph component. This reflects the importance of the graphs data 
> model, which naturally pairs with an important class of analytic function, 
> the network or graph algorithm. 
> However, GraphX is not actively maintained. It is based on RDDs, and cannot 
> exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala 
> users.
> GraphFrames is a Spark package, which implements DataFrame-based graph 
> algorithms, and also incorporates simple graph pattern matching with fixed 
> length patterns (called “motifs”). GraphFrames is based on DataFrames, but 
> has a semantically weak graph data model (based on untyped edges and 
> vertices). The motif pattern matching facility is very limited by comparison 
> with the well-established Cypher language. 
> The Property Graph data model has become quite widespread in recent years, 
> and is the primary focus of commercial graph data management and of graph 
> data research, both for on-premises and cloud data management. Many users of 
> transactional graph databases also wish to work with immutable graphs in 
> Spark.
> The idea is to define a Cypher-compatible Property Graph type based on 
> DataFrames; to replace GraphFrames querying with Cypher; to reimplement 
> GraphX/GraphFrames algos on the PropertyGraph type. 
> To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), 
> reusing existing proven designs and code, will be employed in Spark 3.0. This 
> graph query processor, like CAPS, will overlay and drive the SparkSQL 
> Catalyst query engine, using the CAPS graph query planner.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-31 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920212#comment-16920212
 ] 

Wing Yew Poon commented on SPARK-28770:
---

On my branch from which [https://github.com/apache/spark/pull/23767] was merged 
into master, I modified ReplayListenerSuite following 
[https://gist.github.com/dwickern/6ba9c5c505d2325d3737ace059302922], and ran 
"End-to-end replay with compression" 100 times. I encountered no failures. I 
ran this on my MacBook Pro.
 The instance of failure that Jungtaek cited appears to be due to a comparison 
of two SparkListenerStageExecutorMetrics events (one from the original, the 
other from the replay) failing. One event came from the driver and the other 
came from executor "1". SparkListenerStageExecutorMetrics events are logged at 
stage completion if spark.eventLog.logStageExecutorMetrics.enabled is set to 
true. The failure could be due to these events being in a different order in 
the replay than in the original. 
 In the commit that first introduced these events, in ReplayListenerSuite, 
there was some code to filter out these events in the testApplicationReplay 
method of ReplayListenerSuite. (The code was to filter out the events from the 
original, not from the replay, which I didn't understand.) Maybe we could 
filter out the SparkListenerStageExecutorMetrics events (from both original and 
replay) in testApplicationReplay (which is called by "End-to-end replay" and 
"End-to-end replay with compression"), to avoid this flakiness.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27907) HiveUDAF should return NULL in case of 0 rows

2019-08-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27907:
--
Labels: correctness  (was: )

> HiveUDAF should return NULL in case of 0 rows
> -
>
> Key: SPARK-27907
> URL: https://issues.apache.org/jira/browse/SPARK-27907
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> When query returns zero rows, the HiveUDAFFunction throws NPE
> CASE 1:
> create table abc(a int)
> select histogram_numeric(a,2) from abc // NPE
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost, executor 
> driver): java.lang.NullPointerException
>   at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:471)
>   at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:315)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.eval(interfaces.scala:543)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$5(AggregationIterator.scala:231)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:122)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1350)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> CASE 2:
> create table abc(a int)
> insert into abc values (1)
> select histogram_numeric(a,2) from abc where a=3 //NPE
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor 
> driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:477)
>   at 
> org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:315)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:570)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$6(AggregationIterator.scala:254)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at

[jira] [Updated] (SPARK-28871) Some codes in 'Policy for handling multiple watermarks' does not show friendly

2019-08-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28871:
--
Labels:   (was: documentation)

> Some codes in 'Policy for handling multiple watermarks' does not show 
> friendly 
> ---
>
> Key: SPARK-28871
> URL: https://issues.apache.org/jira/browse/SPARK-28871
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: chaiyongqiang
>Assignee: chaiyongqiang
>Priority: Minor
> Fix For: 2.4.4, 3.0.0
>
> Attachments: Policy_for_handling_multiple_watermarks.png
>
>
> The codes in the 'Policy for handling multiple watermarks' in 
> structured-streaming-programming-guide does not show friendly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28542) Document Stages page

2019-08-31 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28542:
--
Fix Version/s: 3.0.0

> Document Stages page
> 
>
> Key: SPARK-28542
> URL: https://issues.apache.org/jira/browse/SPARK-28542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28542) Document Stages page

2019-08-31 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28542:
-

Fix Version/s: (was: 3.0.0)
 Assignee: (was: Pablo Langa Blanco)
 Priority: Minor  (was: Major)

> Document Stages page
> 
>
> Key: SPARK-28542
> URL: https://issues.apache.org/jira/browse/SPARK-28542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28542) Document Stages page

2019-08-31 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28542.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25598
[https://github.com/apache/spark/pull/25598]

> Document Stages page
> 
>
> Key: SPARK-28542
> URL: https://issues.apache.org/jira/browse/SPARK-28542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Pablo Langa Blanco
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28542) Document Stages page

2019-08-31 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28542:
-

Assignee: Pablo Langa Blanco

> Document Stages page
> 
>
> Key: SPARK-28542
> URL: https://issues.apache.org/jira/browse/SPARK-28542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Pablo Langa Blanco
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28932) Maven install fails on JDK11

2019-08-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28932.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25638
[https://github.com/apache/spark/pull/25638]

> Maven install fails on JDK11
> 
>
> Key: SPARK-28932
> URL: https://issues.apache.org/jira/browse/SPARK-28932
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> {code}
> mvn clean install -pl common/network-common -DskipTests
> error: fatal error: object scala in compiler mirror not found.
> one error found
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

2019-08-31 Thread Marco Gaido (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920153#comment-16920153
 ] 

Marco Gaido commented on SPARK-28916:
-

I think the problem is related to subexpression elimination. I've not been able 
to confirm since for some reasons I am not able to disable it, even though I 
set the config to false, it is performed anyway. Maybe I am missing something 
there. Anyway, you may try and set 
{{spark.sql.subexpressionElimination.enabled}} to {{false}}. Meanwhile I am 
working on a fix. Thanks.

> Generated SpecificSafeProjection.apply method grows beyond 64 KB when use  
> SparkSQL
> ---
>
> Key: SPARK-28916
> URL: https://issues.apache.org/jira/browse/SPARK-28916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.3
>Reporter: MOBIN
>Priority: Major
>
> Can be reproduced by the following steps:
> 1. Create a table with 5000 fields
> 2. val data=spark.sql("select * from spark64kb limit 10");
> 3. data.describe()
> Then,The following error occurred
> {code:java}
> WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, 
> executor 1): org.codehaus.janino.InternalCompilerException: failed to 
> compile: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection"
>  grows beyond 64 KB
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44)
>   at 
> org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199)
>   at 
> org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(T

[jira] [Updated] (SPARK-28864) Add spark source connector for Aliyun Log Service

2019-08-31 Thread Ke Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Li updated SPARK-28864:
--
Description: 
Alibaba Log Service is a big data service which has been widely used in Alibaba 
Group and thousands of customers of Alibaba Cloud. The core storage engine of 
Log Service is named Loghub which is a large scale distributed storage system 
which provides producer and consumer to push and pull data like Kafka, AWS 
Kinesis and Azure Eventhub does. 

There are a lot of users of Log Service are using Spark Streaming, Spark SQL 
and Spark Structured Streaming to analysis data collected from both on premise 
and cloud data sources.

Happy to hear any comments.

  was:
Aliyun Log Service is a big data service which has been widely used in Alibaba 
Group and thousands of customers of Alibaba Cloud. The core storage engine of 
Log Service is named Loghub which is a large scale distributed storage system 
which provides producer and consumer to push and pull data like Kafka, AWS 
Kinesis and Azure Eventhub does. 

There are a lot of users of Log Service are using Spark Streaming, Spark SQL 
and Spark Structured Streaming to analysis data collected from both on premise 
and cloud data sources.

Happy to hear any comments.


> Add spark source connector for Aliyun Log Service
> -
>
> Key: SPARK-28864
> URL: https://issues.apache.org/jira/browse/SPARK-28864
> Project: Spark
>  Issue Type: New Feature
>  Components: Input/Output
>Affects Versions: 3.0.0
>Reporter: Ke Li
>Priority: Major
>
> Alibaba Log Service is a big data service which has been widely used in 
> Alibaba Group and thousands of customers of Alibaba Cloud. The core storage 
> engine of Log Service is named Loghub which is a large scale distributed 
> storage system which provides producer and consumer to push and pull data 
> like Kafka, AWS Kinesis and Azure Eventhub does. 
> There are a lot of users of Log Service are using Spark Streaming, Spark SQL 
> and Spark Structured Streaming to analysis data collected from both on 
> premise and cloud data sources.
> Happy to hear any comments.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28903) Fix AWS JDK version conflict that breaks Pyspark Kinesis tests

2019-08-31 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28903.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 25559
[https://github.com/apache/spark/pull/25559]

> Fix AWS JDK version conflict that breaks Pyspark Kinesis tests
> --
>
> Key: SPARK-28903
> URL: https://issues.apache.org/jira/browse/SPARK-28903
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> The Pyspark Kinesis tests are failing, at least in master:
> {code}
> ==
> ERROR: test_kinesis_stream 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py",
>  line 44, in test_kinesis_stream
> kinesisTestUtils = 
> self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 1554, in __call__
> answer, self._gateway_client, None, self._fqn)
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
> : java.lang.NoSuchMethodError: 
> com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
>   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
>   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
>   at scala.collection.Iterator.find(Iterator.scala:993)
>   at scala.collection.Iterator.find$(Iterator.scala:990)
>   at scala.collection.AbstractIterator.find(Iterator.scala:1429)
>   at scala.collection.IterableLike.find(IterableLike.scala:81)
>   at scala.collection.IterableLike.find$(IterableLike.scala:80)
>   at scala.collection.AbstractIterable.find(Iterable.scala:56)
>   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
>   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.(KinesisTestUtils.scala:46)
> ...
> {code}
> The non-Python Kinesis tests are fine though. It turns out that this is 
> because Pyspark tests use the output of the Spark assembly, and it pulls in 
> hadoop-cloud, which in turn pulls in an old AWS Java SDK.
> Per [~ste...@apache.org], it seems like we can just resolve this by excluding 
> the aws-java-sdk dependency. See the attached PR for some more detail about 
> the debugging and other options.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-31 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920133#comment-16920133
 ] 

Andy Grove commented on SPARK-28921:


Here's a PR to fix against master branch since it didn't automatically link to 
this JIRA: https://github.com/apache/spark/pull/25640

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Critical
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL

2019-08-31 Thread Marco Gaido (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920074#comment-16920074
 ] 

Marco Gaido commented on SPARK-28916:
-

Thanks for reporting this. I am checking it.

> Generated SpecificSafeProjection.apply method grows beyond 64 KB when use  
> SparkSQL
> ---
>
> Key: SPARK-28916
> URL: https://issues.apache.org/jira/browse/SPARK-28916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.3
>Reporter: MOBIN
>Priority: Major
>
> Can be reproduced by the following steps:
> 1. Create a table with 5000 fields
> 2. val data=spark.sql("select * from spark64kb limit 10");
> 3. data.describe()
> Then,The following error occurred
> {code:java}
> WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, 
> executor 1): org.codehaus.janino.InternalCompilerException: failed to 
> compile: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection"
>  grows beyond 64 KB
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44)
>   at 
> org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199)
>   at 
> org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86)
>   at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Code of method 
> "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "

[jira] [Commented] (SPARK-28934) Add `spark.sql.compatiblity.mode`

2019-08-31 Thread Marco Gaido (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920072#comment-16920072
 ] 

Marco Gaido commented on SPARK-28934:
-

Hi [~smilegator]! Thanks for opening this. I am wondering whether it may be 
worth to reopen SPARK-28610 and enable the option for the pgSQL compatibility 
mode. [~cloud_fan] what do you think?

> Add `spark.sql.compatiblity.mode`
> -
>
> Key: SPARK-28934
> URL: https://issues.apache.org/jira/browse/SPARK-28934
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> This issue aims to add `spark.sql.compatiblity.mode` whose values are `spark` 
> or `pgSQL` case-insensitively to control PostgreSQL compatibility features.
>  
> Apache Spark 3.0.0 can start with `spark.sql.parser.ansi.enabled=false` and 
> `spark.sql.compatiblity.mode=spark`.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28908) Structured Streaming Kafka sink support Exactly-Once semantics

2019-08-31 Thread wenxuanguan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wenxuanguan updated SPARK-28908:

Attachment: (was: Kafka Sink Exactly-Once Semantics Design Sketch.pdf)

> Structured Streaming Kafka sink support Exactly-Once semantics
> --
>
> Key: SPARK-28908
> URL: https://issues.apache.org/jira/browse/SPARK-28908
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: wenxuanguan
>Priority: Major
> Attachments: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf
>
>
> Since Apache Kafka supports transaction from 0.11.0.0, we can implement Kafka 
> sink exactly-once semantics with transaction Kafka producer



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28908) Structured Streaming Kafka sink support Exactly-Once semantics

2019-08-31 Thread wenxuanguan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wenxuanguan updated SPARK-28908:

Attachment: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf

> Structured Streaming Kafka sink support Exactly-Once semantics
> --
>
> Key: SPARK-28908
> URL: https://issues.apache.org/jira/browse/SPARK-28908
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: wenxuanguan
>Priority: Major
> Attachments: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf, 
> Kafka Sink Exactly-Once Semantics Design Sketch.pdf
>
>
> Since Apache Kafka supports transaction from 0.11.0.0, we can implement Kafka 
> sink exactly-once semantics with transaction Kafka producer



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org