[jira] [Commented] (SPARK-24035) SQL syntax for Pivot

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457316#comment-16457316
 ] 

Apache Spark commented on SPARK-24035:
--

User 'maryannxue' has created a pull request for this issue:
https://github.com/apache/spark/pull/21187

> SQL syntax for Pivot
> 
>
> Key: SPARK-24035
> URL: https://issues.apache.org/jira/browse/SPARK-24035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Maryann Xue
>Priority: Major
>
> Some users who are SQL experts but don’t know an ounce of Scala/Python or R. 
> Thus, we prefer to supporting the SQL syntax for Pivot too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24035) SQL syntax for Pivot

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24035:


Assignee: Maryann Xue  (was: Apache Spark)

> SQL syntax for Pivot
> 
>
> Key: SPARK-24035
> URL: https://issues.apache.org/jira/browse/SPARK-24035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Maryann Xue
>Priority: Major
>
> Some users who are SQL experts but don’t know an ounce of Scala/Python or R. 
> Thus, we prefer to supporting the SQL syntax for Pivot too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24035) SQL syntax for Pivot

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24035:


Assignee: Apache Spark  (was: Maryann Xue)

> SQL syntax for Pivot
> 
>
> Key: SPARK-24035
> URL: https://issues.apache.org/jira/browse/SPARK-24035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> Some users who are SQL experts but don’t know an ounce of Scala/Python or R. 
> Thus, we prefer to supporting the SQL syntax for Pivot too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24116) SparkSQL inserting overwrite table has inconsistent behavior regarding HDFS trash

2018-04-27 Thread Rui Li (JIRA)
Rui Li created SPARK-24116:
--

 Summary: SparkSQL inserting overwrite table has inconsistent 
behavior regarding HDFS trash
 Key: SPARK-24116
 URL: https://issues.apache.org/jira/browse/SPARK-24116
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Rui Li


When inserting overwrite a table, the old data may or may not go to trash based 
on:
 # Date format. E.g. text table may go to trash but parquet table doesn't.
 # Whether table is partitioned. E.g. partitioned text table doesn't go to 
trash while non-partitioned table does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23688) Refactor tests away from rate source

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23688:
---

Assignee: Jungtaek Lim

> Refactor tests away from rate source
> 
>
> Key: SPARK-23688
> URL: https://issues.apache.org/jira/browse/SPARK-23688
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.0
>
>
> Most continuous processing tests currently use a rate source, since that was 
> what was available at the time of implementation. This forces us to do a lot 
> of awkward things to work around the fact that the data in the sink is not 
> perfectly predictable. We should refactor to use a memory stream once it's 
> implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23688) Refactor tests away from rate source

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23688.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21152
[https://github.com/apache/spark/pull/21152]

> Refactor tests away from rate source
> 
>
> Key: SPARK-23688
> URL: https://issues.apache.org/jira/browse/SPARK-23688
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
> Fix For: 2.4.0
>
>
> Most continuous processing tests currently use a rate source, since that was 
> what was available at the time of implementation. This forces us to do a lot 
> of awkward things to work around the fact that the data in the sink is not 
> perfectly predictable. We should refactor to use a memory stream once it's 
> implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24115) improve instrumentation for spark.ml.tuning

2018-04-27 Thread yogesh garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457113#comment-16457113
 ] 

yogesh garg commented on SPARK-24115:
-

I would like to work on this.

> improve instrumentation for spark.ml.tuning
> ---
>
> Key: SPARK-24115
> URL: https://issues.apache.org/jira/browse/SPARK-24115
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24115) improve instrumentation for spark.ml.tuning

2018-04-27 Thread yogesh garg (JIRA)
yogesh garg created SPARK-24115:
---

 Summary: improve instrumentation for spark.ml.tuning
 Key: SPARK-24115
 URL: https://issues.apache.org/jira/browse/SPARK-24115
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 2.3.0
Reporter: yogesh garg






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24114) improve instrumentation for spark.ml.recommendation

2018-04-27 Thread yogesh garg (JIRA)
yogesh garg created SPARK-24114:
---

 Summary: improve instrumentation for spark.ml.recommendation
 Key: SPARK-24114
 URL: https://issues.apache.org/jira/browse/SPARK-24114
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 2.3.0
Reporter: yogesh garg






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24114) improve instrumentation for spark.ml.recommendation

2018-04-27 Thread yogesh garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457110#comment-16457110
 ] 

yogesh garg commented on SPARK-24114:
-

I would like to work on this.

> improve instrumentation for spark.ml.recommendation
> ---
>
> Key: SPARK-24114
> URL: https://issues.apache.org/jira/browse/SPARK-24114
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24112) Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457080#comment-16457080
 ] 

Apache Spark commented on SPARK-24112:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/21186

> Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility
> --
>
> Key: SPARK-24112
> URL: https://issues.apache.org/jira/browse/SPARK-24112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to not to surprise the previous Parquet Hive table users due 
> to behavior changes. They had Hive Parquet tables and all of them are 
> converted by default without table properties since Spark 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22279) Turn on spark.sql.hive.convertMetastoreOrc by default

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457079#comment-16457079
 ] 

Apache Spark commented on SPARK-22279:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/21186

> Turn on spark.sql.hive.convertMetastoreOrc by default
> -
>
> Key: SPARK-22279
> URL: https://issues.apache.org/jira/browse/SPARK-22279
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Like Parquet, this issue aims to turn on `spark.sql.hive.convertMetastoreOrc` 
> by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24112) Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24112:


Assignee: (was: Apache Spark)

> Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility
> --
>
> Key: SPARK-24112
> URL: https://issues.apache.org/jira/browse/SPARK-24112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to not to surprise the previous Parquet Hive table users due 
> to behavior changes. They had Hive Parquet tables and all of them are 
> converted by default without table properties since Spark 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24112) Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24112:


Assignee: Apache Spark

> Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility
> --
>
> Key: SPARK-24112
> URL: https://issues.apache.org/jira/browse/SPARK-24112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> This issue aims to not to surprise the previous Parquet Hive table users due 
> to behavior changes. They had Hive Parquet tables and all of them are 
> converted by default without table properties since Spark 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24104) SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them

2018-04-27 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-24104:
--

Assignee: Juliusz Sompolski

> SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of 
> updating them
> -
>
> Key: SPARK-24104
> URL: https://issues.apache.org/jira/browse/SPARK-24104
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> SqlAppStatusListener does 
> {code}
> exec.driverAccumUpdates = accumUpdates.toMap
> update(exec)
> {code}
> in onDriverAccumUpdates.
> But postDriverMetricUpdates is called multiple time per query, e.g. from each 
> FileSourceScanExec and BroadcastExchangeExec.
> If the update does not really update it in the KV store (depending on 
> liveUpdatePeriodNs), the previously posted metrics are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24104) SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them

2018-04-27 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-24104.

   Resolution: Fixed
Fix Version/s: 2.3.1
   2.4.0

Issue resolved by pull request 21171
[https://github.com/apache/spark/pull/21171]

> SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of 
> updating them
> -
>
> Key: SPARK-24104
> URL: https://issues.apache.org/jira/browse/SPARK-24104
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> SqlAppStatusListener does 
> {code}
> exec.driverAccumUpdates = accumUpdates.toMap
> update(exec)
> {code}
> in onDriverAccumUpdates.
> But postDriverMetricUpdates is called multiple time per query, e.g. from each 
> FileSourceScanExec and BroadcastExchangeExec.
> If the update does not really update it in the KV store (depending on 
> liveUpdatePeriodNs), the previously posted metrics are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457003#comment-16457003
 ] 

Imran Rashid commented on SPARK-23894:
--

I believe this issue has existed since SPARK-10810 / 
https://github.com/apache/spark/commit/3390b400d04e40f767d8a51f1078fcccb4e64abd 
though originally the SQLContext is what was in the InheritableThreadLocal

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23894:


Assignee: Apache Spark

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Assignee: Apache Spark
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23894:


Assignee: (was: Apache Spark)

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456984#comment-16456984
 ] 

Apache Spark commented on SPARK-23894:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/21185

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456969#comment-16456969
 ] 

Imran Rashid commented on SPARK-23894:
--

I think I understand what is happening here, but I don't know how to fix it.

Normally, there is no active spark session for the executor threads.  I added 
some debugging code to where an executor might call {{SQLConf.get}} to show the 
active session, and under my test runs, there isn't an active session:

{noformat}
12:49:35.801 dispatcher-event-loop-0 INFO Executor: Creating task runner thread 
with activeSession = None
...
getting conf, activeSession = None in Executor task launch worker for task 24
java.lang.Exception: getting conf in thread Executor task launch worker for 
task 23
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.conf(QueryPlan.scala:35)
at 
org.apache.spark.sql.execution.columnar.InMemoryTableScanExec.org$apache$spark$sql$execution$columnar$InMemoryTableScanExec$$createAndDecompressColumn(InMemoryTableScanExe
c.scala:84)
...
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
{noformat}

So how come sometimes its defined?  Note that activeSession is an *Inheritable* 
thread local.  Normally the executor threads are created before activeSession 
is defined, so they don't inherit anything.  But a threadpool is free to create 
more threads at any time.  And when they do, then suddenly the new executor 
threads will inherit the active session from their parent, a thread in the 
driver with the activeSession defined.

I'll submit a PR to defensively always clear the active session in the executor 
thread.

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SPARK-24112) Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

2018-04-27 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456927#comment-16456927
 ] 

Dongjoon Hyun commented on SPARK-24112:
---

I'll make a PR for this and SPARK-22279 together.

> Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility
> --
>
> Key: SPARK-24112
> URL: https://issues.apache.org/jira/browse/SPARK-24112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to not to surprise the previous Parquet Hive table users due 
> to behavior changes. They had Hive Parquet tables and all of them are 
> converted by default without table properties since Spark 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite

2018-04-27 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456919#comment-16456919
 ] 

Imran Rashid commented on SPARK-23894:
--

One thing I've noticed from looking at more instances of this is that normally, 
we don't see any log lines from {{SharedState}} from the executor threads.  
Normally we see this:

{noformat}
09:37:38.203 pool-1-thread-1-ScalaTest-running-ParquetQuerySuite INFO 
SharedState: Warehouse path is 
'file:/Users/irashid/github/pub/spark/sql/core/spark-warehouse/'.
{noformat}

but in failures, we see

{noformat}
23:37:56.728 Executor task launch worker for task 48 INFO SharedState: 
Warehouse path is 
'file:/home/jenkins/workspace/spark-branch-2.3-test-sbt-hadoop-2.6/sql/core/spark-warehouse'.
{noformat}

(notice the thread).  I don't understand why this happens yet.  Nor can I 
reproduce locally.

> Flaky Test:  BucketedWriteWithoutHiveSupportSuite
> -
>
> Key: SPARK-23894
> URL: https://issues.apache.org/jira/browse/SPARK-23894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Minor
> Attachments: unit-tests.log
>
>
> Flaky test observed here: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/
> I'll attach a snippet of the unit-tests logs, for this suite and the 
> preceeding one.  Here's a snippet of the exception.
> {noformat}
> 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: 
> Exception in task 0.0 in stage 402.0 (TID 436)
> java.lang.IllegalStateException: LiveListenerBus is stopped.
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
> at 
> org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
> at 
> org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
> at 
> org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
> at 
> org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
> at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
> at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
> at 
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
> {noformat}
> I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite.  I 
> think it has something more to do with {{SparkSession}} 's lazy evaluation of 
> {{SharedState}} doing something funny with the way we setup the test spark 
> context etc ... though I don't really understand it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24113) --archives hdfs://some/path.zip#newname renaming no longer works

2018-04-27 Thread Peter Parente (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456897#comment-16456897
 ] 

Peter Parente commented on SPARK-24113:
---

Thanks for the crosslink [~vanzin]. I searched but couldn't find the right 
keywords to turn up that issue.

> --archives hdfs://some/path.zip#newname renaming no longer works
> 
>
> Key: SPARK-24113
> URL: https://issues.apache.org/jira/browse/SPARK-24113
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Peter Parente
>Priority: Major
>
> In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
> of NAME in executor yarn containers pointing to the extracted archive. In 
> spark 2.3.0, the #NAME is no longer honored and the symlink is named after 
> basename of the archive file instead.
> For instance:
> {code:java}
> org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
> spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
> spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
> --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python 
> --conf spark.driver.extraClassPath=./resources/conf --conf 
> spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf 
> spark.sql.shuffle.partitions=2000 --conf 
> spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
> spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
> spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
> spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
> ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
> pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA 
> --executor-memory 8G --executor-cores 5 pyspark-shell{code}
> results in the following in executors containers in Spark 2.2.1 (which is 
> correct)
> {code:java}
> lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
> /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
> and results in the following in executor containers in Spark 2.3.0 (which 
> appears to be a regression)
> {code:java}
> lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
> /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24085) Scalar subquery error

2018-04-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-24085:

Fix Version/s: 2.4.0

> Scalar subquery error
> -
>
> Key: SPARK-24085
> URL: https://issues.apache.org/jira/browse/SPARK-24085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Alexey Baturin
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Error
> {noformat}
> SQL Error: java.lang.UnsupportedOperationException: Cannot evaluate 
> expression: scalar-subquery{noformat}
> Then query a partitioed table based on a parquet file then filter by 
> partition column by scalar subquery.
> Query to reproduce:
> {code:sql}
> CREATE TABLE test_prc_bug (
> id_value string
> )
> partitioned by (id_type string)
> location '/tmp/test_prc_bug'
> stored as parquet;
> insert into test_prc_bug values ('1','a');
> insert into test_prc_bug values ('2','a');
> insert into test_prc_bug values ('3','b');
> insert into test_prc_bug values ('4','b');
> select * from test_prc_bug
> where id_type = (select 'b');
> {code}
> If table in ORC format it works fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24085) Scalar subquery error

2018-04-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-24085.
-
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.3.1

> Scalar subquery error
> -
>
> Key: SPARK-24085
> URL: https://issues.apache.org/jira/browse/SPARK-24085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Alexey Baturin
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 2.3.1
>
>
> Error
> {noformat}
> SQL Error: java.lang.UnsupportedOperationException: Cannot evaluate 
> expression: scalar-subquery{noformat}
> Then query a partitioed table based on a parquet file then filter by 
> partition column by scalar subquery.
> Query to reproduce:
> {code:sql}
> CREATE TABLE test_prc_bug (
> id_value string
> )
> partitioned by (id_type string)
> location '/tmp/test_prc_bug'
> stored as parquet;
> insert into test_prc_bug values ('1','a');
> insert into test_prc_bug values ('2','a');
> insert into test_prc_bug values ('3','b');
> insert into test_prc_bug values ('4','b');
> select * from test_prc_bug
> where id_type = (select 'b');
> {code}
> If table in ORC format it works fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24113) --archives hdfs://some/path.zip#newname renaming no longer works

2018-04-27 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-24113.

Resolution: Duplicate

> --archives hdfs://some/path.zip#newname renaming no longer works
> 
>
> Key: SPARK-24113
> URL: https://issues.apache.org/jira/browse/SPARK-24113
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Peter Parente
>Priority: Major
>
> In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
> of NAME in executor yarn containers pointing to the extracted archive. In 
> spark 2.3.0, the #NAME is no longer honored and the symlink is named after 
> basename of the archive file instead.
> For instance:
> {code:java}
> org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
> spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
> spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
> --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python 
> --conf spark.driver.extraClassPath=./resources/conf --conf 
> spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf 
> spark.sql.shuffle.partitions=2000 --conf 
> spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
> spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
> spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
> spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
> ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
> pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA 
> --executor-memory 8G --executor-cores 5 pyspark-shell{code}
> results in the following in executors containers in Spark 2.2.1 (which is 
> correct)
> {code:java}
> lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
> /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
> and results in the following in executor containers in Spark 2.3.0 (which 
> appears to be a regression)
> {code:java}
> lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
> /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24113) --archives hdfs://some/path.zip#newname renaming no longer works

2018-04-27 Thread Peter Parente (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Parente updated SPARK-24113:
--
Description: 
In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
of NAME in executor yarn containers pointing to the extracted archive. In spark 
2.3.0, the #NAME is no longer honored and the symlink is named after basename 
of the archive file instead.

For instance:
{code:java}
org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
--conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf 
spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 
--conf spark.dynamicAllocation.maxExecutors=10 --conf 
spark.sql.shuffle.partitions=2000 --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA 
--executor-memory 8G --executor-cores 5 pyspark-shell{code}
results in the following in executors containers in Spark 2.2.1 (which is 
correct)
{code:java}
lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
/mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
and results in the following in executor containers in Spark 2.3.0 (which 
appears to be a regression)
{code:java}
lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
/mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
{code}

  was:
In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
of NAME in executor yarn containers pointing to the extracted archive. In spark 
2.3.0, the #NAME is no longer honored and the symlink is named after basename 
of the archive file instead.

For instance:
{code:java}
org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
--conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf 
spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 
--conf spark.dynamicAllocation.maxExecutors=10 --conf 
spark.sql.shuffle.partitions=2000 --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
p-ppare...@prod.maxpoint.mgt --archives 
hdfs:///some-path/my-custom-env.zip#CONDA --executor-memory 8G --executor-cores 
5 pyspark-shell{code}
results in the following in executors containers in Spark 2.2.1 (which is 
correct)
{code:java}
lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
/mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
and results in the following in executor containers in Spark 2.3.0 (which 
appears to be a regression)
{code:java}
lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
/mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
{code}


> --archives hdfs://some/path.zip#newname renaming no longer works
> 
>
> Key: SPARK-24113
> URL: https://issues.apache.org/jira/browse/SPARK-24113
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Peter Parente
>Priority: Major
>
> In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
> of NAME in executor yarn containers pointing to the extracted archive. In 
> spark 2.3.0, the #NAME is no longer honored and the symlink is named after 
> basename of the archive file instead.
> For instance:
> {code:java}
> org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
> spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
> spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
> --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python 
> --conf spark.driver.extraClassPath=./resources/conf --conf 
> spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf 
> spark.sql.shuffle.partitions=2000 --conf 
> spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
> spark.shuffle.service.enabled=True --conf 

[jira] [Created] (SPARK-24113) --archives hdfs://some/path.zip#newname renaming no longer works

2018-04-27 Thread Peter Parente (JIRA)
Peter Parente created SPARK-24113:
-

 Summary: --archives hdfs://some/path.zip#newname renaming no 
longer works
 Key: SPARK-24113
 URL: https://issues.apache.org/jira/browse/SPARK-24113
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.3.0
Reporter: Peter Parente


In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
of NAME in executor yarn containers pointing to the extracted archive. In spark 
2.3.0, the #NAME is no longer honored and the symlink is named after basename 
of the archive file instead.

For instance:
{code:java}
org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
--conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf 
spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 
--conf spark.dynamicAllocation.maxExecutors=10 --conf 
spark.sql.shuffle.partitions=2000 --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
p-ppare...@prod.maxpoint.mgt --archives 
hdfs:///some-path/my-custom-env.zip#CONDA --executor-memory 8G --executor-cores 
5 pyspark-shell{code}
results in the following in executors containers in Spark 2.2.1 (which is 
correct)
{code:java}
lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
/mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
and results in the following in executor containers in Spark 2.3.0 (which 
appears to be a regression)
{code:java}
lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
/mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24051) Incorrect results for certain queries using Java and Python APIs on Spark 2.3.0

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24051:


Assignee: Apache Spark

> Incorrect results for certain queries using Java and Python APIs on Spark 
> 2.3.0
> ---
>
> Key: SPARK-24051
> URL: https://issues.apache.org/jira/browse/SPARK-24051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Emlyn Corrin
>Assignee: Apache Spark
>Priority: Major
>
> I'm seeing Spark 2.3.0 return incorrect results for a certain (very specific) 
> query, demonstrated by the Java program below. It was simplified from a much 
> more complex query, but I'm having trouble simplifying it further without 
> removing the erroneous behaviour.
> {code:java}
> package sparktest;
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.expressions.Window;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import java.util.Arrays;
> public class Main {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf()
> .setAppName("SparkTest")
> .setMaster("local[*]");
> SparkSession session = 
> SparkSession.builder().config(conf).getOrCreate();
> Row[] arr1 = new Row[]{
> RowFactory.create(1, 42),
> RowFactory.create(2, 99)};
> StructType sch1 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty()),
> new StructField("b", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds1 = session.createDataFrame(Arrays.asList(arr1), sch1);
> ds1.show();
> Row[] arr2 = new Row[]{
> RowFactory.create(3)};
> StructType sch2 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds2 = session.createDataFrame(Arrays.asList(arr2), sch2)
> .withColumn("b", functions.lit(0));
> ds2.show();
> Column[] cols = new Column[]{
> new Column("a"),
> new Column("b").as("b"),
> functions.count(functions.lit(1))
> .over(Window.partitionBy())
> .as("n")};
> Dataset ds = ds1
> .select(cols)
> .union(ds2.select(cols))
> .where(new Column("n").geq(1))
> .drop("n");
> ds.show();
> //ds.explain(true);
> }
> }
> {code}
> It just calculates the union of 2 datasets,
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> +---+---+
> {code}
> with
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  3|  0|
> +---+---+
> {code}
> The expected result is:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> |  3|  0|
> +---+---+
> {code}
> but instead it prints:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1|  0|
> |  2|  0|
> |  3|  0|
> +---+---+
> {code}
> notice how the value in column c is always zero, overriding the original 
> values in rows 1 and 2.
>  Making seemingly trivial changes, such as replacing {{new 
> Column("b").as("b"),}} with just {{new Column("b"),}} or removing the 
> {{where}} clause after the union, make it behave correctly again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24051) Incorrect results for certain queries using Java and Python APIs on Spark 2.3.0

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24051:


Assignee: (was: Apache Spark)

> Incorrect results for certain queries using Java and Python APIs on Spark 
> 2.3.0
> ---
>
> Key: SPARK-24051
> URL: https://issues.apache.org/jira/browse/SPARK-24051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Emlyn Corrin
>Priority: Major
>
> I'm seeing Spark 2.3.0 return incorrect results for a certain (very specific) 
> query, demonstrated by the Java program below. It was simplified from a much 
> more complex query, but I'm having trouble simplifying it further without 
> removing the erroneous behaviour.
> {code:java}
> package sparktest;
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.expressions.Window;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import java.util.Arrays;
> public class Main {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf()
> .setAppName("SparkTest")
> .setMaster("local[*]");
> SparkSession session = 
> SparkSession.builder().config(conf).getOrCreate();
> Row[] arr1 = new Row[]{
> RowFactory.create(1, 42),
> RowFactory.create(2, 99)};
> StructType sch1 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty()),
> new StructField("b", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds1 = session.createDataFrame(Arrays.asList(arr1), sch1);
> ds1.show();
> Row[] arr2 = new Row[]{
> RowFactory.create(3)};
> StructType sch2 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds2 = session.createDataFrame(Arrays.asList(arr2), sch2)
> .withColumn("b", functions.lit(0));
> ds2.show();
> Column[] cols = new Column[]{
> new Column("a"),
> new Column("b").as("b"),
> functions.count(functions.lit(1))
> .over(Window.partitionBy())
> .as("n")};
> Dataset ds = ds1
> .select(cols)
> .union(ds2.select(cols))
> .where(new Column("n").geq(1))
> .drop("n");
> ds.show();
> //ds.explain(true);
> }
> }
> {code}
> It just calculates the union of 2 datasets,
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> +---+---+
> {code}
> with
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  3|  0|
> +---+---+
> {code}
> The expected result is:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> |  3|  0|
> +---+---+
> {code}
> but instead it prints:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1|  0|
> |  2|  0|
> |  3|  0|
> +---+---+
> {code}
> notice how the value in column c is always zero, overriding the original 
> values in rows 1 and 2.
>  Making seemingly trivial changes, such as replacing {{new 
> Column("b").as("b"),}} with just {{new Column("b"),}} or removing the 
> {{where}} clause after the union, make it behave correctly again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24051) Incorrect results for certain queries using Java and Python APIs on Spark 2.3.0

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456862#comment-16456862
 ] 

Apache Spark commented on SPARK-24051:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/21184

> Incorrect results for certain queries using Java and Python APIs on Spark 
> 2.3.0
> ---
>
> Key: SPARK-24051
> URL: https://issues.apache.org/jira/browse/SPARK-24051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Emlyn Corrin
>Priority: Major
>
> I'm seeing Spark 2.3.0 return incorrect results for a certain (very specific) 
> query, demonstrated by the Java program below. It was simplified from a much 
> more complex query, but I'm having trouble simplifying it further without 
> removing the erroneous behaviour.
> {code:java}
> package sparktest;
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.expressions.Window;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import java.util.Arrays;
> public class Main {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf()
> .setAppName("SparkTest")
> .setMaster("local[*]");
> SparkSession session = 
> SparkSession.builder().config(conf).getOrCreate();
> Row[] arr1 = new Row[]{
> RowFactory.create(1, 42),
> RowFactory.create(2, 99)};
> StructType sch1 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty()),
> new StructField("b", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds1 = session.createDataFrame(Arrays.asList(arr1), sch1);
> ds1.show();
> Row[] arr2 = new Row[]{
> RowFactory.create(3)};
> StructType sch2 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds2 = session.createDataFrame(Arrays.asList(arr2), sch2)
> .withColumn("b", functions.lit(0));
> ds2.show();
> Column[] cols = new Column[]{
> new Column("a"),
> new Column("b").as("b"),
> functions.count(functions.lit(1))
> .over(Window.partitionBy())
> .as("n")};
> Dataset ds = ds1
> .select(cols)
> .union(ds2.select(cols))
> .where(new Column("n").geq(1))
> .drop("n");
> ds.show();
> //ds.explain(true);
> }
> }
> {code}
> It just calculates the union of 2 datasets,
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> +---+---+
> {code}
> with
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  3|  0|
> +---+---+
> {code}
> The expected result is:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> |  3|  0|
> +---+---+
> {code}
> but instead it prints:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1|  0|
> |  2|  0|
> |  3|  0|
> +---+---+
> {code}
> notice how the value in column c is always zero, overriding the original 
> values in rows 1 and 2.
>  Making seemingly trivial changes, such as replacing {{new 
> Column("b").as("b"),}} with just {{new Column("b"),}} or removing the 
> {{where}} clause after the union, make it behave correctly again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24112) Add `spark.sql.hive.convertMetastoreTableProperty` for backward compatiblility

2018-04-27 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-24112:
-

 Summary: Add `spark.sql.hive.convertMetastoreTableProperty` for 
backward compatiblility
 Key: SPARK-24112
 URL: https://issues.apache.org/jira/browse/SPARK-24112
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Dongjoon Hyun


This issue aims to not to surprise the previous Parquet Hive table users due to 
behavior changes. They had Hive Parquet tables and all of them are converted by 
default without table properties since Spark 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22210) Online LDA variationalTopicInference should use random seed to have stable behavior

2018-04-27 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley reassigned SPARK-22210:
-

Assignee: Lu Wang

> Online LDA variationalTopicInference  should use random seed to have stable 
> behavior
> 
>
> Key: SPARK-22210
> URL: https://issues.apache.org/jira/browse/SPARK-22210
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Assignee: Lu Wang
>Priority: Minor
>
> https://github.com/apache/spark/blob/16fab6b0ef3dcb33f92df30e17680922ad5fb672/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L582
> Gamma distribution should use random seed to have consistent behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22210) Online LDA variationalTopicInference should use random seed to have stable behavior

2018-04-27 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-22210:
--
Shepherd: Joseph K. Bradley

> Online LDA variationalTopicInference  should use random seed to have stable 
> behavior
> 
>
> Key: SPARK-22210
> URL: https://issues.apache.org/jira/browse/SPARK-22210
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Priority: Minor
>
> https://github.com/apache/spark/blob/16fab6b0ef3dcb33f92df30e17680922ad5fb672/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L582
> Gamma distribution should use random seed to have consistent behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22210) Online LDA variationalTopicInference should use random seed to have stable behavior

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456756#comment-16456756
 ] 

Apache Spark commented on SPARK-22210:
--

User 'ludatabricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/21183

> Online LDA variationalTopicInference  should use random seed to have stable 
> behavior
> 
>
> Key: SPARK-22210
> URL: https://issues.apache.org/jira/browse/SPARK-22210
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Priority: Minor
>
> https://github.com/apache/spark/blob/16fab6b0ef3dcb33f92df30e17680922ad5fb672/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L582
> Gamma distribution should use random seed to have consistent behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22210) Online LDA variationalTopicInference should use random seed to have stable behavior

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22210:


Assignee: (was: Apache Spark)

> Online LDA variationalTopicInference  should use random seed to have stable 
> behavior
> 
>
> Key: SPARK-22210
> URL: https://issues.apache.org/jira/browse/SPARK-22210
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Priority: Minor
>
> https://github.com/apache/spark/blob/16fab6b0ef3dcb33f92df30e17680922ad5fb672/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L582
> Gamma distribution should use random seed to have consistent behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22210) Online LDA variationalTopicInference should use random seed to have stable behavior

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22210:


Assignee: Apache Spark

> Online LDA variationalTopicInference  should use random seed to have stable 
> behavior
> 
>
> Key: SPARK-22210
> URL: https://issues.apache.org/jira/browse/SPARK-22210
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/apache/spark/blob/16fab6b0ef3dcb33f92df30e17680922ad5fb672/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L582
> Gamma distribution should use random seed to have consistent behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24109) Remove class SnappyOutputStreamWrapper

2018-04-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-24109:
--
Target Version/s: 3.0.0
   Fix Version/s: (was: 2.4.0)

> Remove class SnappyOutputStreamWrapper
> --
>
> Key: SPARK-24109
> URL: https://issues.apache.org/jira/browse/SPARK-24109
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Input/Output
>Affects Versions: 2.2.0, 2.2.1, 2.3.0
>Reporter: wangjinhai
>Priority: Minor
>
> Wrapper over `SnappyOutputStream` which guards against write-after-close and 
> double-close
> issues. See SPARK-7660 for more details.
> This wrapping can be removed if we upgrade to a version
> of snappy-java that contains the fix for 
> [https://github.com/xerial/snappy-java/issues/107.]
> {{snappy-java:1.1.2+ fixed the bug}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24068) CSV schema inferring doesn't work for compressed files

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24068:


Assignee: (was: Apache Spark)

> CSV schema inferring doesn't work for compressed files
> --
>
> Key: SPARK-24068
> URL: https://issues.apache.org/jira/browse/SPARK-24068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Here is a simple csv file compressed by lzo
> {code}
> $ cat ./test.csv
> col1,col2
> a,1
> $ lzop ./test.csv
> $ ls
> test.csv test.csv.lzo
> {code}
> Reading test.csv.lzo with LZO codec (see 
> https://github.com/twitter/hadoop-lzo, for example):
> {code:scala}
> scala> val ds = spark.read.option("header", true).option("inferSchema", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("/Users/maximgekk/tmp/issue/test.csv.lzo")
> ds: org.apache.spark.sql.DataFrame = [�LZO?: string]
> scala> ds.printSchema
> root
>  |-- �LZO: string (nullable = true)
> scala> ds.show
> +-+
> |�LZO|
> +-+
> |a|
> +-+
> {code}
> but the file can be read if the schema is specified:
> {code}
> scala> import org.apache.spark.sql.types._
> scala> val schema = new StructType().add("col1", StringType).add("col2", 
> IntegerType)
> scala> val ds = spark.read.schema(schema).option("header", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("test.csv.lzo")
> scala> ds.show
> +++
> |col1|col2|
> +++
> |   a|   1|
> +++
> {code}
> Just in case, schema inferring works for the original uncompressed file:
> {code:scala}
> scala> spark.read.option("header", true).option("inferSchema", 
> true).csv("test.csv").printSchema
> root
>  |-- col1: string (nullable = true)
>  |-- col2: integer (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24068) CSV schema inferring doesn't work for compressed files

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456654#comment-16456654
 ] 

Apache Spark commented on SPARK-24068:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/21182

> CSV schema inferring doesn't work for compressed files
> --
>
> Key: SPARK-24068
> URL: https://issues.apache.org/jira/browse/SPARK-24068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Here is a simple csv file compressed by lzo
> {code}
> $ cat ./test.csv
> col1,col2
> a,1
> $ lzop ./test.csv
> $ ls
> test.csv test.csv.lzo
> {code}
> Reading test.csv.lzo with LZO codec (see 
> https://github.com/twitter/hadoop-lzo, for example):
> {code:scala}
> scala> val ds = spark.read.option("header", true).option("inferSchema", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("/Users/maximgekk/tmp/issue/test.csv.lzo")
> ds: org.apache.spark.sql.DataFrame = [�LZO?: string]
> scala> ds.printSchema
> root
>  |-- �LZO: string (nullable = true)
> scala> ds.show
> +-+
> |�LZO|
> +-+
> |a|
> +-+
> {code}
> but the file can be read if the schema is specified:
> {code}
> scala> import org.apache.spark.sql.types._
> scala> val schema = new StructType().add("col1", StringType).add("col2", 
> IntegerType)
> scala> val ds = spark.read.schema(schema).option("header", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("test.csv.lzo")
> scala> ds.show
> +++
> |col1|col2|
> +++
> |   a|   1|
> +++
> {code}
> Just in case, schema inferring works for the original uncompressed file:
> {code:scala}
> scala> spark.read.option("header", true).option("inferSchema", 
> true).csv("test.csv").printSchema
> root
>  |-- col1: string (nullable = true)
>  |-- col2: integer (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24068) CSV schema inferring doesn't work for compressed files

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24068:


Assignee: Apache Spark

> CSV schema inferring doesn't work for compressed files
> --
>
> Key: SPARK-24068
> URL: https://issues.apache.org/jira/browse/SPARK-24068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Here is a simple csv file compressed by lzo
> {code}
> $ cat ./test.csv
> col1,col2
> a,1
> $ lzop ./test.csv
> $ ls
> test.csv test.csv.lzo
> {code}
> Reading test.csv.lzo with LZO codec (see 
> https://github.com/twitter/hadoop-lzo, for example):
> {code:scala}
> scala> val ds = spark.read.option("header", true).option("inferSchema", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("/Users/maximgekk/tmp/issue/test.csv.lzo")
> ds: org.apache.spark.sql.DataFrame = [�LZO?: string]
> scala> ds.printSchema
> root
>  |-- �LZO: string (nullable = true)
> scala> ds.show
> +-+
> |�LZO|
> +-+
> |a|
> +-+
> {code}
> but the file can be read if the schema is specified:
> {code}
> scala> import org.apache.spark.sql.types._
> scala> val schema = new StructType().add("col1", StringType).add("col2", 
> IntegerType)
> scala> val ds = spark.read.schema(schema).option("header", 
> true).option("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec").csv("test.csv.lzo")
> scala> ds.show
> +++
> |col1|col2|
> +++
> |   a|   1|
> +++
> {code}
> Just in case, schema inferring works for the original uncompressed file:
> {code:scala}
> scala> spark.read.option("header", true).option("inferSchema", 
> true).csv("test.csv").printSchema
> root
>  |-- col1: string (nullable = true)
>  |-- col2: integer (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23736) High-order function: concat(array1, array2, ..., arrayN) → array

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456642#comment-16456642
 ] 

Apache Spark commented on SPARK-23736:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/21181

> High-order function: concat(array1, array2, ..., arrayN) → array
> 
>
> Key: SPARK-23736
> URL: https://issues.apache.org/jira/browse/SPARK-23736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Assignee: Marek Novotny
>Priority: Major
> Fix For: 2.4.0
>
>
> Extend the _concat_ function to also support array columns.
> Example:
> {{concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 
> 20, 30,100, 200] }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23565) Improved error message for when the number of sources for a query changes

2018-04-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu reassigned SPARK-23565:


Assignee: Patrick McGloin

> Improved error message for when the number of sources for a query changes
> -
>
> Key: SPARK-23565
> URL: https://issues.apache.org/jira/browse/SPARK-23565
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Patrick McGloin
>Assignee: Patrick McGloin
>Priority: Minor
> Fix For: 2.4.0
>
>
> If you change the number of sources for a Structured Streaming query then you 
> will get an assertion error as the number of sources in the checkpoint does 
> not match the number of sources in the query that is starting.  This can 
> happen if, for example, you add a union to the input of the query.  This is 
> of course correct but the error is a bit cryptic and requires investigation.
> Suggestion for a more informative error message =>
> The number of sources for this query has changed.  There are [x] sources in 
> the checkpoint offsets and now there are [y] sources requested by the query.  
> Cannot continue.
> This is the current message.
> 02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to 
> Kafka [id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = 
> d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error 
> java.lang.AssertionError: assertion failed at 
> scala.Predef$.assert(Predef.scala:156) at 
> org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23565) Improved error message for when the number of sources for a query changes

2018-04-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-23565.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 20946
[https://github.com/apache/spark/pull/20946]

> Improved error message for when the number of sources for a query changes
> -
>
> Key: SPARK-23565
> URL: https://issues.apache.org/jira/browse/SPARK-23565
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Patrick McGloin
>Priority: Minor
> Fix For: 2.4.0
>
>
> If you change the number of sources for a Structured Streaming query then you 
> will get an assertion error as the number of sources in the checkpoint does 
> not match the number of sources in the query that is starting.  This can 
> happen if, for example, you add a union to the input of the query.  This is 
> of course correct but the error is a bit cryptic and requires investigation.
> Suggestion for a more informative error message =>
> The number of sources for this query has changed.  There are [x] sources in 
> the checkpoint offsets and now there are [y] sources requested by the query.  
> Cannot continue.
> This is the current message.
> 02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to 
> Kafka [id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = 
> d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error 
> java.lang.AssertionError: assertion failed at 
> scala.Predef$.assert(Predef.scala:156) at 
> org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456551#comment-16456551
 ] 

Apache Spark commented on SPARK-22674:
--

User 'superbobry' has created a pull request for this issue:
https://github.com/apache/spark/pull/21180

> PySpark breaks serialization of namedtuple subclasses
> -
>
> Key: SPARK-22674
> URL: https://issues.apache.org/jira/browse/SPARK-22674
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Jonas Amrich
>Priority: Major
>
> Pyspark monkey patches the namedtuple class to make it serializable, however 
> this breaks serialization of its subclasses. With current implementation, any 
> subclass will be serialized (and deserialized) as it's parent namedtuple. 
> Consider this code, which will fail with {{AttributeError: 'Point' object has 
> no attribute 'sum'}}:
> {code}
> from collections import namedtuple
> Point = namedtuple("Point", "x y")
> class PointSubclass(Point):
> def sum(self):
> return self.x + self.y
> rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]])
> rdd.collect()[0][0].sum()
> {code}
> Moreover, as PySpark hijacks all namedtuples in the main module, importing 
> pyspark breaks serialization of namedtuple subclasses even in code which is 
> not related to spark / distributed execution. I don't see any clean solution 
> to this; a possible workaround may be to limit serialization hack only to 
> direct namedtuple subclasses like in 
> https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24092) spark.python.worker.reuse does not work?

2018-04-27 Thread David Figueroa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456464#comment-16456464
 ] 

David Figueroa commented on SPARK-24092:


still no answer from community 

[https://stackoverflow.com/questions/50043684/spark-python-worker-reuse-not-working-as-expected]

[http://apache-spark-user-list.1001560.n3.nabble.com/spark-python-worker-reuse-not-working-as-expected-td31976.html]

can anyone look at this problem? seems like a bug.

> spark.python.worker.reuse does not work?
> 
>
> Key: SPARK-24092
> URL: https://issues.apache.org/jira/browse/SPARK-24092
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: David Figueroa
>Priority: Minor
>
> {{spark.python.worker.reuse is true by default but even after explicitly 
> setting to true the code below does not print the same python worker process 
> ids.}}
> {code:java|title=procid.py|borderStyle=solid}
> def return_pid(_): yield os.getpid()
> spark = SparkSession.builder.getOrCreate()
> pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())
> print(pids)
> pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())
> print(pids){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23935) High-order function: map_entries(map<K, V>) → array<row<K,V>>

2018-04-27 Thread Marek Novotny (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456274#comment-16456274
 ] 

Marek Novotny commented on SPARK-23935:
---

I will work on this one. Thanks.

> High-order function: map_entries(map) → array>
> -
>
> Key: SPARK-23935
> URL: https://issues.apache.org/jira/browse/SPARK-23935
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns an array of all entries in the given map.
> {noformat}
> SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), 
> ROW(2, 'y')]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23934) High-order function: map_from_entries(array<row<K, V>>) → map<K,V>

2018-04-27 Thread Marek Novotny (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456272#comment-16456272
 ] 

Marek Novotny commented on SPARK-23934:
---

I will work on this one. Thanks.

> High-order function: map_from_entries(array>) → map
> --
>
> Key: SPARK-23934
> URL: https://issues.apache.org/jira/browse/SPARK-23934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns a map created from the given array of entries.
> {noformat}
> SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); -- {1 -> 'x', 2 -> 'y'}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24089) DataFrame.write().mode(SaveMode.Append).insertInto(TABLE)

2018-04-27 Thread Marco Gaido (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456230#comment-16456230
 ] 

Marco Gaido edited comment on SPARK-24089 at 4/27/18 11:01 AM:
---

[~rkrgarlapati] the problem is that you are not inserting to an already 
existing table, but your are inserting into a temp view. This operation is not 
allowed. This is not a bug but it is a misuse. Please close this as "Works as 
designed". Thanks.


was (Author: mgaido):
[~rkrgarlapati] the problem is that you are not inserting to an already 
existing table, but your are inserting into a temp view. This operation is not 
allowed.

> DataFrame.write().mode(SaveMode.Append).insertInto(TABLE) 
> --
>
> Key: SPARK-24089
> URL: https://issues.apache.org/jira/browse/SPARK-24089
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: kumar
>Priority: Major
>  Labels: bug
>
> I am completely stuck with this issue, unable to progress further. For more 
> info pls refer this post : 
> [https://stackoverflow.com/questions/49994085/spark-sql-2-3-dataframe-savemode-append-issue]
> I want to load multiple files one by one, don't want to load all files at a 
> time. To achieve this i used SaveMode.Append, so that 2nd file data will be 
> added to 1st file data in database, but it's throwing exception.
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved 
> operator 'InsertIntoTable LogicalRDD [a1#4, b1#5, c1#6, d1#7], false, false, 
> false;;
> 'InsertIntoTable LogicalRDD [a1#4, b1#5, c1#6, d1#7], false, false, false
> +- LogicalRDD [a1#22, b1#23, c1#24, d1#25], false
> {code}
> Code:
> {code:java}
> package com.log;
> import com.log.common.RegexMatch;
> import com.log.spark.SparkProcessor;
> import org.apache.spark.SparkContext;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.function.Function;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import org.apache.spark.storage.StorageLevel;
> import java.util.ArrayList;
> import java.util.List;
> public class TestApp {
> private SparkSession spark;
> private SparkContext sparkContext;
> private SQLContext sqlContext;
> public TestApp() {
> SparkSession spark = SparkSession.builder().appName("Simple 
> Application")
> .config("spark.master", "local").getOrCreate();
> SparkContext sc = spark.sparkContext();
> this.spark = spark;
> this.sparkContext = sc;
> }
> public static void main(String[] args) {
> TestApp app = new TestApp();
> String[] afiles = {"C:\\Users\\test\\Desktop\\logs\\log1.txt",
> "C:\\Users\\test\\Desktop\\logs\\log2.txt"};
> for (String file : afiles) {
> app.writeFileToSchema(file);
> }
> }
> public void writeFileToSchema(String filePath) {
> StructType schema = getSchema();
> JavaRDD rowRDD = getRowRDD(filePath);
> if (spark.catalog().tableExists("mylogs")) {
> logDataFrame = spark.createDataFrame(rowRDD, schema);
> logDataFrame.createOrReplaceTempView("temptable");
> 
> logDataFrame.write().mode(SaveMode.Append).insertInto("mylogs");//exception
> } else {
> logDataFrame = spark.createDataFrame(rowRDD, schema);
> logDataFrame.createOrReplaceTempView("mylogs");
> }
> Dataset results = spark.sql("SELECT count(b1) FROM mylogs");
> List allrows = results.collectAsList();
> System.out.println("Count:"+allrows);
> sqlContext = logDataFrame.sqlContext();
> }
> Dataset logDataFrame;
> public List getTagList() {
> Dataset results = sqlContext.sql("SELECT distinct(b1) FROM 
> mylogs");
> List allrows = results.collectAsList();
> return allrows;
> }
> public StructType getSchema() {
> String schemaString = "a1 b1 c1 d1";
> List fields = new ArrayList<>();
> for (String fieldName : schemaString.split(" ")) {
> StructField field = DataTypes.createStructField(fieldName, 
> DataTypes.StringType, true);
> fields.add(field);
> }
> StructType schema = DataTypes.createStructType(fields);
> return schema;
> }
> public JavaRDD getRowRDD(String filePath) {
> JavaRDD logRDD = sparkContext.textFile(filePath, 
> 1).toJavaRDD();
> RegexMatch reg = new RegexMatch();
> JavaRDD rowRDD = logRDD
> .map((Function) line -> {
> String[] 

[jira] [Commented] (SPARK-24089) DataFrame.write().mode(SaveMode.Append).insertInto(TABLE)

2018-04-27 Thread Marco Gaido (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456230#comment-16456230
 ] 

Marco Gaido commented on SPARK-24089:
-

[~rkrgarlapati] the problem is that you are not inserting to an already 
existing table, but your are inserting into a temp view. This operation is not 
allowed.

> DataFrame.write().mode(SaveMode.Append).insertInto(TABLE) 
> --
>
> Key: SPARK-24089
> URL: https://issues.apache.org/jira/browse/SPARK-24089
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: kumar
>Priority: Major
>  Labels: bug
>
> I am completely stuck with this issue, unable to progress further. For more 
> info pls refer this post : 
> [https://stackoverflow.com/questions/49994085/spark-sql-2-3-dataframe-savemode-append-issue]
> I want to load multiple files one by one, don't want to load all files at a 
> time. To achieve this i used SaveMode.Append, so that 2nd file data will be 
> added to 1st file data in database, but it's throwing exception.
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved 
> operator 'InsertIntoTable LogicalRDD [a1#4, b1#5, c1#6, d1#7], false, false, 
> false;;
> 'InsertIntoTable LogicalRDD [a1#4, b1#5, c1#6, d1#7], false, false, false
> +- LogicalRDD [a1#22, b1#23, c1#24, d1#25], false
> {code}
> Code:
> {code:java}
> package com.log;
> import com.log.common.RegexMatch;
> import com.log.spark.SparkProcessor;
> import org.apache.spark.SparkContext;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.function.Function;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import org.apache.spark.storage.StorageLevel;
> import java.util.ArrayList;
> import java.util.List;
> public class TestApp {
> private SparkSession spark;
> private SparkContext sparkContext;
> private SQLContext sqlContext;
> public TestApp() {
> SparkSession spark = SparkSession.builder().appName("Simple 
> Application")
> .config("spark.master", "local").getOrCreate();
> SparkContext sc = spark.sparkContext();
> this.spark = spark;
> this.sparkContext = sc;
> }
> public static void main(String[] args) {
> TestApp app = new TestApp();
> String[] afiles = {"C:\\Users\\test\\Desktop\\logs\\log1.txt",
> "C:\\Users\\test\\Desktop\\logs\\log2.txt"};
> for (String file : afiles) {
> app.writeFileToSchema(file);
> }
> }
> public void writeFileToSchema(String filePath) {
> StructType schema = getSchema();
> JavaRDD rowRDD = getRowRDD(filePath);
> if (spark.catalog().tableExists("mylogs")) {
> logDataFrame = spark.createDataFrame(rowRDD, schema);
> logDataFrame.createOrReplaceTempView("temptable");
> 
> logDataFrame.write().mode(SaveMode.Append).insertInto("mylogs");//exception
> } else {
> logDataFrame = spark.createDataFrame(rowRDD, schema);
> logDataFrame.createOrReplaceTempView("mylogs");
> }
> Dataset results = spark.sql("SELECT count(b1) FROM mylogs");
> List allrows = results.collectAsList();
> System.out.println("Count:"+allrows);
> sqlContext = logDataFrame.sqlContext();
> }
> Dataset logDataFrame;
> public List getTagList() {
> Dataset results = sqlContext.sql("SELECT distinct(b1) FROM 
> mylogs");
> List allrows = results.collectAsList();
> return allrows;
> }
> public StructType getSchema() {
> String schemaString = "a1 b1 c1 d1";
> List fields = new ArrayList<>();
> for (String fieldName : schemaString.split(" ")) {
> StructField field = DataTypes.createStructField(fieldName, 
> DataTypes.StringType, true);
> fields.add(field);
> }
> StructType schema = DataTypes.createStructType(fields);
> return schema;
> }
> public JavaRDD getRowRDD(String filePath) {
> JavaRDD logRDD = sparkContext.textFile(filePath, 
> 1).toJavaRDD();
> RegexMatch reg = new RegexMatch();
> JavaRDD rowRDD = logRDD
> .map((Function) line -> {
> String[] st = line.split(" ");
> return RowFactory.create(st[0], st[1], st[2], st[3]);
> });
> rowRDD.persist(StorageLevel.MEMORY_ONLY());
> return rowRDD;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-23897) Guava version

2018-04-27 Thread aze (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456143#comment-16456143
 ] 

aze commented on SPARK-23897:
-

So how are you going to deal with this: 
[https://www.cvedetails.com/cve/CVE-2018-10237/] ?

> Guava version
> -
>
> Key: SPARK-23897
> URL: https://issues.apache.org/jira/browse/SPARK-23897
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Minor
>
> Guava dependency version 14 is pretty old, needs to be updated to at least 
> 16, google cloud storage connector uses newer one which causes pretty popular 
> error with guava; "java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;"
>  and causes app to crash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Chandra Hasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandra Hasan closed SPARK-24086.
-

After adding necessary dependencies, its working fine

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 - Started 
> ServerConnector@6813a331{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 'SparkUI' on 
> port 4040.
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4f7c0be3{/jobs,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4cfbaf4{/jobs/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@58faa93b{/jobs/job,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@127d7908{/jobs/job/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6b9c69a9{/stages,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6622a690{/stages/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@30b9eadd{/stages/stage,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@3249a1ce{/stages/stage/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4dd94a58{/stages/pool,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@2f4919b0{/stages/pool/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@a8a8b75{/storage,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@75b21c3b{/storage/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@72be135f{/storage/rdd,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@155d1021{/storage/rdd/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> 

[jira] [Commented] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456127#comment-16456127
 ] 

Hyukjin Kwon commented on SPARK-24086:
--

Yup, providing details is helpful to the community. If someone faces the same 
issue, the details should be helpful. Thank you.
Not sure yet. I usually encourage people to open a JIRA when it looks quite 
clear it's an issue to let the contributors focus on issues not the questions 
here.

Otherwise, I would debug it by myself or ask this first to the mailing list and 
see if it's really an issue. In my experience, mailing list is pretty 
responsive when the question or issue is well descriptive.

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 - Started 
> ServerConnector@6813a331{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 'SparkUI' on 
> port 4040.
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4f7c0be3{/jobs,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4cfbaf4{/jobs/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@58faa93b{/jobs/job,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@127d7908{/jobs/job/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6b9c69a9{/stages,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6622a690{/stages/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@30b9eadd{/stages/stage,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@3249a1ce{/stages/stage/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4dd94a58{/stages/pool,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@2f4919b0{/stages/pool/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@a8a8b75{/storage,null,AVAILABLE,@Spark}
> 

[jira] [Assigned] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24110:


Assignee: Apache Spark

> Avoid calling UGI loginUserFromKeytab in ThriftServer
> -
>
> Key: SPARK-24110
> URL: https://issues.apache.org/jira/browse/SPARK-24110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Apache Spark
>Priority: Major
>
> Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
> This is unnecessary and will cause various potential problems, like Hadoop 
> IPC failure after 7 days, or RM failover issue and so on.
> So here we need to remove all the unnecessary login logics and make sure UGI 
> in the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456076#comment-16456076
 ] 

Apache Spark commented on SPARK-24110:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/21178

> Avoid calling UGI loginUserFromKeytab in ThriftServer
> -
>
> Key: SPARK-24110
> URL: https://issues.apache.org/jira/browse/SPARK-24110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Major
>
> Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
> This is unnecessary and will cause various potential problems, like Hadoop 
> IPC failure after 7 days, or RM failover issue and so on.
> So here we need to remove all the unnecessary login logics and make sure UGI 
> in the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24110:


Assignee: (was: Apache Spark)

> Avoid calling UGI loginUserFromKeytab in ThriftServer
> -
>
> Key: SPARK-24110
> URL: https://issues.apache.org/jira/browse/SPARK-24110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Major
>
> Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
> This is unnecessary and will cause various potential problems, like Hadoop 
> IPC failure after 7 days, or RM failover issue and so on.
> So here we need to remove all the unnecessary login logics and make sure UGI 
> in the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23830:
---

Assignee: Eric Maynard

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Assignee: Eric Maynard
>Priority: Trivial
> Fix For: 2.4.0
>
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23830.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21168
[https://github.com/apache/spark/pull/21168]

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Priority: Trivial
> Fix For: 2.4.0
>
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2018-04-27 Thread fengchaoge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengchaoge updated SPARK-21337:
---
Attachment: (was: t1.zip)

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: test.JPG, test1.JPG, test2.JPG
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2018-04-27 Thread fengchaoge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengchaoge updated SPARK-21337:
---
Attachment: t1.zip

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: test.JPG, test1.JPG, test2.JPG
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24111) Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24111:


Assignee: Apache Spark

> Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
> --
>
> Key: SPARK-24111
> URL: https://issues.apache.org/jira/browse/SPARK-24111
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24111) Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark

2018-04-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456033#comment-16456033
 ] 

Apache Spark commented on SPARK-24111:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/21177

> Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
> --
>
> Key: SPARK-24111
> URL: https://issues.apache.org/jira/browse/SPARK-24111
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24111) Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark

2018-04-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24111:


Assignee: (was: Apache Spark)

> Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
> --
>
> Key: SPARK-24111
> URL: https://issues.apache.org/jira/browse/SPARK-24111
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24111) Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark

2018-04-27 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-24111:


 Summary: Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
 Key: SPARK-24111
 URL: https://issues.apache.org/jira/browse/SPARK-24111
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Takeshi Yamamuro






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-04-27 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24110:
---

 Summary: Avoid calling UGI loginUserFromKeytab in ThriftServer
 Key: SPARK-24110
 URL: https://issues.apache.org/jira/browse/SPARK-24110
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Saisai Shao


Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
This is unnecessary and will cause various potential problems, like Hadoop IPC 
failure after 7 days, or RM failover issue and so on.

So here we need to remove all the unnecessary login logics and make sure UGI in 
the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Chandra Hasan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456003#comment-16456003
 ] 

Chandra Hasan edited comment on SPARK-24086 at 4/27/18 6:47 AM:


[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
 If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 


was (Author: hasan4791):
[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
 If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", 
org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 - 

[jira] [Comment Edited] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Chandra Hasan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456003#comment-16456003
 ] 

Chandra Hasan edited comment on SPARK-24086 at 4/27/18 6:47 AM:


[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
 If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below
{code:java}
kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 


was (Author: hasan4791):
[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
 If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 - 

[jira] [Comment Edited] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Chandra Hasan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456003#comment-16456003
 ] 

Chandra Hasan edited comment on SPARK-24086 at 4/27/18 6:46 AM:


[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
 If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", 
org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 


was (Author: hasan4791):
[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers); 
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer"); 
kafkaParams.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer"); 
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 

[jira] [Commented] (SPARK-24109) Remove class SnappyOutputStreamWrapper

2018-04-27 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456000#comment-16456000
 ] 

Takeshi Yamamuro commented on SPARK-24109:
--

IMO it'd be better to keep this ticket open because the wrapper can be remove 
in future, but not now. For related discussion, see: 
https://github.com/apache/spark/pull/18949#issuecomment-323354674

> Remove class SnappyOutputStreamWrapper
> --
>
> Key: SPARK-24109
> URL: https://issues.apache.org/jira/browse/SPARK-24109
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Input/Output
>Affects Versions: 2.2.0, 2.2.1, 2.3.0
>Reporter: wangjinhai
>Priority: Minor
> Fix For: 2.4.0
>
>
> Wrapper over `SnappyOutputStream` which guards against write-after-close and 
> double-close
> issues. See SPARK-7660 for more details.
> This wrapping can be removed if we upgrade to a version
> of snappy-java that contains the fix for 
> [https://github.com/xerial/snappy-java/issues/107.]
> {{snappy-java:1.1.2+ fixed the bug}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24086) Exception while executing spark streaming examples

2018-04-27 Thread Chandra Hasan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456003#comment-16456003
 ] 

Chandra Hasan commented on SPARK-24086:
---

[~hyukjin.kwon] Thanks mate, i included necessary dependencies while executing 
and its working now.
If someone is facing same issue here is the solution
{code:java}
spark-submit --jars 
kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar
 --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount 
target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar  
{code}
 

Also [~hyukjin.kwon] I would like to inform that the consumer properties 
mentioned in the example file JavaDirectKafkaWordCount example isn't updated 
which throws configuration missing error and i need to rewrite the code as below

{{}}
{code:java}
kafkaParams.put("bootstrap.servers", brokers); 
kafkaParams.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer"); 
kafkaParams.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer"); 
kafkaParams.put("group.id", "");{code}
 

{{What do you say, Is it fine or need to open a bug for this?}}

 

> Exception while executing spark streaming examples
> --
>
> Key: SPARK-24086
> URL: https://issues.apache.org/jira/browse/SPARK-24086
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 2.3.0
>Reporter: Chandra Hasan
>Priority: Major
>
> After running mvn clean package, I tried to execute one of the spark example 
> program JavaDirectKafkaWordCount.java but throws following exeception.
> {code:java}
> [cloud-user@server-2 examples]$ run-example 
> streaming.JavaDirectKafkaWordCount 192.168.0.4:9092 msu
> 2018-04-25 09:39:22 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-04-25 09:39:22 INFO SparkContext:54 - Running Spark version 2.3.0
> 2018-04-25 09:39:22 INFO SparkContext:54 - Submitted application: 
> JavaDirectKafkaWordCount
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls to: 
> cloud-user
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing view acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - Changing modify acls groups to:
> 2018-04-25 09:39:22 INFO SecurityManager:54 - SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(cloud-user); 
> groups with view permissions: Set(); users with modify permissions: 
> Set(cloud-user); groups with modify permissions: Set()
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 
> 'sparkDriver' on port 59333.
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering MapOutputTracker
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering BlockManagerMaster
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-04-25 09:39:23 INFO BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-04-25 09:39:23 INFO DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-6fc11fc1-f638-42ea-a9df-dc01fb81b7b6
> 2018-04-25 09:39:23 INFO MemoryStore:54 - MemoryStore started with capacity 
> 366.3 MB
> 2018-04-25 09:39:23 INFO SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-04-25 09:39:23 INFO log:192 - Logging initialized @1825ms
> 2018-04-25 09:39:23 INFO Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-04-25 09:39:23 INFO Server:414 - Started @1900ms
> 2018-04-25 09:39:23 INFO AbstractConnector:278 - Started 
> ServerConnector@6813a331{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
> 2018-04-25 09:39:23 INFO Utils:54 - Successfully started service 'SparkUI' on 
> port 4040.
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4f7c0be3{/jobs,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4cfbaf4{/jobs/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@58faa93b{/jobs/job,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@127d7908{/jobs/job/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6b9c69a9{/stages,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6622a690{/stages/json,null,AVAILABLE,@Spark}
> 2018-04-25 09:39:23 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@30b9eadd{/stages/stage,null,AVAILABLE,@Spark}
> 2018-04-25