[jira] [Assigned] (SPARK-13413) Remove SparkContext.metricsSystem

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reassigned SPARK-13413:
---

Assignee: Reynold Xin

> Remove SparkContext.metricsSystem
> -
>
> Key: SPARK-13413
> URL: https://issues.apache.org/jira/browse/SPARK-13413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkContext.metricsSystem returns MetricsSystem, which is a private class. I 
> think it was added by accident. We should remove it in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13413) Remove SparkContext.metricsSystem

2016-02-20 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13413:
---

 Summary: Remove SparkContext.metricsSystem
 Key: SPARK-13413
 URL: https://issues.apache.org/jira/browse/SPARK-13413
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin


SparkContext.metricsSystem returns MetricsSystem, which is a private class. I 
think it was added by accident. We should remove it in Spark 2.0.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13413) Remove SparkContext.metricsSystem

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13413:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove SparkContext.metricsSystem
> -
>
> Key: SPARK-13413
> URL: https://issues.apache.org/jira/browse/SPARK-13413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> SparkContext.metricsSystem returns MetricsSystem, which is a private class. I 
> think it was added by accident. We should remove it in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13413) Remove SparkContext.metricsSystem

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13413:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove SparkContext.metricsSystem
> -
>
> Key: SPARK-13413
> URL: https://issues.apache.org/jira/browse/SPARK-13413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkContext.metricsSystem returns MetricsSystem, which is a private class. I 
> think it was added by accident. We should remove it in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13413) Remove SparkContext.metricsSystem

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155475#comment-15155475
 ] 

Apache Spark commented on SPARK-13413:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11282

> Remove SparkContext.metricsSystem
> -
>
> Key: SPARK-13413
> URL: https://issues.apache.org/jira/browse/SPARK-13413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkContext.metricsSystem returns MetricsSystem, which is a private class. I 
> think it was added by accident. We should remove it in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13412) Spark Shell Ctrl-C behaviour suggestion

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13412.
---
Resolution: Duplicate

This is just a comment on the other JIRA; you shouldn't open a new one.

> Spark Shell Ctrl-C behaviour suggestion
> ---
>
> Key: SPARK-13412
> URL: https://issues.apache.org/jira/browse/SPARK-13412
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.0
>Reporter: Jon Maurer
>Priority: Minor
>
> It would be useful to catch the interrupt from a ctrl-c and prompt for 
> confirmation prior to closing spark shell. This is currently an issue when 
> sitting at an idle prompt. For example, if a user accidentally enters ctrl-c 
> then all previous progress is lost and must be run again. Instead, the 
> desired behavior would instead prompt the user to enter 'yes' or another 
> ctrl-c to exit the shell, thus preventing rework. 
> There is related discussion about this sort of feature on the Scala issue 
> tracker: https://issues.scala-lang.org/browse/SI-6302



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job

2016-02-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155479#comment-15155479
 ] 

Sean Owen commented on SPARK-10001:
---

(Let's keep the discussion here; forking a discussion into a new JIRA doesn't 
help anything)

> Allow Ctrl-C in spark-shell to kill running job
> ---
>
> Key: SPARK-10001
> URL: https://issues.apache.org/jira/browse/SPARK-10001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.4.1
>Reporter: Cheolsoo Park
>Priority: Minor
>
> Hitting Ctrl-C in spark-sql (and other tools like presto) cancels any running 
> job and starts a new input line on the prompt. It would be nice if 
> spark-shell also can do that. Otherwise, in case a user submits a job, say he 
> made a mistake, and wants to cancel it, he needs to exit the shell and 
> re-login to continue his work. Re-login can be a pain especially in Spark on 
> yarn, since it takes a while to allocate AM container and initial executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Timothy Chen (JIRA)
Timothy Chen created SPARK-13414:


 Summary: Add support for launching multiple Mesos dispatchers
 Key: SPARK-13414
 URL: https://issues.apache.org/jira/browse/SPARK-13414
 Project: Spark
  Issue Type: Improvement
Reporter: Timothy Chen


Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
one mesos dispatcher launched, but potentially users that like to run 
multi-tenant dispatcher might want to launch multiples. It also helps local 
development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13414:


Assignee: Apache Spark

> Add support for launching multiple Mesos dispatchers
> 
>
> Key: SPARK-13414
> URL: https://issues.apache.org/jira/browse/SPARK-13414
> Project: Spark
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
> one mesos dispatcher launched, but potentially users that like to run 
> multi-tenant dispatcher might want to launch multiples. It also helps local 
> development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155485#comment-15155485
 ] 

Apache Spark commented on SPARK-13414:
--

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11281

> Add support for launching multiple Mesos dispatchers
> 
>
> Key: SPARK-13414
> URL: https://issues.apache.org/jira/browse/SPARK-13414
> Project: Spark
>  Issue Type: Improvement
>Reporter: Timothy Chen
>
> Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
> one mesos dispatcher launched, but potentially users that like to run 
> multi-tenant dispatcher might want to launch multiples. It also helps local 
> development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13414:


Assignee: (was: Apache Spark)

> Add support for launching multiple Mesos dispatchers
> 
>
> Key: SPARK-13414
> URL: https://issues.apache.org/jira/browse/SPARK-13414
> Project: Spark
>  Issue Type: Improvement
>Reporter: Timothy Chen
>
> Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
> one mesos dispatcher launched, but potentially users that like to run 
> multi-tenant dispatcher might want to launch multiples. It also helps local 
> development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13415) Visualize subquery on SQL tab

2016-02-20 Thread Davies Liu (JIRA)
Davies Liu created SPARK-13415:
--

 Summary: Visualize subquery on SQL tab
 Key: SPARK-13415
 URL: https://issues.apache.org/jira/browse/SPARK-13415
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Davies Liu


Right now, uncorrelated scalar subqueries are not showed in SQL tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13415) Visualize subquery in SQL web UI

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13415:

Summary: Visualize subquery in SQL web UI  (was: Visualize subquery on SQL 
tab)

> Visualize subquery in SQL web UI
> 
>
> Key: SPARK-13415
> URL: https://issues.apache.org/jira/browse/SPARK-13415
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> Right now, uncorrelated scalar subqueries are not showed in SQL tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12540) Support all TPCDS queries

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-12540:

Description: As of Spark SQL 1.6, Spark can run 55 out of 99 TPCDS queries. 
The goal of this epic is to support running all the TPC-DS queries.  (was: 
Spark SQL 1.6 can run 55 out of 99 TPCDS queries, the goal is to support all of 
them)

> Support all TPCDS queries
> -
>
> Key: SPARK-12540
> URL: https://issues.apache.org/jira/browse/SPARK-12540
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> As of Spark SQL 1.6, Spark can run 55 out of 99 TPCDS queries. The goal of 
> this epic is to support running all the TPC-DS queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13302) Cleanup persistence Docstests in ml

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13302:
--
Assignee: holdenk

> Cleanup persistence Docstests in ml
> ---
>
> Key: SPARK-13302
> URL: https://issues.apache.org/jira/browse/SPARK-13302
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Some of the new doctests in ml/clustering.py have a lot of setup code, move 
> the setup code to the general test init to keep the doctest more 
> example-style looking.
> This is a follow up to https://github.com/apache/spark/pull/10999 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13302) Cleanup persistence Docstests in ml

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13302.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11197
[https://github.com/apache/spark/pull/11197]

> Cleanup persistence Docstests in ml
> ---
>
> Key: SPARK-13302
> URL: https://issues.apache.org/jira/browse/SPARK-13302
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: holdenk
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Some of the new doctests in ml/clustering.py have a lot of setup code, move 
> the setup code to the general test init to keep the doctest more 
> example-style looking.
> This is a follow up to https://github.com/apache/spark/pull/10999 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13375) PySpark API Utils missing item: kFold

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13375:
--
Target Version/s:   (was: 1.6.0)

> PySpark API Utils missing item: kFold
> -
>
> Key: SPARK-13375
> URL: https://issues.apache.org/jira/browse/SPARK-13375
> Project: Spark
>  Issue Type: Task
>  Components: MLlib, PySpark
>Affects Versions: 1.5.0
>Reporter: Bruno Wu
>Priority: Minor
>
> kFold function has not been implemented in MLUtils in Python API for MLlib 
> (pyspark.mllib.util as of 1.6.0)
> This JIRA ticket is opened to add this function into pyspark.mllib.util.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12567) Add aes_encrypt and aes_decrypt UDFs

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12567:
--
Fix Version/s: (was: 2.0.0)

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: SPARK-12567
> URL: https://issues.apache.org/jira/browse/SPARK-12567
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> AES (Advanced Encryption Standard) algorithm.
> Add aes_encrypt and aes_decrypt UDFs.
> Ref:
> [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions]
> [MySQL|https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-decrypt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13385) Enable AssociationRules to generate consequents with user-defined lengths

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13385:
--
Target Version/s:   (was: 1.6.0)
Priority: Minor  (was: Major)

> Enable AssociationRules to generate consequents with user-defined lengths
> -
>
> Key: SPARK-13385
> URL: https://issues.apache.org/jira/browse/SPARK-13385
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.6.0
>Reporter: zhengruifeng
>Priority: Minor
> Attachments: rule-generation.pdf
>
>
> AssociationRules should generates all association rules with user-defined 
> iterations, no just rules which have a single item as the consequent.
> Such as:
> 39 804 ==> 413 743 819 #SUP: 1023 #CONF: 0.70117
> 39 743 ==> 413 804 819 #SUP: 1023 #CONF: 0.93939
> 39 413 ==> 743 804 819 #SUP: 1023 #CONF: 0.6007
> 819 ==> 39 413 743 804 #SUP: 1023 #CONF: 0.15418
> 804 ==> 39 413 743 819 #SUP: 1023 #CONF: 0.12997
> 743 ==> 39 413 804 819 #SUP: 1023 #CONF: 0.7276
> 39 ==> 413 743 804 819 #SUP: 1023 #CONF: 0.12874
> ...
> I have implemented it based on Apriori's Rule-Generation Algorithm:
> https://github.com/zhengruifeng/spark-rules
> It's compatible with fpm's APIs.
> import org.apache.spark.mllib.fpm._
> val data = sc.textFile("hdfs://ns1/whale/T40I10D100K.dat")
> val transactions = data.map(s => s.trim.split(' ')).persist()
> val fpg = new FPGrowth().setMinSupport(0.01)
> val model = fpg.run(transactions)
> val ar = new AprioriRules().setMinConfidence(0.1).setMaxConsequent(15)
> val results = ar.run(model.freqItemsets)
> and it output rule-generation infomation like this:
> 15/11/04 11:28:46 INFO AprioriRules: Candidates for 1-consequent rules : 
> 312917
> 15/11/04 11:28:58 INFO AprioriRules: Generated 1-consequent rules : 306703
> 15/11/04 11:29:10 INFO AprioriRules: Candidates for 2-consequent rules : 
> 707747
> 15/11/04 11:29:35 INFO AprioriRules: Generated 2-consequent rules : 704000
> 15/11/04 11:29:55 INFO AprioriRules: Candidates for 3-consequent rules : 
> 1020253
> 15/11/04 11:30:38 INFO AprioriRules: Generated 3-consequent rules : 1014002
> 15/11/04 11:31:14 INFO AprioriRules: Candidates for 4-consequent rules : 
> 972225
> 15/11/04 11:32:00 INFO AprioriRules: Generated 4-consequent rules : 956483
> 15/11/04 11:32:44 INFO AprioriRules: Candidates for 5-consequent rules : 
> 653749
> 15/11/04 11:33:32 INFO AprioriRules: Generated 5-consequent rules : 626993
> 15/11/04 11:34:07 INFO AprioriRules: Candidates for 6-consequent rules : 
> 331038
> 15/11/04 11:34:50 INFO AprioriRules: Generated 6-consequent rules : 314455
> 15/11/04 11:35:10 INFO AprioriRules: Candidates for 7-consequent rules : 
> 138490
> 15/11/04 11:35:43 INFO AprioriRules: Generated 7-consequent rules : 136260
> 15/11/04 11:35:57 INFO AprioriRules: Candidates for 8-consequent rules : 48567
> 15/11/04 11:36:14 INFO AprioriRules: Generated 8-consequent rules : 47331
> 15/11/04 11:36:24 INFO AprioriRules: Candidates for 9-consequent rules : 12430
> 15/11/04 11:36:33 INFO AprioriRules: Generated 9-consequent rules : 11925
> 15/11/04 11:36:37 INFO AprioriRules: Candidates for 10-consequent rules : 2211
> 15/11/04 11:36:47 INFO AprioriRules: Generated 10-consequent rules : 2064
> 15/11/04 11:36:55 INFO AprioriRules: Candidates for 11-consequent rules : 246
> 15/11/04 11:36:58 INFO AprioriRules: Generated 11-consequent rules : 219
> 15/11/04 11:37:00 INFO AprioriRules: Candidates for 12-consequent rules : 13
> 15/11/04 11:37:03 INFO AprioriRules: Generated 12-consequent rules : 11
> 15/11/04 11:37:03 INFO AprioriRules: Candidates for 13-consequent rules : 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13410) unionAll AnalysisException with DataFrames containing UDT columns.

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13410:
--
Target Version/s:   (was: 1.6.0)

> unionAll AnalysisException with DataFrames containing UDT columns.
> --
>
> Key: SPARK-13410
> URL: https://issues.apache.org/jira/browse/SPARK-13410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Franklyn Dsouza
>  Labels: patch
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Unioning two DataFrames that contain UDTs fails with 
> {quote}
> AnalysisException: u"unresolved operator 'Union;"
> {quote}
> I tracked this down to this line 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L202
> Which compares datatypes between the output attributes of both logical plans. 
> However for UDTs this will be a new instance of the UserDefinedType or 
> PythonUserDefinedType 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158
>  
> So this equality check will check if the two instances are the same and since 
> they aren't references to a singleton this check fails. 
> *Note: this will work fine if you are unioning the dataframe with itself.*
> I have a proposed patch for this which overrides the equality operator on the 
> two classes here: https://github.com/apache/spark/pull/11279
> Reproduction steps
> {code}
> from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
> from pyspark.sql import types
> schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
> #note they need to be two separate dataframes
> a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
> b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
> c = a.unionAll(b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13399) Investigate type erasure warnings in CheckpointSuite

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13399:
--
Component/s: Tests

> Investigate type erasure warnings in CheckpointSuite
> 
>
> Key: SPARK-13399
> URL: https://issues.apache.org/jira/browse/SPARK-13399
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: holdenk
>Priority: Trivial
>
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:154:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn] dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn] ^
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:911:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn]   dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn]   ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13349) adding a split and union to a streaming application cause big performance hit

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen closed SPARK-13349.
-

> adding a split and union to a streaming application cause big performance hit
> -
>
> Key: SPARK-13349
> URL: https://issues.apache.org/jira/browse/SPARK-13349
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.1
>Reporter: krishna ramachandran
>Priority: Critical
>
> We have a streaming application containing approximately 12 jobs every batch, 
> running in streaming mode (4 sec batches). Each job writes output to cassandra
> each job can contain several stages.
> job 1
> ---> receive Stream A --> map --> filter -> (union with another stream B) --> 
> map --> groupbykey --> transform --> reducebykey --> map
> we go thro' few more jobs of transforms and save to database. 
> Around stage 5, we union the output of Dstream from job 1 (in red) with 
> another stream (generated by split during job 2) and save that state
> It appears the whole execution thus far is repeated which is redundant (I can 
> see this in execution graph & also performance -> processing time). 
> Processing time per batch nearly doubles or triples.
> This additional & redundant processing cause each batch to run as much as 2.5 
> times slower compared to runs without the union - union for most batches does 
> not alter the original DStream (union with an empty set). If I cache the 
> DStream from job 1(red block output), performance improves substantially but 
> hit out of memory errors within few hours.
> What is the recommended way to cache/unpersist in such a scenario? there is 
> no dstream level "unpersist"
> setting "spark.streaming.unpersist" to true and 
> streamingContext.remember("duration") did not help. Still seeing out of 
> memory errors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13414:
--
Component/s: Mesos

> Add support for launching multiple Mesos dispatchers
> 
>
> Key: SPARK-13414
> URL: https://issues.apache.org/jira/browse/SPARK-13414
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Timothy Chen
>
> Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
> one mesos dispatcher launched, but potentially users that like to run 
> multi-tenant dispatcher might want to launch multiples. It also helps local 
> development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13392) KafkaSink for Metrics

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13392:
--
Component/s: Streaming

> KafkaSink for Metrics
> -
>
> Key: SPARK-13392
> URL: https://issues.apache.org/jira/browse/SPARK-13392
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: UTKARSH BHATNAGAR
>Priority: Minor
>
> I would like to push metrics from Spark jobs to directly into a Kafka topic 
> via KafkaSink. Will write KafkaSink asap and submit a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13349) adding a split and union to a streaming application cause big performance hit

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13349.
---
Resolution: Invalid

No, this is not a bug.

> adding a split and union to a streaming application cause big performance hit
> -
>
> Key: SPARK-13349
> URL: https://issues.apache.org/jira/browse/SPARK-13349
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.1
>Reporter: krishna ramachandran
>Priority: Critical
>
> We have a streaming application containing approximately 12 jobs every batch, 
> running in streaming mode (4 sec batches). Each job writes output to cassandra
> each job can contain several stages.
> job 1
> ---> receive Stream A --> map --> filter -> (union with another stream B) --> 
> map --> groupbykey --> transform --> reducebykey --> map
> we go thro' few more jobs of transforms and save to database. 
> Around stage 5, we union the output of Dstream from job 1 (in red) with 
> another stream (generated by split during job 2) and save that state
> It appears the whole execution thus far is repeated which is redundant (I can 
> see this in execution graph & also performance -> processing time). 
> Processing time per batch nearly doubles or triples.
> This additional & redundant processing cause each batch to run as much as 2.5 
> times slower compared to runs without the union - union for most batches does 
> not alter the original DStream (union with an empty set). If I cache the 
> DStream from job 1(red block output), performance improves substantially but 
> hit out of memory errors within few hours.
> What is the recommended way to cache/unpersist in such a scenario? there is 
> no dstream level "unpersist"
> setting "spark.streaming.unpersist" to true and 
> streamingContext.remember("duration") did not help. Still seeing out of 
> memory errors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12594) Outer Join Elimination by Filter Condition

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12594:
--
Assignee: Xiao Li

> Outer Join Elimination by Filter Condition
> --
>
> Key: SPARK-12594
> URL: https://issues.apache.org/jira/browse/SPARK-12594
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.0.0
>
>
> Elimination of outer joins, if the predicates in the filter condition can 
> restrict the result sets so that all null-supplying rows are eliminated. 
> - full outer -> inner if both sides have such predicates
> - left outer -> inner if the right side has such predicates
> - right outer -> inner if the left side has such predicates
> - full outer -> left outer if only the left side has such predicates
> - full outer -> right outer if only the right side has such predicates
> If applicable, this can greatly improve the performance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12966) Postgres JDBC ArrayType(DecimalType) 'Unable to find server array type'

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12966:
--
Assignee: Brandon Bradley

> Postgres JDBC ArrayType(DecimalType) 'Unable to find server array type'
> ---
>
> Key: SPARK-12966
> URL: https://issues.apache.org/jira/browse/SPARK-12966
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Brandon Bradley
>Assignee: Brandon Bradley
> Fix For: 2.0.0
>
>
> Similar to SPARK-12747 but for DecimalType.
> Do we need to handle precision and scale?
> I've already starting trying to work on this. I cannot see if Postgres JDBC 
> driver handles precision and scale or just converts to default BigDecimal 
> precision and scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13261) Expose maxCharactersPerColumn as a user configurable option

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13261:
--
Assignee: Hossein Falaki

> Expose maxCharactersPerColumn as a user configurable option
> ---
>
> Key: SPARK-13261
> URL: https://issues.apache.org/jira/browse/SPARK-13261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Hossein Falaki
>Assignee: Hossein Falaki
> Fix For: 2.0.0
>
>
> We are using Univocity parser in the CSV data source in Spark. The parser has 
> a fairly small limit for maximum number of characters per column. Spark's CSV 
> data source updates it but it is not exposed to user. There are still use 
> cases where the limit is too small. I think we should just expose it as an 
> option. I suggest "maxCharsPerColumn" for the option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12969) Exception while casting a spark supported date formatted "string" to "date" data type.

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen closed SPARK-12969.
-

> Exception while  casting a spark supported date formatted "string" to "date" 
> data type.
> ---
>
> Key: SPARK-12969
> URL: https://issues.apache.org/jira/browse/SPARK-12969
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Spark Java 
>Reporter: Jais Sebastian
>
> Getting exception while  converting a string column( column is having spark 
> supported date format -MM-dd ) to date data type. Below is the code 
> snippet 
> List jsonData = Arrays.asList( 
> "{\"d\":\"2015-02-01\",\"n\":1}");
> JavaRDD dataRDD = 
> this.getSparkContext().parallelize(jsonData);
> DataFrame data = this.getSqlContext().read().json(dataRDD);
> DataFrame newData = data.select(data.col("d").cast("date"));
> newData.show();
> Above code will give the error
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> generated.java, Line 95, Column 28: Expression "scala.Option < Long > 
> longOpt16" is not an lvalue
> This happens only if we execute the program in client mode , it works if we 
> execute through spark submit. Here is the sample project : 
> https://github.com/uhonnavarkar/spark_test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12969) Exception while casting a spark supported date formatted "string" to "date" data type.

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12969.
---
Resolution: Not A Problem

I'm not clear what is being reported here, but it sounds like this occurs when 
somehow not using spark-submit. The error is not from a Scala compiler or Spark 
process. I don't think this is a Spark issue.

> Exception while  casting a spark supported date formatted "string" to "date" 
> data type.
> ---
>
> Key: SPARK-12969
> URL: https://issues.apache.org/jira/browse/SPARK-12969
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Spark Java 
>Reporter: Jais Sebastian
>
> Getting exception while  converting a string column( column is having spark 
> supported date format -MM-dd ) to date data type. Below is the code 
> snippet 
> List jsonData = Arrays.asList( 
> "{\"d\":\"2015-02-01\",\"n\":1}");
> JavaRDD dataRDD = 
> this.getSparkContext().parallelize(jsonData);
> DataFrame data = this.getSqlContext().read().json(dataRDD);
> DataFrame newData = data.select(data.col("d").cast("date"));
> newData.show();
> Above code will give the error
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> generated.java, Line 95, Column 28: Expression "scala.Option < Long > 
> longOpt16" is not an lvalue
> This happens only if we execute the program in client mode , it works if we 
> execute through spark submit. Here is the sample project : 
> https://github.com/uhonnavarkar/spark_test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-02-20 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155636#comment-15155636
 ] 

Xiao Li commented on SPARK-13393:
-

Can you output the plans? {{errorDF.explain(true)}} and 
{{correctDF.explain(true)}} Thanks!

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-02-20 Thread Varadharajan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155645#comment-15155645
 ] 

Varadharajan commented on SPARK-13393:
--

Hi, 

Here is the plan for errorDF:

{noformat}
scala> errorDF.explain(true)
== Parsed Logical Plan ==
Project [id#0,id#0 AS varadha_id#4]
+- Join LeftOuter, Some((id#0 = id#2))
   :- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:29
   +- Filter (id#2 = 1)
  +- LogicalRDD [id#2,name#3], MapPartitionsRDD[1] at rddToDataFrameHolder 
at :29

== Analyzed Logical Plan ==
id: int, varadha_id: int
Project [id#0,id#0 AS varadha_id#4]
+- Join LeftOuter, Some((id#0 = id#2))
   :- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:29
   +- Filter (id#2 = 1)
  +- LogicalRDD [id#2,name#3], MapPartitionsRDD[1] at rddToDataFrameHolder 
at :29

== Optimized Logical Plan ==
Project [id#0,id#0 AS varadha_id#4]
+- Join LeftOuter, Some((id#0 = id#2))
   :- Project [id#0]
   :  +- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder 
at :29
   +- Project [id#2]
  +- Filter (id#2 = 1)
 +- LogicalRDD [id#2,name#3], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :29

== Physical Plan ==
Project [id#0,id#0 AS varadha_id#4]
+- SortMergeOuterJoin [id#0], [id#2], LeftOuter, None
   :- Sort [id#0 ASC], false, 0
   :  +- TungstenExchange hashpartitioning(id#0,200), None
   : +- Project [id#0]
   :+- Scan ExistingRDD[id#0,name#1]
   +- Sort [id#2 ASC], false, 0
  +- TungstenExchange hashpartitioning(id#2,200), None
 +- Project [id#2]
+- Filter (id#2 = 1)
   +- Scan ExistingRDD[id#2,name#3]
{noformat}

And here is for correctDF

{noformat}
scala> correctDF.explain(true)
== Parsed Logical Plan ==
Project [id#0,n_id#5 AS nagaraj_id#6]
+- Join LeftOuter, Some((id#0 = n_id#5))
   :- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:29
   +- Project [id#0 AS n_id#5]
  +- Filter (id#0 = 2)
 +- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :29

== Analyzed Logical Plan ==
id: int, nagaraj_id: int
Project [id#0,n_id#5 AS nagaraj_id#6]
+- Join LeftOuter, Some((id#0 = n_id#5))
   :- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:29
   +- Project [id#0 AS n_id#5]
  +- Filter (id#0 = 2)
 +- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :29

== Optimized Logical Plan ==
Project [id#0,n_id#5 AS nagaraj_id#6]
+- Join LeftOuter, Some((id#0 = n_id#5))
   :- Project [id#0]
   :  +- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at rddToDataFrameHolder 
at :29
   +- Project [id#0 AS n_id#5]
  +- Filter (id#0 = 2)
 +- LogicalRDD [id#0,name#1], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :29

== Physical Plan ==
Project [id#0,n_id#5 AS nagaraj_id#6]
+- SortMergeOuterJoin [id#0], [n_id#5], LeftOuter, None
   :- Sort [id#0 ASC], false, 0
   :  +- TungstenExchange hashpartitioning(id#0,200), None
   : +- Project [id#0]
   :+- Scan ExistingRDD[id#0,name#1]
   +- Sort [n_id#5 ASC], false, 0
  +- TungstenExchange hashpartitioning(n_id#5,200), None
 +- Project [id#0 AS n_id#5]
+- Filter (id#0 = 2)
   +- Scan ExistingRDD[id#0,name#1]
{noformat}

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155666#comment-15155666
 ] 

Apache Spark commented on SPARK-12720:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/11283

> SQL generation support for cube, rollup, and grouping set
> -
>
> Key: SPARK-12720
> URL: https://issues.apache.org/jira/browse/SPARK-12720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Xiao Li
>
> {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. 
> Please refer to SPARK-11012 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12720:


Assignee: Xiao Li  (was: Apache Spark)

> SQL generation support for cube, rollup, and grouping set
> -
>
> Key: SPARK-12720
> URL: https://issues.apache.org/jira/browse/SPARK-12720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Xiao Li
>
> {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. 
> Please refer to SPARK-11012 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12720:


Assignee: Apache Spark  (was: Xiao Li)

> SQL generation support for cube, rollup, and grouping set
> -
>
> Key: SPARK-12720
> URL: https://issues.apache.org/jira/browse/SPARK-12720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. 
> Please refer to SPARK-11012 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13386) ConnectedComponents should support maxIteration option

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13386.
-
   Resolution: Fixed
 Assignee: zhengruifeng
Fix Version/s: 2.0.0

> ConnectedComponents should support maxIteration option
> --
>
> Key: SPARK-13386
> URL: https://issues.apache.org/jira/browse/SPARK-13386
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 2.0.0
>
>
> Runing ConnectedComponents is time-consuming on big and complex graph.
> I use it on a graph with 1.7B vertices and 11B edges, and the exact result is 
> not a must. So I think user can directly control the maxIteration of this 
> algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13414) Add support for launching multiple Mesos dispatchers

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13414.
-
   Resolution: Fixed
 Assignee: Timothy Chen
Fix Version/s: 2.0.0

> Add support for launching multiple Mesos dispatchers
> 
>
> Key: SPARK-13414
> URL: https://issues.apache.org/jira/browse/SPARK-13414
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is 
> one mesos dispatcher launched, but potentially users that like to run 
> multi-tenant dispatcher might want to launch multiples. It also helps local 
> development to have the ability to launch multiple ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5506) java.lang.ClassCastException using lambda expressions in combination of spark and Servlet

2016-02-20 Thread Eugene Morozov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155820#comment-15155820
 ] 

Eugene Morozov commented on SPARK-5506:
---

The symptom means there are no classes to deserialize function on spark worker.

The similar issue I've faced with web-service under tests with maven failsafe 
plugin and with running tests from Intellij Idea is that Intellij Idea runs 
everything from classes or test-classes - not from the built jar files. This 
way no jar files can be added to classpath even though compiled jar file are in 
the target directory. 

The issue happens with maven failsafe plugin of version 18.x, but doesn't with 
version 19.x. 
The reason is that version 19.x uses built jar file instead of test-classes => 
this way SparkConf().setJars successfully adds jar files to classpath.

Hope this helps.

> java.lang.ClassCastException using lambda expressions in combination of spark 
> and Servlet
> -
>
> Key: SPARK-5506
> URL: https://issues.apache.org/jira/browse/SPARK-5506
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 1.2.0
> Environment: spark server: Ubuntu 14.04 amd64
> $ java -version
> java version "1.8.0_25"
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
>Reporter: Milad Khajavi
>
> I'm trying to build a web API for my Apache spark jobs using sparkjava.com 
> framework. My code is:
> @Override
> public void init() {
> get("/hello",
> (req, res) -> {
> String sourcePath = "hdfs://spark:54310/input/*";
> SparkConf conf = new SparkConf().setAppName("LineCount");
> conf.setJars(new String[] { 
> "/home/sam/resin-4.0.42/webapps/test.war" });
> File configFile = new File("config.properties");
> String sparkURI = "spark://hamrah:7077";
> conf.setMaster(sparkURI);
> conf.set("spark.driver.allowMultipleContexts", "true");
> JavaSparkContext sc = new JavaSparkContext(conf);
> @SuppressWarnings("resource")
> JavaRDD log = sc.textFile(sourcePath);
> JavaRDD lines = log.filter(x -> {
> return true;
> });
> return lines.count();
> });
> }
> If I remove the lambda expression or put it inside a simple jar rather than a 
> web service (somehow a Servlet) it will run without any error. But using a 
> lambda expression inside a Servlet will result this exception:
> 15/01/28 10:36:33 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
> hamrah): java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type 
> org.apache.spark.api.java.function.Function in instance of 
> org.apache.spark.api.java.JavaRDD$$anonfun$filter$1
> at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
> at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java

[jira] [Resolved] (SPARK-13310) Missing Sorting Columns in Generate

2016-02-20 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13310.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11198
[https://github.com/apache/spark/pull/11198]

> Missing Sorting Columns in Generate
> ---
>
> Key: SPARK-13310
> URL: https://issues.apache.org/jira/browse/SPARK-13310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
> Fix For: 2.0.0
>
>
> {code}
>   // case 1: missing sort columns are resolvable if join is true
>   sql("SELECT explode(a) AS val, b FROM data WHERE b < 2 order by val, c")
>   // case 2: missing sort columns are not resolvable if join is false. 
> Thus, issue a message in this case
>   sql("SELECT explode(a) AS val FROM data order by val, c")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12459) Add ExpressionDescription to string functions

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155856#comment-15155856
 ] 

Apache Spark commented on SPARK-12459:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/10460

> Add ExpressionDescription to string functions
> -
>
> Key: SPARK-12459
> URL: https://issues.apache.org/jira/browse/SPARK-12459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12459) Add ExpressionDescription to string functions

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12459:


Assignee: Apache Spark

> Add ExpressionDescription to string functions
> -
>
> Key: SPARK-12459
> URL: https://issues.apache.org/jira/browse/SPARK-12459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12459) Add ExpressionDescription to string functions

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12459:


Assignee: (was: Apache Spark)

> Add ExpressionDescription to string functions
> -
>
> Key: SPARK-12459
> URL: https://issues.apache.org/jira/browse/SPARK-12459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13416) Add positive check for option 'numIter' in StronglyConnectedComponents

2016-02-20 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-13416:


 Summary: Add positive check for option 'numIter' in 
StronglyConnectedComponents 
 Key: SPARK-13416
 URL: https://issues.apache.org/jira/browse/SPARK-13416
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Reporter: zhengruifeng
Priority: Minor


The output of StronglyConnectedComponents with numIter no greater than 1 may 
make no sense. So I just add require check in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13416) Add positive check for option 'numIter' in StronglyConnectedComponents

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155866#comment-15155866
 ] 

Apache Spark commented on SPARK-13416:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/11284

> Add positive check for option 'numIter' in StronglyConnectedComponents 
> ---
>
> Key: SPARK-13416
> URL: https://issues.apache.org/jira/browse/SPARK-13416
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Reporter: zhengruifeng
>Priority: Minor
>
> The output of StronglyConnectedComponents with numIter no greater than 1 may 
> make no sense. So I just add require check in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13416) Add positive check for option 'numIter' in StronglyConnectedComponents

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13416:


Assignee: (was: Apache Spark)

> Add positive check for option 'numIter' in StronglyConnectedComponents 
> ---
>
> Key: SPARK-13416
> URL: https://issues.apache.org/jira/browse/SPARK-13416
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Reporter: zhengruifeng
>Priority: Minor
>
> The output of StronglyConnectedComponents with numIter no greater than 1 may 
> make no sense. So I just add require check in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13416) Add positive check for option 'numIter' in StronglyConnectedComponents

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13416:


Assignee: Apache Spark

> Add positive check for option 'numIter' in StronglyConnectedComponents 
> ---
>
> Key: SPARK-13416
> URL: https://issues.apache.org/jira/browse/SPARK-13416
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> The output of StronglyConnectedComponents with numIter no greater than 1 may 
> make no sense. So I just add require check in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13306) Uncorrelated scalar subquery

2016-02-20 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13306.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11190
[https://github.com/apache/spark/pull/11190]

> Uncorrelated scalar subquery
> 
>
> Key: SPARK-13306
> URL: https://issues.apache.org/jira/browse/SPARK-13306
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> A scalar subquery is a subquery that only generate single row and single 
> column, could be used as part of expression.
> Uncorrelated scalar subquery means it does not has a reference to external 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13306) Uncorrelated scalar subquery

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155916#comment-15155916
 ] 

Apache Spark commented on SPARK-13306:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11285

> Uncorrelated scalar subquery
> 
>
> Key: SPARK-13306
> URL: https://issues.apache.org/jira/browse/SPARK-13306
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> A scalar subquery is a subquery that only generate single row and single 
> column, could be used as part of expression.
> Uncorrelated scalar subquery means it does not has a reference to external 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13399) Investigate type erasure warnings in CheckpointSuite

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13399:


Assignee: (was: Apache Spark)

> Investigate type erasure warnings in CheckpointSuite
> 
>
> Key: SPARK-13399
> URL: https://issues.apache.org/jira/browse/SPARK-13399
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: holdenk
>Priority: Trivial
>
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:154:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn] dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn] ^
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:911:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn]   dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn]   ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13399) Investigate type erasure warnings in CheckpointSuite

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155918#comment-15155918
 ] 

Apache Spark commented on SPARK-13399:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/11286

> Investigate type erasure warnings in CheckpointSuite
> 
>
> Key: SPARK-13399
> URL: https://issues.apache.org/jira/browse/SPARK-13399
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: holdenk
>Priority: Trivial
>
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:154:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn] dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn] ^
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:911:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn]   dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn]   ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13399) Investigate type erasure warnings in CheckpointSuite

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13399:


Assignee: Apache Spark

> Investigate type erasure warnings in CheckpointSuite
> 
>
> Key: SPARK-13399
> URL: https://issues.apache.org/jira/browse/SPARK-13399
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:154:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn] dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn] ^
> [warn] 
> /home/holden/repos/spark/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala:911:
>  abstract type V in type 
> org.apache.spark.streaming.TestOutputStreamWithPartitions[V] is unchecked 
> since it is eliminated by erasure
> [warn]   dstream.isInstanceOf[TestOutputStreamWithPartitions[V]]
> [warn]   ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13417) SQL subquery support

2016-02-20 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13417:
---

 Summary: SQL subquery support
 Key: SPARK-13417
 URL: https://issues.apache.org/jira/browse/SPARK-13417
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin


This is an umbrella JIRA ticket to track various issues related to subqueries.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13306) Initial implementation for uncorrelated scalar subquery

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13306:

Summary: Initial implementation for uncorrelated scalar subquery  (was: 
Uncorrelated scalar subquery)

> Initial implementation for uncorrelated scalar subquery
> ---
>
> Key: SPARK-13306
> URL: https://issues.apache.org/jira/browse/SPARK-13306
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> A scalar subquery is a subquery that only generate single row and single 
> column, could be used as part of expression.
> Uncorrelated scalar subquery means it does not has a reference to external 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13306) Initial implementation for uncorrelated scalar subquery

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13306:

Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-13417

> Initial implementation for uncorrelated scalar subquery
> ---
>
> Key: SPARK-13306
> URL: https://issues.apache.org/jira/browse/SPARK-13306
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> A scalar subquery is a subquery that only generate single row and single 
> column, could be used as part of expression.
> Uncorrelated scalar subquery means it does not has a reference to external 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13415) Visualize subquery in SQL web UI

2016-02-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13415:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-13417

> Visualize subquery in SQL web UI
> 
>
> Key: SPARK-13415
> URL: https://issues.apache.org/jira/browse/SPARK-13415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>
> Right now, uncorrelated scalar subqueries are not showed in SQL tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13418) SQL generation for uncorrelated scalar subqueries

2016-02-20 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13418:
---

 Summary: SQL generation for uncorrelated scalar subqueries
 Key: SPARK-13418
 URL: https://issues.apache.org/jira/browse/SPARK-13418
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin


This is pretty difficult right now because SQLBuilder is in the hive package, 
whereas the sql function for ScalarSubquery is defined in catalyst package.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13420) Rename Subquery logical plan to SubqueryAlias

2016-02-20 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13420:
---

 Summary: Rename Subquery logical plan to SubqueryAlias
 Key: SPARK-13420
 URL: https://issues.apache.org/jira/browse/SPARK-13420
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin
Assignee: Reynold Xin


logical.Subquery is pretty confusing now we have various subquery expressions.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13419) SubquerySuite should use checkAnswer rather than ScalaTest's assertResult

2016-02-20 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13419:
---

 Summary: SubquerySuite should use checkAnswer rather than 
ScalaTest's assertResult
 Key: SPARK-13419
 URL: https://issues.apache.org/jira/browse/SPARK-13419
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin


This is blocked by being able to generate SQL for subqueries.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13398) Move away from deprecated ThreadPoolTaskSupport

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13398:


Assignee: Apache Spark

> Move away from deprecated ThreadPoolTaskSupport
> ---
>
> Key: SPARK-13398
> URL: https://issues.apache.org/jira/browse/SPARK-13398
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> ThreadPoolTaskSupport has been replaced by ForkJoinTaskSupport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13418) SQL generation for uncorrelated scalar subqueries

2016-02-20 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155928#comment-15155928
 ] 

Reynold Xin commented on SPARK-13418:
-

cc [~liancheng] it seems to me this would be easier if we have .sql in logical 
plans, rather than having them in the hive package ...



> SQL generation for uncorrelated scalar subqueries
> -
>
> Key: SPARK-13418
> URL: https://issues.apache.org/jira/browse/SPARK-13418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> This is pretty difficult right now because SQLBuilder is in the hive package, 
> whereas the sql function for ScalarSubquery is defined in catalyst package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13398) Move away from deprecated ThreadPoolTaskSupport

2016-02-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13398:


Assignee: (was: Apache Spark)

> Move away from deprecated ThreadPoolTaskSupport
> ---
>
> Key: SPARK-13398
> URL: https://issues.apache.org/jira/browse/SPARK-13398
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>Priority: Trivial
>
> ThreadPoolTaskSupport has been replaced by ForkJoinTaskSupport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13398) Move away from deprecated ThreadPoolTaskSupport

2016-02-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155929#comment-15155929
 ] 

Apache Spark commented on SPARK-13398:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/11287

> Move away from deprecated ThreadPoolTaskSupport
> ---
>
> Key: SPARK-13398
> URL: https://issues.apache.org/jira/browse/SPARK-13398
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>Priority: Trivial
>
> ThreadPoolTaskSupport has been replaced by ForkJoinTaskSupport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org