[jira] [Commented] (SPARK-12957) Derive and propagate data constrains in logical plan

2016-03-09 Thread Sameer Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188856#comment-15188856
 ] 

Sameer Agarwal commented on SPARK-12957:


[~ksunitha] I've attached a copy of the design document to the JIRA: 
https://issues.apache.org/jira/secure/attachment/12792466/ConstraintPropagationinSparkSQL.pdf.
 Thanks!

> Derive and propagate data constrains in logical plan 
> -
>
> Key: SPARK-12957
> URL: https://issues.apache.org/jira/browse/SPARK-12957
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Sameer Agarwal
> Attachments: ConstraintPropagationinSparkSQL.pdf
>
>
> Based on the semantic of a query plan, we can derive data constrains (e.g. if 
> a filter defines {{a > 10}}, we know that the output data of this filter 
> satisfy the constrain of {{a > 10}} and {{a is not null}}). We should build a 
> framework to derive and propagate constrains in the logical plan, which can 
> help us to build more advanced optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13798:

Description: 
If the Aggregate does not have any aggregate expression, it is useless. We can 
replace Aggregate by Project.

Then, Project can be collapsed or pushed down further.

This only makes sense when the grouping and aggregate are identical. Thus, will 
do it later

  was:
If the Aggregate does not have any aggregate expression, it is useless. We can 
replace Aggregate by Project.

Then, Project can be collapsed or pushed down further.


> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.
> Then, Project can be collapsed or pushed down further.
> This only makes sense when the grouping and aggregate are identical. Thus, 
> will do it later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-13798.
---
Resolution: Later

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.
> Then, Project can be collapsed or pushed down further.
> This only makes sense when the grouping and aggregate are identical. Thus, 
> will do it later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12957) Derive and propagate data constrains in logical plan

2016-03-09 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal updated SPARK-12957:
---
Attachment: ConstraintPropagationinSparkSQL.pdf

Design Document

> Derive and propagate data constrains in logical plan 
> -
>
> Key: SPARK-12957
> URL: https://issues.apache.org/jira/browse/SPARK-12957
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Sameer Agarwal
> Attachments: ConstraintPropagationinSparkSQL.pdf
>
>
> Based on the semantic of a query plan, we can derive data constrains (e.g. if 
> a filter defines {{a > 10}}, we know that the output data of this filter 
> satisfy the constrain of {{a > 10}} and {{a is not null}}). We should build a 
> framework to derive and propagate constrains in the logical plan, which can 
> help us to build more advanced optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-13798:
-

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.
> Then, Project can be collapsed or pushed down further.
> This only makes sense when the grouping and aggregate are identical. Thus, 
> will do it later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-13798.
---
Resolution: Invalid

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.
> Then, Project can be collapsed or pushed down further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13798:

Description: 
If the Aggregate does not have any aggregate expression, it is useless. We can 
replace Aggregate by Project.

Then, Project can be collapsed or pushed down further.

  was:If the Aggregate does not have any aggregate expression, it is useless. 
We can replace Aggregate by Project.


> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.
> Then, Project can be collapsed or pushed down further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13134) add 'spark.streaming.kafka.partition.multiplier' into SparkConf

2016-03-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188834#comment-15188834
 ] 

Sean Owen commented on SPARK-13134:
---

I don't understand why there are two JIRAs here, or why you attached a patch, 
or why that after closing both?

> add 'spark.streaming.kafka.partition.multiplier' into SparkConf
> ---
>
> Key: SPARK-13134
> URL: https://issues.apache.org/jira/browse/SPARK-13134
> Project: Spark
>  Issue Type: Sub-task
>  Components: Input/Output
>Affects Versions: 1.6.1
>Reporter: zhengcanbin
> Attachments: 13134.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath resolved SPARK-13706.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11547
[https://github.com/apache/spark/pull/11547]

> Python Example for Train Validation Split Missing
> -
>
> Key: SPARK-13706
> URL: https://issues.apache.org/jira/browse/SPARK-13706
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Reporter: Jeremy
>Assignee: Jeremy
>Priority: Minor
> Fix For: 2.0.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> An example of how to use TrainValidationSplit in pyspark needs to be added. 
> Should be consistent with the current examples. I'll submit a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs

2016-03-09 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13796:
--
Priority: Minor  (was: Major)

Do you have more detail on how to reproduce it? It's good that you think you 
see what introduced the problem, but how about linking to the commit or leaving 
some analysis of how it was introduced? Is there any actual impact, or just log 
noise??

> Lock release errors occur frequently in executor logs
> -
>
> Key: SPARK-13796
> URL: https://issues.apache.org/jira/browse/SPARK-13796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Nishkam Ravi
>Priority: Minor
>
> Executor logs contain a lot of these error messages (irrespective of the 
> workload):
> 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by 
> TID = 1119



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188819#comment-15188819
 ] 

Apache Spark commented on SPARK-13798:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/11565

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13797) Eliminate Unnecessary Window

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13797:


Assignee: (was: Apache Spark)

> Eliminate Unnecessary Window
> 
>
> Key: SPARK-13797
> URL: https://issues.apache.org/jira/browse/SPARK-13797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Window does not have any window expression, it is useless. It might 
> happen after column pruning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13797) Eliminate Unnecessary Window

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13797:


Assignee: Apache Spark

> Eliminate Unnecessary Window
> 
>
> Key: SPARK-13797
> URL: https://issues.apache.org/jira/browse/SPARK-13797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> If the Window does not have any window expression, it is useless. It might 
> happen after column pruning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13797) Eliminate Unnecessary Window

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188818#comment-15188818
 ] 

Apache Spark commented on SPARK-13797:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/11565

> Eliminate Unnecessary Window
> 
>
> Key: SPARK-13797
> URL: https://issues.apache.org/jira/browse/SPARK-13797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Window does not have any window expression, it is useless. It might 
> happen after column pruning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13798:


Assignee: Apache Spark

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13798:


Assignee: (was: Apache Spark)

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13798:

Description: If the Aggregate does not have any aggregate expression, it is 
useless. We can replace Aggregate by Project.

> Replace Aggregate by Project if no Aggregate Function exists
> 
>
> Key: SPARK-13798
> URL: https://issues.apache.org/jira/browse/SPARK-13798
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Aggregate does not have any aggregate expression, it is useless. We 
> can replace Aggregate by Project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13798) Replace Aggregate by Project if no Aggregate Function exists

2016-03-09 Thread Xiao Li (JIRA)
Xiao Li created SPARK-13798:
---

 Summary: Replace Aggregate by Project if no Aggregate Function 
exists
 Key: SPARK-13798
 URL: https://issues.apache.org/jira/browse/SPARK-13798
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13797) Eliminate Unnecessary Window

2016-03-09 Thread Xiao Li (JIRA)
Xiao Li created SPARK-13797:
---

 Summary: Eliminate Unnecessary Window
 Key: SPARK-13797
 URL: https://issues.apache.org/jira/browse/SPARK-13797
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13797) Eliminate Unnecessary Window

2016-03-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-13797:

Description: If the Window does not have any window expression, it is 
useless. It might happen after column pruning

> Eliminate Unnecessary Window
> 
>
> Key: SPARK-13797
> URL: https://issues.apache.org/jira/browse/SPARK-13797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> If the Window does not have any window expression, it is useless. It might 
> happen after column pruning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Nick Pentreath (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-13430:
---
Assignee: Bryan Cutler

> Expose ml summary function in PySpark for classification and regression models
> --
>
> Key: SPARK-13430
> URL: https://issues.apache.org/jira/browse/SPARK-13430
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Reporter: Shubhanshu Mishra
>Assignee: Bryan Cutler
>  Labels: classification, java, ml, mllib, pyspark, regression, 
> scala, sparkr
>
> I think model summary interface which is available in Spark's scala, Java and 
> R interfaces should also be available in the python interface. 
> Similar to #SPARK-11494
> https://issues.apache.org/jira/browse/SPARK-11494



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188774#comment-15188774
 ] 

Nick Pentreath commented on SPARK-12626:


[~dbtsai] ok thanks - would like to take a look when it's ready.

> MLlib 2.0 Roadmap
> -
>
> Key: SPARK-12626
> URL: https://issues.apache.org/jira/browse/SPARK-12626
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Xiangrui Meng
>Priority: Blocker
>  Labels: roadmap
>
> This is a master list for MLlib improvements we plan to have in Spark 2.0. 
> Please view this list as a wish list rather than a concrete plan, because we 
> don't have an accurate estimate of available resources. Due to limited review 
> bandwidth, features appearing on this list will get higher priority during 
> code review. But feel free to suggest new items to the list in comments. We 
> are experimenting with this process. Your feedback would be greatly 
> appreciated.
> h1. Instructions
> h2. For contributors:
> * Please read 
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark 
> carefully. Code style, documentation, and unit tests are important.
> * If you are a first-time Spark contributor, please always start with a 
> [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
> than a medium/big feature. Based on our experience, mixing the development 
> process with a big feature usually causes long delay in code review.
> * Never work silently. Let everyone know on the corresponding JIRA page when 
> you start working on some features. This is to avoid duplicate work. For 
> small features, you don't need to wait to get JIRA assigned.
> * For medium/big features or features with dependencies, please get assigned 
> first before coding and keep the ETA updated on the JIRA. If there exist no 
> activity on the JIRA page for a certain amount of time, the JIRA should be 
> released for other contributors.
> * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
> after another.
> * Remember to add the `@Since("2.0.0")` annotation to new public APIs.
> * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
> review greatly helps to improve others' code as well as yours.
> h2. For committers:
> * Try to break down big features into small and specific JIRA tasks and link 
> them properly.
> * Add a "starter" label to starter tasks.
> * Put a rough estimate for medium/big features and track the progress.
> * If you start reviewing a PR, please add yourself to the Shepherd field on 
> JIRA.
> * If the code looks good to you, please comment "LGTM". For non-trivial PRs, 
> please ping a maintainer to make a final pass.
> * After merging a PR, create and link JIRAs for Python, example code, and 
> documentation if applicable.
> h1. Roadmap (*WIP*)
> This is NOT [a complete list of MLlib JIRAs for 
> 2.0|https://issues.apache.org/jira/issues/?filter=12334385]. We only include 
> umbrella JIRAs and high-level tasks.
> Major efforts in this release:
> * `spark.ml`: Achieve feature parity for the `spark.ml` API, relative to the 
> `spark.mllib` API.  This includes the Python API.
> * Linear algebra: Separate out the linear algebra library as a standalone 
> project without a Spark dependency to simplify production deployment.
> * Pipelines API: Complete critical improvements to the Pipelines API
> * New features: As usual, we expect to expand the feature set of MLlib.  
> However, we will prioritize API parity over new features.  _New algorithms 
> should be written for `spark.ml`, not `spark.mllib`._
> h2. Algorithms and performance
> * iteratively re-weighted least squares (IRLS) for GLMs (SPARK-9835)
> * estimator interface for GLMs (SPARK-12811)
> * extended support for GLM model families and link functions in SparkR 
> (SPARK-12566)
> * improved model summaries and stats via IRLS (SPARK-9837)
> Additional (maybe lower priority):
> * robust linear regression with Huber loss (SPARK-3181)
> * vector-free L-BFGS (SPARK-10078)
> * tree partition by features (SPARK-3717)
> * local linear algebra (SPARK-6442)
> * weighted instance support (SPARK-9610)
> ** random forest (SPARK-9478)
> ** GBT (SPARK-9612)
> * locality sensitive hashing (LSH) (SPARK-5992)
> * deep learning (SPARK-5575)
> ** autoencoder (SPARK-10408)
> ** restricted Boltzmann machine (RBM) (SPARK-4251)
> ** convolutional neural network (stretch)
> * factorization machine (SPARK-7008)
> * distributed LU decomposition (SPARK-8514)
> h2. Statistics
> * bivariate statistics as UDAFs (SPARK-10385)
> * R-like statistics for GLMs (SPARK-9835)
> * sketch algorithms (cross listed) : approximate quantiles (SPARK-6761), 
> count-min sketch (SPARK-6763), Bloom filter (SPARK-1281

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-09 Thread Praveen Devarao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188761#comment-15188761
 ] 

Praveen Devarao commented on SPARK-12177:
-

Hi Cody,

The last time when I got TopicParition and OffsetAndMetadata class made 
serializable, the argument was that these classes are used by end-users and are 
metadata class which would be needed for checkpoint purpose.

As for ConsumerRecord,  this class is meant to hold the actual data and would 
usually be not needed for checkpoint purpose...if we need the data we can 
always go to respective offset in respective topic from respective partition. 
Also, the ConsumerRecord class has members which are of generic type (K and V) 
so really the serialization depends on what type of object is flowed in by the 
user and if that is serializable.

Given this, From Kafka perspective not sure how we can reason why would one 
want to mark this class as serializable.

Thanks

Praveen

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13796) Lock release errors occur frequently in executor logs

2016-03-09 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188762#comment-15188762
 ] 

Nishkam Ravi commented on SPARK-13796:
--

Running with master from March 7th (e52e597db48d069b98c1d404b221d3365f38fbb8)

Error introduced by 633d63a48ad98754dc7c56f9ac150fc2aa4e42c5

> Lock release errors occur frequently in executor logs
> -
>
> Key: SPARK-13796
> URL: https://issues.apache.org/jira/browse/SPARK-13796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Nishkam Ravi
>
> Executor logs contain a lot of these error messages (irrespective of the 
> workload):
> 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by 
> TID = 1119



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13796) Lock release errors occur frequently in executor logs

2016-03-09 Thread Nishkam Ravi (JIRA)
Nishkam Ravi created SPARK-13796:


 Summary: Lock release errors occur frequently in executor logs
 Key: SPARK-13796
 URL: https://issues.apache.org/jira/browse/SPARK-13796
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
Reporter: Nishkam Ravi


Executor logs contain a lot of these error messages (irrespective of the 
workload):

16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by 
TID = 1119



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run

2016-03-09 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188710#comment-15188710
 ] 

Dongjoon Hyun commented on SPARK-9289:
--

Oh, it wasn't fast enough. I see.
Thank you any way.

> OrcPartitionDiscoverySuite is slow to run
> -
>
> Key: SPARK-9289
> URL: https://issues.apache.org/jira/browse/SPARK-9289
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Reynold Xin
>
> {code}
> [info] - read partitioned table - normal case (18 seconds, 557 milliseconds)
> [info] - read partitioned table - partition key included in orc file (5 
> seconds, 160 milliseconds)
> [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds)
> [info] - read partitioned table - with nulls and partition keys are included 
> in Orc file (3 seconds, 218 milliseconds)
> {code}
> Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13793:


Assignee: (was: Apache Spark)

> PipeRDD doesn't propagate exceptions while reading parent RDD
> -
>
> Key: SPARK-13793
> URL: https://issues.apache.org/jira/browse/SPARK-13793
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Tejas Patil
>Priority: Minor
>
> PipeRDD creates a process to run the command and spawns a thread to feed the 
> input data to the process's stdin. If there is any exception in the child 
> thread which gets the input data from the parent RDD, the child thread does 
> not propagate that exception to the main thread. eg. In event of fetch 
> failures, since the exception is not be propagated, the entire stage fails. 
> The correct behaviour would be to recompute the parent(s) and then relaunch 
> the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188660#comment-15188660
 ] 

Apache Spark commented on SPARK-13793:
--

User 'tejasapatil' has created a pull request for this issue:
https://github.com/apache/spark/pull/11628

> PipeRDD doesn't propagate exceptions while reading parent RDD
> -
>
> Key: SPARK-13793
> URL: https://issues.apache.org/jira/browse/SPARK-13793
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Tejas Patil
>Priority: Minor
>
> PipeRDD creates a process to run the command and spawns a thread to feed the 
> input data to the process's stdin. If there is any exception in the child 
> thread which gets the input data from the parent RDD, the child thread does 
> not propagate that exception to the main thread. eg. In event of fetch 
> failures, since the exception is not be propagated, the entire stage fails. 
> The correct behaviour would be to recompute the parent(s) and then relaunch 
> the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13793) PipedRDD doesn't propagate exceptions while reading parent RDDd

2016-03-09 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated SPARK-13793:

Summary: PipedRDD doesn't propagate exceptions while reading parent RDDd  
(was: PipeRDD doesn't propagate exceptions while reading parent RDD)

> PipedRDD doesn't propagate exceptions while reading parent RDDd
> ---
>
> Key: SPARK-13793
> URL: https://issues.apache.org/jira/browse/SPARK-13793
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Tejas Patil
>Priority: Minor
>
> PipeRDD creates a process to run the command and spawns a thread to feed the 
> input data to the process's stdin. If there is any exception in the child 
> thread which gets the input data from the parent RDD, the child thread does 
> not propagate that exception to the main thread. eg. In event of fetch 
> failures, since the exception is not be propagated, the entire stage fails. 
> The correct behaviour would be to recompute the parent(s) and then relaunch 
> the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13793:


Assignee: Apache Spark

> PipeRDD doesn't propagate exceptions while reading parent RDD
> -
>
> Key: SPARK-13793
> URL: https://issues.apache.org/jira/browse/SPARK-13793
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Tejas Patil
>Assignee: Apache Spark
>Priority: Minor
>
> PipeRDD creates a process to run the command and spawns a thread to feed the 
> input data to the process's stdin. If there is any exception in the child 
> thread which gets the input data from the parent RDD, the child thread does 
> not propagate that exception to the main thread. eg. In event of fetch 
> failures, since the exception is not be propagated, the entire stage fails. 
> The correct behaviour would be to recompute the parent(s) and then relaunch 
> the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13795) ClassCast Exception while attempting to show() a DataFrame

2016-03-09 Thread Ganesh Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188636#comment-15188636
 ] 

Ganesh Krishnan commented on SPARK-13795:
-

This is similar to this Scala bug: https://issues.scala-lang.org/browse/SI-6337

> ClassCast Exception while attempting to show() a DataFrame
> --
>
> Key: SPARK-13795
> URL: https://issues.apache.org/jira/browse/SPARK-13795
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
> Environment: Linux 14.04 LTS
>Reporter: Ganesh Krishnan
>
> DataFrame Schema (by printSchema() ) is as follows
> allDataJoined.printSchema() 
>  |-- eventType: string (nullable = true)
>  |-- itemId: string (nullable = true)
>  |-- productId: string (nullable = true)
>  |-- productVersion: string (nullable = true)
>  |-- servicedBy: string (nullable = true)
>  |-- ACCOUNT_NAME: string (nullable = true)
>  |-- CONTENTGROUPID: string (nullable = true)
>  |-- PRODUCT_ID: string (nullable = true)
>  |-- PROFILE_ID: string (nullable = true)
>  |-- SALESADVISEREMAIL: string (nullable = true)
>  |-- businessName: string (nullable = true)
>  |-- contentGroupId: string (nullable = true)
>  |-- salesAdviserName: string (nullable = true)
>  |-- salesAdviserPhone: string (nullable = true)
> There is NO column that has any datatype except String. There used to be 
> previously an inferred column of type long that was dropped  
>  
> DataFrame allDataJoined = whiteEventJoinedWithReference.
>drop(rliDataFrame.col("occurredAtDate"));
> allDataJoined.printSchema() : output above ^^
> Now 
> allDataJoined.show() throws the following exception vv
> java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at scala.math.Ordering$Int$.compare(Ordering.scala:256)
>   at scala.math.Ordering$class.gt(Ordering.scala:97)
>   at scala.math.Ordering$Int$.gt(Ordering.scala:256)
>   at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:457)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:383)
>   at 
> org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:238)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
>   at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
>   at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.prunePartitions(DataSourceStrategy.scala:257)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:82)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.makeBroadcastHashJoin(SparkStrategies.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:97)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrateg

[jira] [Created] (SPARK-13795) ClassCast Exception while attempting to show() a DataFrame

2016-03-09 Thread Ganesh Krishnan (JIRA)
Ganesh Krishnan created SPARK-13795:
---

 Summary: ClassCast Exception while attempting to show() a DataFrame
 Key: SPARK-13795
 URL: https://issues.apache.org/jira/browse/SPARK-13795
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.0
 Environment: Linux 14.04 LTS
Reporter: Ganesh Krishnan


DataFrame Schema (by printSchema() ) is as follows

allDataJoined.printSchema() 

 |-- eventType: string (nullable = true)
 |-- itemId: string (nullable = true)
 |-- productId: string (nullable = true)
 |-- productVersion: string (nullable = true)
 |-- servicedBy: string (nullable = true)
 |-- ACCOUNT_NAME: string (nullable = true)
 |-- CONTENTGROUPID: string (nullable = true)
 |-- PRODUCT_ID: string (nullable = true)
 |-- PROFILE_ID: string (nullable = true)
 |-- SALESADVISEREMAIL: string (nullable = true)
 |-- businessName: string (nullable = true)
 |-- contentGroupId: string (nullable = true)
 |-- salesAdviserName: string (nullable = true)
 |-- salesAdviserPhone: string (nullable = true)

There is NO column that has any datatype except String. There used to be 
previously an inferred column of type long that was dropped  
 
DataFrame allDataJoined = whiteEventJoinedWithReference.
   drop(rliDataFrame.col("occurredAtDate"));
allDataJoined.printSchema() : output above ^^
Now 
allDataJoined.show() throws the following exception vv

java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Ordering$Int$.compare(Ordering.scala:256)
at scala.math.Ordering$class.gt(Ordering.scala:97)
at scala.math.Ordering$Int$.gt(Ordering.scala:256)
at 
org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:457)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:383)
at 
org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:238)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
at 
scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.prunePartitions(DataSourceStrategy.scala:257)
at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:82)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.makeBroadcastHashJoin(SparkStrategies.scala:88)
at 
org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:97)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at 
org.apach

[jira] [Deleted] (SPARK-10813) API design: high level class structuring regarding windowed and non-windowed streams

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin deleted SPARK-10813:



> API design: high level class structuring regarding windowed and non-windowed 
> streams
> 
>
> Key: SPARK-10813
> URL: https://issues.apache.org/jira/browse/SPARK-10813
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> I can think of 3 high level alternatives for streaming data frames. See
> https://github.com/rxin/spark/pull/17



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13146) API for managing streaming dataframes

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13146.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> API for managing streaming dataframes
> -
>
> Key: SPARK-13146
> URL: https://issues.apache.org/jira/browse/SPARK-13146
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-10819) Logical plan: determine logical operators needed

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin deleted SPARK-10819:



> Logical plan: determine logical operators needed
> 
>
> Key: SPARK-10819
> URL: https://issues.apache.org/jira/browse/SPARK-10819
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Reynold Xin
>
> Again, it would be great if we can just reuse Spark SQL's existing logical 
> plan. We might need to introduce new logical plans (e.g. windowing which is 
> different from Spark SQL's).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-10818) Query optimization: investigate whether we need a separate optimizer from Spark SQL's

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin deleted SPARK-10818:



> Query optimization: investigate whether we need a separate optimizer from 
> Spark SQL's
> -
>
> Key: SPARK-10818
> URL: https://issues.apache.org/jira/browse/SPARK-10818
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Reynold Xin
>
> It would be great if we can just reuse Spark SQL's query optimizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188598#comment-15188598
 ] 

Apache Spark commented on SPARK-13794:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11627

> Rename DataFrameWriter.stream DataFrameWriter.startStream
> -
>
> Key: SPARK-13794
> URL: https://issues.apache.org/jira/browse/SPARK-13794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> This makes it more obvious with the verb "start" that we are actually 
> starting some execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13794:


Assignee: Apache Spark  (was: Reynold Xin)

> Rename DataFrameWriter.stream DataFrameWriter.startStream
> -
>
> Key: SPARK-13794
> URL: https://issues.apache.org/jira/browse/SPARK-13794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> This makes it more obvious with the verb "start" that we are actually 
> starting some execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13794:


Assignee: Reynold Xin  (was: Apache Spark)

> Rename DataFrameWriter.stream DataFrameWriter.startStream
> -
>
> Key: SPARK-13794
> URL: https://issues.apache.org/jira/browse/SPARK-13794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> This makes it more obvious with the verb "start" that we are actually 
> starting some execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13794) Rename DataFrameWriter.stream DataFrameWriter.startStream

2016-03-09 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13794:
---

 Summary: Rename DataFrameWriter.stream DataFrameWriter.startStream
 Key: SPARK-13794
 URL: https://issues.apache.org/jira/browse/SPARK-13794
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


This makes it more obvious with the verb "start" that we are actually starting 
some execution.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12117) Column Aliases are Ignored in callUDF while using struct()

2016-03-09 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188593#comment-15188593
 ] 

Liang-Chi Hsieh commented on SPARK-12117:
-

As I revisit this PR and find that this bug is already fixed in current 
codebase. I think we can close this now.

> Column Aliases are Ignored in callUDF while using struct()
> --
>
> Key: SPARK-12117
> URL: https://issues.apache.org/jira/browse/SPARK-12117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Sachin Aggarwal
>
> case where this works:
>   val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"), 
> ("Rishabh", "2"))).toDF("myText", "id")
>   
> TestDoc1.select(callUDF("mydef",struct($"myText".as("Text"),$"id".as("label"))).as("col1")).show
> steps to reproduce error case:
> 1)create a file copy following text--filename(a.json)
> { "myText": "Sachin Aggarwal","id": "1"}
> { "myText": "Rishabh","id": "2"}
> 2)define a simple UDF
> def mydef(r:Row)={println(r.schema); r.getAs("Text").asInstanceOf[String]}
> 3)register the udf 
>  sqlContext.udf.register("mydef" ,mydef _)
> 4)read the input file 
> val TestDoc2=sqlContext.read.json("/tmp/a.json")
> 5)make a call to UDF
> TestDoc2.select(callUDF("mydef",struct($"myText".as("Text"),$"id".as("label"))).as("col1")).show
> ERROR received:
> java.lang.IllegalArgumentException: Field "Text" does not exist.
>  at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:234)
>  at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:234)
>  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>  at scala.collection.AbstractMap.getOrElse(Map.scala:58)
>  at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:233)
>  at 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.fieldIndex(rows.scala:212)
>  at org.apache.spark.sql.Row$class.getAs(Row.scala:325)
>  at org.apache.spark.sql.catalyst.expressions.GenericRow.getAs(rows.scala:191)
>  at 
> $line414.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.mydef(:107)
>  at 
> $line419.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:110)
>  at 
> $line419.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:110)
>  at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75)
>  at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74)
>  at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:964)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:55)
>  at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:53)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>  at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>  at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>  at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>  at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>  at org.apache.spark.schedul

[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken

2016-03-09 Thread Eran Withana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589
 ] 

Eran Withana edited comment on SPARK-12345 at 3/10/16 3:20 AM:
---

is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

{code}
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
{code}

To provide more context, here is my spark-submit script

{code}
$SPARK_HOME/bin/spark-submit \
 --class com.mycompany.SparkStarter \
 --master mesos://mesos-dispatcher:7077 \
 --name SparkStarterJob \
--driver-memory 1G \
 --executor-memory 4G \
--deploy-mode cluster \
 --total-executor-cores 1 \
 --conf 
spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \
 http://abc.com/spark-starter.jar
{code}


was (Author: eran.chinth...@gmail.com):
is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

{code}
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
{code}

To provide more context, here is my spark-submit script

{code}
$SPARK_HOME/bin/spark-submit \
 `# main class to be run` \
 --class com.mycompany.SparkStarter \
 --master mesos://mesos-dispatcher:7077 \
 --name SparkStarterJob \
--driver-memory 1G \
 --executor-memory 4G \
--deploy-mode cluster \
 --total-executor-cores 1 \
 --conf 
spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \
 http://abc.com/spark-starter.jar
{code}

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Timothy Chen
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken

2016-03-09 Thread Eran Withana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589
 ] 

Eran Withana edited comment on SPARK-12345 at 3/10/16 3:19 AM:
---

is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

{code}
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
{code}

To provide more context, here is my spark-submit script

{code}
$SPARK_HOME/bin/spark-submit \
 `# main class to be run` \
 --class com.mycompany.SparkStarter \
 --master mesos://mesos-dispatcher:7077 \
 --name SparkStarterJob \
--driver-memory 1G \
 --executor-memory 4G \
--deploy-mode cluster \
 --total-executor-cores 1 \
 --conf 
spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \
 http://abc.com/spark-starter.jar
{code}


was (Author: eran.chinth...@gmail.com):
is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

{code}
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
{code}

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Timothy Chen
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12345) Mesos cluster mode is broken

2016-03-09 Thread Eran Withana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589
 ] 

Eran Withana edited comment on SPARK-12345 at 3/10/16 3:16 AM:
---

is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

{code}
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
{code}


was (Author: eran.chinth...@gmail.com):
is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

```
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
```

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Timothy Chen
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12345) Mesos cluster mode is broken

2016-03-09 Thread Eran Withana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188589#comment-15188589
 ] 

Eran Withana commented on SPARK-12345:
--

is the resolution to this issue available in Spark 1.6.0 release? 

I just used Spark 1.6.0 and got the following error in mesos logs, when it 
tried to run the task

```
I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
20160223-000314-3439362570-5050-631-S0
sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
```

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Timothy Chen
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13766) Inconsistent file extensions and omitted file extensions written by CSV, TEXT and JSON data sources

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13766.
-
   Resolution: Fixed
 Assignee: Hyukjin Kwon
Fix Version/s: 2.0.0

> Inconsistent file extensions and omitted file extensions written by CSV, TEXT 
> and JSON data sources
> ---
>
> Key: SPARK-13766
> URL: https://issues.apache.org/jira/browse/SPARK-13766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, the output (part-files) from CSV, TEXT and JSON data sources do 
> not have file extensions such as .csv, .txt and .json (except for compression 
> extensions such as .gz, .deflate and .bz4).
> In addition, it looks Parquet has the extensions (in part-files) such as 
> .gz.parquet or .snappy.parquet according to compression codecs whereas ORC 
> does not have such extensions but it is just .orc.
> So, in a simple view, currently the extensions are set as below:
> {code}
> TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
> Parquet -  [.COMPRESSION_CODEC_NAME].parquet
> ORC - .orc
> {code}
> It would be great if we have a consistent naming for them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run

2016-03-09 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9289.

Resolution: Not A Problem

> OrcPartitionDiscoverySuite is slow to run
> -
>
> Key: SPARK-9289
> URL: https://issues.apache.org/jira/browse/SPARK-9289
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Reynold Xin
>
> {code}
> [info] - read partitioned table - normal case (18 seconds, 557 milliseconds)
> [info] - read partitioned table - partition key included in orc file (5 
> seconds, 160 milliseconds)
> [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds)
> [info] - read partitioned table - with nulls and partition keys are included 
> in Orc file (3 seconds, 218 milliseconds)
> {code}
> Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run

2016-03-09 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188583#comment-15188583
 ] 

Reynold Xin commented on SPARK-9289:


Still pretty long but let me close this.


> OrcPartitionDiscoverySuite is slow to run
> -
>
> Key: SPARK-9289
> URL: https://issues.apache.org/jira/browse/SPARK-9289
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Reynold Xin
>
> {code}
> [info] - read partitioned table - normal case (18 seconds, 557 milliseconds)
> [info] - read partitioned table - partition key included in orc file (5 
> seconds, 160 milliseconds)
> [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds)
> [info] - read partitioned table - with nulls and partition keys are included 
> in Orc file (3 seconds, 218 milliseconds)
> {code}
> Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13793) PipeRDD doesn't propagate exceptions while reading parent RDD

2016-03-09 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-13793:
---

 Summary: PipeRDD doesn't propagate exceptions while reading parent 
RDD
 Key: SPARK-13793
 URL: https://issues.apache.org/jira/browse/SPARK-13793
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Tejas Patil
Priority: Minor


PipeRDD creates a process to run the command and spawns a thread to feed the 
input data to the process's stdin. If there is any exception in the child 
thread which gets the input data from the parent RDD, the child thread does not 
propagate that exception to the main thread. eg. In event of fetch failures, 
since the exception is not be propagated, the entire stage fails. The correct 
behaviour would be to recompute the parent(s) and then relaunch the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block data too soon"

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188574#comment-15188574
 ] 

Apache Spark commented on SPARK-7420:
-

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11626

> Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block 
> data too soon"
> -
>
> Key: SPARK-7420
> URL: https://issues.apache.org/jira/browse/SPARK-7420
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Andrew Or
>Assignee: Tathagata Das
>Priority: Critical
>  Labels: flaky-test
>
> {code}
> The code passed to eventually never returned normally. Attempted 18 times 
> over 10.13803606001 seconds. Last failure message: 
> receiverTracker.hasUnallocatedBlocks was false.
> {code}
> It seems to be failing only in maven.
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/458/
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/459/
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2173/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-13760:
--

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9289) OrcPartitionDiscoverySuite is slow to run

2016-03-09 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188567#comment-15188567
 ] 

Dongjoon Hyun commented on SPARK-9289:
--

Hi, [~rxin].

As of today, this issue seems to be solved. If then, could you close this issue?

* Notebook
{code}
$ build/sbt "project hive" "test-only *OrcPartitionDiscoverySuite -- -z 
partitioned"
...
[info] OrcPartitionDiscoverySuite:
[info] - read partitioned table - normal case (4 seconds, 427 milliseconds)
[info] - read partitioned table - partition key included in orc file (1 second, 
419 milliseconds)
[info] - read partitioned table - with nulls (911 milliseconds)
[info] - read partitioned table - with nulls and partition keys are included in 
Orc file (747 milliseconds)
{code}

* Jenkins
{code}
[info] OrcPartitionDiscoverySuite:
[info] - read partitioned table - normal case (1 second, 745 milliseconds)
[info] - read partitioned table - partition key included in orc file (1 second, 
961 milliseconds)
[info] - read partitioned table - with nulls (1 second, 243 milliseconds)
[info] - read partitioned table - with nulls and partition keys are included in 
Orc file (1 second, 1 millisecond)
{code}

> OrcPartitionDiscoverySuite is slow to run
> -
>
> Key: SPARK-9289
> URL: https://issues.apache.org/jira/browse/SPARK-9289
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Reynold Xin
>
> {code}
> [info] - read partitioned table - normal case (18 seconds, 557 milliseconds)
> [info] - read partitioned table - partition key included in orc file (5 
> seconds, 160 milliseconds)
> [info] - read partitioned table - with nulls (4 seconds, 69 milliseconds)
> [info] - read partitioned table - with nulls and partition keys are included 
> in Orc file (3 seconds, 218 milliseconds)
> {code}
> Does the unit test really need to run for 18 secs, 5 secs, 4 secs, and 3 secs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13760.
--
Resolution: Later

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188568#comment-15188568
 ] 

Yin Huai commented on SPARK-13760:
--

Set the resolution to later. Maybe we want to revisit it after we drop the 
Scala 2.10 support.

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal closed SPARK-13760.
--
Resolution: Won't Fix

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13760:


Assignee: Apache Spark  (was: Sameer Agarwal)

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Apache Spark
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13760:


Assignee: Sameer Agarwal  (was: Apache Spark)

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-03-09 Thread Zhongshuai Pei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188556#comment-15188556
 ] 

Zhongshuai Pei commented on SPARK-4105:
---

I  had this happen in Spark 1.5.2 [~joshrosen] [~daniel.siegmann.aol]

> FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
> shuffle
> -
>
> Key: SPARK-4105
> URL: https://issues.apache.org/jira/browse/SPARK-4105
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
> Attachments: JavaObjectToSerialize.java, 
> SparkFailedToUncompressGenerator.scala
>
>
> We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
> shuffle read.  Here's a sample stacktrace from an executor:
> {code}
> 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
> 33053)
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)

[jira] [Reopened] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-13760:
--

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188555#comment-15188555
 ] 

Yin Huai commented on SPARK-13760:
--

Seems https://github.com/apache/spark/pull/11597 broke the scala 2.10 build. 
So, I have reverted it.

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13760:
-
Fix Version/s: (was: 1.6.2)
   (was: 2.0.0)

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13792) Limit logging of bad records

2016-03-09 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13792:
--

 Summary: Limit logging of bad records
 Key: SPARK-13792
 URL: https://issues.apache.org/jira/browse/SPARK-13792
 Project: Spark
  Issue Type: Sub-task
Reporter: Hossein Falaki


Currently in PERMISSIVE and DROPMALFORMED modes we log any record that is going 
to be ignored. This can generate a lot of logs with large datasets. A better 
idea is to log the first record and the number of subsequent records for each 
partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13760:
-
Assignee: Sameer Agarwal

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
>Priority: Trivial
> Fix For: 1.6.2, 2.0.0
>
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13760:
-
Fix Version/s: (was: 1.6.1)
   1.6.2

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Priority: Trivial
> Fix For: 1.6.2, 2.0.0
>
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13760) Fix BigDecimal constructor for FloatType

2016-03-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13760.
--
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 11597
[https://github.com/apache/spark/pull/11597]

> Fix BigDecimal constructor for FloatType
> 
>
> Key: SPARK-13760
> URL: https://issues.apache.org/jira/browse/SPARK-13760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Priority: Trivial
> Fix For: 2.0.0, 1.6.1
>
>
> Change `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The 
> latter is deprecated and can result in inconsistencies due to an implicit 
> conversion to `Double`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework

2016-03-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13492:
--
Assignee: Sergiusz Urbaniak  (was: Andrew Or)

> Configure a custom webui_url for the Spark Mesos Framework
> --
>
> Key: SPARK-13492
> URL: https://issues.apache.org/jira/browse/SPARK-13492
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sergiusz Urbaniak
>Assignee: Sergiusz Urbaniak
>Priority: Minor
> Fix For: 2.0.0
>
>
> Previously the Mesos framework webui URL was being derived only from the 
> Spark UI address leaving no possibility to configure it. This issue proposes 
> to make it configurable. If unset it falls back to the previous behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework

2016-03-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-13492:
-

Assignee: Andrew Or

> Configure a custom webui_url for the Spark Mesos Framework
> --
>
> Key: SPARK-13492
> URL: https://issues.apache.org/jira/browse/SPARK-13492
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sergiusz Urbaniak
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>
> Previously the Mesos framework webui URL was being derived only from the 
> Spark UI address leaving no possibility to configure it. This issue proposes 
> to make it configurable. If unset it falls back to the previous behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13492) Configure a custom webui_url for the Spark Mesos Framework

2016-03-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13492.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Configure a custom webui_url for the Spark Mesos Framework
> --
>
> Key: SPARK-13492
> URL: https://issues.apache.org/jira/browse/SPARK-13492
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sergiusz Urbaniak
>Priority: Minor
> Fix For: 2.0.0
>
>
> Previously the Mesos framework webui URL was being derived only from the 
> Spark UI address leaving no possibility to configure it. This issue proposes 
> to make it configurable. If unset it falls back to the previous behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13775) history server sort by completed time by default

2016-03-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13775.
---
  Resolution: Fixed
Assignee: Zhuo Liu
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> history server sort by completed time by default
> 
>
> Key: SPARK-13775
> URL: https://issues.apache.org/jira/browse/SPARK-13775
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The new history server ui using datatables sorts by application Id. Lets 
> change it to sort by completed time like it did with the old table format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13778) Master's ApplicationPage displays wrong application executor state when a worker is lost

2016-03-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13778.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Master's ApplicationPage displays wrong application executor state when a 
> worker is lost
> 
>
> Key: SPARK-13778
> URL: https://issues.apache.org/jira/browse/SPARK-13778
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> When a worker is lost, the executors on this worker are also lost. But 
> Master's ApplicationPage still displays their states as running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13791) Add MetadataLog and HDFSMetadataLog

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188467#comment-15188467
 ] 

Apache Spark commented on SPARK-13791:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/11625

> Add MetadataLog and HDFSMetadataLog
> ---
>
> Key: SPARK-13791
> URL: https://issues.apache.org/jira/browse/SPARK-13791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> - Add a MetadataLog interface for  metadata reliably storage.
> - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. 
> - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata 
> by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13791) Add MetadataLog and HDFSMetadataLog

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13791:


Assignee: Shixiong Zhu  (was: Apache Spark)

> Add MetadataLog and HDFSMetadataLog
> ---
>
> Key: SPARK-13791
> URL: https://issues.apache.org/jira/browse/SPARK-13791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> - Add a MetadataLog interface for  metadata reliably storage.
> - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. 
> - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata 
> by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13791) Add MetadataLog and HDFSMetadataLog

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13791:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Add MetadataLog and HDFSMetadataLog
> ---
>
> Key: SPARK-13791
> URL: https://issues.apache.org/jira/browse/SPARK-13791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> - Add a MetadataLog interface for  metadata reliably storage.
> - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. 
> - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata 
> by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13791) Add MetadataLog and HDFSMetadataLog

2016-03-09 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-13791:


 Summary: Add MetadataLog and HDFSMetadataLog
 Key: SPARK-13791
 URL: https://issues.apache.org/jira/browse/SPARK-13791
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu



- Add a MetadataLog interface for  metadata reliably storage.
- Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. 
- Update FileStreamSource to use HDFSMetadataLog instead of managing metadata 
by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-13782) Model export/import for spark.ml: BisectingKMeans

2016-03-09 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-13782:
--
Comment: was deleted

(was: Hi, [~josephkb]. 
May I work on this issue?)

> Model export/import for spark.ml: BisectingKMeans
> -
>
> Key: SPARK-13782
> URL: https://issues.apache.org/jira/browse/SPARK-13782
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2016-03-09 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-13747.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> Concurrent execution in SQL doesn't work with Scala ForkJoinPool
> 
>
> Key: SPARK-13747
> URL: https://issues.apache.org/jira/browse/SPARK-13747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Shixiong Zhu
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> Run the following codes may fail
> {code}
> (1 to 100).par.foreach { _ =>
>   println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count())
> }
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> This is because SparkContext.runJob can be suspended when using a 
> ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it 
> calls Await.ready (introduced by https://github.com/apache/spark/pull/9264).
> So when SparkContext.runJob is suspended, ForkJoinPool will run another task 
> in the same thread, however, the local properties has been polluted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13790) Speed up ColumnVector's getDecimal

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13790:


Assignee: Apache Spark

> Speed up ColumnVector's getDecimal
> --
>
> Key: SPARK-13790
> URL: https://issues.apache.org/jira/browse/SPARK-13790
> Project: Spark
>  Issue Type: Improvement
>Reporter: Nong Li
>Assignee: Apache Spark
>Priority: Minor
>
> This should reuse a decimal object for the simple case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13790) Speed up ColumnVector's getDecimal

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188451#comment-15188451
 ] 

Apache Spark commented on SPARK-13790:
--

User 'nongli' has created a pull request for this issue:
https://github.com/apache/spark/pull/11624

> Speed up ColumnVector's getDecimal
> --
>
> Key: SPARK-13790
> URL: https://issues.apache.org/jira/browse/SPARK-13790
> Project: Spark
>  Issue Type: Improvement
>Reporter: Nong Li
>Priority: Minor
>
> This should reuse a decimal object for the simple case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13790) Speed up ColumnVector's getDecimal

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13790:


Assignee: (was: Apache Spark)

> Speed up ColumnVector's getDecimal
> --
>
> Key: SPARK-13790
> URL: https://issues.apache.org/jira/browse/SPARK-13790
> Project: Spark
>  Issue Type: Improvement
>Reporter: Nong Li
>Priority: Minor
>
> This should reuse a decimal object for the simple case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12555) Datasets: data is corrupted when input data is reordered

2016-03-09 Thread Luciano Resende (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188447#comment-15188447
 ] 

Luciano Resende commented on SPARK-12555:
-

This issue is still reproducible in Spark 1.6.x but seems resolved in 2.x. I 
have added a test case in trunk (PR #11623)  to avoid future regression, but 
please let us know if there is a need to backport fixes.

> Datasets: data is corrupted when input data is reordered
> 
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: ALL platforms on 1.6
>Reporter: Tim Preece
>  Labels: big-endian
>
> Testcase
> ---
> {code}
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.Dataset
> case class people(age: Int, name: String)
> object nameAgg extends Aggregator[people, String, String] {
>   def zero: String = ""
>   def reduce(b: String, a: people): String = a.name + b
>   def merge(b1: String, b2: String): String = b1 + b2
>   def finish(r: String): String = r
> }
> object DataSetAgg {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("DataSetAgg")
> val spark = new SparkContext(conf)
> val sqlContext = new SQLContext(spark)
> import sqlContext.implicits._
> val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS 
> name, 1279869254 AS age").as[people]
> peopleds.groupBy(_.age).agg(nameAgg.toColumn).show()
>   }
> }
> {code}
> Result ( on a Little Endian Platform )
> 
> {noformat}
> +--+--+
> |_1|_2|
> +--+--+
> |1279869254|FAILTi|
> +--+--+
> {noformat}
> Explanation
> ---
> Internally the String variable in the unsafe row is not updated after an 
> unsafe row join operation.
> The displayed string is corrupted and shows part of the integer ( interpreted 
> as a string ) along with "Ti"
> The column names also look different on a Little Endian platform.
> Result ( on a Big Endian Platform )
> {noformat}
> +--+--+
> | value|nameAgg$(name,age)|
> +--+--+
> |1279869254|LIAFTi|
> +--+--+
> {noformat}
> The following Unit test also fails ( but only explicitly on a Big Endian 
> platorm )
> {code}
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
>   Results do not match for query:
>   == Parsed Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Analyzed Logical Plan ==
>   value: string, ClassInputAgg$(b,a): int
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Optimized Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Physical Plan ==
>   TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], 
> output=[value#748,ClassInputAgg$(b,a)#762])
>   +- TungstenExchange hashpartitioning(value#748,5), None
>  +- TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], 
> output=[value#748,value#758])
> +- !AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>+- Project [one AS b#650,1 AS a#651]
>   +- Scan OneRowRelation[]
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>   ![one,1][one,9] (QueryTest.scala:127)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13790) Speed up ColumnVector's getDecimal

2016-03-09 Thread Nong Li (JIRA)
Nong Li created SPARK-13790:
---

 Summary: Speed up ColumnVector's getDecimal
 Key: SPARK-13790
 URL: https://issues.apache.org/jira/browse/SPARK-13790
 Project: Spark
  Issue Type: Improvement
Reporter: Nong Li
Priority: Minor


This should reuse a decimal object for the simple case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12555) Datasets: data is corrupted when input data is reordered

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12555:


Assignee: (was: Apache Spark)

> Datasets: data is corrupted when input data is reordered
> 
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: ALL platforms on 1.6
>Reporter: Tim Preece
>  Labels: big-endian
>
> Testcase
> ---
> {code}
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.Dataset
> case class people(age: Int, name: String)
> object nameAgg extends Aggregator[people, String, String] {
>   def zero: String = ""
>   def reduce(b: String, a: people): String = a.name + b
>   def merge(b1: String, b2: String): String = b1 + b2
>   def finish(r: String): String = r
> }
> object DataSetAgg {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("DataSetAgg")
> val spark = new SparkContext(conf)
> val sqlContext = new SQLContext(spark)
> import sqlContext.implicits._
> val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS 
> name, 1279869254 AS age").as[people]
> peopleds.groupBy(_.age).agg(nameAgg.toColumn).show()
>   }
> }
> {code}
> Result ( on a Little Endian Platform )
> 
> {noformat}
> +--+--+
> |_1|_2|
> +--+--+
> |1279869254|FAILTi|
> +--+--+
> {noformat}
> Explanation
> ---
> Internally the String variable in the unsafe row is not updated after an 
> unsafe row join operation.
> The displayed string is corrupted and shows part of the integer ( interpreted 
> as a string ) along with "Ti"
> The column names also look different on a Little Endian platform.
> Result ( on a Big Endian Platform )
> {noformat}
> +--+--+
> | value|nameAgg$(name,age)|
> +--+--+
> |1279869254|LIAFTi|
> +--+--+
> {noformat}
> The following Unit test also fails ( but only explicitly on a Big Endian 
> platorm )
> {code}
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
>   Results do not match for query:
>   == Parsed Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Analyzed Logical Plan ==
>   value: string, ClassInputAgg$(b,a): int
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Optimized Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Physical Plan ==
>   TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], 
> output=[value#748,ClassInputAgg$(b,a)#762])
>   +- TungstenExchange hashpartitioning(value#748,5), None
>  +- TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], 
> output=[value#748,value#758])
> +- !AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>+- Project [one AS b#650,1 AS a#651]
>   +- Scan OneRowRelation[]
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>   ![one,1][one,9] (QueryTest.scala:127)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12555) Datasets: data is corrupted when input data is reordered

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188438#comment-15188438
 ] 

Apache Spark commented on SPARK-12555:
--

User 'lresende' has created a pull request for this issue:
https://github.com/apache/spark/pull/11623

> Datasets: data is corrupted when input data is reordered
> 
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: ALL platforms on 1.6
>Reporter: Tim Preece
>  Labels: big-endian
>
> Testcase
> ---
> {code}
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.Dataset
> case class people(age: Int, name: String)
> object nameAgg extends Aggregator[people, String, String] {
>   def zero: String = ""
>   def reduce(b: String, a: people): String = a.name + b
>   def merge(b1: String, b2: String): String = b1 + b2
>   def finish(r: String): String = r
> }
> object DataSetAgg {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("DataSetAgg")
> val spark = new SparkContext(conf)
> val sqlContext = new SQLContext(spark)
> import sqlContext.implicits._
> val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS 
> name, 1279869254 AS age").as[people]
> peopleds.groupBy(_.age).agg(nameAgg.toColumn).show()
>   }
> }
> {code}
> Result ( on a Little Endian Platform )
> 
> {noformat}
> +--+--+
> |_1|_2|
> +--+--+
> |1279869254|FAILTi|
> +--+--+
> {noformat}
> Explanation
> ---
> Internally the String variable in the unsafe row is not updated after an 
> unsafe row join operation.
> The displayed string is corrupted and shows part of the integer ( interpreted 
> as a string ) along with "Ti"
> The column names also look different on a Little Endian platform.
> Result ( on a Big Endian Platform )
> {noformat}
> +--+--+
> | value|nameAgg$(name,age)|
> +--+--+
> |1279869254|LIAFTi|
> +--+--+
> {noformat}
> The following Unit test also fails ( but only explicitly on a Big Endian 
> platorm )
> {code}
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
>   Results do not match for query:
>   == Parsed Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Analyzed Logical Plan ==
>   value: string, ClassInputAgg$(b,a): int
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Optimized Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Physical Plan ==
>   TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], 
> output=[value#748,ClassInputAgg$(b,a)#762])
>   +- TungstenExchange hashpartitioning(value#748,5), None
>  +- TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], 
> output=[value#748,value#758])
> +- !AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>+- Project [one AS b#650,1 AS a#651]
>   +- Scan OneRowRelation[]
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>   ![one,1][one,9] (QueryTest.scala:127)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12555) Datasets: data is corrupted when input data is reordered

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12555:


Assignee: Apache Spark

> Datasets: data is corrupted when input data is reordered
> 
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: ALL platforms on 1.6
>Reporter: Tim Preece
>Assignee: Apache Spark
>  Labels: big-endian
>
> Testcase
> ---
> {code}
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.Dataset
> case class people(age: Int, name: String)
> object nameAgg extends Aggregator[people, String, String] {
>   def zero: String = ""
>   def reduce(b: String, a: people): String = a.name + b
>   def merge(b1: String, b2: String): String = b1 + b2
>   def finish(r: String): String = r
> }
> object DataSetAgg {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("DataSetAgg")
> val spark = new SparkContext(conf)
> val sqlContext = new SQLContext(spark)
> import sqlContext.implicits._
> val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS 
> name, 1279869254 AS age").as[people]
> peopleds.groupBy(_.age).agg(nameAgg.toColumn).show()
>   }
> }
> {code}
> Result ( on a Little Endian Platform )
> 
> {noformat}
> +--+--+
> |_1|_2|
> +--+--+
> |1279869254|FAILTi|
> +--+--+
> {noformat}
> Explanation
> ---
> Internally the String variable in the unsafe row is not updated after an 
> unsafe row join operation.
> The displayed string is corrupted and shows part of the integer ( interpreted 
> as a string ) along with "Ti"
> The column names also look different on a Little Endian platform.
> Result ( on a Big Endian Platform )
> {noformat}
> +--+--+
> | value|nameAgg$(name,age)|
> +--+--+
> |1279869254|LIAFTi|
> +--+--+
> {noformat}
> The following Unit test also fails ( but only explicitly on a Big Endian 
> platorm )
> {code}
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
>   Results do not match for query:
>   == Parsed Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Analyzed Logical Plan ==
>   value: string, ClassInputAgg$(b,a): int
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Optimized Logical Plan ==
>   Aggregate [value#748], 
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS 
> ClassInputAgg$(b,a)#762]
>   +- AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>  +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>   
>   == Physical Plan ==
>   TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], 
> output=[value#748,ClassInputAgg$(b,a)#762])
>   +- TungstenExchange hashpartitioning(value#748,5), None
>  +- TungstenAggregate(key=[value#748], 
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], 
> output=[value#748,value#758])
> +- !AppendColumns , class[a[0]: int, b[0]: string], 
> class[value[0]: string], [value#748]
>+- Project [one AS b#650,1 AS a#651]
>   +- Scan OneRowRelation[]
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
>   ![one,1][one,9] (QueryTest.scala:127)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13787) Feature importances for decision trees in Python

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13787:


Assignee: (was: Apache Spark)

> Feature importances for decision trees in Python
> 
>
> Key: SPARK-13787
> URL: https://issues.apache.org/jira/browse/SPARK-13787
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> Expose feature importances for pyspark.ml DecisionTreeClassifier, 
> DecisionTreeRegressor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13787) Feature importances for decision trees in Python

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188414#comment-15188414
 ] 

Apache Spark commented on SPARK-13787:
--

User 'sethah' has created a pull request for this issue:
https://github.com/apache/spark/pull/11622

> Feature importances for decision trees in Python
> 
>
> Key: SPARK-13787
> URL: https://issues.apache.org/jira/browse/SPARK-13787
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> Expose feature importances for pyspark.ml DecisionTreeClassifier, 
> DecisionTreeRegressor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13787) Feature importances for decision trees in Python

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13787:


Assignee: Apache Spark

> Feature importances for decision trees in Python
> 
>
> Key: SPARK-13787
> URL: https://issues.apache.org/jira/browse/SPARK-13787
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> Expose feature importances for pyspark.ml DecisionTreeClassifier, 
> DecisionTreeRegressor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13311) prettyString of IN is not good

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188412#comment-15188412
 ] 

Apache Spark commented on SPARK-13311:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/11514

> prettyString of IN is not good
> --
>
> Key: SPARK-13311
> URL: https://issues.apache.org/jira/browse/SPARK-13311
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Priority: Minor
>
> In(i_class,[Ljava.lang.Object;@1a575883))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.

2016-03-09 Thread zhengcanbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengcanbin updated SPARK-13125:

Attachment: 13134.patch

> makes the ratio of KafkaRDD partition to kafka topic partition  configurable.
> -
>
> Key: SPARK-13125
> URL: https://issues.apache.org/jira/browse/SPARK-13125
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 1.6.1
>Reporter: zhengcanbin
>  Labels: features
> Attachments: 13134.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Now each given Kafka topic/partition corresponds to an RDD partition, in some 
> case it's quite necessary to make this configurable,  namely a ratio 
> configuration of RDDPartition/kafkaTopicPartition is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13134) add 'spark.streaming.kafka.partition.multiplier' into SparkConf

2016-03-09 Thread zhengcanbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengcanbin updated SPARK-13134:

Attachment: 13134.patch

> add 'spark.streaming.kafka.partition.multiplier' into SparkConf
> ---
>
> Key: SPARK-13134
> URL: https://issues.apache.org/jira/browse/SPARK-13134
> Project: Spark
>  Issue Type: Sub-task
>  Components: Input/Output
>Affects Versions: 1.6.1
>Reporter: zhengcanbin
> Attachments: 13134.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-09 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188358#comment-15188358
 ] 

yuhao yang commented on SPARK-13783:


I'm interested. 

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13430:


Assignee: Apache Spark

> Expose ml summary function in PySpark for classification and regression models
> --
>
> Key: SPARK-13430
> URL: https://issues.apache.org/jira/browse/SPARK-13430
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Reporter: Shubhanshu Mishra
>Assignee: Apache Spark
>  Labels: classification, java, ml, mllib, pyspark, regression, 
> scala, sparkr
>
> I think model summary interface which is available in Spark's scala, Java and 
> R interfaces should also be available in the python interface. 
> Similar to #SPARK-11494
> https://issues.apache.org/jira/browse/SPARK-11494



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188336#comment-15188336
 ] 

Apache Spark commented on SPARK-13430:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/11621

> Expose ml summary function in PySpark for classification and regression models
> --
>
> Key: SPARK-13430
> URL: https://issues.apache.org/jira/browse/SPARK-13430
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Reporter: Shubhanshu Mishra
>  Labels: classification, java, ml, mllib, pyspark, regression, 
> scala, sparkr
>
> I think model summary interface which is available in Spark's scala, Java and 
> R interfaces should also be available in the python interface. 
> Similar to #SPARK-11494
> https://issues.apache.org/jira/browse/SPARK-11494



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13430:


Assignee: (was: Apache Spark)

> Expose ml summary function in PySpark for classification and regression models
> --
>
> Key: SPARK-13430
> URL: https://issues.apache.org/jira/browse/SPARK-13430
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Reporter: Shubhanshu Mishra
>  Labels: classification, java, ml, mllib, pyspark, regression, 
> scala, sparkr
>
> I think model summary interface which is available in Spark's scala, Java and 
> R interfaces should also be available in the python interface. 
> Similar to #SPARK-11494
> https://issues.apache.org/jira/browse/SPARK-11494



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13761) Deprecate validateParams

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13761:


Assignee: Apache Spark

> Deprecate validateParams
> 
>
> Key: SPARK-13761
> URL: https://issues.apache.org/jira/browse/SPARK-13761
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> Deprecate validateParams() method here: 
> [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553]
> Move all functionality in overridden methods to transformSchema().
> Check docs to make sure they indicate complex Param interaction checks should 
> be done in transformSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13761) Deprecate validateParams

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188329#comment-15188329
 ] 

Apache Spark commented on SPARK-13761:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/11620

> Deprecate validateParams
> 
>
> Key: SPARK-13761
> URL: https://issues.apache.org/jira/browse/SPARK-13761
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Deprecate validateParams() method here: 
> [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553]
> Move all functionality in overridden methods to transformSchema().
> Check docs to make sure they indicate complex Param interaction checks should 
> be done in transformSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13761) Deprecate validateParams

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13761:


Assignee: (was: Apache Spark)

> Deprecate validateParams
> 
>
> Key: SPARK-13761
> URL: https://issues.apache.org/jira/browse/SPARK-13761
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Deprecate validateParams() method here: 
> [https://github.com/apache/spark/blob/035d3acdf3c1be5b309a861d5c5beb803b946b5e/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L553]
> Move all functionality in overridden methods to transformSchema().
> Check docs to make sure they indicate complex Param interaction checks should 
> be done in transformSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13068) Extend pyspark ml paramtype conversion to support lists

2016-03-09 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188319#comment-15188319
 ] 

Joseph K. Bradley commented on SPARK-13068:
---

You're right that the current implementation would not support nested types 
well.  But I don't think we need full-blown ParamValidators; we really need a 
separate concept in Python than in Scala: Python needs conversion, whereas 
Scala can handle validation.

What if, instead of expectedType, we used a new field "typeConverter."  It 
could be given as an argument to Param and used where expectedType currently is 
used.

We could deprecate expectedType for 2.0 and remove it in 2.1.

How does that sound?

> Extend pyspark ml paramtype conversion to support lists
> ---
>
> Key: SPARK-13068
> URL: https://issues.apache.org/jira/browse/SPARK-13068
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> In SPARK-7675 we added type conversion for PySpark ML params. We should 
> follow up and support param type conversion for lists and nested structures 
> as required. This blocks having all PySpark ML params having type information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13789) Infer additional constraints from attribute equality

2016-03-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188307#comment-15188307
 ] 

Apache Spark commented on SPARK-13789:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/11618

> Infer additional constraints from attribute equality
> 
>
> Key: SPARK-13789
> URL: https://issues.apache.org/jira/browse/SPARK-13789
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Sameer Agarwal
>
> We should be able to infer additional set of data constraints based on 
> attribute equality. For e.g., if an operator has constraints of the form (`a 
> = 5`, `a = b`), we should be able to infer an additional constraint of the 
> form `b = 5`
> cc [~nongli]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13789) Infer additional constraints from attribute equality

2016-03-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13789:


Assignee: (was: Apache Spark)

> Infer additional constraints from attribute equality
> 
>
> Key: SPARK-13789
> URL: https://issues.apache.org/jira/browse/SPARK-13789
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Sameer Agarwal
>
> We should be able to infer additional set of data constraints based on 
> attribute equality. For e.g., if an operator has constraints of the form (`a 
> = 5`, `a = b`), we should be able to infer an additional constraint of the 
> form `b = 5`
> cc [~nongli]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >