[jira] [Resolved] (SPARK-14595) Add inputMetrics to FileScanRDD

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14595.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Add inputMetrics to FileScanRDD
> ---
>
> Key: SPARK-14595
> URL: https://issues.apache.org/jira/browse/SPARK-14595
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12457) Add ExpressionDescription to collection functions

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247221#comment-15247221
 ] 

Apache Spark commented on SPARK-12457:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/12492

> Add ExpressionDescription to collection functions
> -
>
> Key: SPARK-12457
> URL: https://issues.apache.org/jira/browse/SPARK-12457
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table

2016-04-18 Thread Adrian Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247161#comment-15247161
 ] 

Adrian Wang commented on SPARK-14126:
-

Yes, still working.

> [Table related commands] Truncate table
> ---
>
> Key: SPARK-14126
> URL: https://issues.apache.org/jira/browse/SPARK-14126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_TRUNCATETABLE
> We also need to check the behavior of Hive when we call truncate table on a 
> partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14712:


Assignee: Apache Spark

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247159#comment-15247159
 ] 

Apache Spark commented on SPARK-14712:
--

User 'hujy' has created a pull request for this issue:
https://github.com/apache/spark/pull/12491

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14712:


Assignee: (was: Apache Spark)

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247141#comment-15247141
 ] 

hujiayin edited comment on SPARK-14712 at 4/19/16 3:59 AM:
---

Hi Gayathri, I think self has the numFeatures and numClasses defined and I can 
submit a code for this issue.


was (Author: hujiayin):
Hi Murali, I think self has the numFeatures and numClasses defined and I can 
submit a code for this issue.

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247141#comment-15247141
 ] 

hujiayin commented on SPARK-14712:
--

Hi Murali, I think self has the numFeatures and numClasses defined and I can 
submit a code for this issue.

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14687) Call path.getFileSystem(conf) instead of call FileSystem.get(conf)

2016-04-18 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247128#comment-15247128
 ] 

Liwei Lin commented on SPARK-14687:
---

Updated with problem details. Thanks for the reminder! :-)

> Call path.getFileSystem(conf) instead of call FileSystem.get(conf)
> --
>
> Key: SPARK-14687
> URL: https://issues.apache.org/jira/browse/SPARK-14687
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Liwei Lin
>Priority: Minor
>
> Generally we should call path.getFileSystem(conf) instead of call 
> FileSystem.get(conf), because the latter is actually called on the 
> DEFAULT_URI (fs.defaultFS), leading to problems under certain situations:
> - if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is 
> {{hdfs://clusterB/...}}: then we'll encounter 
> {{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., 
> expected: hdfs://clusterA/...)}}
> - if {{fs.defaultFS}} is not specified, the schema will default to 
> {{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException 
> (Wrong FS: hdfs://..., expected: file:///)}}
> - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which 
> is used for federated HDFS): then we'll encounter 
> {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: 
> viewfs:///)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14687) Call path.getFileSystem(conf) instead of call FileSystem.get(conf)

2016-04-18 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-14687:
--
Description: 
Generally we should call path.getFileSystem(conf) instead of call 
FileSystem.get(conf), because the latter is actually called on the DEFAULT_URI 
(fs.defaultFS), leading to problems under certain situations:
- if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is 
{{hdfs://clusterB/...}}: then we'll encounter 
{{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., expected: 
hdfs://clusterA/...)}}
- if {{fs.defaultFS}} is not specified, the schema will default to 
{{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException (Wrong 
FS: hdfs://..., expected: file:///)}}
- if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which is 
used for federated HDFS): then we'll encounter 
{{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: 
viewfs:///)}}


  was:
Generally we should call path.getFileSystem(conf) instead of call 
FileSystem.get(conf), because the latter is actually called on the DEFAULT_URI 
(fs.defaultFS), leading to problems under certain situations:
- if {{fs.defaultFS}} is not specified, the schema will default to {{file:///}}
- if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which is 
used for federated HDFS)
- if {{fs.defaultFS}} is {{hdfs://A/...}}, but path is {{hdfs://B/...}}


> Call path.getFileSystem(conf) instead of call FileSystem.get(conf)
> --
>
> Key: SPARK-14687
> URL: https://issues.apache.org/jira/browse/SPARK-14687
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Liwei Lin
>Priority: Minor
>
> Generally we should call path.getFileSystem(conf) instead of call 
> FileSystem.get(conf), because the latter is actually called on the 
> DEFAULT_URI (fs.defaultFS), leading to problems under certain situations:
> - if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is 
> {{hdfs://clusterB/...}}: then we'll encounter 
> {{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., 
> expected: hdfs://clusterA/...)}}
> - if {{fs.defaultFS}} is not specified, the schema will default to 
> {{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException 
> (Wrong FS: hdfs://..., expected: file:///)}}
> - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which 
> is used for federated HDFS): then we'll encounter 
> {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: 
> viewfs:///)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14724) Improve performance of sorting by using radix sort when possible

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14724:


Assignee: (was: Apache Spark)

> Improve performance of sorting by using radix sort when possible
> 
>
> Key: SPARK-14724
> URL: https://issues.apache.org/jira/browse/SPARK-14724
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Eric Liang
>
> Spark currently uses TimSort for all in-memory sorts, including sorts done 
> for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. 
> sorting by integer keys).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14724) Improve performance of sorting by using radix sort when possible

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247111#comment-15247111
 ] 

Apache Spark commented on SPARK-14724:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/12490

> Improve performance of sorting by using radix sort when possible
> 
>
> Key: SPARK-14724
> URL: https://issues.apache.org/jira/browse/SPARK-14724
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Eric Liang
>
> Spark currently uses TimSort for all in-memory sorts, including sorts done 
> for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. 
> sorting by integer keys).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14724) Improve performance of sorting by using radix sort when possible

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14724:


Assignee: Apache Spark

> Improve performance of sorting by using radix sort when possible
> 
>
> Key: SPARK-14724
> URL: https://issues.apache.org/jira/browse/SPARK-14724
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Eric Liang
>Assignee: Apache Spark
>
> Spark currently uses TimSort for all in-memory sorts, including sorts done 
> for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. 
> sorting by integer keys).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13904) Add support for pluggable cluster manager

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13904.
-
   Resolution: Fixed
 Assignee: Hemant Bhanawat
Fix Version/s: 2.0.0

> Add support for pluggable cluster manager
> -
>
> Key: SPARK-13904
> URL: https://issues.apache.org/jira/browse/SPARK-13904
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Hemant Bhanawat
>Assignee: Hemant Bhanawat
> Fix For: 2.0.0
>
>
> Currently Spark allows only a few cluster managers viz Yarn, Mesos and 
> Standalone. But, as Spark is now being used in newer and different use cases, 
> there is a need for allowing other cluster managers to manage spark 
> components. One such use case is - embedding spark components like executor 
> and driver inside another process which may be a datastore. This allows 
> colocation of data and processing. Another requirement that stems from such a 
> use case is that the executors/driver should not take the parent process down 
> when they go down and the components can be relaunched inside the same 
> process again. 
> So, this JIRA requests two functionalities:
> 1. Support for external cluster managers
> 2. Allow a cluster manager to clean up the tasks without taking the parent 
> process down. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13904) Add support for pluggable cluster manager

2016-04-18 Thread Hemant Bhanawat (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247096#comment-15247096
 ] 

Hemant Bhanawat commented on SPARK-13904:
-

[~kiszk] Since the builds are passing now, can I assume that it was some 
sporadic issue and close this JIRA?

> Add support for pluggable cluster manager
> -
>
> Key: SPARK-13904
> URL: https://issues.apache.org/jira/browse/SPARK-13904
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Hemant Bhanawat
>
> Currently Spark allows only a few cluster managers viz Yarn, Mesos and 
> Standalone. But, as Spark is now being used in newer and different use cases, 
> there is a need for allowing other cluster managers to manage spark 
> components. One such use case is - embedding spark components like executor 
> and driver inside another process which may be a datastore. This allows 
> colocation of data and processing. Another requirement that stems from such a 
> use case is that the executors/driver should not take the parent process down 
> when they go down and the components can be relaunched inside the same 
> process again. 
> So, this JIRA requests two functionalities:
> 1. Support for external cluster managers
> 2. Allow a cluster manager to clean up the tasks without taking the parent 
> process down. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14724) Improve performance of sorting by using radix sort when possible

2016-04-18 Thread Eric Liang (JIRA)
Eric Liang created SPARK-14724:
--

 Summary: Improve performance of sorting by using radix sort when 
possible
 Key: SPARK-14724
 URL: https://issues.apache.org/jira/browse/SPARK-14724
 Project: Spark
  Issue Type: Improvement
Reporter: Eric Liang


Spark currently uses TimSort for all in-memory sorts, including sorts done for 
shuffle. One low-hanging fruit is to use radix sort when possible (e.g. sorting 
by integer keys).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14724) Improve performance of sorting by using radix sort when possible

2016-04-18 Thread Eric Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Liang updated SPARK-14724:
---
Component/s: Spark Core

> Improve performance of sorting by using radix sort when possible
> 
>
> Key: SPARK-14724
> URL: https://issues.apache.org/jira/browse/SPARK-14724
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Eric Liang
>
> Spark currently uses TimSort for all in-memory sorts, including sorts done 
> for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. 
> sorting by integer keys).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14722.
-
   Resolution: Fixed
 Assignee: Sameer Agarwal
Fix Version/s: 2.0.0

> Rename upstreams() -> inputRDDs() in WholeStageCodegen
> --
>
> Key: SPARK-14722
> URL: https://issues.apache.org/jira/browse/SPARK-14722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14718) Avoid mutating ExprCode in doGenCode

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14718.
-
   Resolution: Fixed
 Assignee: Sameer Agarwal
Fix Version/s: 2.0.0

> Avoid mutating ExprCode in doGenCode
> 
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.0.0
>
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14709) spark.ml API for linear SVM

2016-04-18 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882
 ] 

yuhao yang edited comment on SPARK-14709 at 4/19/16 3:23 AM:
-

I'll start on this to give a quick prototype first. If time allows, I'm also 
thinking we should try with SMO.


was (Author: yuhaoyan):
I'll start on this to give a quick prototype first.

> spark.ml API for linear SVM
> ---
>
> Key: SPARK-14709
> URL: https://issues.apache.org/jira/browse/SPARK-14709
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>
> Provide API for SVM algorithm for DataFrames.  I would recommend using 
> OWL-QN, rather than wrapping spark.mllib's SGD-based implementation.
> The API should mimic existing spark.ml.classification APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14701:


Assignee: (was: Apache Spark)

> checkpointWriter is stopped before eventLoop. Hence rejectedExecution 
> exception is coming in StreamingContext.stop
> --
>
> Key: SPARK-14701
> URL: https://issues.apache.org/jira/browse/SPARK-14701
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.1
> Environment: Windows, local[*] mode as well as  Redhat Linux , Yarn 
> Cluster
>Reporter: Sreelal S L
>Priority: Minor
>
> In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the 
> checkpointWriter.stop is called before eventLoop.stop.
> If i call the streamingContext.stop when a batch is about to complete(Im 
> invoking it from a StreamingListener.onBatchCompleted callback) , a 
> rejectedException may get thrown from checkPointWriter.executor, since the 
> eventLoop will try to process DoCheckpoint events even after the 
> checkPointWriter.executor was stopped.
> 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to 
> the thread pool executor
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 
> rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> I think the order of the stopping should be changed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14701:


Assignee: Apache Spark

> checkpointWriter is stopped before eventLoop. Hence rejectedExecution 
> exception is coming in StreamingContext.stop
> --
>
> Key: SPARK-14701
> URL: https://issues.apache.org/jira/browse/SPARK-14701
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.1
> Environment: Windows, local[*] mode as well as  Redhat Linux , Yarn 
> Cluster
>Reporter: Sreelal S L
>Assignee: Apache Spark
>Priority: Minor
>
> In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the 
> checkpointWriter.stop is called before eventLoop.stop.
> If i call the streamingContext.stop when a batch is about to complete(Im 
> invoking it from a StreamingListener.onBatchCompleted callback) , a 
> rejectedException may get thrown from checkPointWriter.executor, since the 
> eventLoop will try to process DoCheckpoint events even after the 
> checkPointWriter.executor was stopped.
> 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to 
> the thread pool executor
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 
> rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> I think the order of the stopping should be changed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247080#comment-15247080
 ] 

Apache Spark commented on SPARK-14701:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12489

> checkpointWriter is stopped before eventLoop. Hence rejectedExecution 
> exception is coming in StreamingContext.stop
> --
>
> Key: SPARK-14701
> URL: https://issues.apache.org/jira/browse/SPARK-14701
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.1
> Environment: Windows, local[*] mode as well as  Redhat Linux , Yarn 
> Cluster
>Reporter: Sreelal S L
>Priority: Minor
>
> In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the 
> checkpointWriter.stop is called before eventLoop.stop.
> If i call the streamingContext.stop when a batch is about to complete(Im 
> invoking it from a StreamingListener.onBatchCompleted callback) , a 
> rejectedException may get thrown from checkPointWriter.executor, since the 
> eventLoop will try to process DoCheckpoint events even after the 
> checkPointWriter.executor was stopped.
> 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to 
> the thread pool executor
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 
> rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
>   at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> I think the order of the stopping should be changed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14723) A new way to support dynamic allocation in Spark Streaming

2016-04-18 Thread WilliamZhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WilliamZhu updated SPARK-14723:
---
Attachment: spark-streaming-dynamic-allocation-desigh.pdf

> A new way to support dynamic allocation in Spark Streaming
> --
>
> Key: SPARK-14723
> URL: https://issues.apache.org/jira/browse/SPARK-14723
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Streaming
>Reporter: WilliamZhu
>  Labels: features
> Fix For: 2.1.0
>
> Attachments: spark-streaming-dynamic-allocation-desigh.pdf
>
>
> Provide a more powerful Algorithm to support dynamic allocation in spark 
> streaming.
> more details: http://www.jianshu.com/p/ae7fdd4746f6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14723) A new way to support dynamic allocation in Spark Streaming

2016-04-18 Thread WilliamZhu (JIRA)
WilliamZhu created SPARK-14723:
--

 Summary: A new way to support dynamic allocation in Spark Streaming
 Key: SPARK-14723
 URL: https://issues.apache.org/jira/browse/SPARK-14723
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Streaming
Reporter: WilliamZhu
 Fix For: 2.1.0


Provide a more powerful Algorithm to support dynamic allocation in spark 
streaming.

more details: http://www.jianshu.com/p/ae7fdd4746f6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR

2016-04-18 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247047#comment-15247047
 ] 

Sun Rui commented on SPARK-12922:
-

[~Narine],
1. Typically users don't care number of partitions in SparkSQL.  If they care, 
they can tune it by setting “spark.sql.shuffle.partitions”. It seems not 
related to implementation of gapply?
2. I think we need support groupBy instead of groupByKey for DataFrame. for 
groupBy, users can specify multiple key columns at once. So a list should be 
used to hold the key columns.

FYI, I have basically implemented dapply(), and is debugging it

> Implement gapply() on DataFrame in SparkR
> -
>
> Key: SPARK-12922
> URL: https://issues.apache.org/jira/browse/SPARK-12922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Sun Rui
>
> gapply() applies an R function on groups grouped by one or more columns of a 
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() 
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
> {code}
> R function input: grouping keys value, a local data.frame of this grouped 
> data 
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must 
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user 
> could do map-side combination via dapply().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-14719.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
> 
>
> Key: SPARK-14719
> URL: https://issues.apache.org/jira/browse/SPARK-14719
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if 
> BlockManager puts fail, even though those puts are only performed as a 
> performance optimization. Instead, it should log and ignore exceptions 
> originating from the block manager put.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14667) Remove HashShuffleManager

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14667.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove HashShuffleManager
> -
>
> Key: SPARK-14667
> URL: https://issues.apache.org/jira/browse/SPARK-14667
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> The sort shuffle manager has been the default since Spark 1.2. It is time to 
> remove the old hash shuffle manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12072) python dataframe ._jdf.schema().json() breaks on large metadata dataframes

2016-04-18 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247017#comment-15247017
 ] 

holdenk commented on SPARK-12072:
-

Any results yet?

> python dataframe ._jdf.schema().json() breaks on large metadata dataframes
> --
>
> Key: SPARK-12072
> URL: https://issues.apache.org/jira/browse/SPARK-12072
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
>Reporter: Rares Mirica
>
> When a dataframe contains a column with a large number of values in ml_attr, 
> schema evaluation will routinely fail on getting the schema as json, this 
> will, in turn, cause a bunch of problems with, eg: calling udfs on the schema 
> because calling columns relies on 
> _parse_datatype_json_string(self._jdf.schema().json())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13227) Risky apply() in OpenHashMap

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13227.
-
   Resolution: Fixed
 Assignee: Nan Zhu
Fix Version/s: 2.0.0

> Risky apply() in OpenHashMap
> 
>
> Key: SPARK-13227
> URL: https://issues.apache.org/jira/browse/SPARK-13227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Nan Zhu
>Assignee: Nan Zhu
>Priority: Minor
> Fix For: 2.0.0
>
>
> It might confuse the future developers when they use OpenHashMap.apply() with 
> a numeric value type.
> null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and 
> null.asInstance[Double] will return 0/0.0/0L, which might confuse the 
> developer if the value set contains 0/0.0/0L with an existing key 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13227) Risky apply() in OpenHashMap

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13227:

Fix Version/s: 1.6.2

> Risky apply() in OpenHashMap
> 
>
> Key: SPARK-13227
> URL: https://issues.apache.org/jira/browse/SPARK-13227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Nan Zhu
>Assignee: Nan Zhu
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> It might confuse the future developers when they use OpenHashMap.apply() with 
> a numeric value type.
> null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and 
> null.asInstance[Double] will return 0/0.0/0L, which might confuse the 
> developer if the value set contains 0/0.0/0L with an existing key 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14706) Python ML persistence integration test

2016-04-18 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246950#comment-15246950
 ] 

Xusen Yin commented on SPARK-14706:
---

I am starting write it.

> Python ML persistence integration test
> --
>
> Key: SPARK-14706
> URL: https://issues.apache.org/jira/browse/SPARK-14706
> Project: Spark
>  Issue Type: Test
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> Goal: extend integration test in {{ml/tests.py}}.
> In the {{PersistenceTest}} suite, there is a method {{_compare_pipelines}}.  
> This issue includes:
> * Extending {{_compare_pipelines}} to handle CrossValidator, 
> TrainValidationSplit, and OneVsRest
> * Adding an integration test in PersistenceTest which includes nested 
> meta-algorithms.  E.g.: {{Pipeline[ CrossValidator[ TrainValidationSplit[ 
> OneVsRest[ LogisticRegression ] ] ] ]}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14602:


Assignee: (was: Apache Spark)

> [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length 
> limit on Windows
> --
>
> Key: SPARK-14602
> URL: https://issues.apache.org/jira/browse/SPARK-14602
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit, Windows, YARN
>Affects Versions: 2.0.0
> Environment: YARN on Windows
>Reporter: Sebastian Kochman
>
> After change https://issues.apache.org/jira/browse/SPARK-11157, which removed 
> a single large Spark assembly in favor of multiple small jars, when you try 
> to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app 
> fails with the following error:
> Diagnostics: The command line has a length of 12046 exceeds maximum allowed 
> length of 8191. Command starts with: @set 
> SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this 
> attempt. Failing the application.
> So basically a large number of jars needed for staging in YARN causes 
> exceeding Windows command line length limit.
> Please see more details in the discussion here:
> https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14602:


Assignee: Apache Spark

> [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length 
> limit on Windows
> --
>
> Key: SPARK-14602
> URL: https://issues.apache.org/jira/browse/SPARK-14602
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit, Windows, YARN
>Affects Versions: 2.0.0
> Environment: YARN on Windows
>Reporter: Sebastian Kochman
>Assignee: Apache Spark
>
> After change https://issues.apache.org/jira/browse/SPARK-11157, which removed 
> a single large Spark assembly in favor of multiple small jars, when you try 
> to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app 
> fails with the following error:
> Diagnostics: The command line has a length of 12046 exceeds maximum allowed 
> length of 8191. Command starts with: @set 
> SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this 
> attempt. Failing the application.
> So basically a large number of jars needed for staging in YARN causes 
> exceeding Windows command line length limit.
> Please see more details in the discussion here:
> https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246942#comment-15246942
 ] 

Apache Spark commented on SPARK-14602:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12487

> [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length 
> limit on Windows
> --
>
> Key: SPARK-14602
> URL: https://issues.apache.org/jira/browse/SPARK-14602
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit, Windows, YARN
>Affects Versions: 2.0.0
> Environment: YARN on Windows
>Reporter: Sebastian Kochman
>
> After change https://issues.apache.org/jira/browse/SPARK-11157, which removed 
> a single large Spark assembly in favor of multiple small jars, when you try 
> to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app 
> fails with the following error:
> Diagnostics: The command line has a length of 12046 exceeds maximum allowed 
> length of 8191. Command starts with: @set 
> SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this 
> attempt. Failing the application.
> So basically a large number of jars needed for staging in YARN causes 
> exceeding Windows command line length limit.
> Please see more details in the discussion here:
> https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13643) Create SparkSession interface

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13643:


Assignee: (was: Apache Spark)

> Create SparkSession interface
> -
>
> Key: SPARK-13643
> URL: https://issues.apache.org/jira/browse/SPARK-13643
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13643) Create SparkSession interface

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246935#comment-15246935
 ] 

Apache Spark commented on SPARK-13643:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/12485

> Create SparkSession interface
> -
>
> Key: SPARK-13643
> URL: https://issues.apache.org/jira/browse/SPARK-13643
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13643) Create SparkSession interface

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13643:


Assignee: Apache Spark

> Create SparkSession interface
> -
>
> Key: SPARK-13643
> URL: https://issues.apache.org/jira/browse/SPARK-13643
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14722:


Assignee: (was: Apache Spark)

> Rename upstreams() -> inputRDDs() in WholeStageCodegen
> --
>
> Key: SPARK-14722
> URL: https://issues.apache.org/jira/browse/SPARK-14722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14722:


Assignee: Apache Spark

> Rename upstreams() -> inputRDDs() in WholeStageCodegen
> --
>
> Key: SPARK-14722
> URL: https://issues.apache.org/jira/browse/SPARK-14722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246903#comment-15246903
 ] 

Apache Spark commented on SPARK-14722:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/12486

> Rename upstreams() -> inputRDDs() in WholeStageCodegen
> --
>
> Key: SPARK-14722
> URL: https://issues.apache.org/jira/browse/SPARK-14722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14709) spark.ml API for linear SVM

2016-04-18 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882
 ] 

yuhao yang edited comment on SPARK-14709 at 4/19/16 1:14 AM:
-

I'll start on this to give a quick prototype first.


was (Author: yuhaoyan):
I'll start on this.

> spark.ml API for linear SVM
> ---
>
> Key: SPARK-14709
> URL: https://issues.apache.org/jira/browse/SPARK-14709
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>
> Provide API for SVM algorithm for DataFrames.  I would recommend using 
> OWL-QN, rather than wrapping spark.mllib's SGD-based implementation.
> The API should mimic existing spark.ml.classification APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen

2016-04-18 Thread Sameer Agarwal (JIRA)
Sameer Agarwal created SPARK-14722:
--

 Summary: Rename upstreams() -> inputRDDs() in WholeStageCodegen
 Key: SPARK-14722
 URL: https://issues.apache.org/jira/browse/SPARK-14722
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Sameer Agarwal






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14720) Move the rest of HiveContext to HiveSessionState

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14720:


Assignee: Apache Spark  (was: Andrew Or)

> Move the rest of HiveContext to HiveSessionState
> 
>
> Key: SPARK-14720
> URL: https://issues.apache.org/jira/browse/SPARK-14720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> This will be a major cleanup task. Unfortunately part of the state will leak 
> to SessionState, which shouldn't know anything about Hive. Part of the effort 
> here is to create a new SparkSession interface (SPARK-13643) and do 
> reflection there to decide which SessionState to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14720) Move the rest of HiveContext to HiveSessionState

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14720:


Assignee: Andrew Or  (was: Apache Spark)

> Move the rest of HiveContext to HiveSessionState
> 
>
> Key: SPARK-14720
> URL: https://issues.apache.org/jira/browse/SPARK-14720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> This will be a major cleanup task. Unfortunately part of the state will leak 
> to SessionState, which shouldn't know anything about Hive. Part of the effort 
> here is to create a new SparkSession interface (SPARK-13643) and do 
> reflection there to decide which SessionState to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14720) Move the rest of HiveContext to HiveSessionState

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246897#comment-15246897
 ] 

Apache Spark commented on SPARK-14720:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/12485

> Move the rest of HiveContext to HiveSessionState
> 
>
> Key: SPARK-14720
> URL: https://issues.apache.org/jira/browse/SPARK-14720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> This will be a major cleanup task. Unfortunately part of the state will leak 
> to SessionState, which shouldn't know anything about Hive. Part of the effort 
> here is to create a new SparkSession interface (SPARK-13643) and do 
> reflection there to decide which SessionState to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14721) Actually remove the HiveContext file itself

2016-04-18 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14721:
-

 Summary: Actually remove the HiveContext file itself
 Key: SPARK-14721
 URL: https://issues.apache.org/jira/browse/SPARK-14721
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14720) Move the rest of HiveContext to HiveSessionState

2016-04-18 Thread Andrew Or (JIRA)
Andrew Or created SPARK-14720:
-

 Summary: Move the rest of HiveContext to HiveSessionState
 Key: SPARK-14720
 URL: https://issues.apache.org/jira/browse/SPARK-14720
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


This will be a major cleanup task. Unfortunately part of the state will leak to 
SessionState, which shouldn't know anything about Hive. Part of the effort here 
is to create a new SparkSession interface (SPARK-13643) and do reflection there 
to decide which SessionState to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14709) spark.ml API for linear SVM

2016-04-18 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882
 ] 

yuhao yang commented on SPARK-14709:


I'll start on this.

> spark.ml API for linear SVM
> ---
>
> Key: SPARK-14709
> URL: https://issues.apache.org/jira/browse/SPARK-14709
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>
> Provide API for SVM algorithm for DataFrames.  I would recommend using 
> OWL-QN, rather than wrapping spark.mllib's SGD-based implementation.
> The API should mimic existing spark.ml.classification APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14711) Examples jar not a part of distribution

2016-04-18 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-14711.

   Resolution: Fixed
 Assignee: Mark Grover
Fix Version/s: 2.0.0

> Examples jar not a part of distribution
> ---
>
> Key: SPARK-14711
> URL: https://issues.apache.org/jira/browse/SPARK-14711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Mark Grover
>Assignee: Mark Grover
> Fix For: 2.0.0
>
>
> While mucking around with some examples, it seems like spark-examples jar is 
> not being included in the distribution tarball. Also, it's not in the 
> classpath in the spark-submit classpath, which means commands like 
> {{run-example}} fail to work, whether a "distribution" tarball is used or a 
> regular {{mvn package}} build.
> The root cause of this may be due to the fact that the spark-examples jar is 
> under {{$SPARK_HOME/examples/target}} while all its dependencies are at 
> {{$SPARK_HOME/examples/target/scala-2.11/jars}}. And, we only seem to be 
> including the jars directory in the classpath. See 
> [here|https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L354]
>  for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases

2016-04-18 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-14714.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12480
[https://github.com/apache/spark/pull/12480]

> PySpark Param TypeConverter arg is not passed by name in some cases
> ---
>
> Key: SPARK-14714
> URL: https://issues.apache.org/jira/browse/SPARK-14714
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
> Fix For: 2.0.0
>
>
> PySpark Param constructors need to pass the TypeConverter argument by name, 
> partly to make sure it is not mistaken for the expectedType arg and partly 
> because we will remove the expectedType arg in 2.1.  In several places, this 
> is not being done correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14515) Add python example for ChiSqSelector

2016-04-18 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-14515.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12283
[https://github.com/apache/spark/pull/12283]

> Add python example for ChiSqSelector
> 
>
> Key: SPARK-14515
> URL: https://issues.apache.org/jira/browse/SPARK-14515
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML, PySpark
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 2.0.0
>
>
> Add the missing python example for ChiSqSelector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14719:


Assignee: Apache Spark  (was: Josh Rosen)

> WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
> 
>
> Key: SPARK-14719
> URL: https://issues.apache.org/jira/browse/SPARK-14719
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if 
> BlockManager puts fail, even though those puts are only performed as a 
> performance optimization. Instead, it should log and ignore exceptions 
> originating from the block manager put.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246811#comment-15246811
 ] 

Apache Spark commented on SPARK-14719:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/12484

> WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
> 
>
> Key: SPARK-14719
> URL: https://issues.apache.org/jira/browse/SPARK-14719
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if 
> BlockManager puts fail, even though those puts are only performed as a 
> performance optimization. Instead, it should log and ignore exceptions 
> originating from the block manager put.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14719:


Assignee: Josh Rosen  (was: Apache Spark)

> WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
> 
>
> Key: SPARK-14719
> URL: https://issues.apache.org/jira/browse/SPARK-14719
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if 
> BlockManager puts fail, even though those puts are only performed as a 
> performance optimization. Instead, it should log and ignore exceptions 
> originating from the block manager put.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-14719:
--

 Summary: WriteAheadLogBasedBlockHandler should ignore BlockManager 
put errors
 Key: SPARK-14719
 URL: https://issues.apache.org/jira/browse/SPARK-14719
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Josh Rosen
Assignee: Josh Rosen


{{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if 
BlockManager puts fail, even though those puts are only performed as a 
performance optimization. Instead, it should log and ignore exceptions 
originating from the block manager put.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13698) Fix Analysis Exceptions when Using Backticks in Generate

2016-04-18 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-13698.
-
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.0.0

> Fix Analysis Exceptions when Using Backticks in Generate
> 
>
> Key: SPARK-13698
> URL: https://issues.apache.org/jira/browse/SPARK-13698
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
> Fix For: 2.0.0
>
>
> Analysis exception occurs while running the following query.
> {code}
> SELECT ints FROM nestedArray LATERAL VIEW explode(a.b) `a` AS `ints`
> {code}
> {code}
> Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot 
> resolve '`ints`' given input columns: [a, `ints`]; line 1 pos 7
> 'Project ['ints]
> +- Generate explode(a#0.b), true, false, Some(a), [`ints`#8]
>+- SubqueryAlias nestedarray
>   +- LocalRelation [a#0], 1,2,3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13698) Fix Analysis Exceptions when Using Backticks in Generate

2016-04-18 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246796#comment-15246796
 ] 

Dilip Biswal commented on SPARK-13698:
--

[~cloud_fan] Hi Wenchen, Can you please help to fix the assignee field for this 
JIRA ? Thanks in advance !!

> Fix Analysis Exceptions when Using Backticks in Generate
> 
>
> Key: SPARK-13698
> URL: https://issues.apache.org/jira/browse/SPARK-13698
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dilip Biswal
>
> Analysis exception occurs while running the following query.
> {code}
> SELECT ints FROM nestedArray LATERAL VIEW explode(a.b) `a` AS `ints`
> {code}
> {code}
> Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot 
> resolve '`ints`' given input columns: [a, `ints`]; line 1 pos 7
> 'Project ['ints]
> +- Generate explode(a#0.b), true, false, Some(a), [`ints`#8]
>+- SubqueryAlias nestedarray
>   +- LocalRelation [a#0], 1,2,3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14718) Avoid mutating ExprCode in doGenCode

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14718:


Assignee: Apache Spark

> Avoid mutating ExprCode in doGenCode
> 
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>Assignee: Apache Spark
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14718) Avoid mutating ExprCode in doGenCode

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246792#comment-15246792
 ] 

Apache Spark commented on SPARK-14718:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/12483

> Avoid mutating ExprCode in doGenCode
> 
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14718) Avoid mutating ExprCode in doGenCode

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14718:


Assignee: (was: Apache Spark)

> Avoid mutating ExprCode in doGenCode
> 
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1239) Improve fetching of map output statuses

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-1239:
---
Target Version/s: 2.0.0

> Improve fetching of map output statuses
> ---
>
> Key: SPARK-1239
> URL: https://issues.apache.org/jira/browse/SPARK-1239
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Patrick Wendell
>Assignee: Thomas Graves
>
> Instead we should modify the way we fetch map output statuses to take both a 
> mapper and a reducer - or we should just piggyback the statuses on each task. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14718) Avoid mutating ExprCode in doGenCode

2016-04-18 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal updated SPARK-14718:
---
Summary: Avoid mutating ExprCode in doGenCode  (was: doGenCode should 
return a new ExprCode and not mutate existing one)

> Avoid mutating ExprCode in doGenCode
> 
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14718) doGenCode should return a new ExprCode and not mutate existing one

2016-04-18 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal updated SPARK-14718:
---
Summary: doGenCode should return a new ExprCode and not mutate existing one 
 (was: doGenCode should return a new ExprCode and note mutate existing one)

> doGenCode should return a new ExprCode and not mutate existing one
> --
>
> Key: SPARK-14718
> URL: https://issues.apache.org/jira/browse/SPARK-14718
> Project: Spark
>  Issue Type: Improvement
>Reporter: Sameer Agarwal
>
> The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
> the java code to evaluate the given expression. It should instead just return 
> a new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14718) doGenCode should return a new ExprCode and note mutate existing one

2016-04-18 Thread Sameer Agarwal (JIRA)
Sameer Agarwal created SPARK-14718:
--

 Summary: doGenCode should return a new ExprCode and note mutate 
existing one
 Key: SPARK-14718
 URL: https://issues.apache.org/jira/browse/SPARK-14718
 Project: Spark
  Issue Type: Improvement
Reporter: Sameer Agarwal


The `doGenCode` method currently takes in an ExprCode, mutates it and returns 
the java code to evaluate the given expression. It should instead just return a 
new ExprCode to avoid passing around mutable objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value

2016-04-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758
 ] 

Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:19 PM:


I can take this [~davies]


was (Author: felixcheung):
I can take this @davies

> Scala, Python APIs for Dataset.unpersist differ in default blocking value
> -
>
> Key: SPARK-14717
> URL: https://issues.apache.org/jira/browse/SPARK-14717
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but 
> in Python, it is set to True by default.  We should presumably make them 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value

2016-04-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758
 ] 

Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:18 PM:


I can take this @davies


was (Author: felixcheung):
I can take this @davies

> Scala, Python APIs for Dataset.unpersist differ in default blocking value
> -
>
> Key: SPARK-14717
> URL: https://issues.apache.org/jira/browse/SPARK-14717
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but 
> in Python, it is set to True by default.  We should presumably make them 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector

2016-04-18 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14515:
--
Shepherd: Joseph K. Bradley
Assignee: zhengruifeng
Target Version/s: 2.0.0
 Component/s: PySpark
  ML

> Add python example for ChiSqSelector
> 
>
> Key: SPARK-14515
> URL: https://issues.apache.org/jira/browse/SPARK-14515
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML, PySpark
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> Add the missing python example for ChiSqSelector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value

2016-04-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758
 ] 

Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:17 PM:


I can take this @davies


was (Author: felixcheung):
I can take this [~davis]


> Scala, Python APIs for Dataset.unpersist differ in default blocking value
> -
>
> Key: SPARK-14717
> URL: https://issues.apache.org/jira/browse/SPARK-14717
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but 
> in Python, it is set to True by default.  We should presumably make them 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value

2016-04-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758
 ] 

Felix Cheung commented on SPARK-14717:
--

I can take this [~davis]


> Scala, Python APIs for Dataset.unpersist differ in default blocking value
> -
>
> Key: SPARK-14717
> URL: https://issues.apache.org/jira/browse/SPARK-14717
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but 
> in Python, it is set to True by default.  We should presumably make them 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector

2016-04-18 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14515:
--
Priority: Minor  (was: Major)

> Add python example for ChiSqSelector
> 
>
> Key: SPARK-14515
> URL: https://issues.apache.org/jira/browse/SPARK-14515
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: zhengruifeng
>Priority: Minor
>
> Add the missing python example for ChiSqSelector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector

2016-04-18 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14515:
--
Issue Type: Documentation  (was: Improvement)

> Add python example for ChiSqSelector
> 
>
> Key: SPARK-14515
> URL: https://issues.apache.org/jira/browse/SPARK-14515
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: zhengruifeng
>
> Add the missing python example for ChiSqSelector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value

2016-04-18 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-14717:
-

 Summary: Scala, Python APIs for Dataset.unpersist differ in 
default blocking value
 Key: SPARK-14717
 URL: https://issues.apache.org/jira/browse/SPARK-14717
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Joseph K. Bradley
Priority: Minor


In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but in 
Python, it is set to True by default.  We should presumably make them 
consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14716) Add partitioned parquet support file stream sink

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246658#comment-15246658
 ] 

Apache Spark commented on SPARK-14716:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/12409

> Add partitioned parquet support file stream sink
> 
>
> Key: SPARK-14716
> URL: https://issues.apache.org/jira/browse/SPARK-14716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14716) Add partitioned parquet support file stream sink

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14716:


Assignee: (was: Apache Spark)

> Add partitioned parquet support file stream sink
> 
>
> Key: SPARK-14716
> URL: https://issues.apache.org/jira/browse/SPARK-14716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14716) Add partitioned parquet support file stream sink

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14716:


Assignee: Apache Spark

> Add partitioned parquet support file stream sink
> 
>
> Key: SPARK-14716
> URL: https://issues.apache.org/jira/browse/SPARK-14716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14716) Add partitioned parquet support file stream sink

2016-04-18 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-14716:
--
Summary: Add partitioned parquet support file stream sink  (was: Added 
partitioned parquet support file stream sink)

> Add partitioned parquet support file stream sink
> 
>
> Key: SPARK-14716
> URL: https://issues.apache.org/jira/browse/SPARK-14716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14716) Added partitioned parquet support file stream sink

2016-04-18 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-14716:
-

 Summary: Added partitioned parquet support file stream sink
 Key: SPARK-14716
 URL: https://issues.apache.org/jira/browse/SPARK-14716
 Project: Spark
  Issue Type: Sub-task
Reporter: Tathagata Das






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14713) Fix flaky test: o.a.s.network.netty.NettyBlockTransferServiceSuite.can bind to a specific port twice and the second increments

2016-04-18 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-14713.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix flaky test: o.a.s.network.netty.NettyBlockTransferServiceSuite.can bind 
> to a specific port twice and the second increments
> --
>
> Key: SPARK-14713
> URL: https://issues.apache.org/jira/browse/SPARK-14713
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> When there are multiple tests running, "NettyBlockTransferServiceSuite.can 
> bind to a specific port twice and the second increments" may be flaky. See:  
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.2/786/testReport/junit/org.apache.spark.network.netty/NettyBlockTransferServiceSuite/can_bind_to_a_specific_port_twice_and_the_second_increments/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14699) Driver is marked as failed even it runs successfully

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246609#comment-15246609
 ] 

Apache Spark commented on SPARK-14699:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/12481

> Driver is marked as failed even it runs successfully
> 
>
> Key: SPARK-14699
> URL: https://issues.apache.org/jira/browse/SPARK-14699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
> Environment: Standalone deployment
>Reporter: Huiqiang Liu
>
> We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs 
> are marked as failed.
> To address this issue, we wrote a simple test application which just sum up 
> from 1 to 1 and it is marked as failed even though its result was correct.
> Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: 
> Lost connection to worker rpc" when driver exits.
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at 
> SparkBatchTest.scala:19) finished in 0.052 s
> 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, 
> whose tasks have all completed, from pool
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at 
> SparkBatchTest.scala:19, took 0.061177 s
> 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc 
> endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. 
> Exiting.
> 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs,null}
> 1

[jira] [Assigned] (SPARK-14699) Driver is marked as failed even it runs successfully

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14699:


Assignee: (was: Apache Spark)

> Driver is marked as failed even it runs successfully
> 
>
> Key: SPARK-14699
> URL: https://issues.apache.org/jira/browse/SPARK-14699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
> Environment: Standalone deployment
>Reporter: Huiqiang Liu
>
> We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs 
> are marked as failed.
> To address this issue, we wrote a simple test application which just sum up 
> from 1 to 1 and it is marked as failed even though its result was correct.
> Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: 
> Lost connection to worker rpc" when driver exits.
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at 
> SparkBatchTest.scala:19) finished in 0.052 s
> 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, 
> whose tasks have all completed, from pool
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at 
> SparkBatchTest.scala:19, took 0.061177 s
> 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc 
> endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. 
> Exiting.
> 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs,null}
> 16/04/14 06:20:41 INFO spark.ContextCleaner: Cleaned accumulator 2
> 16/04/14 06:20:41 INFO storage.BlockManagerInf

[jira] [Assigned] (SPARK-14699) Driver is marked as failed even it runs successfully

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14699:


Assignee: Apache Spark

> Driver is marked as failed even it runs successfully
> 
>
> Key: SPARK-14699
> URL: https://issues.apache.org/jira/browse/SPARK-14699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
> Environment: Standalone deployment
>Reporter: Huiqiang Liu
>Assignee: Apache Spark
>
> We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs 
> are marked as failed.
> To address this issue, we wrote a simple test application which just sum up 
> from 1 to 1 and it is marked as failed even though its result was correct.
> Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: 
> Lost connection to worker rpc" when driver exits.
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at 
> SparkBatchTest.scala:19) finished in 0.052 s
> 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, 
> whose tasks have all completed, from pool
> 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at 
> SparkBatchTest.scala:19, took 0.061177 s
> 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc 
> endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. 
> Exiting.
> 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 
> on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 
> MB)
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/json,null}
> 16/04/14 06:20:41 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs,null}
> 16/04/14 06:20:41 INFO spark.ContextCleaner: Cleaned accumulator 2
> 16/04/14 06:20:41 INF

[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model

2016-04-18 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246600#comment-15246600
 ] 

Gayathri Murali commented on SPARK-14712:
-

_repr_ is defined for LabeledPoint and LinearModel in mllib.regression not with 
LogisticRegressionModel. Would you like to add this for LogisticRegressionModel 
in both ml and mllib? 

> spark.ml LogisticRegressionModel.toString should summarize model
> 
>
> Key: SPARK-14712
> URL: https://issues.apache.org/jira/browse/SPARK-14712
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> spark.mllib LogisticRegressionModel overrides toString to print a little 
> model info.  We should do the same in spark.ml.  I'd recommend:
> * super.toString
> * numClasses
> * numFeatures
> We should also override {{__repr__}} in pyspark to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14504) Enable Oracle docker integration tests

2016-04-18 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-14504.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12270
[https://github.com/apache/spark/pull/12270]

> Enable Oracle docker integration tests
> --
>
> Key: SPARK-14504
> URL: https://issues.apache.org/jira/browse/SPARK-14504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>Priority: Minor
> Fix For: 2.0.0
>
>
> Enable Oracle docker integration tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14504) Enable Oracle docker integration tests

2016-04-18 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14504:
---
Assignee: Luciano Resende

> Enable Oracle docker integration tests
> --
>
> Key: SPARK-14504
> URL: https://issues.apache.org/jira/browse/SPARK-14504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>Assignee: Luciano Resende
>Priority: Minor
> Fix For: 2.0.0
>
>
> Enable Oracle docker integration tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14674) Move HiveContext.hiveconf to HiveSessionState

2016-04-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14674.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12449
[https://github.com/apache/spark/pull/12449]

> Move HiveContext.hiveconf to HiveSessionState
> -
>
> Key: SPARK-14674
> URL: https://issues.apache.org/jira/browse/SPARK-14674
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> Just a minor cleanup. This allows us to remove HiveContext later without 
> inflating the diff too much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14710) Rename gen/genCode to genCode/doGenCode to better reflect the semantics

2016-04-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14710.
-
   Resolution: Fixed
 Assignee: Sameer Agarwal
Fix Version/s: 2.0.0

> Rename gen/genCode to genCode/doGenCode to better reflect the semantics
> ---
>
> Key: SPARK-14710
> URL: https://issues.apache.org/jira/browse/SPARK-14710
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14489) RegressionEvaluator returns NaN for ALS in Spark ml

2016-04-18 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246538#comment-15246538
 ] 

Joseph K. Bradley commented on SPARK-14489:
---

I agree that it's unclear what to do with a new item.  I don't think there are 
any good options and would support either not tolerating or ignoring new items.

> RegressionEvaluator returns NaN for ALS in Spark ml
> ---
>
> Key: SPARK-14489
> URL: https://issues.apache.org/jira/browse/SPARK-14489
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
> Environment: AWS EMR
>Reporter: Boris Clémençon 
>  Labels: patch
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When building a Spark ML pipeline containing an ALS estimator, the metrics 
> "rmse", "mse", "r2" and "mae" all return NaN. 
> The reason is in CrossValidator.scala line 109. The K-folds are randomly 
> generated. For large and sparse datasets, there is a significant probability 
> that at least one user of the validation set is missing in the training set, 
> hence generating a few NaN estimation with transform method and NaN 
> RegressionEvaluator's metrics too. 
> Suggestion to fix the bug: remove the NaN values while computing the rmse or 
> other metrics (ie, removing users or items in validation test that is missing 
> in the learning set). Send logs when this happen.
> Issue SPARK-14153 seems to be the same pbm
> {code:title=Bar.scala|borderStyle=solid}
> val splits = MLUtils.kFold(dataset.rdd, $(numFolds), 0)
> splits.zipWithIndex.foreach { case ((training, validation), splitIndex) =>
>   val trainingDataset = sqlCtx.createDataFrame(training, schema).cache()
>   val validationDataset = sqlCtx.createDataFrame(validation, 
> schema).cache()
>   // multi-model training
>   logDebug(s"Train split $splitIndex with multiple sets of parameters.")
>   val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]]
>   trainingDataset.unpersist()
>   var i = 0
>   while (i < numModels) {
> // TODO: duplicate evaluator to take extra params from input
> val metric = eval.evaluate(models(i).transform(validationDataset, 
> epm(i)))
> logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
> metrics(i) += metric
> i += 1
>   }
>   validationDataset.unpersist()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7264) SparkR API for parallel functions

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7264:
-
Target Version/s: 2.0.0

> SparkR API for parallel functions
> -
>
> Key: SPARK-7264
> URL: https://issues.apache.org/jira/browse/SPARK-7264
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Timothy Hunter
>
> This is a JIRA to discuss design proposals for enabling parallel R 
> computation in SparkR without exposing the entire RDD API. 
> The rationale for this is that the RDD API has a number of low level 
> functions and we would like to expose a more light-weight API that is both 
> friendly to R users and easy to maintain.
> http://goo.gl/GLHKZI has a first cut design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7264) SparkR API for parallel functions

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7264:
-
Assignee: Timothy Hunter

> SparkR API for parallel functions
> -
>
> Key: SPARK-7264
> URL: https://issues.apache.org/jira/browse/SPARK-7264
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Timothy Hunter
>
> This is a JIRA to discuss design proposals for enabling parallel R 
> computation in SparkR without exposing the entire RDD API. 
> The rationale for this is that the RDD API has a number of low level 
> functions and we would like to expose a more light-weight API that is both 
> friendly to R users and easy to maintain.
> http://goo.gl/GLHKZI has a first cut design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14604) Modify design of ML model summaries

2016-04-18 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246489#comment-15246489
 ] 

Gayathri Murali commented on SPARK-14604:
-

[~josephkb] I see that LogisticRegression has a evaluate method. Would you like 
to add a similar one to LinearRegressionModel and GLM? Also LogisticRegression 
Summary does not store model while Linear and GLM does. 

> Modify design of ML model summaries
> ---
>
> Key: SPARK-14604
> URL: https://issues.apache.org/jira/browse/SPARK-14604
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> Several spark.ml models now have summaries containing evaluation metrics and 
> training info:
> * LinearRegressionModel
> * LogisticRegressionModel
> * GeneralizedLinearRegressionModel
> These summaries have unfortunately been added in an inconsistent way.  I 
> propose to reorganize them to have:
> * For each model, 1 summary (without training info) and 1 training summary 
> (with info from training).  The non-training summary can be produced for a 
> new dataset via {{evaluate}}.
> * A summary should not store the model itself.
> * A summary should provide a transient reference to the dataset used to 
> produce the summary.
> This task will involve reorganizing the GLM summary (which lacks a 
> training/non-training distinction) and deprecating the model method in the 
> LinearRegressionSummary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-14299:
--
Assignee: Xusen Yin

> Scala ML examples code merge and clean up
> -
>
> Key: SPARK-14299
> URL: https://issues.apache.org/jira/browse/SPARK-14299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>Priority: Minor
>  Labels: starter
> Fix For: 2.0.0
>
>
> Duplicated code that I found in scala/examples/ml:
> * scala/ml
> ** CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample
> ** TrainValidationSplitExample.scala --> 
> ModelSelectionViaTrainValidationSplitExample
> ** DeveloperApiExample.scala --> I delete it for now because it's only about 
> how to create your own classifieri, etc, which can be learned easily from 
> other examples and ml codes.
> ** SimpleParamsExample.scala --> merge with 
> LogisticRegressionSummaryExample.scala
> ** SimpleTextClassificationPipeline.scala --> 
> ModelSelectionViaCrossValidationExample
> ** DataFrameExample.scala --> merge with 
> LogisticRegressionSummaryExample.scala
> * Intend to reserve with command-line support:
> ** DecisionTreeExample.scala --> DecisionTreeRegressionExample, 
> DecisionTreeClassificationExample
> ** GBTExample.scala --> GradientBoostedTreeClassifierExample, 
> GradientBoostedTreeRegressorExample
> ** LinearRegressionExample.scala --> LinearRegressionWithElasticNetExample
> ** LogisticRegressionExample.scala --> 
> LogisticRegressionWithElasticNetExample, LogisticRegressionSummaryExample
> ** RandomForestExample.scala --> RandomForestRegressorExample, 
> RandomForestClassifierExample
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.
> I'll take this one as an example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14299) Scala ML examples code merge and clean up

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-14299.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12366
[https://github.com/apache/spark/pull/12366]

> Scala ML examples code merge and clean up
> -
>
> Key: SPARK-14299
> URL: https://issues.apache.org/jira/browse/SPARK-14299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
> Fix For: 2.0.0
>
>
> Duplicated code that I found in scala/examples/ml:
> * scala/ml
> ** CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample
> ** TrainValidationSplitExample.scala --> 
> ModelSelectionViaTrainValidationSplitExample
> ** DeveloperApiExample.scala --> I delete it for now because it's only about 
> how to create your own classifieri, etc, which can be learned easily from 
> other examples and ml codes.
> ** SimpleParamsExample.scala --> merge with 
> LogisticRegressionSummaryExample.scala
> ** SimpleTextClassificationPipeline.scala --> 
> ModelSelectionViaCrossValidationExample
> ** DataFrameExample.scala --> merge with 
> LogisticRegressionSummaryExample.scala
> * Intend to reserve with command-line support:
> ** DecisionTreeExample.scala --> DecisionTreeRegressionExample, 
> DecisionTreeClassificationExample
> ** GBTExample.scala --> GradientBoostedTreeClassifierExample, 
> GradientBoostedTreeRegressorExample
> ** LinearRegressionExample.scala --> LinearRegressionWithElasticNetExample
> ** LogisticRegressionExample.scala --> 
> LogisticRegressionWithElasticNetExample, LogisticRegressionSummaryExample
> ** RandomForestExample.scala --> RandomForestRegressorExample, 
> RandomForestClassifierExample
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.
> I'll take this one as an example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-14440:
--
Assignee: Xusen Yin

> Remove PySpark ml.pipeline's specific Reader and Writer
> ---
>
> Key: SPARK-14440
> URL: https://issues.apache.org/jira/browse/SPARK-14440
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Since the 
> PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader 
> are just extended from JavaMLWriter and JavaMLReader without other 
> modifications of attributes and methods, there is no need to keep them, just 
> like what we did in the save/load of ml/tuning.py.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-18 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-14440.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12216
[https://github.com/apache/spark/pull/12216]

> Remove PySpark ml.pipeline's specific Reader and Writer
> ---
>
> Key: SPARK-14440
> URL: https://issues.apache.org/jira/browse/SPARK-14440
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Xusen Yin
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Since the 
> PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader 
> are just extended from JavaMLWriter and JavaMLReader without other 
> modifications of attributes and methods, there is no need to keep them, just 
> like what we did in the save/load of ml/tuning.py.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14715) Provide a way to mask partitions of a Dataset/Dataframe

2016-04-18 Thread Anderson de Andrade (JIRA)
Anderson de Andrade created SPARK-14715:
---

 Summary: Provide a way to mask partitions of a Dataset/Dataframe
 Key: SPARK-14715
 URL: https://issues.apache.org/jira/browse/SPARK-14715
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Anderson de Andrade


If a Dataset/Dataframe were to have a custom partitioning by key(s), it would 
be very efficient to just mask partitions when filtering by the same key(s). 
This feature is already provide by PartitionPruningRDD on RDDs. We need 
something similar on the Dataset/Dataframe space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14647) Group SQLContext/HiveContext state into PersistentState

2016-04-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14647.
--
Resolution: Fixed

Issue resolved by pull request 12463
[https://github.com/apache/spark/pull/12463]

> Group SQLContext/HiveContext state into PersistentState
> ---
>
> Key: SPARK-14647
> URL: https://issues.apache.org/jira/browse/SPARK-14647
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> This is analogous to SPARK-13526, which moved some things into 
> `SessionState`. After this issue we'll have an analogous `PersistentState` 
> that groups things to be shared across sessions. This will simplify the 
> constructors of the contexts significantly by allowing us to pass fewer 
> things into the contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14714:


Assignee: Apache Spark  (was: Joseph K. Bradley)

> PySpark Param TypeConverter arg is not passed by name in some cases
> ---
>
> Key: SPARK-14714
> URL: https://issues.apache.org/jira/browse/SPARK-14714
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> PySpark Param constructors need to pass the TypeConverter argument by name, 
> partly to make sure it is not mistaken for the expectedType arg and partly 
> because we will remove the expectedType arg in 2.1.  In several places, this 
> is not being done correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases

2016-04-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14714:


Assignee: Joseph K. Bradley  (was: Apache Spark)

> PySpark Param TypeConverter arg is not passed by name in some cases
> ---
>
> Key: SPARK-14714
> URL: https://issues.apache.org/jira/browse/SPARK-14714
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> PySpark Param constructors need to pass the TypeConverter argument by name, 
> partly to make sure it is not mistaken for the expectedType arg and partly 
> because we will remove the expectedType arg in 2.1.  In several places, this 
> is not being done correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases

2016-04-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246443#comment-15246443
 ] 

Apache Spark commented on SPARK-14714:
--

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/12480

> PySpark Param TypeConverter arg is not passed by name in some cases
> ---
>
> Key: SPARK-14714
> URL: https://issues.apache.org/jira/browse/SPARK-14714
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> PySpark Param constructors need to pass the TypeConverter argument by name, 
> partly to make sure it is not mistaken for the expectedType arg and partly 
> because we will remove the expectedType arg in 2.1.  In several places, this 
> is not being done correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >