date:20150706

[jira] [Updated] (SPARK-7880) Silent failure if assembly jar is corrupted

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7880:
-
Target Version/s: 1.3.2, 1.4.2, 1.5.0  (was: 1.3.2, 1.4.1, 1.5.0)

> Silent failure if assembly jar is corrupted
> ---
>
> Key: SPARK-7880
> URL: https://issues.apache.org/jira/browse/SPARK-7880
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>
> If you try to run `bin/spark-submit` with a corrupted jar, you get no output 
> and your application does not run. We should have an informative message that 
> indicates the failure to open the jar instead of silently swallowing it.
> This is caused by this line:
> https://github.com/apache/spark/blob/61664732b25b35f94be35a42cde651cbfd0e02b7/bin/spark-class#L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5905) Improve RowMatrix user guide and doc.

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5905:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Improve RowMatrix user guide and doc.
> -
>
> Key: SPARK-5905
> URL: https://issues.apache.org/jira/browse/SPARK-5905
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, MLlib
>Affects Versions: 1.3.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> From mbofb's comment in PR https://github.com/apache/spark/pull/4680:
> {code}
> The description of RowMatrix.computeSVD and 
> mllib-dimensionality-reduction.html should be more precise/explicit regarding 
> the m x n matrix. In the current description I would conclude that n refers 
> to the rows. According to 
> http://math.stackexchange.com/questions/191711/how-many-rows-and-columns-are-in-an-m-x-n-matrix
>  this way of describing a matrix is only used in particular domains. I as a 
> reader interested on applying SVD would rather prefer the more common m x n 
> way of rows x columns (e.g. 
> http://en.wikipedia.org/wiki/Matrix_%28mathematics%29 ) which is also used in 
> http://en.wikipedia.org/wiki/Latent_semantic_analysis (and also within the 
> ARPACK manual:
> “
> N Integer. (INPUT) - Dimension of the eigenproblem. 
> NEV Integer. (INPUT) - Number of eigenvalues of OP to be computed. 0 < NEV < 
> N. 
> NCV Integer. (INPUT) - Number of columns of the matrix V (less than or equal 
> to N).
> “
> ).
> description of RowMatrix.computeSVD and mllib-dimensionality-reduction.html:
> "We assume n is smaller than m." Is this just a recommendation or a hard 
> requirement. This condition seems not to be checked and causing an 
> IllegalArgumentException – the processing finishes even though the vectors 
> have a higher dimension than the number of vectors.
> description of RowMatrix. computePrincipalComponents or RowMatrix in general:
> I got a Exception.
> java.lang.IllegalArgumentException: Argument with more than 65535 cols: 
> 7949273
> at 
> org.apache.spark.mllib.linalg.distributed.RowMatrix.checkNumColumns(RowMatrix.scala:131)
> at 
> org.apache.spark.mllib.linalg.distributed.RowMatrix.computeCovariance(RowMatrix.scala:318)
> at 
> org.apache.spark.mllib.linalg.distributed.RowMatrix.computePrincipalComponents(RowMatrix.scala:373)
> This 65535 cols restriction would be nice to be written in the doc (if this 
> still applies in 1.3).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6174:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Improve doc: Python ALS, MatrixFactorizationModel
> -
>
> Key: SPARK-6174
> URL: https://issues.apache.org/jira/browse/SPARK-6174
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, PySpark
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> The Python docs for recommendation have almost no content except an example.  
> Add class, method & attribute descriptions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6129) Add a section in user guide for model evaluation

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6129:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Add a section in user guide for model evaluation
> 
>
> Key: SPARK-6129
> URL: https://issues.apache.org/jira/browse/SPARK-6129
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>
> We now have evaluation metrics for binary, multiclass, ranking, and 
> multilabel in MLlib. It would be nice to have a section in the user guide to 
> summarize them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6266) PySpark SparseVector missing doc for size, indices, values

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6266:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> PySpark SparseVector missing doc for size, indices, values
> --
>
> Key: SPARK-6266
> URL: https://issues.apache.org/jira/browse/SPARK-6266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, PySpark
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Need to add doc for size, indices, values attributes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8016) YARN cluster / client modes have different app names for python

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8016:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> YARN cluster / client modes have different app names for python
> ---
>
> Key: SPARK-8016
> URL: https://issues.apache.org/jira/browse/SPARK-8016
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Priority: Minor
> Attachments: python.png
>
>
> See screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8050) Make Savable and Loader Java-friendly.

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8050:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Make Savable and Loader Java-friendly.
> --
>
> Key: SPARK-8050
> URL: https://issues.apache.org/jira/browse/SPARK-8050
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>
> Should overload save/load to accept JavaSparkContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8807) Add between operator in SparkR

2015-07-06 Thread Venkata Vineel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614688#comment-14614688
 ] 

Venkata Vineel commented on SPARK-8807:
---

[~yu_ishikawa]  Can you please add more details on this. I would like to work 
on this issue. Please consider.

> Add between operator in SparkR
> --
>
> Key: SPARK-8807
> URL: https://issues.apache.org/jira/browse/SPARK-8807
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> Add between operator in SparkR
> ```
> df$age between c(1, 2)
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8400) ml.ALS doesn't handle -1 block size

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8400:
-
Target Version/s: 1.3.2, 1.4.2, 1.5.0  (was: 1.3.2, 1.4.1, 1.5.0)

> ml.ALS doesn't handle -1 block size
> ---
>
> Key: SPARK-8400
> URL: https://issues.apache.org/jira/browse/SPARK-8400
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.3.1
>Reporter: Xiangrui Meng
>
> Under spark.mllib, if number blocks is set to -1, we set the block size 
> automatically based on the input partition size. However, this behavior is 
> not preserved in the spark.ml API. If user sets -1 in Spark 1.3, it will not 
> work, but no error messages will show.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8390) Update DirectKafkaWordCount examples to show how offset ranges can be used

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8390:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Update DirectKafkaWordCount examples to show how offset ranges can be used
> --
>
> Key: SPARK-8390
> URL: https://issues.apache.org/jira/browse/SPARK-8390
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: Tathagata Das
>Assignee: Cody Koeninger
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8593) History Server doesn't show complete application when one attempt inprogress

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8593:
-
Target Version/s:   (was: 1.4.1)

> History Server doesn't show complete application when one attempt inprogress
> 
>
> Key: SPARK-8593
> URL: https://issues.apache.org/jira/browse/SPARK-8593
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0
>Reporter: Thomas Graves
>
> The Spark history server doesn't show an application if the first attempt of 
> the application is still inprogress.  
> Here are the files in hdfs:
> -rwxrwx---   3 tgraves hdfs234 2015-06-24 15:49 
> sparkhistory/application_1433751980223_18926_1.inprogress
> -rwxrwx---   3 tgraves hdfs9609450 2015-06-24 15:51 
> sparkhistory/application_1433751980223_18926_2
> The UI shows them if I set the showIncomplete=true.
> Removing the inprogress file allows it to show up when showIncomplete is 
> false.
> It should be smart enough to atleast show the second successful attempt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8414) Ensure ClosureCleaner actually triggers clean ups

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8414:
-
Target Version/s: 1.5.0  (was: 1.4.1, 1.5.0)

> Ensure ClosureCleaner actually triggers clean ups
> -
>
> Key: SPARK-8414
> URL: https://issues.apache.org/jira/browse/SPARK-8414
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now it cleans up old references only through natural GCs, which may not 
> occur if the driver has infinite RAM. We should do a periodic GC to make sure 
> that we actually do clean things up. Something like once per 30 minutes seems 
> relatively inexpensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8828) Revert the change of SPARK-5680

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8828:
-
Component/s: SQL

> Revert the change of SPARK-5680
> ---
>
> Key: SPARK-8828
> URL: https://issues.apache.org/jira/browse/SPARK-8828
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Yin Huai
>Priority: Critical
>
> SPARK-5680 introduced a bug to sum function. After this change, when all 
> input values are nulls, it returns 0.0 instead of null, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8747) fix EqualNullSafe for binary type

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8747:
-
Assignee: Wenchen Fan

> fix EqualNullSafe for binary type
> -
>
> Key: SPARK-8747
> URL: https://issues.apache.org/jira/browse/SPARK-8747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7401) Dot product and squared_distances should be vectorized in Vectors

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7401:
-
Assignee: Manoj Kumar

> Dot product and squared_distances should be vectorized in Vectors
> -
>
> Key: SPARK-7401
> URL: https://issues.apache.org/jira/browse/SPARK-7401
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Assignee: Manoj Kumar
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)

2015-07-06 Thread Peter Prettenhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614682#comment-14614682
 ] 

Peter Prettenhofer commented on SPARK-5133:
---

[~yalamart] I'm already working on it -- havent published a PR yet

> Feature Importance for Decision Tree (Ensembles)
> 
>
> Key: SPARK-5133
> URL: https://issues.apache.org/jira/browse/SPARK-5133
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Peter Prettenhofer
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Add feature importance to decision tree model and tree ensemble models.
> If people are interested in this feature I could implement it given a mentor 
> (API decisions, etc). Please find a description of the feature below:
> Decision trees intrinsically perform feature selection by selecting 
> appropriate split points. This information can be used to assess the relative 
> importance of a feature. 
> Relative feature importance gives valuable insight into a decision tree or 
> tree ensemble and can even be used for feature selection.
> More information on feature importance (via decrease in impurity) can be 
> found in ESLII (10.13.1) or here [1].
> R's randomForest package uses a different technique for assessing variable 
> importance that is based on permutation tests.
> All necessary information to create relative importance scores should be 
> available in the tree representation (class Node; split, impurity gain, 
> (weighted) nr of samples?).
> [1] 
> http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8743) Deregister Codahale metrics for streaming when StreamingContext is closed

2015-07-06 Thread Tathagata Das (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614681#comment-14614681
 ] 

Tathagata Das commented on SPARK-8743:
--

[~neelesh77] Any ETA on this? 

> Deregister Codahale metrics for streaming when StreamingContext is closed 
> --
>
> Key: SPARK-8743
> URL: https://issues.apache.org/jira/browse/SPARK-8743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 1.4.1
>Reporter: Tathagata Das
>Assignee: Neelesh Srinivas Salian
>  Labels: starter
>
> Currently, when the StreamingContext is closed, the registered metrics are 
> not deregistered. If another streaming context is started, it throws a 
> warning saying that the metrics are already registered. 
> The solution is to deregister the metrics when streamingcontext is stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8788) Java unit test for PCA transformer

2015-07-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8788:
-
Affects Version/s: (was: 1.5.0)
 Priority: Minor  (was: Major)

[~yanboliang] please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and set 
JIRA fields more carefully. This can't affect version 1.5, which doesn't exist, 
and is not Major.

> Java unit test for PCA transformer
> --
>
> Key: SPARK-8788
> URL: https://issues.apache.org/jira/browse/SPARK-8788
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add Java unit test for PCA transformer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8833:
---

Assignee: (was: Apache Spark)

> Kafka Direct API support offset in zookeeper
> 
>
> Key: SPARK-8833
> URL: https://issues.apache.org/jira/browse/SPARK-8833
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: guowei
>
> Kafka Direct API only support consume the topic from latest or earliest.
> but user usually need to consume message from last offset  when restart 
> stream app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8833:
---

Assignee: Apache Spark

> Kafka Direct API support offset in zookeeper
> 
>
> Key: SPARK-8833
> URL: https://issues.apache.org/jira/browse/SPARK-8833
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: guowei
>Assignee: Apache Spark
>
> Kafka Direct API only support consume the topic from latest or earliest.
> but user usually need to consume message from last offset  when restart 
> stream app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614677#comment-14614677
 ] 

Apache Spark commented on SPARK-8833:
-

User 'guowei2' has created a pull request for this issue:
https://github.com/apache/spark/pull/7235

> Kafka Direct API support offset in zookeeper
> 
>
> Key: SPARK-8833
> URL: https://issues.apache.org/jira/browse/SPARK-8833
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: guowei
>
> Kafka Direct API only support consume the topic from latest or earliest.
> but user usually need to consume message from last offset  when restart 
> stream app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

2015-07-06 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614673#comment-14614673
 ] 

Sean Owen commented on SPARK-8646:
--

[~j_houg] is the resolution here just that pandas has to be installed if pandas 
is used?

> PySpark does not run on YARN
> 
>
> Key: SPARK-8646
> URL: https://issues.apache.org/jira/browse/SPARK-8646
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 1.4.0
> Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py 
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in 
> my own code.
>Reporter: Juliet Hougland
> Attachments: pi-test.log, spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, 
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, 
> spark1.4-SPARK_HOME-set.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in 
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| 
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that 
> worked on previous spark versions (ie comands the use spark-submit) should 
> continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614670#comment-14614670
 ] 

Sean Owen commented on SPARK-8833:
--

No, you actually pass the offsets you want to begin consuming at. Are you 
looking at {{createDirectStream}}? it's {{fromOffsets}}.

> Kafka Direct API support offset in zookeeper
> 
>
> Key: SPARK-8833
> URL: https://issues.apache.org/jira/browse/SPARK-8833
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: guowei
>
> Kafka Direct API only support consume the topic from latest or earliest.
> but user usually need to consume message from last offset  when restart 
> stream app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread guowei (JIRA)

guowei created SPARK-8833:
-

 Summary: Kafka Direct API support offset in zookeeper
 Key: SPARK-8833
 URL: https://issues.apache.org/jira/browse/SPARK-8833
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: guowei


Kafka Direct API only support consume the topic from latest or earliest.
but user usually need to consume message from last offset  when restart stream 
app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext

2015-07-06 Thread Santiago M. Mola (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614645#comment-14614645
 ] 

Santiago M. Mola commented on SPARK-6981:
-

Any progress on this?

> [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext
> 
>
> Key: SPARK-6981
> URL: https://issues.apache.org/jira/browse/SPARK-6981
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Edoardo Vacchi
>Priority: Minor
>
> In order to simplify extensibility with new strategies from third-parties, it 
> should be better to factor SparkPlanner and QueryExecution in their own 
> classes. Dependent types add additional, unnecessary complexity; besides, 
> HiveContext would benefit from this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8832) insertInto() throws error in sparkR

2015-07-06 Thread Amar Gondaliya (JIRA)

Amar Gondaliya created SPARK-8832:
-

 Summary: insertInto() throws error in sparkR
 Key: SPARK-8832
 URL: https://issues.apache.org/jira/browse/SPARK-8832
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
Reporter: Amar Gondaliya


insertInto() is not working. It throws AssertionError.  Trying to insert record 
from one dataframe to another dataframe. 

df1 <- generated from the other dataframe after applying group by 
aggregation(columnNames :  "item","frequency")

registerTempTable(df1,"df")

df2 <-generated from the other dataframe after applying group by 
aggregation(columnNames :  "item","frequency")

insertInto(df2,"df",overwrite=T)

this throws assertion error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8818) In should not take Any not Column

2015-07-06 Thread Yu Ishikawa (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614635#comment-14614635
 ] 

Yu Ishikawa edited comment on SPARK-8818 at 7/6/15 7:08 AM:


[~marmbrus] Is what you want to do like that?
https://issues.apache.org/jira/browse/SPARK-8348


was (Author: yuu.ishik...@gmail.com):
[~marmbrus] What you want to do is like that?
https://issues.apache.org/jira/browse/SPARK-8348

> In should not take Any not Column
> -
>
> Key: SPARK-8818
> URL: https://issues.apache.org/jira/browse/SPARK-8818
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>
> This is pretty verbose having to write {{lit(...)}}
> {code}
>   .where('timestamp in (lit(1435897619640L), lit(1435924856812L)))
> {code}
> I think i most cases people using in will be listing static values, not 
> columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8818) In should not take Any not Column

2015-07-06 Thread Yu Ishikawa (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614635#comment-14614635
 ] 

Yu Ishikawa commented on SPARK-8818:


[~marmbrus] What you want to do is like that?
https://issues.apache.org/jira/browse/SPARK-8348

> In should not take Any not Column
> -
>
> Key: SPARK-8818
> URL: https://issues.apache.org/jira/browse/SPARK-8818
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>
> This is pretty verbose having to write {{lit(...)}}
> {code}
>   .where('timestamp in (lit(1435897619640L), lit(1435924856812L)))
> {code}
> I think i most cases people using in will be listing static values, not 
> columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8540) KMeans-based outlier detection

2015-07-06 Thread Venkata Vineel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614630#comment-14614630
 ] 

Venkata Vineel commented on SPARK-8540:
---

[~josephkb] Can I please work on this(if you can mentor me with design etc.).

> KMeans-based outlier detection
> --
>
> Key: SPARK-8540
> URL: https://issues.apache.org/jira/browse/SPARK-8540
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Proposal for K-Means-based outlier detection:
> * Cluster data using K-Means
> * Provide prediction/filtering functionality which returns outliers/anomalies
> ** This can take some threshold parameter which specifies either (a) how far 
> off a point needs to be to be considered an outlier or (b) how many outliers 
> should be returned.
> Note this will require a bit of API design, which should probably be posted 
> and discussed on this JIRA before implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6885) Decision trees: predict class probabilities

2015-07-06 Thread Venkata Vineel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614629#comment-14614629
 ] 

Venkata Vineel commented on SPARK-6885:
---

[~josephkb] Can I work on this. Can you please assign this to me ?

> Decision trees: predict class probabilities
> ---
>
> Key: SPARK-6885
> URL: https://issues.apache.org/jira/browse/SPARK-6885
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> Under spark.ml, have DecisionTreeClassifier (currently being added) extend 
> ProbabilisticClassifier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-07-06 Thread Santiago M. Mola (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614627#comment-14614627
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~davies] NULL values are grouped together when using a GROUP BY clause.

See 
https://en.wikipedia.org/wiki/Null_%28SQL%29#When_two_nulls_are_equal:_grouping.2C_sorting.2C_and_some_set_operations

{quote}
Because SQL:2003 defines all Null markers as being unequal to one another, a 
special definition was required in order to group Nulls together when 
performing certain operations. SQL defines "any two values that are equal to 
one another, or any two Nulls", as "not distinct". This definition of not 
distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other 
keywords that perform grouping) are used.
{quote}

> CaseKeyWhen has incorrect NULL handling
> ---
>
> Key: SPARK-8636
> URL: https://issues.apache.org/jira/browse/SPARK-8636
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Santiago M. Mola
>  Labels: starter
>
> CaseKeyWhen implementation in Spark uses the following equals implementation:
> {code}
>   private def equalNullSafe(l: Any, r: Any) = {
> if (l == null && r == null) {
>   true
> } else if (l == null || r == null) {
>   false
> } else {
>   l == r
> }
>   }
> {code}
> Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
> is not unequal either). In this case, a NULL value in a CASE WHEN expression 
> should never match.
> For example, you can execute this in MySQL:
> {code}
> SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END 
> FROM DUAL;
> {code}
> And the result will be "NULL DOES NOT MATCH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7880) Silent failure if assembly jar is corrupted

[jira] [Updated] (SPARK-5905) Improve RowMatrix user guide and doc.

[jira] [Updated] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel

[jira] [Updated] (SPARK-6129) Add a section in user guide for model evaluation

[jira] [Updated] (SPARK-6266) PySpark SparseVector missing doc for size, indices, values

[jira] [Updated] (SPARK-8016) YARN cluster / client modes have different app names for python

[jira] [Updated] (SPARK-8050) Make Savable and Loader Java-friendly.

[jira] [Commented] (SPARK-8807) Add between operator in SparkR

[jira] [Updated] (SPARK-8400) ml.ALS doesn't handle -1 block size

[jira] [Updated] (SPARK-8390) Update DirectKafkaWordCount examples to show how offset ranges can be used

[jira] [Updated] (SPARK-8593) History Server doesn't show complete application when one attempt inprogress

[jira] [Updated] (SPARK-8414) Ensure ClosureCleaner actually triggers clean ups

[jira] [Updated] (SPARK-8828) Revert the change of SPARK-5680

[jira] [Updated] (SPARK-8747) fix EqualNullSafe for binary type

[jira] [Updated] (SPARK-7401) Dot product and squared_distances should be vectorized in Vectors

[jira] [Commented] (SPARK-5133) Feature Importance for Decision Tree (Ensembles)

[jira] [Commented] (SPARK-8743) Deregister Codahale metrics for streaming when StreamingContext is closed

[jira] [Updated] (SPARK-8788) Java unit test for PCA transformer

[jira] [Assigned] (SPARK-8833) Kafka Direct API support offset in zookeeper

[jira] [Assigned] (SPARK-8833) Kafka Direct API support offset in zookeeper

[jira] [Commented] (SPARK-8833) Kafka Direct API support offset in zookeeper

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

[jira] [Commented] (SPARK-8833) Kafka Direct API support offset in zookeeper

[jira] [Created] (SPARK-8833) Kafka Direct API support offset in zookeeper

[jira] [Commented] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext

[jira] [Created] (SPARK-8832) insertInto() throws error in sparkR

[jira] [Comment Edited] (SPARK-8818) In should not take Any not Column

[jira] [Commented] (SPARK-8818) In should not take Any not Column

[jira] [Commented] (SPARK-8540) KMeans-based outlier detection

[jira] [Commented] (SPARK-6885) Decision trees: predict class probabilities

[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

< 1 2 3 4

301 - 331 of 331 matches

Site Navigation

Mail list logo

Footer information