date:20150701

[jira] [Updated] (SPARK-8265) Add LinearDataGenerator to pyspark.mllib.utils

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8265:
-
Assignee: Manoj Kumar

> Add LinearDataGenerator to pyspark.mllib.utils
> --
>
> Key: SPARK-8265
> URL: https://issues.apache.org/jira/browse/SPARK-8265
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Assignee: Manoj Kumar
>Priority: Minor
> Fix For: 1.5.0
>
>
> This is useful in testing various linear models in pyspark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8764) StringIndexer should take option to handle unseen values

2015-07-01 Thread holdenk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611056#comment-14611056
 ] 

holdenk commented on SPARK-8764:


I could do this, I've got another PR with the StringIndexderModel anyways.

> StringIndexer should take option to handle unseen values
> 
>
> Key: SPARK-8764
> URL: https://issues.apache.org/jira/browse/SPARK-8764
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The option should be a Param, probably set to false by default (throwing 
> exception when encountering unseen values).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor

2015-07-01 Thread holdenk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611057#comment-14611057
 ] 

holdenk commented on SPARK-8744:


I could do this, I've got another PR with the StringIndexderModel anyways.

> StringIndexerModel should have public constructor
> -
>
> Key: SPARK-8744
> URL: https://issues.apache.org/jira/browse/SPARK-8744
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be helpful to allow users to pass a pre-computed index to create an 
> indexer, rather than always going through StringIndexer to create the model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread holdenk (JIRA)

holdenk created SPARK-8769:
--

 Summary: toLocalIterator should mention it results in many jobs
 Key: SPARK-8769
 URL: https://issues.apache.org/jira/browse/SPARK-8769
 Project: Spark
  Issue Type: Documentation
Reporter: holdenk
Priority: Trivial


toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
that to avoid re-computing, if the input was the result of a 
wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-1635) Java API docs do not show annotation.

2015-07-01 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-1635.

Resolution: Duplicate

> Java API docs do not show annotation.
> -
>
> Key: SPARK-1635
> URL: https://issues.apache.org/jira/browse/SPARK-1635
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> The generated Java API docs do not contain Developer/Experimental 
> annotations. The :: Developer/Experimental :: tag is in the generated doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1564:
---

Assignee: Apache Spark  (was: Andrew Or)

> Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
> -
>
> Key: SPARK-1564
> URL: https://issues.apache.org/jira/browse/SPARK-1564
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matei Zaharia
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1564:
---

Assignee: Andrew Or  (was: Apache Spark)

> Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
> -
>
> Key: SPARK-1564
> URL: https://issues.apache.org/jira/browse/SPARK-1564
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matei Zaharia
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611024#comment-14611024
 ] 

Apache Spark commented on SPARK-1564:
-

User 'deroneriksson' has created a pull request for this issue:
https://github.com/apache/spark/pull/7169

> Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
> -
>
> Key: SPARK-1564
> URL: https://issues.apache.org/jira/browse/SPARK-1564
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matei Zaharia
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8695:
---

Assignee: Apache Spark  (was: Xiangrui Meng)

> TreeAggregation shouldn't be triggered for 5 partitions
> ---
>
> Key: SPARK-8695
> URL: https://issues.apache.org/jira/browse/SPARK-8695
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Apache Spark
>Priority: Minor
>
> If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock 
> time. Instead, it introduces scheduling and shuffling overhead. We should 
> update the condition to use tree aggregation (code attached):
> {code}
>   while (numPartitions > scale + numPartitions / scale) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8695:
---

Assignee: Xiangrui Meng  (was: Apache Spark)

> TreeAggregation shouldn't be triggered for 5 partitions
> ---
>
> Key: SPARK-8695
> URL: https://issues.apache.org/jira/browse/SPARK-8695
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>
> If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock 
> time. Instead, it introduces scheduling and shuffling overhead. We should 
> update the condition to use tree aggregation (code attached):
> {code}
>   while (numPartitions > scale + numPartitions / scale) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610997#comment-14610997
 ] 

Apache Spark commented on SPARK-8695:
-

User 'piganesh' has created a pull request for this issue:
https://github.com/apache/spark/pull/7168

> TreeAggregation shouldn't be triggered for 5 partitions
> ---
>
> Key: SPARK-8695
> URL: https://issues.apache.org/jira/browse/SPARK-8695
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>
> If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock 
> time. Instead, it introduces scheduling and shuffling overhead. We should 
> update the condition to use tree aggregation (code attached):
> {code}
>   while (numPartitions > scale + numPartitions / scale) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf

2015-07-01 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610988#comment-14610988
 ] 

Josh Rosen commented on SPARK-8768:
---

We didn't notice this earlier because the Master Maven Pre-YARN build was 
misconfigured and was building against the wrong Hadoop versions.

> SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
> Akka Protobuf
> -
>
> Key: SPARK-8768
> URL: https://issues.apache.org/jira/browse/SPARK-8768
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.0
>Reporter: Josh Rosen
>Priority: Blocker
>
> The end-to-end SparkSubmitSuite tests ("launch simple application with 
> spark-submit", "include jars passed in through --jars", and "include jars 
> passed in through --packages") are currently failing for the pre-YARN Hadoop 
> builds.
> I managed to reproduce one of the Jenkins failures locally:
> {code}
> build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
> -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
> -Dtest=none
> {code}
> Here's the output from unit-tests.log:
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
> application with spark-submit' =
> 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
> Utils: SLF4J: Class path contains multiple SLF4J bindings.
> 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
> Utils: SLF4J: Found binding in 
> [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
> Utils: SLF4J: Found binding in 
> [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
> Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
> Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
> 1.5.0-SNAPSHOT
> 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
> joshrosen
> 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
> joshrosen
> 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
> authentication disabled; ui acls disabled; users with view permissions: 
> Set(joshrosen); users with modify permissions: Set(joshrosen)
> 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
> 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
> 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
> Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
> thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
> ActorSystem [sparkDriver]
> 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
> Utils: java.lang.VerifyError: class 
> akka.remote.WireFormats$AkkaControlMessage overrides final method 
> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
> 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at java.lang.ClassLoader.defineClass1(Native Method)
> 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
> Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> 15/07/01 13:40:00.010 r

[jira] [Created] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf

2015-07-01 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-8768:
-

 Summary: SparkSubmitSuite fails on Hadoop 1.x builds due to 
java.lang.VerifyError in Akka Protobuf
 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker


The end-to-end SparkSubmitSuite tests ("launch simple application with 
spark-submit", "include jars passed in through --jars", and "include jars 
passed in through --packages") are currently failing for the pre-YARN Hadoop 
builds.

I managed to reproduce one of the Jenkins failures locally:

{code}
build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
-Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
-Dtest=none
{code}

Here's the output from unit-tests.log:

{code}
= TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple application 
with spark-submit' =

15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
Utils: SLF4J: Class path contains multiple SLF4J bindings.
15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
Utils: SLF4J: Found binding in 
[jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
Utils: SLF4J: Found binding in 
[jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT
15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: joshrosen
15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
joshrosen
15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(joshrosen); users 
with modify permissions: Set(joshrosen)
15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
ActorSystem [sparkDriver]
15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
Utils: java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage 
overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.lang.ClassLoader.defineClass1(Native Method)
15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
Utils:at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.security.AccessController.doPrivileged(Native Method)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
Utils:at java.lang.ClassLoader.loadClass(Clas

[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges

2015-07-01 Thread Deron Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610979#comment-14610979
 ] 

Deron Eriksson commented on SPARK-1564:
---

I'm working on this one. I believe this is a duplicate of 
https://issues.apache.org/jira/browse/SPARK-1635

> Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
> -
>
> Key: SPARK-1564
> URL: https://issues.apache.org/jira/browse/SPARK-1564
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Matei Zaharia
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8660) Update comments that contain R statements in ml.logisticRegressionSuite

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610972#comment-14610972
 ] 

Apache Spark commented on SPARK-8660:
-

User 'Rosstin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7167

> Update comments that contain R statements in ml.logisticRegressionSuite
> ---
>
> Key: SPARK-8660
> URL: https://issues.apache.org/jira/browse/SPARK-8660
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: somil deshmukh
>Priority: Trivial
>  Labels: starter
> Fix For: 1.5.0
>
>   Original Estimate: 20m
>  Remaining Estimate: 20m
>
> We put R statements as comments in unit test. However, there are two issues:
> 1. JavaDoc style "/** ... */" is used instead of normal multiline comment "/* 
> ... */".
> 2. We put a leading "*" on each line. It is hard to copy & paste the commands 
> to/from R and verify the result.
> For example, in 
> https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L504
> {code}
> /**
>  * Using the following R code to load the data and train the model using 
> glmnet package.
>  *
>  * > library("glmnet")
>  * > data <- read.csv("path", header=FALSE)
>  * > label = factor(data$V1)
>  * > features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5))
>  * > weights = coef(glmnet(features,label, family="binomial", alpha = 
> 1.0, lambda = 6.0))
>  * > weights
>  * 5 x 1 sparse Matrix of class "dgCMatrix"
>  *  s0
>  * (Intercept) -0.2480643
>  * data.V2  0.000
>  * data.V3   .
>  * data.V4   .
>  * data.V5   .
>  */
> {code}
> should change to
> {code}
> /*
>   Using the following R code to load the data and train the model using 
> glmnet package.
>  
>   library("glmnet")
>   data <- read.csv("path", header=FALSE)
>   label = factor(data$V1)
>   features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5))
>   weights = coef(glmnet(features,label, family="binomial", alpha = 1.0, 
> lambda = 6.0))
>   weights
>   5 x 1 sparse Matrix of class "dgCMatrix"
>s0
>   (Intercept) -0.2480643
>   data.V2  0.000
>   data.V3   .
>   data.V4   .
>   data.V5   .
> */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector

2015-07-01 Thread Feynman Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610970#comment-14610970
 ] 

Feynman Liang commented on SPARK-8703:
--

Took a second pass over the code and I agree with @josephkb; extending 
HashingTF seems pointless given that nothing is reused but the `HashingTF` 
attributes are now polluting `CountVectorizer`.

> Add CountVectorizer as a ml transformer to convert document to words count 
> vector
> -
>
> Key: SPARK-8703
> URL: https://issues.apache.org/jira/browse/SPARK-8703
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: yuhao yang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Converts a text document to a sparse vector of token counts. Similar to 
> http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
> I can further add an estimator to extract vocabulary from corpus if that's 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5016) GaussianMixtureEM should distribute matrix inverse for large numFeatures, k

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610961#comment-14610961
 ] 

Apache Spark commented on SPARK-5016:
-

User 'feynmanliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/7166

> GaussianMixtureEM should distribute matrix inverse for large numFeatures, k
> ---
>
> Key: SPARK-5016
> URL: https://issues.apache.org/jira/browse/SPARK-5016
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>  Labels: clustering
>
> If numFeatures or k are large, GMM EM should distribute the matrix inverse 
> computation for Gaussian initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610942#comment-14610942
 ] 

Apache Spark commented on SPARK-8766:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7165

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8766:
---

Assignee: Davies Liu  (was: Apache Spark)

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8766:
---

Assignee: Apache Spark  (was: Davies Liu)

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7820) Java8-tests suite compile error under SBT

2015-07-01 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-7820:
--
Assignee: Saisai Shao

> Java8-tests suite compile error under SBT
> -
>
> Key: SPARK-7820
> URL: https://issues.apache.org/jira/browse/SPARK-7820
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Critical
> Fix For: 1.5.0, 1.4.2
>
>
> Lots of compilation error is shown when java 8 test suite is enabled in SBT:
> {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 
> -Dhadoop.version=2.6.0 -Pjava8-tests}}
> {code}
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43:
>  error: cannot find symbol
> [error] public class Java8APISuite extends LocalJavaStreamingContext 
> implements Serializable {
> [error]^
> [error]   symbol: class LocalJavaStreamingContext
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57:
>  error: cannot find symbol
> [error] JavaTestUtils.attachTestOutputStream(letterCount);
> [error] ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
>  error: cannot find symbol
> [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2);
> [error]   ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
>  error: cannot find symbol
> [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2);
> [error]  ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> {code}
> The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which 
> exists in streaming test jar. It is OK for maven compile, since it will 
> generate test jar, but will be failed in sbt test compile, sbt do not 
> generate test jar by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7820) Java8-tests suite compile error under SBT

2015-07-01 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-7820.
---
   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.2

Issue resolved by pull request 7120
[https://github.com/apache/spark/pull/7120]

> Java8-tests suite compile error under SBT
> -
>
> Key: SPARK-7820
> URL: https://issues.apache.org/jira/browse/SPARK-7820
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>Priority: Critical
> Fix For: 1.4.2, 1.5.0
>
>
> Lots of compilation error is shown when java 8 test suite is enabled in SBT:
> {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 
> -Dhadoop.version=2.6.0 -Pjava8-tests}}
> {code}
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43:
>  error: cannot find symbol
> [error] public class Java8APISuite extends LocalJavaStreamingContext 
> implements Serializable {
> [error]^
> [error]   symbol: class LocalJavaStreamingContext
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57:
>  error: cannot find symbol
> [error] JavaTestUtils.attachTestOutputStream(letterCount);
> [error] ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
>  error: cannot find symbol
> [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2);
> [error]   ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
>  error: cannot find symbol
> [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2);
> [error]  ^
> [error]   symbol:   variable JavaTestUtils
> [error]   location: class Java8APISuite
> [error] 
> /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73:
>  error: cannot find symbol
> [error] JavaDStream stream = 
> JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
> [error]  ^
> [error]   symbol:   variable ssc
> [error]   location: class Java8APISuite
> {code}
> The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which 
> exists in streaming test jar. It is OK for maven compile, since it will 
> generate test jar, but will be failed in sbt test compile, sbt do not 
> generate test jar by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8677) Decimal divide operation throws ArithmeticException

2015-07-01 Thread Jihong MA (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610865#comment-14610865
 ] 

Jihong MA commented on SPARK-8677:
--

I am not sure if there is guideline for DecimalType.Unlimited, can we go for an 
accuracy at least equivalent to Double? 

> Decimal divide operation throws ArithmeticException
> ---
>
> Key: SPARK-8677
> URL: https://issues.apache.org/jira/browse/SPARK-8677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 1.5.0
>
>
> Please refer to [BigDecimal 
> doc|http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html]:
> {quote}
> ... the rounding mode setting of a MathContext object with a precision 
> setting of 0 is not used and thus irrelevant. In the case of divide, the 
> exact quotient could have an infinitely long decimal expansion; for example, 
> 1 divided by 3.
> {quote}
> Because we provide a MathContext.UNLIMITED in toBigDecimal, Decimal divide 
> operation will throw the following exception:
> {code}
> val decimal = Decimal(1.0, 10, 3) / Decimal(3.0, 10, 3)
> [info]   java.lang.ArithmeticException: Non-terminating decimal expansion; no 
> exact representable decimal result.
> [info]   at java.math.BigDecimal.divide(BigDecimal.java:1690)
> [info]   at java.math.BigDecimal.divide(BigDecimal.java:1723)
> [info]   at scala.math.BigDecimal.$div(BigDecimal.scala:256)
> [info]   at org.apache.spark.sql.types.Decimal.$div(Decimal.scala:272)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8677) Decimal divide operation throws ArithmeticException

2015-07-01 Thread Jihong MA (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610831#comment-14610831
 ] 

Jihong MA commented on SPARK-8677:
--

Thanks for fixing the division problem. but this fix introduces one more issue 
w.r.t the accuracy of Decimal computation. 

scala> val aa = Decimal(2) / Decimal(3);
aa: org.apache.spark.sql.types.Decimal = 1

when a Decimal is defined as Decimal.Unlimited, we are not expecting the 
division result's scale value to inherit from its parent, this is causing big 
accuracy issue once we go coupe round of division over decimal data vs. double 
data.  below is a sample output from my run.  

10:27:46.042 WARN 
org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE 
STDDEV DOUBLE---4.0 , 0.8VALUE

10:27:46.137 WARN 
org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE 
STDDEV DECIMAL---4.29000 , 0.858VALUE


> Decimal divide operation throws ArithmeticException
> ---
>
> Key: SPARK-8677
> URL: https://issues.apache.org/jira/browse/SPARK-8677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 1.5.0
>
>
> Please refer to [BigDecimal 
> doc|http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html]:
> {quote}
> ... the rounding mode setting of a MathContext object with a precision 
> setting of 0 is not used and thus irrelevant. In the case of divide, the 
> exact quotient could have an infinitely long decimal expansion; for example, 
> 1 divided by 3.
> {quote}
> Because we provide a MathContext.UNLIMITED in toBigDecimal, Decimal divide 
> operation will throw the following exception:
> {code}
> val decimal = Decimal(1.0, 10, 3) / Decimal(3.0, 10, 3)
> [info]   java.lang.ArithmeticException: Non-terminating decimal expansion; no 
> exact representable decimal result.
> [info]   at java.math.BigDecimal.divide(BigDecimal.java:1690)
> [info]   at java.math.BigDecimal.divide(BigDecimal.java:1723)
> [info]   at scala.math.BigDecimal.$div(BigDecimal.scala:256)
> [info]   at org.apache.spark.sql.types.Decimal.$div(Decimal.scala:272)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8767) Abstractions for InputColParam, OutputColParam

2015-07-01 Thread Joseph K. Bradley (JIRA)

Joseph K. Bradley created SPARK-8767:


 Summary: Abstractions for InputColParam, OutputColParam
 Key: SPARK-8767
 URL: https://issues.apache.org/jira/browse/SPARK-8767
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley


I'd like to create Param subclasses for output and input columns.  These will 
provide easier schema checking, which could even be done automatically in an 
abstraction rather than in each class.  That should simplify things for 
developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8378) Add Spark Flume Python API

2015-07-01 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-8378.
--
   Resolution: Fixed
 Assignee: Shixiong Zhu
Fix Version/s: 1.5.0

> Add Spark Flume Python API
> --
>
> Key: SPARK-8378
> URL: https://issues.apache.org/jira/browse/SPARK-8378
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8765:
-
Labels: flaky-test  (was: )

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>  Labels: flaky-test
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8765:
-
Shepherd: Xiangrui Meng

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8765:
-
Assignee: Yanbo Liang

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8647) Potential issues with the constant hashCode

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8647:
-
Assignee: Alok Singh
Target Version/s: 1.5.0

> Potential issues with the constant hashCode 
> 
>
> Key: SPARK-8647
> URL: https://issues.apache.org/jira/browse/SPARK-8647
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Alok Singh
>Assignee: Alok Singh
>Priority: Minor
>  Labels: performance
>
> Hi,
> This may be potential bug or performance issue or just the code docs.
> The issue is wrt to MatrixUDT class.
>  If we decide to put instance of MatrixUDT into the hash based collection.
> The hashCode function is returning constant and even though equals method is 
> consistant with hashCode. I don't see the reason why hashCode() = 1994 (i.e 
> constant) has been used.
> I was expecting it to be similar to the other matrix class or the vector 
> class .
> If there is the reason why we have this code, we should document it properly 
> in the code so that others reading it is fine.
> regards,
> Alok
> Details
> =
> a)
> In reference to the file 
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
> line 188-197 ie
>  override def equals(o: Any): Boolean = {
> o match {
> case v: MatrixUDT => true
> case _ => false
> }
> }
> override def hashCode(): Int = 1994
> b) the commit is 
> https://github.com/apache/spark/commit/11e025956be3818c00effef0d650734f8feeb436
> on March 20.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7938) Use errorprone in Spark

2015-07-01 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-7938.
---
Resolution: Won't Fix

> Use errorprone in Spark
> ---
>
> Key: SPARK-7938
> URL: https://issues.apache.org/jira/browse/SPARK-7938
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Reynold Xin
>  Labels: starter
>
> We have quite a bit of low level code written in Java (e.g. unsafe module). 
> One nice thing about Java is that we can use better tools for finding common 
> errors, e.g. Google's error prone.
> This is a ticket to integrate error pone into our Maven build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8072) Better AnalysisException for writing DataFrame with identically named columns

2015-07-01 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8072:

Shepherd: Michael Armbrust

> Better AnalysisException for writing DataFrame with identically named columns
> -
>
> Key: SPARK-8072
> URL: https://issues.apache.org/jira/browse/SPARK-8072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should check if there are duplicate columns, and if yes, throw an explicit 
> error message saying there are duplicate columns. See current error message 
> below. 
> {code}
> In [3]: df.withColumn('age', df.age)
> Out[3]: DataFrame[age: bigint, name: string, age: bigint]
> In [4]: df.withColumn('age', df.age).write.parquet('test-parquet.out')
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df.withColumn('age', df.age).write.parquet('test-parquet.out')
> /scratch/rxin/spark/python/pyspark/sql/readwriter.py in parquet(self, path, 
> mode)
> 350 >>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))
> 351 """
> --> 352 self._jwrite.mode(mode).parquet(path)
> 353 
> 354 @since(1.4)
> /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc
>  in __call__(self, *args)
> 535 answer = self.gateway_client.send_command(command)
> 536 return_value = get_return_value(answer, self.gateway_client,
> --> 537 self.target_id, self.name)
> 538 
> 539 for temp_arg in temp_args:
> /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc
>  in get_return_value(answer, gateway_client, target_id, name)
> 298 raise Py4JJavaError(
> 299 'An error occurred while calling {0}{1}{2}.\n'.
> --> 300 format(target_id, '.', name), value)
> 301 else:
> 302 raise Py4JError(
> Py4JJavaError: An error occurred while calling o35.parquet.
> : org.apache.spark.sql.AnalysisException: Reference 'age' is ambiguous, could 
> be: age#0L, age#3L.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:279)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:350)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:341)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
>

[jira] [Created] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Davies Liu (JIRA)

Davies Liu created SPARK-8766:
-

 Summary: DataFrame Python API should work with column which has 
non-ascii character in it
 Key: SPARK-8766
 URL: https://issues.apache.org/jira/browse/SPARK-8766
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.0, 1.3.1
Reporter: Davies Liu
Assignee: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB

2015-07-01 Thread Murtaza Kanchwala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610799#comment-14610799
 ] 

Murtaza Kanchwala commented on SPARK-6101:
--

No, It is not a map function. For now you can Amazon's DynamoDB library to 
implement your own Data access layer and use Spark transformations and actions 
to add them. Where I'll prefer you to do batch saves and batch loads for more 
efficiency.

> Create a SparkSQL DataSource API implementation for DynamoDB
> 
>
> Key: SPARK-6101
> URL: https://issues.apache.org/jira/browse/SPARK-6101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.5.0
>
>
> similar to https://github.com/databricks/spark-avro  and 
> https://github.com/databricks/spark-csv



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8765:
---

Assignee: Apache Spark

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Critical
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8765:
---

Assignee: (was: Apache Spark)

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610783#comment-14610783
 ] 

Apache Spark commented on SPARK-8765:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/7164

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Joseph K. Bradley (JIRA)

Joseph K. Bradley created SPARK-8765:


 Summary: Flaky PySpark PowerIterationClustering test
 Key: SPARK-8765
 URL: https://issues.apache.org/jira/browse/SPARK-8765
 Project: Spark
  Issue Type: Test
  Components: MLlib, PySpark
Reporter: Joseph K. Bradley
Priority: Critical


See failure: 
[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]

{code}
**
File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
 line 291, in __main__.PowerIterationClusteringModel
Failed example:
sorted(model.assignments().collect())
Expected:
[Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
Got:
[Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), Assignment(id=2, 
cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, cluster=0)]
**
File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
 line 299, in __main__.PowerIterationClusteringModel
Failed example:
sorted(sameModel.assignments().collect())
Expected:
[Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
Got:
[Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), Assignment(id=2, 
cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, cluster=0)]
**
   2 of  13 in __main__.PowerIterationClusteringModel
***Test Failed*** 2 failures.

Had test failures in pyspark.mllib.clustering with python2.6; see logs.
{code}

CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8308) add missing save load for python doc example and tune down MatrixFactorization iterations

2015-07-01 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-8308.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 6760
[https://github.com/apache/spark/pull/6760]

> add missing save load for python doc example and tune down 
> MatrixFactorization iterations
> -
>
> Key: SPARK-8308
> URL: https://issues.apache.org/jira/browse/SPARK-8308
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
> Fix For: 1.5.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> 1. add some missing save/load in python examples, LogisticRegression, 
> LinearRegression, NaiveBayes
> 2. tune down iterations for MatrixFactorization, since current number will 
> trigger StackOverflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8308) add missing save load for python doc example and tune down MatrixFactorization iterations

2015-07-01 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-8308:
-
Assignee: yuhao yang

> add missing save load for python doc example and tune down 
> MatrixFactorization iterations
> -
>
> Key: SPARK-8308
> URL: https://issues.apache.org/jira/browse/SPARK-8308
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> 1. add some missing save/load in python examples, LogisticRegression, 
> LinearRegression, NaiveBayes
> 2. tune down iterations for MatrixFactorization, since current number will 
> trigger StackOverflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6263) Python MLlib API missing items: Utils

2015-07-01 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-6263.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 5707
[https://github.com/apache/spark/pull/5707]

> Python MLlib API missing items: Utils
> -
>
> Key: SPARK-6263
> URL: https://issues.apache.org/jira/browse/SPARK-6263
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Kai Sasaki
> Fix For: 1.5.0
>
>
> This JIRA lists items missing in the Python API for this sub-package of MLlib.
> This list may be incomplete, so please check again when sending a PR to add 
> these features to the Python API.
> Also, please check for major disparities between documentation; some parts of 
> the Python API are less well-documented than their Scala counterparts.  Some 
> items may be listed in the umbrella JIRA linked to this task.
> MLUtils
> * appendBias
> * kFold
> * loadVectors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor

2015-07-01 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610722#comment-14610722
 ] 

Joseph K. Bradley commented on SPARK-8744:
--

Good point, I'll link a JIRA for that.

Also, there needs to be a constructor which does not require a UID, but 
generates one automatically.

> StringIndexerModel should have public constructor
> -
>
> Key: SPARK-8744
> URL: https://issues.apache.org/jira/browse/SPARK-8744
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be helpful to allow users to pass a pre-computed index to create an 
> indexer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8764) StringIndexer should take option to handle unseen values

2015-07-01 Thread Joseph K. Bradley (JIRA)

Joseph K. Bradley created SPARK-8764:


 Summary: StringIndexer should take option to handle unseen values
 Key: SPARK-8764
 URL: https://issues.apache.org/jira/browse/SPARK-8764
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Minor


The option should be a Param, probably set to false by default (throwing 
exception when encountering unseen values).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8744) StringIndexerModel should have public constructor

2015-07-01 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-8744:
-
Description: It would be helpful to allow users to pass a pre-computed 
index to create an indexer, rather than always going through StringIndexer to 
create the model.  (was: It would be helpful to allow users to pass a 
pre-computed index to create an indexer.)

> StringIndexerModel should have public constructor
> -
>
> Key: SPARK-8744
> URL: https://issues.apache.org/jira/browse/SPARK-8744
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be helpful to allow users to pass a pre-computed index to create an 
> indexer, rather than always going through StringIndexer to create the model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method

2015-07-01 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610720#comment-14610720
 ] 

Joseph K. Bradley commented on SPARK-1503:
--

I appreciate it!

> Implement Nesterov's accelerated first-order method
> ---
>
> Key: SPARK-1503
> URL: https://issues.apache.org/jira/browse/SPARK-1503
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Xiangrui Meng
>Assignee: Aaron Staple
> Attachments: linear.png, linear_l1.png, logistic.png, logistic_l2.png
>
>
> Nesterov's accelerated first-order method is a drop-in replacement for 
> steepest descent but it converges much faster. We should implement this 
> method and compare its performance with existing algorithms, including SGD 
> and L-BFGS.
> TFOCS (http://cvxr.com/tfocs/) is a reference implementation of Nesterov's 
> method and its variants on composite objectives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3071) Increase default driver memory

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-3071:
-
Assignee: Ilya Ganelin

> Increase default driver memory
> --
>
> Key: SPARK-3071
> URL: https://issues.apache.org/jira/browse/SPARK-3071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Xiangrui Meng
>Assignee: Ilya Ganelin
>
> The current default is 512M, which is usually too small because user also 
> uses driver to do some computation. In local mode, executor memory setting is 
> ignored while only driver memory is used, which provides more incentive to 
> increase the default driver memory.
> I suggest
> 1. 2GB in local mode and warn users if executor memory is set a bigger value
> 2. same as worker memory on an EC2 standalone server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3071) Increase default driver memory

2015-07-01 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-3071:
-
Affects Version/s: 1.4.2
 Target Version/s: 1.5.0

> Increase default driver memory
> --
>
> Key: SPARK-3071
> URL: https://issues.apache.org/jira/browse/SPARK-3071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.4.2
>Reporter: Xiangrui Meng
>Assignee: Ilya Ganelin
>
> The current default is 512M, which is usually too small because user also 
> uses driver to do some computation. In local mode, executor memory setting is 
> ignored while only driver memory is used, which provides more incentive to 
> increase the default driver memory.
> I suggest
> 1. 2GB in local mode and warn users if executor memory is set a bigger value
> 2. same as worker memory on an EC2 standalone server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB

2015-07-01 Thread venu k tangirala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610712#comment-14610712
 ] 

venu k tangirala commented on SPARK-6101:
-

So doing amazon DynamoDB mapper is a spark map function at the end of all my 
transformations would work ?

> Create a SparkSQL DataSource API implementation for DynamoDB
> 
>
> Key: SPARK-6101
> URL: https://issues.apache.org/jira/browse/SPARK-6101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.5.0
>
>
> similar to https://github.com/databricks/spark-avro  and 
> https://github.com/databricks/spark-csv



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6284) Support framework authentication and role in Mesos framework

2015-07-01 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6284:
-
Assignee: Timothy Chen
Target Version/s: 1.5.0

> Support framework authentication and role in Mesos framework
> 
>
> Key: SPARK-6284
> URL: https://issues.apache.org/jira/browse/SPARK-6284
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>
> Support framework authentication and role in both Coarse grain and fine grain 
> mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610701#comment-14610701
 ] 

Apache Spark commented on SPARK-4485:
-

User 'kai-zeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7162

> Add broadcast outer join to  optimize left outer join and right outer join
> --
>
> Key: SPARK-4485
> URL: https://issues.apache.org/jira/browse/SPARK-4485
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: XiaoJing wang
>Assignee: Kai Zeng
>Priority: Critical
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> For now, spark use broadcast join instead of hash join to optimize {{inner 
> join}} when the size of one side data did not reach the 
> {{AUTO_BROADCASTJOIN_THRESHOLD}}
> However,Spark SQL will perform shuffle operations on each child relations 
> while executing 
> {{left outer join}} and {{right outer join}}.   {outer join}} is more 
> suitable for optimiztion with broadcast join. 
> We are planning to create a {{BroadcastHashouterJoin}} to implement the 
> broadcast join for {{left outer join}} and {{right outer join}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-4485:
---

Assignee: Apache Spark  (was: Kai Zeng)

> Add broadcast outer join to  optimize left outer join and right outer join
> --
>
> Key: SPARK-4485
> URL: https://issues.apache.org/jira/browse/SPARK-4485
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: XiaoJing wang
>Assignee: Apache Spark
>Priority: Critical
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> For now, spark use broadcast join instead of hash join to optimize {{inner 
> join}} when the size of one side data did not reach the 
> {{AUTO_BROADCASTJOIN_THRESHOLD}}
> However,Spark SQL will perform shuffle operations on each child relations 
> while executing 
> {{left outer join}} and {{right outer join}}.   {outer join}} is more 
> suitable for optimiztion with broadcast join. 
> We are planning to create a {{BroadcastHashouterJoin}} to implement the 
> broadcast join for {{left outer join}} and {{right outer join}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-4485:
---

Assignee: Kai Zeng  (was: Apache Spark)

> Add broadcast outer join to  optimize left outer join and right outer join
> --
>
> Key: SPARK-4485
> URL: https://issues.apache.org/jira/browse/SPARK-4485
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: XiaoJing wang
>Assignee: Kai Zeng
>Priority: Critical
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> For now, spark use broadcast join instead of hash join to optimize {{inner 
> join}} when the size of one side data did not reach the 
> {{AUTO_BROADCASTJOIN_THRESHOLD}}
> However,Spark SQL will perform shuffle operations on each child relations 
> while executing 
> {{left outer join}} and {{right outer join}}.   {outer join}} is more 
> suitable for optimiztion with broadcast join. 
> We are planning to create a {{BroadcastHashouterJoin}} to implement the 
> broadcast join for {{left outer join}} and {{right outer join}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8621) crosstab exception when one of the value is empty

2015-07-01 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8621.

   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 1.4.2
   1.5.0

> crosstab exception when one of the value is empty
> -
>
> Key: SPARK-8621
> URL: https://issues.apache.org/jira/browse/SPARK-8621
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Wenchen Fan
>Priority: Critical
> Fix For: 1.5.0, 1.4.2
>
>
> I think this happened because some value is empty.
> {code}
> scala> df1.stat.crosstab("role", "lang")
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: ;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.parseAttributeName(LogicalPlan.scala:145)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:135)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:157)
>   at org.apache.spark.sql.DataFrame.col(DataFrame.scala:603)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:394)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:160)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:157)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:157)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:147)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:132)
>   at 
> org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:132)
>   at 
> org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:91)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8752) Add ExpectsInputTypes trait for defining expected input types.

2015-07-01 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8752.

   Resolution: Fixed
Fix Version/s: 1.5.0

> Add ExpectsInputTypes trait for defining expected input types.
> --
>
> Key: SPARK-8752
> URL: https://issues.apache.org/jira/browse/SPARK-8752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7714) SparkR tests should use more specific expectations than expect_true

2015-07-01 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-7714:
-
Assignee: Sun Rui

> SparkR tests should use more specific expectations than expect_true
> ---
>
> Key: SPARK-7714
> URL: https://issues.apache.org/jira/browse/SPARK-7714
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Josh Rosen
>Assignee: Sun Rui
> Fix For: 1.5.0
>
>
> SparkR's test use testthat's {{expect_true(foo == bar)}}, but using 
> expectations like {{expect_equal(foo, bar)}} will give informative error 
> messages if the assertion fails.  We should update the existing tests to use 
> the more specific matchers, such as expect_equal, expect_is, 
> expect_identical, expect_error, etc.
> See http://r-pkgs.had.co.nz/tests.html for more documentation on testtthat 
> expectation functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7714) SparkR tests should use more specific expectations than expect_true

2015-07-01 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-7714.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7152
[https://github.com/apache/spark/pull/7152]

> SparkR tests should use more specific expectations than expect_true
> ---
>
> Key: SPARK-7714
> URL: https://issues.apache.org/jira/browse/SPARK-7714
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Josh Rosen
> Fix For: 1.5.0
>
>
> SparkR's test use testthat's {{expect_true(foo == bar)}}, but using 
> expectations like {{expect_equal(foo, bar)}} will give informative error 
> messages if the assertion fails.  We should update the existing tests to use 
> the more specific matchers, such as expect_equal, expect_is, 
> expect_identical, expect_error, etc.
> See http://r-pkgs.had.co.nz/tests.html for more documentation on testtthat 
> expectation functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8763.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7161
[https://github.com/apache/spark/pull/7161]

> executing run-tests.py with Python 2.6 fails with absence of 
> subprocess.check_output function
> -
>
> Key: SPARK-8763
> URL: https://issues.apache.org/jira/browse/SPARK-8763
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
>Reporter: Tomohiko K.
>  Labels: pyspark, testing
> Fix For: 1.5.0
>
>
> Running run-tests.py with Python 2.6 cause following error:
> {noformat}
> Running PySpark tests. Output is in 
> python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
> Will test against the following Python executables: ['python2.6', 
> 'python3.4', 'pypy']
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> Traceback (most recent call last):
>   File "./python/run-tests.py", line 196, in 
> main()
>   File "./python/run-tests.py", line 159, in main
> python_implementation = subprocess.check_output(
> AttributeError: 'module' object has no attribute 'check_output'
> ...
> {noformat}
> The cause of this error is using subprocess.check_output function, which 
> exists since Python 2.7.
> (ref. 
> https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-07-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+

2015-07-01 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610589#comment-14610589
 ] 

Shivaram Venkataraman commented on SPARK-8724:
--

Sure. We could add that to the examples. I also think we could also add some 
more details on how to launch SparkR if you are not using the script bin/sparkR 
(i.e. from RStudio or from plain R).

> Need documentation on how to deploy or use SparkR in Spark 1.4.0+
> -
>
> Key: SPARK-8724
> URL: https://issues.apache.org/jira/browse/SPARK-8724
> Project: Spark
>  Issue Type: Bug
>  Components: R
>Affects Versions: 1.4.0
>Reporter: Felix Cheung
>Priority: Minor
>
> As of now there doesn't seem to be any official documentation on how to 
> deploy SparkR with Spark 1.4.0+
> Also, cluster manager specific documentation (like 
> http://spark.apache.org/docs/latest/spark-standalone.html) does not call out 
> what mode is supported for SparkR and details on deployment steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6833) Extend `addPackage` so that any given R file can be sourced in the worker before functions are run.

2015-07-01 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-6833.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Thanks [~sunrui] for checking this. We should add documentation for this but we 
can use another JIRA for this I guess

> Extend `addPackage` so that any given R file can be sourced in the worker 
> before functions are run.
> ---
>
> Key: SPARK-6833
> URL: https://issues.apache.org/jira/browse/SPARK-6833
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Priority: Minor
> Fix For: 1.5.0
>
>
> Similar to how extra python files or packages can be specified (in zip / egg 
> formats), it will be good to support the ability to add extra R files to the 
> executors working directory.
> One thing that needs to be investigated is if this will just work out of the 
> box using the spark-submit flag --files ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8744) StringIndexerModel should have public constructor

2015-07-01 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610558#comment-14610558
 ] 

yuhao yang edited comment on SPARK-8744 at 7/1/15 4:10 PM:
---

There seems to be more jobs than simply changing the access modifiers. Since a 
passed-in labels will have a larger chance to trigger the "unseen label" 
exception. Perhaps we should address the exception first.


was (Author: yuhaoyan):
Just a reminder:
There seems to be more jobs to do than simply change the access modifiers. 
Since a passed-in labels will have a larger chance to trigger the "unseen 
label" exception. Perhaps we should address the exception first.

> StringIndexerModel should have public constructor
> -
>
> Key: SPARK-8744
> URL: https://issues.apache.org/jira/browse/SPARK-8744
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be helpful to allow users to pass a pre-computed index to create an 
> indexer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor

2015-07-01 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610558#comment-14610558
 ] 

yuhao yang commented on SPARK-8744:
---

Just a reminder:
There seems to be more jobs to do than simply change the access modifiers. 
Since a passed-in labels will have a larger chance to trigger the "unseen 
label" exception. Perhaps we should address the exception first.

> StringIndexerModel should have public constructor
> -
>
> Key: SPARK-8744
> URL: https://issues.apache.org/jira/browse/SPARK-8744
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be helpful to allow users to pass a pre-computed index to create an 
> indexer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-01 Thread Mark Stephenson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610555#comment-14610555
 ] 

Mark Stephenson commented on SPARK-8596:


[~cantdutchthis]: we have been getting the same error and it's definitely a 
user permissions issue.  Even when giving the new RStudio user ownership rights 
to the ./spark folder, there are additional classpath errors. 

We are working on a solution today to utilize and login to RStudio as the 
'hadoop' user to start with, just to make sure that the proof of concept works, 
and then expound a longer term solution with some potential bootstrap code.  
Will advise once we have it solved.

> Install and configure RStudio server on Spark EC2
> -
>
> Key: SPARK-8596
> URL: https://issues.apache.org/jira/browse/SPARK-8596
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2, SparkR
>Reporter: Shivaram Venkataraman
>
> This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610530#comment-14610530
 ] 

Apache Spark commented on SPARK-8763:
-

User 'cocoatomo' has created a pull request for this issue:
https://github.com/apache/spark/pull/7161

> executing run-tests.py with Python 2.6 fails with absence of 
> subprocess.check_output function
> -
>
> Key: SPARK-8763
> URL: https://issues.apache.org/jira/browse/SPARK-8763
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
>Reporter: Tomohiko K.
>  Labels: pyspark, testing
>
> Running run-tests.py with Python 2.6 cause following error:
> {noformat}
> Running PySpark tests. Output is in 
> python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
> Will test against the following Python executables: ['python2.6', 
> 'python3.4', 'pypy']
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> Traceback (most recent call last):
>   File "./python/run-tests.py", line 196, in 
> main()
>   File "./python/run-tests.py", line 159, in main
> python_implementation = subprocess.check_output(
> AttributeError: 'module' object has no attribute 'check_output'
> ...
> {noformat}
> The cause of this error is using subprocess.check_output function, which 
> exists since Python 2.7.
> (ref. 
> https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8763:
---

Assignee: Apache Spark

> executing run-tests.py with Python 2.6 fails with absence of 
> subprocess.check_output function
> -
>
> Key: SPARK-8763
> URL: https://issues.apache.org/jira/browse/SPARK-8763
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
>Reporter: Tomohiko K.
>Assignee: Apache Spark
>  Labels: pyspark, testing
>
> Running run-tests.py with Python 2.6 cause following error:
> {noformat}
> Running PySpark tests. Output is in 
> python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
> Will test against the following Python executables: ['python2.6', 
> 'python3.4', 'pypy']
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> Traceback (most recent call last):
>   File "./python/run-tests.py", line 196, in 
> main()
>   File "./python/run-tests.py", line 159, in main
> python_implementation = subprocess.check_output(
> AttributeError: 'module' object has no attribute 'check_output'
> ...
> {noformat}
> The cause of this error is using subprocess.check_output function, which 
> exists since Python 2.7.
> (ref. 
> https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8763:
---

Assignee: (was: Apache Spark)

> executing run-tests.py with Python 2.6 fails with absence of 
> subprocess.check_output function
> -
>
> Key: SPARK-8763
> URL: https://issues.apache.org/jira/browse/SPARK-8763
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
>Reporter: Tomohiko K.
>  Labels: pyspark, testing
>
> Running run-tests.py with Python 2.6 cause following error:
> {noformat}
> Running PySpark tests. Output is in 
> python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
> Will test against the following Python executables: ['python2.6', 
> 'python3.4', 'pypy']
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> Traceback (most recent call last):
>   File "./python/run-tests.py", line 196, in 
> main()
>   File "./python/run-tests.py", line 159, in main
> python_implementation = subprocess.check_output(
> AttributeError: 'module' object has no attribute 'check_output'
> ...
> {noformat}
> The cause of this error is using subprocess.check_output function, which 
> exists since Python 2.7.
> (ref. 
> https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8265) Add LinearDataGenerator to pyspark.mllib.utils

2015-07-01 Thread Manoj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Kumar resolved SPARK-8265.

   Resolution: Fixed
Fix Version/s: 1.5.0

> Add LinearDataGenerator to pyspark.mllib.utils
> --
>
> Key: SPARK-8265
> URL: https://issues.apache.org/jira/browse/SPARK-8265
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
> Fix For: 1.5.0
>
>
> This is useful in testing various linear models in pyspark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>

2015-07-01 Thread Alexis Seigneurin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610456#comment-14610456
 ] 

Alexis Seigneurin commented on SPARK-4557:
--

Here: 
https://github.com/aseigneurin/spark/commit/9a4019caf8a3de956635a0030a43f0a5a9f4edbc

And see how I've used it in Java: 
https://gist.github.com/aseigneurin/a200155c89cd0035d0e8

> Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a 
> Function<..., Void>
> ---
>
> Key: SPARK-4557
> URL: https://issues.apache.org/jira/browse/SPARK-4557
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Alexis Seigneurin
>Priority: Minor
>  Labels: starter
>
> In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You 
> have to write:
> {code:java}
> .foreachRDD(items -> {
> ...;
> return null;
> });
> {code}
> Instead of:
> {code:java}
> .foreachRDD(items -> ...);
> {code}
> This is because the foreachRDD method accepts a Function, Void> 
> instead of a VoidFunction>. This would make sense to change it 
> to a VoidFunction as, in Spark's API, the foreach method already accepts a 
> VoidFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Tomohiko K. (JIRA)

Tomohiko K. created SPARK-8763:
--

 Summary: executing run-tests.py with Python 2.6 fails with absence 
of subprocess.check_output function
 Key: SPARK-8763
 URL: https://issues.apache.org/jira/browse/SPARK-8763
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.5.0
 Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
Reporter: Tomohiko K.


Running run-tests.py with Python 2.6 cause following error:

{noformat}
Running PySpark tests. Output is in 
python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
Will test against the following Python executables: ['python2.6', 'python3.4', 
'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Traceback (most recent call last):
  File "./python/run-tests.py", line 196, in 
main()
  File "./python/run-tests.py", line 159, in main
python_implementation = subprocess.check_output(
AttributeError: 'module' object has no attribute 'check_output'
...
{noformat}

The cause of this error is using subprocess.check_output function, which exists 
since Python 2.7.
(ref. 
https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>

2015-07-01 Thread somil deshmukh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610414#comment-14610414
 ] 

somil deshmukh commented on SPARK-4557:
---

Can u provide me some example of JavaDStreamLike,which I run and check whether 
it is breaking or not.

> Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a 
> Function<..., Void>
> ---
>
> Key: SPARK-4557
> URL: https://issues.apache.org/jira/browse/SPARK-4557
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Alexis Seigneurin
>Priority: Minor
>  Labels: starter
>
> In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You 
> have to write:
> {code:java}
> .foreachRDD(items -> {
> ...;
> return null;
> });
> {code}
> Instead of:
> {code:java}
> .foreachRDD(items -> ...);
> {code}
> This is because the foreachRDD method accepts a Function, Void> 
> instead of a VoidFunction>. This would make sense to change it 
> to a VoidFunction as, in Spark's API, the foreach method already accepts a 
> VoidFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8733) ML RDD.unpersist calls should use blocking = false

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8733:
---

Assignee: (was: Apache Spark)

> ML RDD.unpersist calls should use blocking = false
> --
>
> Key: SPARK-8733
> URL: https://issues.apache.org/jira/browse/SPARK-8733
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
> Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MLlib uses unpersist in many places, but is not consistent about blocking vs 
> not.  We should check through all of MLlib and change calls to use blocking = 
> false, unless there is a real need to block.  I have run into issues with 
> futures timing out because of unpersist() calls, when there was no real need 
> for the ML method to fail.
> See attached screenshot.  Training succeeded, but the final unpersist during 
> cleanup failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8733) ML RDD.unpersist calls should use blocking = false

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610408#comment-14610408
 ] 

Apache Spark commented on SPARK-8733:
-

User 'ilganeli' has created a pull request for this issue:
https://github.com/apache/spark/pull/7160

> ML RDD.unpersist calls should use blocking = false
> --
>
> Key: SPARK-8733
> URL: https://issues.apache.org/jira/browse/SPARK-8733
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
> Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MLlib uses unpersist in many places, but is not consistent about blocking vs 
> not.  We should check through all of MLlib and change calls to use blocking = 
> false, unless there is a real need to block.  I have run into issues with 
> futures timing out because of unpersist() calls, when there was no real need 
> for the ML method to fail.
> See attached screenshot.  Training succeeded, but the final unpersist during 
> cleanup failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8733) ML RDD.unpersist calls should use blocking = false

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8733:
---

Assignee: Apache Spark

> ML RDD.unpersist calls should use blocking = false
> --
>
> Key: SPARK-8733
> URL: https://issues.apache.org/jira/browse/SPARK-8733
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
> Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MLlib uses unpersist in many places, but is not consistent about blocking vs 
> not.  We should check through all of MLlib and change calls to use blocking = 
> false, unless there is a real need to block.  I have run into issues with 
> futures timing out because of unpersist() calls, when there was no real need 
> for the ML method to fail.
> See attached screenshot.  Training succeeded, but the final unpersist during 
> cleanup failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>

2015-07-01 Thread Alexis Seigneurin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610376#comment-14610376
 ] 

Alexis Seigneurin commented on SPARK-4557:
--

Yes, but the problem is not compiling Spark's code with this API change.

The problem is about compiling Java code that is using the updated 
JavaDStreamLike interface: either the Java code does not compile, or you break 
the API. None is ideal.

> Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a 
> Function<..., Void>
> ---
>
> Key: SPARK-4557
> URL: https://issues.apache.org/jira/browse/SPARK-4557
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Alexis Seigneurin
>Priority: Minor
>  Labels: starter
>
> In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You 
> have to write:
> {code:java}
> .foreachRDD(items -> {
> ...;
> return null;
> });
> {code}
> Instead of:
> {code:java}
> .foreachRDD(items -> ...);
> {code}
> This is because the foreachRDD method accepts a Function, Void> 
> instead of a VoidFunction>. This would make sense to change it 
> to a VoidFunction as, in Spark's API, the foreach method already accepts a 
> VoidFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>

2015-07-01 Thread somil deshmukh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610334#comment-14610334
 ] 

somil deshmukh commented on SPARK-4557:
---

In  JavaDStreamLike.scala ,I have replace JFunction(R,Void) to 
JVoidFunction(R),I have complied the code with no error using  Java 7.

> Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a 
> Function<..., Void>
> ---
>
> Key: SPARK-4557
> URL: https://issues.apache.org/jira/browse/SPARK-4557
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Alexis Seigneurin
>Priority: Minor
>  Labels: starter
>
> In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You 
> have to write:
> {code:java}
> .foreachRDD(items -> {
> ...;
> return null;
> });
> {code}
> Instead of:
> {code:java}
> .foreachRDD(items -> ...);
> {code}
> This is because the foreachRDD method accepts a Function, Void> 
> instead of a VoidFunction>. This would make sense to change it 
> to a VoidFunction as, in Spark's API, the foreach method already accepts a 
> VoidFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8762) Maven build fails if the project is in a symlinked folder

2015-07-01 Thread Roman Zenka (JIRA)

Roman Zenka created SPARK-8762:
--

 Summary: Maven build fails if the project is in a symlinked folder
 Key: SPARK-8762
 URL: https://issues.apache.org/jira/browse/SPARK-8762
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: CentOS, Java 1.7, Maven 3.3.3
Reporter: Roman Zenka
Priority: Minor


Build was failing mysteriously in spark-core module with following error:

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on 
project spark-core_2.10: Compilation failure: Compilation failure:
[ERROR] 
/mnt/jenkins/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/DeveloperApi.java:[35,8]
 error: duplicate class: org.apache.spark.annotation.DeveloperApi
[ERROR] 
/mnt/jenkins/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/Experimental.java:[36,8]
 error: duplicate class: org.apache.spark.annotation.Experimental
[ERROR] 
/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/AlphaComponent.java:[33,8]
 error: duplicate class: org.apache.spark.annotation.AlphaComponent
[ERROR] 
/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/Private.java:[41,8]
 error: duplicate class: org.apache.spark.annotation.Private
[ERROR] -> [Help 1]

The /var/lib/jenkins folder is actually a symlink to 
/mnt/jenkins/var/lib/jenkins. This confuses the compiler that seems to resolve 
some paths and keep others intact, which leads to the same class appearing 
"twice" during the compilation.

The workaround is to always point the build to the physical folder, never build 
through a symlink. I have not determined the precise source of the error, but 
it is likely inside Maven.

The fix could be as easy as mentioning that this issue exists in the FAQ so 
others running into it can fix it instantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8734) Expose all Mesos DockerInfo options to Spark

2015-07-01 Thread Chris Heller (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610289#comment-14610289
 ] 

Chris Heller commented on SPARK-8734:
-

I've started work on this @ 
https://github.com/hellertime/spark/tree/feature/SPARK-8734

Once I have all the fields I'll submit a PR, but for those eager to try it out 
feel free to fetch and merge.

> Expose all Mesos DockerInfo options to Spark
> 
>
> Key: SPARK-8734
> URL: https://issues.apache.org/jira/browse/SPARK-8734
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Chris Heller
>Priority: Minor
>
> SPARK-2691 only exposed a few options from the DockerInfo message. It would 
> be reasonable to expose them all, especially given one can now specify 
> arbitrary parameters to docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-8291) Add parse functionality to LabeledPoint in PySpark

2015-07-01 Thread Manoj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Kumar closed SPARK-8291.
--
Resolution: Won't Fix

> Add parse functionality to LabeledPoint in PySpark
> --
>
> Key: SPARK-8291
> URL: https://issues.apache.org/jira/browse/SPARK-8291
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> It is useful to have functionality that can parse a string into a 
> LabeledPoint while loading files, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6602) Replace direct use of Akka with Spark RPC interface

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610274#comment-14610274
 ] 

Apache Spark commented on SPARK-6602:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/7159

> Replace direct use of Akka with Spark RPC interface
> ---
>
> Key: SPARK-6602
> URL: https://issues.apache.org/jira/browse/SPARK-6602
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Shixiong Zhu
>Priority: Critical
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8755:
---

Assignee: (was: Apache Spark)

> Streaming application from checkpoint will fail to load in security mode.
> -
>
> Key: SPARK-8755
> URL: https://issues.apache.org/jira/browse/SPARK-8755
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>
> If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
> need *kinit* in the client machine.
> But when the application was recorved from checkpoint file, it had to 
> *kinit*, because:
> The checkpoint did not use this configurations before it use a DFSClient to 
> fetch the ckeckpoint file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.

2015-07-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610065#comment-14610065
 ] 

Apache Spark commented on SPARK-8755:
-

User 'SaintBacchus' has created a pull request for this issue:
https://github.com/apache/spark/pull/7158

> Streaming application from checkpoint will fail to load in security mode.
> -
>
> Key: SPARK-8755
> URL: https://issues.apache.org/jira/browse/SPARK-8755
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>
> If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
> need *kinit* in the client machine.
> But when the application was recorved from checkpoint file, it had to 
> *kinit*, because:
> The checkpoint did not use this configurations before it use a DFSClient to 
> fetch the ckeckpoint file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8755:
---

Assignee: Apache Spark

> Streaming application from checkpoint will fail to load in security mode.
> -
>
> Key: SPARK-8755
> URL: https://issues.apache.org/jira/browse/SPARK-8755
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>Assignee: Apache Spark
>
> If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
> need *kinit* in the client machine.
> But when the application was recorved from checkpoint file, it had to 
> *kinit*, because:
> The checkpoint did not use this configurations before it use a DFSClient to 
> fetch the ckeckpoint file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.

2015-07-01 Thread SaintBacchus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-8755:

Description: 
If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
need *kinit* in the client machine.
But when the application was recorved from checkpoint file, it had to *kinit*, 
because:
The checkpoint did not use this configurations before it use a DFSClient to 
fetch the ckeckpoint file.

  was:
If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
need *kinit* in the client machine.
But the application was recorved from checkpoint file, it had to *kinit*, 
because:
 the checkpoint did not use this configurations before it use a DFSClient to 
fetch the ckeckpoint file.


> Streaming application from checkpoint will fail to load in security mode.
> -
>
> Key: SPARK-8755
> URL: https://issues.apache.org/jira/browse/SPARK-8755
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>
> If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not 
> need *kinit* in the client machine.
> But when the application was recorved from checkpoint file, it had to 
> *kinit*, because:
> The checkpoint did not use this configurations before it use a DFSClient to 
> fetch the ckeckpoint file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method

2015-07-01 Thread Kai Sasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610018#comment-14610018
 ] 

Kai Sasaki commented on SPARK-1503:
---

[~staple] [~josephkb] Thank you for pinging and inspiring information! I'll 
rewrite current patch based on your logic and codes. Thanks a lot.


> Implement Nesterov's accelerated first-order method
> ---
>
> Key: SPARK-1503
> URL: https://issues.apache.org/jira/browse/SPARK-1503
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Xiangrui Meng
>Assignee: Aaron Staple
> Attachments: linear.png, linear_l1.png, logistic.png, logistic_l2.png
>
>
> Nesterov's accelerated first-order method is a drop-in replacement for 
> steepest descent but it converges much faster. We should implement this 
> method and compare its performance with existing algorithms, including SGD 
> and L-BFGS.
> TFOCS (http://cvxr.com/tfocs/) is a reference implementation of Nesterov's 
> method and its variants on composite objectives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8761) Master.removeApplication is not thread-safe but is called from multiple threads

2015-07-01 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-8761:
---

 Summary: Master.removeApplication is not thread-safe but is called 
from multiple threads
 Key: SPARK-8761
 URL: https://issues.apache.org/jira/browse/SPARK-8761
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Reporter: Shixiong Zhu


Master.removeApplication is not thread-safe. But it's called both in the 
message loop of Master and MasterPage.handleAppKillRequest which runs in 
threads of the Web server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8760) allow moving and symlinking binaries

2015-07-01 Thread Philipp Angerer (JIRA)

Philipp Angerer created SPARK-8760:
--

 Summary: allow moving and symlinking binaries
 Key: SPARK-8760
 URL: https://issues.apache.org/jira/browse/SPARK-8760
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Spark Shell, Spark Submit, SparkR
Affects Versions: 1.4.0
Reporter: Philipp Angerer
Priority: Minor


you use the following line to determine {{$SPARK_HOME}} in all binaries

{code:none}
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
{code}

however users should be able to override this. also symlinks should be followed:

{code:none}
if [[ -z "$SPARK_HOME" ]]; then
export SPARK_HOME="$(dirname "$(readlink -f "$0")")"
fi
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8723) improve code gen for divide and remainder

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8723:
-
Assignee: Wenchen Fan

> improve code gen for divide and remainder
> -
>
> Key: SPARK-8723
> URL: https://issues.apache.org/jira/browse/SPARK-8723
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8727) Add missing python api

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8727:
-
Assignee: Tarek Auel

> Add missing python api
> --
>
> Key: SPARK-8727
> URL: https://issues.apache.org/jira/browse/SPARK-8727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Tarek Auel
>Assignee: Tarek Auel
> Fix For: 1.5.0
>
>
> Add the python api that is missing for
> https://issues.apache.org/jira/browse/SPARK-8248
> https://issues.apache.org/jira/browse/SPARK-8234
> https://issues.apache.org/jira/browse/SPARK-8217
> https://issues.apache.org/jira/browse/SPARK-8215
> https://issues.apache.org/jira/browse/SPARK-8212



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8692) re-order the case statements that handling catalyst data types

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8692:
-
Assignee: Wenchen Fan

> re-order the case statements that handling catalyst data types 
> ---
>
> Key: SPARK-8692
> URL: https://issues.apache.org/jira/browse/SPARK-8692
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8615) sql programming guide recommends deprecated code

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8615:
-
Assignee: Tijo Thomas

> sql programming guide recommends deprecated code
> 
>
> Key: SPARK-8615
> URL: https://issues.apache.org/jira/browse/SPARK-8615
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: Gergely Svigruha
>Assignee: Tijo Thomas
>Priority: Minor
> Fix For: 1.5.0
>
>
> The Spark 1.4 sql programming guide has an example code on how to use JDBC 
> tables:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
> sqlContext.load("jdbc", Map(...))
> However this code complies with a warning, and recommends to do this:
>  sqlContext.read.format("jdbc").options(Map(...)).load()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8590) add code gen for ExtractValue

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8590:
-
Assignee: Wenchen Fan

> add code gen for ExtractValue
> -
>
> Key: SPARK-8590
> URL: https://issues.apache.org/jira/browse/SPARK-8590
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8589) cleanup DateTimeUtils

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8589:
-
Assignee: Wenchen Fan

> cleanup DateTimeUtils
> -
>
> Key: SPARK-8589
> URL: https://issues.apache.org/jira/browse/SPARK-8589
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8535) PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8535:
-
Assignee: Yuri Saito

> PySpark : Can't create DataFrame from Pandas dataframe with no explicit 
> column name
> ---
>
> Key: SPARK-8535
> URL: https://issues.apache.org/jira/browse/SPARK-8535
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.4.0
>Reporter: Christophe Bourguignat
>Assignee: Yuri Saito
> Fix For: 1.5.0
>
>
> Trying to create a Spark DataFrame from a pandas dataframe with no explicit 
> column name : 
> pandasDF = pd.DataFrame([[1, 2], [5, 6]])
> sparkDF = sqlContext.createDataFrame(pandasDF)
> ***
> > 1 sparkDF = sqlContext.createDataFrame(pandasDF)
> /usr/local/Cellar/apache-spark/1.4.0/libexec/python/pyspark/sql/context.pyc 
> in createDataFrame(self, data, schema, samplingRatio)
> 344 
> 345 jrdd = 
> self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
> --> 346 df = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), 
> schema.json())
> 347 return DataFrame(df, self)
> 348 
> /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 536 answer = self.gateway_client.send_command(command)
> 537 return_value = get_return_value(answer, self.gateway_client,
> --> 538 self.target_id, self.name)
> 539 
> 540 for temp_arg in temp_args:
> /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
> 298 raise Py4JJavaError(
> 299 'An error occurred while calling {0}{1}{2}.\n'.
> --> 300 format(target_id, '.', name), value)
> 301 else:
> 302 raise Py4JError(
> Py4JJavaError: An error occurred while calling o87.applySchemaToPythonRDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8236) misc function: crc32

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8236:
-
Assignee: Tarek Auel

> misc function: crc32
> 
>
> Key: SPARK-8236
> URL: https://issues.apache.org/jira/browse/SPARK-8236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
> Fix For: 1.5.0
>
>
> crc32(string/binary): bigint
> Computes a cyclic redundancy check value for string or binary argument and 
> returns bigint value (as of Hive 1.3.0). Example: crc32('ABC') = 2743272264.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8235) misc function: sha1 / sha

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8235:
-
Assignee: Tarek Auel

> misc function: sha1 / sha
> -
>
> Key: SPARK-8235
> URL: https://issues.apache.org/jira/browse/SPARK-8235
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
> Fix For: 1.5.0
>
>
> sha1(string/binary): string
> sha(string/binary): string
> Calculates the SHA-1 digest for string or binary and returns the value as a 
> hex string (as of Hive 1.3.0). Example: sha1('ABC') = 
> '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8031) Version number written to Hive metastore is "0.13.1aa" instead of "0.13.1a"

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8031:
-
Assignee: Cheng Lian

> Version number written to Hive metastore is "0.13.1aa" instead of "0.13.1a"
> ---
>
> Key: SPARK-8031
> URL: https://issues.apache.org/jira/browse/SPARK-8031
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1, 1.4.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Trivial
> Fix For: 1.5.0
>
>
> While debugging {{CliSuite}} for 1.4.0-SNAPSHOT, noticed the following WARN 
> log line:
> {noformat}
> 15/06/02 13:40:29 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 0.13.1aa
> {noformat}
> The problem is that, the version of Hive dependencies 1.4.0-SNAPSHOT uses is 
> {{0.13.1a}} (the one shaded by [~pwendell]), but the version showed in this 
> line is {{0.13.1aa}} (one more {{a}}). The WARN log itself is OK since 
> {{CliSuite}} initializes a brand new temporary Derby metastore.
> While initializing Hive metastore, Hive calls {{ObjectStore.checkSchema()}} 
> and may write the "short" version string to metastore. This short version 
> string is defined by {{hive.version.shortname}} in the POM. However, [it was 
> defined as 
> {{0.13.1aa}}|https://github.com/pwendell/hive/commit/32e515907f0005c7a28ee388eadd1c94cf99b2d4#diff-600376dffeb79835ede4a0b285078036R62].
>  Confirmed with [~pwendell] that it should be a typo.
> This doesn't cause any trouble for now, but we probably want to fix this in 
> the future if we ever need to release another shaded version of Hive 0.13.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3258) Python API for streaming MLlib algorithms

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3258:
-
Assignee: Manoj Kumar

> Python API for streaming MLlib algorithms
> -
>
> Key: SPARK-3258
> URL: https://issues.apache.org/jira/browse/SPARK-3258
> Project: Spark
>  Issue Type: Umbrella
>  Components: MLlib, PySpark, Streaming
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
> Fix For: 1.5.0
>
>
> This is an umbrella JIRA to track Python port of streaming MLlib algorithms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7810) rdd.py "_load_from_socket" cannot load data from jvm socket if ipv6 is used

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7810:
-
Assignee: Ai He

> rdd.py "_load_from_socket" cannot load data from jvm socket if ipv6 is used
> ---
>
> Key: SPARK-7810
> URL: https://issues.apache.org/jira/browse/SPARK-7810
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.3.1
>Reporter: Ai He
>Assignee: Ai He
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
>
> Method "_load_from_socket" in rdd.py cannot load data from jvm socket if ipv6 
> is used. The current method only works well with ipv4. New modification 
> should work around both two protocols.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8759) add default eval to binary and unary expression according to default behavior of nullable

2015-07-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8759:
---

Assignee: (was: Apache Spark)

> add default eval to binary and unary expression according to default behavior 
> of nullable
> -
>
> Key: SPARK-8759
> URL: https://issues.apache.org/jira/browse/SPARK-8759
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8731) Beeline doesn't work with -e option when started in background

2015-07-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8731:
-
Component/s: SQL

> Beeline doesn't work with -e option when started in background
> --
>
> Key: SPARK-8731
> URL: https://issues.apache.org/jira/browse/SPARK-8731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Wang Yiguang
>Priority: Minor
>
> Beeline stops when running back ground like this:
> beeline -e "some query" &
> it doesn't work even with the -f switch.
> For example:
> this works:
> beeline -u "jdbc:hive2://0.0.0.0:8000" -e "show databases;" 
> however this not:
> beeline -u "jdbc:hive2://0.0.0.0:8000" -e "show databases;"  &



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 >

101 - 200 of 240 matches

Mail list logo