[jira] [Updated] (SPARK-8265) Add LinearDataGenerator to pyspark.mllib.utils
[ https://issues.apache.org/jira/browse/SPARK-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8265: - Assignee: Manoj Kumar > Add LinearDataGenerator to pyspark.mllib.utils > -- > > Key: SPARK-8265 > URL: https://issues.apache.org/jira/browse/SPARK-8265 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Assignee: Manoj Kumar >Priority: Minor > Fix For: 1.5.0 > > > This is useful in testing various linear models in pyspark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8764) StringIndexer should take option to handle unseen values
[ https://issues.apache.org/jira/browse/SPARK-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611056#comment-14611056 ] holdenk commented on SPARK-8764: I could do this, I've got another PR with the StringIndexderModel anyways. > StringIndexer should take option to handle unseen values > > > Key: SPARK-8764 > URL: https://issues.apache.org/jira/browse/SPARK-8764 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > Original Estimate: 72h > Remaining Estimate: 72h > > The option should be a Param, probably set to false by default (throwing > exception when encountering unseen values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor
[ https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611057#comment-14611057 ] holdenk commented on SPARK-8744: I could do this, I've got another PR with the StringIndexderModel anyways. > StringIndexerModel should have public constructor > - > > Key: SPARK-8744 > URL: https://issues.apache.org/jira/browse/SPARK-8744 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > Original Estimate: 48h > Remaining Estimate: 48h > > It would be helpful to allow users to pass a pre-computed index to create an > indexer, rather than always going through StringIndexer to create the model. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8769) toLocalIterator should mention it results in many jobs
holdenk created SPARK-8769: -- Summary: toLocalIterator should mention it results in many jobs Key: SPARK-8769 URL: https://issues.apache.org/jira/browse/SPARK-8769 Project: Spark Issue Type: Documentation Reporter: holdenk Priority: Trivial toLocalIterator on RDDs should mention that it results in mutliple jobs, and that to avoid re-computing, if the input was the result of a wide-transformation, the input should be cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-1635) Java API docs do not show annotation.
[ https://issues.apache.org/jira/browse/SPARK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-1635. Resolution: Duplicate > Java API docs do not show annotation. > - > > Key: SPARK-1635 > URL: https://issues.apache.org/jira/browse/SPARK-1635 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Xiangrui Meng >Priority: Minor > > The generated Java API docs do not contain Developer/Experimental > annotations. The :: Developer/Experimental :: tag is in the generated doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-1564: --- Assignee: Apache Spark (was: Andrew Or) > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-1564: --- Assignee: Andrew Or (was: Apache Spark) > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611024#comment-14611024 ] Apache Spark commented on SPARK-1564: - User 'deroneriksson' has created a pull request for this issue: https://github.com/apache/spark/pull/7169 > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions
[ https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8695: --- Assignee: Apache Spark (was: Xiangrui Meng) > TreeAggregation shouldn't be triggered for 5 partitions > --- > > Key: SPARK-8695 > URL: https://issues.apache.org/jira/browse/SPARK-8695 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Apache Spark >Priority: Minor > > If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock > time. Instead, it introduces scheduling and shuffling overhead. We should > update the condition to use tree aggregation (code attached): > {code} > while (numPartitions > scale + numPartitions / scale) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions
[ https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8695: --- Assignee: Xiangrui Meng (was: Apache Spark) > TreeAggregation shouldn't be triggered for 5 partitions > --- > > Key: SPARK-8695 > URL: https://issues.apache.org/jira/browse/SPARK-8695 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > > If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock > time. Instead, it introduces scheduling and shuffling overhead. We should > update the condition to use tree aggregation (code attached): > {code} > while (numPartitions > scale + numPartitions / scale) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8695) TreeAggregation shouldn't be triggered for 5 partitions
[ https://issues.apache.org/jira/browse/SPARK-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610997#comment-14610997 ] Apache Spark commented on SPARK-8695: - User 'piganesh' has created a pull request for this issue: https://github.com/apache/spark/pull/7168 > TreeAggregation shouldn't be triggered for 5 partitions > --- > > Key: SPARK-8695 > URL: https://issues.apache.org/jira/browse/SPARK-8695 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > > If an RDD has 5 partitions, tree aggregation doesn't reduce the wall-clock > time. Instead, it introduces scheduling and shuffling overhead. We should > update the condition to use tree aggregation (code attached): > {code} > while (numPartitions > scale + numPartitions / scale) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf
[ https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610988#comment-14610988 ] Josh Rosen commented on SPARK-8768: --- We didn't notice this earlier because the Master Maven Pre-YARN build was misconfigured and was building against the wrong Hadoop versions. > SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in > Akka Protobuf > - > > Key: SPARK-8768 > URL: https://issues.apache.org/jira/browse/SPARK-8768 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.0 >Reporter: Josh Rosen >Priority: Blocker > > The end-to-end SparkSubmitSuite tests ("launch simple application with > spark-submit", "include jars passed in through --jars", and "include jars > passed in through --packages") are currently failing for the pre-YARN Hadoop > builds. > I managed to reproduce one of the Jenkins failures locally: > {code} > build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver > -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite > -Dtest=none > {code} > Here's the output from unit-tests.log: > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple > application with spark-submit' = > 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO > Utils: SLF4J: Class path contains multiple SLF4J bindings. > 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO > Utils: SLF4J: Found binding in > [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO > Utils: SLF4J: Found binding in > [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO > Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO > Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version > 1.5.0-SNAPSHOT > 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: > joshrosen > 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: > joshrosen > 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(joshrosen); users with modify permissions: Set(joshrosen) > 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started > 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting > 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO > Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down > ActorSystem [sparkDriver] > 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO > Utils: java.lang.VerifyError: class > akka.remote.WireFormats$AkkaControlMessage overrides final method > getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; > 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO > Utils:at java.lang.ClassLoader.defineClass1(Native Method) > 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO > Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO > Utils:at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO > Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO > Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO > Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > 15/07/01 13:40:00.010 r
[jira] [Created] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf
Josh Rosen created SPARK-8768: - Summary: SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf Key: SPARK-8768 URL: https://issues.apache.org/jira/browse/SPARK-8768 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Reporter: Josh Rosen Priority: Blocker The end-to-end SparkSubmitSuite tests ("launch simple application with spark-submit", "include jars passed in through --jars", and "include jars passed in through --packages") are currently failing for the pre-YARN Hadoop builds. I managed to reproduce one of the Jenkins failures locally: {code} build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite -Dtest=none {code} Here's the output from unit-tests.log: {code} = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple application with spark-submit' = 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Class path contains multiple SLF4J bindings. 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joshrosen); users with modify permissions: Set(joshrosen) 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem [sparkDriver] 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass1(Native Method) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.security.AccessController.doPrivileged(Native Method) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.loadClass(Clas
[jira] [Commented] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
[ https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610979#comment-14610979 ] Deron Eriksson commented on SPARK-1564: --- I'm working on this one. I believe this is a duplicate of https://issues.apache.org/jira/browse/SPARK-1635 > Add JavaScript into Javadoc to turn ::Experimental:: and such into badges > - > > Key: SPARK-1564 > URL: https://issues.apache.org/jira/browse/SPARK-1564 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Matei Zaharia >Assignee: Andrew Or >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8660) Update comments that contain R statements in ml.logisticRegressionSuite
[ https://issues.apache.org/jira/browse/SPARK-8660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610972#comment-14610972 ] Apache Spark commented on SPARK-8660: - User 'Rosstin' has created a pull request for this issue: https://github.com/apache/spark/pull/7167 > Update comments that contain R statements in ml.logisticRegressionSuite > --- > > Key: SPARK-8660 > URL: https://issues.apache.org/jira/browse/SPARK-8660 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: somil deshmukh >Priority: Trivial > Labels: starter > Fix For: 1.5.0 > > Original Estimate: 20m > Remaining Estimate: 20m > > We put R statements as comments in unit test. However, there are two issues: > 1. JavaDoc style "/** ... */" is used instead of normal multiline comment "/* > ... */". > 2. We put a leading "*" on each line. It is hard to copy & paste the commands > to/from R and verify the result. > For example, in > https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L504 > {code} > /** > * Using the following R code to load the data and train the model using > glmnet package. > * > * > library("glmnet") > * > data <- read.csv("path", header=FALSE) > * > label = factor(data$V1) > * > features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5)) > * > weights = coef(glmnet(features,label, family="binomial", alpha = > 1.0, lambda = 6.0)) > * > weights > * 5 x 1 sparse Matrix of class "dgCMatrix" > * s0 > * (Intercept) -0.2480643 > * data.V2 0.000 > * data.V3 . > * data.V4 . > * data.V5 . > */ > {code} > should change to > {code} > /* > Using the following R code to load the data and train the model using > glmnet package. > > library("glmnet") > data <- read.csv("path", header=FALSE) > label = factor(data$V1) > features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5)) > weights = coef(glmnet(features,label, family="binomial", alpha = 1.0, > lambda = 6.0)) > weights > 5 x 1 sparse Matrix of class "dgCMatrix" >s0 > (Intercept) -0.2480643 > data.V2 0.000 > data.V3 . > data.V4 . > data.V5 . > */ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector
[ https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610970#comment-14610970 ] Feynman Liang commented on SPARK-8703: -- Took a second pass over the code and I agree with @josephkb; extending HashingTF seems pointless given that nothing is reused but the `HashingTF` attributes are now polluting `CountVectorizer`. > Add CountVectorizer as a ml transformer to convert document to words count > vector > - > > Key: SPARK-8703 > URL: https://issues.apache.org/jira/browse/SPARK-8703 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: yuhao yang > Original Estimate: 24h > Remaining Estimate: 24h > > Converts a text document to a sparse vector of token counts. Similar to > http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html > I can further add an estimator to extract vocabulary from corpus if that's > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5016) GaussianMixtureEM should distribute matrix inverse for large numFeatures, k
[ https://issues.apache.org/jira/browse/SPARK-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610961#comment-14610961 ] Apache Spark commented on SPARK-5016: - User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/7166 > GaussianMixtureEM should distribute matrix inverse for large numFeatures, k > --- > > Key: SPARK-5016 > URL: https://issues.apache.org/jira/browse/SPARK-5016 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Joseph K. Bradley > Labels: clustering > > If numFeatures or k are large, GMM EM should distribute the matrix inverse > computation for Gaussian initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it
[ https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610942#comment-14610942 ] Apache Spark commented on SPARK-8766: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7165 > DataFrame Python API should work with column which has non-ascii character in > it > > > Key: SPARK-8766 > URL: https://issues.apache.org/jira/browse/SPARK-8766 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.0 >Reporter: Davies Liu >Assignee: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it
[ https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8766: --- Assignee: Davies Liu (was: Apache Spark) > DataFrame Python API should work with column which has non-ascii character in > it > > > Key: SPARK-8766 > URL: https://issues.apache.org/jira/browse/SPARK-8766 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.0 >Reporter: Davies Liu >Assignee: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it
[ https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8766: --- Assignee: Apache Spark (was: Davies Liu) > DataFrame Python API should work with column which has non-ascii character in > it > > > Key: SPARK-8766 > URL: https://issues.apache.org/jira/browse/SPARK-8766 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.0 >Reporter: Davies Liu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7820) Java8-tests suite compile error under SBT
[ https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-7820: -- Assignee: Saisai Shao > Java8-tests suite compile error under SBT > - > > Key: SPARK-7820 > URL: https://issues.apache.org/jira/browse/SPARK-7820 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Affects Versions: 1.4.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Critical > Fix For: 1.5.0, 1.4.2 > > > Lots of compilation error is shown when java 8 test suite is enabled in SBT: > {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 > -Dhadoop.version=2.6.0 -Pjava8-tests}} > {code} > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43: > error: cannot find symbol > [error] public class Java8APISuite extends LocalJavaStreamingContext > implements Serializable { > [error]^ > [error] symbol: class LocalJavaStreamingContext > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57: > error: cannot find symbol > [error] JavaTestUtils.attachTestOutputStream(letterCount); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: > error: cannot find symbol > [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: > error: cannot find symbol > [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > {code} > The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which > exists in streaming test jar. It is OK for maven compile, since it will > generate test jar, but will be failed in sbt test compile, sbt do not > generate test jar by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7820) Java8-tests suite compile error under SBT
[ https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-7820. --- Resolution: Fixed Fix Version/s: 1.5.0 1.4.2 Issue resolved by pull request 7120 [https://github.com/apache/spark/pull/7120] > Java8-tests suite compile error under SBT > - > > Key: SPARK-7820 > URL: https://issues.apache.org/jira/browse/SPARK-7820 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Affects Versions: 1.4.0 >Reporter: Saisai Shao >Priority: Critical > Fix For: 1.4.2, 1.5.0 > > > Lots of compilation error is shown when java 8 test suite is enabled in SBT: > {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 > -Dhadoop.version=2.6.0 -Pjava8-tests}} > {code} > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43: > error: cannot find symbol > [error] public class Java8APISuite extends LocalJavaStreamingContext > implements Serializable { > [error]^ > [error] symbol: class LocalJavaStreamingContext > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57: > error: cannot find symbol > [error] JavaTestUtils.attachTestOutputStream(letterCount); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: > error: cannot find symbol > [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: > error: cannot find symbol > [error] List> result = JavaTestUtils.runStreams(ssc, 2, 2); > [error] ^ > [error] symbol: variable JavaTestUtils > [error] location: class Java8APISuite > [error] > /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73: > error: cannot find symbol > [error] JavaDStream stream = > JavaTestUtils.attachTestInputStream(ssc, inputData, 1); > [error] ^ > [error] symbol: variable ssc > [error] location: class Java8APISuite > {code} > The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which > exists in streaming test jar. It is OK for maven compile, since it will > generate test jar, but will be failed in sbt test compile, sbt do not > generate test jar by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8677) Decimal divide operation throws ArithmeticException
[ https://issues.apache.org/jira/browse/SPARK-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610865#comment-14610865 ] Jihong MA commented on SPARK-8677: -- I am not sure if there is guideline for DecimalType.Unlimited, can we go for an accuracy at least equivalent to Double? > Decimal divide operation throws ArithmeticException > --- > > Key: SPARK-8677 > URL: https://issues.apache.org/jira/browse/SPARK-8677 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh > Fix For: 1.5.0 > > > Please refer to [BigDecimal > doc|http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html]: > {quote} > ... the rounding mode setting of a MathContext object with a precision > setting of 0 is not used and thus irrelevant. In the case of divide, the > exact quotient could have an infinitely long decimal expansion; for example, > 1 divided by 3. > {quote} > Because we provide a MathContext.UNLIMITED in toBigDecimal, Decimal divide > operation will throw the following exception: > {code} > val decimal = Decimal(1.0, 10, 3) / Decimal(3.0, 10, 3) > [info] java.lang.ArithmeticException: Non-terminating decimal expansion; no > exact representable decimal result. > [info] at java.math.BigDecimal.divide(BigDecimal.java:1690) > [info] at java.math.BigDecimal.divide(BigDecimal.java:1723) > [info] at scala.math.BigDecimal.$div(BigDecimal.scala:256) > [info] at org.apache.spark.sql.types.Decimal.$div(Decimal.scala:272) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8677) Decimal divide operation throws ArithmeticException
[ https://issues.apache.org/jira/browse/SPARK-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610831#comment-14610831 ] Jihong MA commented on SPARK-8677: -- Thanks for fixing the division problem. but this fix introduces one more issue w.r.t the accuracy of Decimal computation. scala> val aa = Decimal(2) / Decimal(3); aa: org.apache.spark.sql.types.Decimal = 1 when a Decimal is defined as Decimal.Unlimited, we are not expecting the division result's scale value to inherit from its parent, this is causing big accuracy issue once we go coupe round of division over decimal data vs. double data. below is a sample output from my run. 10:27:46.042 WARN org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE STDDEV DOUBLE---4.0 , 0.8VALUE 10:27:46.137 WARN org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE STDDEV DECIMAL---4.29000 , 0.858VALUE > Decimal divide operation throws ArithmeticException > --- > > Key: SPARK-8677 > URL: https://issues.apache.org/jira/browse/SPARK-8677 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh > Fix For: 1.5.0 > > > Please refer to [BigDecimal > doc|http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html]: > {quote} > ... the rounding mode setting of a MathContext object with a precision > setting of 0 is not used and thus irrelevant. In the case of divide, the > exact quotient could have an infinitely long decimal expansion; for example, > 1 divided by 3. > {quote} > Because we provide a MathContext.UNLIMITED in toBigDecimal, Decimal divide > operation will throw the following exception: > {code} > val decimal = Decimal(1.0, 10, 3) / Decimal(3.0, 10, 3) > [info] java.lang.ArithmeticException: Non-terminating decimal expansion; no > exact representable decimal result. > [info] at java.math.BigDecimal.divide(BigDecimal.java:1690) > [info] at java.math.BigDecimal.divide(BigDecimal.java:1723) > [info] at scala.math.BigDecimal.$div(BigDecimal.scala:256) > [info] at org.apache.spark.sql.types.Decimal.$div(Decimal.scala:272) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8767) Abstractions for InputColParam, OutputColParam
Joseph K. Bradley created SPARK-8767: Summary: Abstractions for InputColParam, OutputColParam Key: SPARK-8767 URL: https://issues.apache.org/jira/browse/SPARK-8767 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley I'd like to create Param subclasses for output and input columns. These will provide easier schema checking, which could even be done automatically in an abstraction rather than in each class. That should simplify things for developers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8378) Add Spark Flume Python API
[ https://issues.apache.org/jira/browse/SPARK-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-8378. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 1.5.0 > Add Spark Flume Python API > -- > > Key: SPARK-8378 > URL: https://issues.apache.org/jira/browse/SPARK-8378 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8765: - Labels: flaky-test (was: ) > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang >Priority: Critical > Labels: flaky-test > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8765: - Shepherd: Xiangrui Meng > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang >Priority: Critical > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8765: - Assignee: Yanbo Liang > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Yanbo Liang >Priority: Critical > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8647) Potential issues with the constant hashCode
[ https://issues.apache.org/jira/browse/SPARK-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-8647: - Assignee: Alok Singh Target Version/s: 1.5.0 > Potential issues with the constant hashCode > > > Key: SPARK-8647 > URL: https://issues.apache.org/jira/browse/SPARK-8647 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.4.0 >Reporter: Alok Singh >Assignee: Alok Singh >Priority: Minor > Labels: performance > > Hi, > This may be potential bug or performance issue or just the code docs. > The issue is wrt to MatrixUDT class. > If we decide to put instance of MatrixUDT into the hash based collection. > The hashCode function is returning constant and even though equals method is > consistant with hashCode. I don't see the reason why hashCode() = 1994 (i.e > constant) has been used. > I was expecting it to be similar to the other matrix class or the vector > class . > If there is the reason why we have this code, we should document it properly > in the code so that others reading it is fine. > regards, > Alok > Details > = > a) > In reference to the file > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala > line 188-197 ie > override def equals(o: Any): Boolean = { > o match { > case v: MatrixUDT => true > case _ => false > } > } > override def hashCode(): Int = 1994 > b) the commit is > https://github.com/apache/spark/commit/11e025956be3818c00effef0d650734f8feeb436 > on March 20. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7938) Use errorprone in Spark
[ https://issues.apache.org/jira/browse/SPARK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-7938. --- Resolution: Won't Fix > Use errorprone in Spark > --- > > Key: SPARK-7938 > URL: https://issues.apache.org/jira/browse/SPARK-7938 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin > Labels: starter > > We have quite a bit of low level code written in Java (e.g. unsafe module). > One nice thing about Java is that we can use better tools for finding common > errors, e.g. Google's error prone. > This is a ticket to integrate error pone into our Maven build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8072) Better AnalysisException for writing DataFrame with identically named columns
[ https://issues.apache.org/jira/browse/SPARK-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-8072: Shepherd: Michael Armbrust > Better AnalysisException for writing DataFrame with identically named columns > - > > Key: SPARK-8072 > URL: https://issues.apache.org/jira/browse/SPARK-8072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > > We should check if there are duplicate columns, and if yes, throw an explicit > error message saying there are duplicate columns. See current error message > below. > {code} > In [3]: df.withColumn('age', df.age) > Out[3]: DataFrame[age: bigint, name: string, age: bigint] > In [4]: df.withColumn('age', df.age).write.parquet('test-parquet.out') > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 df.withColumn('age', df.age).write.parquet('test-parquet.out') > /scratch/rxin/spark/python/pyspark/sql/readwriter.py in parquet(self, path, > mode) > 350 >>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data')) > 351 """ > --> 352 self._jwrite.mode(mode).parquet(path) > 353 > 354 @since(1.4) > /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc > in __call__(self, *args) > 535 answer = self.gateway_client.send_command(command) > 536 return_value = get_return_value(answer, self.gateway_client, > --> 537 self.target_id, self.name) > 538 > 539 for temp_arg in temp_args: > /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc > in get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > Py4JJavaError: An error occurred while calling o35.parquet. > : org.apache.spark.sql.AnalysisException: Reference 'age' is ambiguous, could > be: age#0L, age#3L.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:279) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:116) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:350) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:341) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at >
[jira] [Created] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it
Davies Liu created SPARK-8766: - Summary: DataFrame Python API should work with column which has non-ascii character in it Key: SPARK-8766 URL: https://issues.apache.org/jira/browse/SPARK-8766 Project: Spark Issue Type: Bug Affects Versions: 1.4.0, 1.3.1 Reporter: Davies Liu Assignee: Davies Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610799#comment-14610799 ] Murtaza Kanchwala commented on SPARK-6101: -- No, It is not a map function. For now you can Amazon's DynamoDB library to implement your own Data access layer and use Spark transformations and actions to add them. Where I'll prefer you to do batch saves and batch loads for more efficiency. > Create a SparkSQL DataSource API implementation for DynamoDB > > > Key: SPARK-6101 > URL: https://issues.apache.org/jira/browse/SPARK-6101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.2.0 >Reporter: Chris Fregly >Assignee: Chris Fregly > Fix For: 1.5.0 > > > similar to https://github.com/databricks/spark-avro and > https://github.com/databricks/spark-csv -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8765: --- Assignee: Apache Spark > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Critical > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8765: --- Assignee: (was: Apache Spark) > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Critical > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8765) Flaky PySpark PowerIterationClustering test
[ https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610783#comment-14610783 ] Apache Spark commented on SPARK-8765: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/7164 > Flaky PySpark PowerIterationClustering test > --- > > Key: SPARK-8765 > URL: https://issues.apache.org/jira/browse/SPARK-8765 > Project: Spark > Issue Type: Test > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Critical > > See failure: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] > {code} > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 291, in __main__.PowerIterationClusteringModel > Failed example: > sorted(model.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", > line 299, in __main__.PowerIterationClusteringModel > Failed example: > sorted(sameModel.assignments().collect()) > Expected: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... > Got: > [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), > Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, > cluster=0)] > ** >2 of 13 in __main__.PowerIterationClusteringModel > ***Test Failed*** 2 failures. > Had test failures in pyspark.mllib.clustering with python2.6; see logs. > {code} > CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8765) Flaky PySpark PowerIterationClustering test
Joseph K. Bradley created SPARK-8765: Summary: Flaky PySpark PowerIterationClustering test Key: SPARK-8765 URL: https://issues.apache.org/jira/browse/SPARK-8765 Project: Spark Issue Type: Test Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Critical See failure: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console] {code} ** File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", line 291, in __main__.PowerIterationClusteringModel Failed example: sorted(model.assignments().collect()) Expected: [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... Got: [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, cluster=0)] ** File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py", line 299, in __main__.PowerIterationClusteringModel Failed example: sorted(sameModel.assignments().collect()) Expected: [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ... Got: [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, cluster=0)] ** 2 of 13 in __main__.PowerIterationClusteringModel ***Test Failed*** 2 failures. Had test failures in pyspark.mllib.clustering with python2.6; see logs. {code} CC: [~mengxr] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8308) add missing save load for python doc example and tune down MatrixFactorization iterations
[ https://issues.apache.org/jira/browse/SPARK-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-8308. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 6760 [https://github.com/apache/spark/pull/6760] > add missing save load for python doc example and tune down > MatrixFactorization iterations > - > > Key: SPARK-8308 > URL: https://issues.apache.org/jira/browse/SPARK-8308 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Fix For: 1.5.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > 1. add some missing save/load in python examples, LogisticRegression, > LinearRegression, NaiveBayes > 2. tune down iterations for MatrixFactorization, since current number will > trigger StackOverflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8308) add missing save load for python doc example and tune down MatrixFactorization iterations
[ https://issues.apache.org/jira/browse/SPARK-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-8308: - Assignee: yuhao yang > add missing save load for python doc example and tune down > MatrixFactorization iterations > - > > Key: SPARK-8308 > URL: https://issues.apache.org/jira/browse/SPARK-8308 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > 1. add some missing save/load in python examples, LogisticRegression, > LinearRegression, NaiveBayes > 2. tune down iterations for MatrixFactorization, since current number will > trigger StackOverflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6263) Python MLlib API missing items: Utils
[ https://issues.apache.org/jira/browse/SPARK-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-6263. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 5707 [https://github.com/apache/spark/pull/5707] > Python MLlib API missing items: Utils > - > > Key: SPARK-6263 > URL: https://issues.apache.org/jira/browse/SPARK-6263 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Kai Sasaki > Fix For: 1.5.0 > > > This JIRA lists items missing in the Python API for this sub-package of MLlib. > This list may be incomplete, so please check again when sending a PR to add > these features to the Python API. > Also, please check for major disparities between documentation; some parts of > the Python API are less well-documented than their Scala counterparts. Some > items may be listed in the umbrella JIRA linked to this task. > MLUtils > * appendBias > * kFold > * loadVectors -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor
[ https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610722#comment-14610722 ] Joseph K. Bradley commented on SPARK-8744: -- Good point, I'll link a JIRA for that. Also, there needs to be a constructor which does not require a UID, but generates one automatically. > StringIndexerModel should have public constructor > - > > Key: SPARK-8744 > URL: https://issues.apache.org/jira/browse/SPARK-8744 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > Original Estimate: 48h > Remaining Estimate: 48h > > It would be helpful to allow users to pass a pre-computed index to create an > indexer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8764) StringIndexer should take option to handle unseen values
Joseph K. Bradley created SPARK-8764: Summary: StringIndexer should take option to handle unseen values Key: SPARK-8764 URL: https://issues.apache.org/jira/browse/SPARK-8764 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Minor The option should be a Param, probably set to false by default (throwing exception when encountering unseen values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8744) StringIndexerModel should have public constructor
[ https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-8744: - Description: It would be helpful to allow users to pass a pre-computed index to create an indexer, rather than always going through StringIndexer to create the model. (was: It would be helpful to allow users to pass a pre-computed index to create an indexer.) > StringIndexerModel should have public constructor > - > > Key: SPARK-8744 > URL: https://issues.apache.org/jira/browse/SPARK-8744 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > Original Estimate: 48h > Remaining Estimate: 48h > > It would be helpful to allow users to pass a pre-computed index to create an > indexer, rather than always going through StringIndexer to create the model. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method
[ https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610720#comment-14610720 ] Joseph K. Bradley commented on SPARK-1503: -- I appreciate it! > Implement Nesterov's accelerated first-order method > --- > > Key: SPARK-1503 > URL: https://issues.apache.org/jira/browse/SPARK-1503 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Aaron Staple > Attachments: linear.png, linear_l1.png, logistic.png, logistic_l2.png > > > Nesterov's accelerated first-order method is a drop-in replacement for > steepest descent but it converges much faster. We should implement this > method and compare its performance with existing algorithms, including SGD > and L-BFGS. > TFOCS (http://cvxr.com/tfocs/) is a reference implementation of Nesterov's > method and its variants on composite objectives. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3071) Increase default driver memory
[ https://issues.apache.org/jira/browse/SPARK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3071: - Assignee: Ilya Ganelin > Increase default driver memory > -- > > Key: SPARK-3071 > URL: https://issues.apache.org/jira/browse/SPARK-3071 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Xiangrui Meng >Assignee: Ilya Ganelin > > The current default is 512M, which is usually too small because user also > uses driver to do some computation. In local mode, executor memory setting is > ignored while only driver memory is used, which provides more incentive to > increase the default driver memory. > I suggest > 1. 2GB in local mode and warn users if executor memory is set a bigger value > 2. same as worker memory on an EC2 standalone server -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3071) Increase default driver memory
[ https://issues.apache.org/jira/browse/SPARK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3071: - Affects Version/s: 1.4.2 Target Version/s: 1.5.0 > Increase default driver memory > -- > > Key: SPARK-3071 > URL: https://issues.apache.org/jira/browse/SPARK-3071 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.4.2 >Reporter: Xiangrui Meng >Assignee: Ilya Ganelin > > The current default is 512M, which is usually too small because user also > uses driver to do some computation. In local mode, executor memory setting is > ignored while only driver memory is used, which provides more incentive to > increase the default driver memory. > I suggest > 1. 2GB in local mode and warn users if executor memory is set a bigger value > 2. same as worker memory on an EC2 standalone server -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610712#comment-14610712 ] venu k tangirala commented on SPARK-6101: - So doing amazon DynamoDB mapper is a spark map function at the end of all my transformations would work ? > Create a SparkSQL DataSource API implementation for DynamoDB > > > Key: SPARK-6101 > URL: https://issues.apache.org/jira/browse/SPARK-6101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.2.0 >Reporter: Chris Fregly >Assignee: Chris Fregly > Fix For: 1.5.0 > > > similar to https://github.com/databricks/spark-avro and > https://github.com/databricks/spark-csv -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6284) Support framework authentication and role in Mesos framework
[ https://issues.apache.org/jira/browse/SPARK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6284: - Assignee: Timothy Chen Target Version/s: 1.5.0 > Support framework authentication and role in Mesos framework > > > Key: SPARK-6284 > URL: https://issues.apache.org/jira/browse/SPARK-6284 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen >Assignee: Timothy Chen > > Support framework authentication and role in both Coarse grain and fine grain > mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join
[ https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610701#comment-14610701 ] Apache Spark commented on SPARK-4485: - User 'kai-zeng' has created a pull request for this issue: https://github.com/apache/spark/pull/7162 > Add broadcast outer join to optimize left outer join and right outer join > -- > > Key: SPARK-4485 > URL: https://issues.apache.org/jira/browse/SPARK-4485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: XiaoJing wang >Assignee: Kai Zeng >Priority: Critical > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > For now, spark use broadcast join instead of hash join to optimize {{inner > join}} when the size of one side data did not reach the > {{AUTO_BROADCASTJOIN_THRESHOLD}} > However,Spark SQL will perform shuffle operations on each child relations > while executing > {{left outer join}} and {{right outer join}}. {outer join}} is more > suitable for optimiztion with broadcast join. > We are planning to create a {{BroadcastHashouterJoin}} to implement the > broadcast join for {{left outer join}} and {{right outer join}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join
[ https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4485: --- Assignee: Apache Spark (was: Kai Zeng) > Add broadcast outer join to optimize left outer join and right outer join > -- > > Key: SPARK-4485 > URL: https://issues.apache.org/jira/browse/SPARK-4485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: XiaoJing wang >Assignee: Apache Spark >Priority: Critical > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > For now, spark use broadcast join instead of hash join to optimize {{inner > join}} when the size of one side data did not reach the > {{AUTO_BROADCASTJOIN_THRESHOLD}} > However,Spark SQL will perform shuffle operations on each child relations > while executing > {{left outer join}} and {{right outer join}}. {outer join}} is more > suitable for optimiztion with broadcast join. > We are planning to create a {{BroadcastHashouterJoin}} to implement the > broadcast join for {{left outer join}} and {{right outer join}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-4485) Add broadcast outer join to optimize left outer join and right outer join
[ https://issues.apache.org/jira/browse/SPARK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4485: --- Assignee: Kai Zeng (was: Apache Spark) > Add broadcast outer join to optimize left outer join and right outer join > -- > > Key: SPARK-4485 > URL: https://issues.apache.org/jira/browse/SPARK-4485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: XiaoJing wang >Assignee: Kai Zeng >Priority: Critical > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > For now, spark use broadcast join instead of hash join to optimize {{inner > join}} when the size of one side data did not reach the > {{AUTO_BROADCASTJOIN_THRESHOLD}} > However,Spark SQL will perform shuffle operations on each child relations > while executing > {{left outer join}} and {{right outer join}}. {outer join}} is more > suitable for optimiztion with broadcast join. > We are planning to create a {{BroadcastHashouterJoin}} to implement the > broadcast join for {{left outer join}} and {{right outer join}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8621) crosstab exception when one of the value is empty
[ https://issues.apache.org/jira/browse/SPARK-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-8621. Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 1.4.2 1.5.0 > crosstab exception when one of the value is empty > - > > Key: SPARK-8621 > URL: https://issues.apache.org/jira/browse/SPARK-8621 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Wenchen Fan >Priority: Critical > Fix For: 1.5.0, 1.4.2 > > > I think this happened because some value is empty. > {code} > scala> df1.stat.crosstab("role", "lang") > org.apache.spark.sql.AnalysisException: syntax error in attribute name: ; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.parseAttributeName(LogicalPlan.scala:145) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:135) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:157) > at org.apache.spark.sql.DataFrame.col(DataFrame.scala:603) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:394) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:160) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:157) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:157) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:147) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:132) > at > org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:132) > at > org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:91) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8752) Add ExpectsInputTypes trait for defining expected input types.
[ https://issues.apache.org/jira/browse/SPARK-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-8752. Resolution: Fixed Fix Version/s: 1.5.0 > Add ExpectsInputTypes trait for defining expected input types. > -- > > Key: SPARK-8752 > URL: https://issues.apache.org/jira/browse/SPARK-8752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7714) SparkR tests should use more specific expectations than expect_true
[ https://issues.apache.org/jira/browse/SPARK-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-7714: - Assignee: Sun Rui > SparkR tests should use more specific expectations than expect_true > --- > > Key: SPARK-7714 > URL: https://issues.apache.org/jira/browse/SPARK-7714 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Josh Rosen >Assignee: Sun Rui > Fix For: 1.5.0 > > > SparkR's test use testthat's {{expect_true(foo == bar)}}, but using > expectations like {{expect_equal(foo, bar)}} will give informative error > messages if the assertion fails. We should update the existing tests to use > the more specific matchers, such as expect_equal, expect_is, > expect_identical, expect_error, etc. > See http://r-pkgs.had.co.nz/tests.html for more documentation on testtthat > expectation functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7714) SparkR tests should use more specific expectations than expect_true
[ https://issues.apache.org/jira/browse/SPARK-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-7714. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7152 [https://github.com/apache/spark/pull/7152] > SparkR tests should use more specific expectations than expect_true > --- > > Key: SPARK-7714 > URL: https://issues.apache.org/jira/browse/SPARK-7714 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Josh Rosen > Fix For: 1.5.0 > > > SparkR's test use testthat's {{expect_true(foo == bar)}}, but using > expectations like {{expect_equal(foo, bar)}} will give informative error > messages if the assertion fails. We should update the existing tests to use > the more specific matchers, such as expect_equal, expect_is, > expect_identical, expect_error, etc. > See http://r-pkgs.had.co.nz/tests.html for more documentation on testtthat > expectation functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
[ https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-8763. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7161 [https://github.com/apache/spark/pull/7161] > executing run-tests.py with Python 2.6 fails with absence of > subprocess.check_output function > - > > Key: SPARK-8763 > URL: https://issues.apache.org/jira/browse/SPARK-8763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 >Reporter: Tomohiko K. > Labels: pyspark, testing > Fix For: 1.5.0 > > > Running run-tests.py with Python 2.6 cause following error: > {noformat} > Running PySpark tests. Output is in > python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log > Will test against the following Python executables: ['python2.6', > 'python3.4', 'pypy'] > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > Traceback (most recent call last): > File "./python/run-tests.py", line 196, in > main() > File "./python/run-tests.py", line 159, in main > python_implementation = subprocess.check_output( > AttributeError: 'module' object has no attribute 'check_output' > ... > {noformat} > The cause of this error is using subprocess.check_output function, which > exists since Python 2.7. > (ref. > https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+
[ https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610589#comment-14610589 ] Shivaram Venkataraman commented on SPARK-8724: -- Sure. We could add that to the examples. I also think we could also add some more details on how to launch SparkR if you are not using the script bin/sparkR (i.e. from RStudio or from plain R). > Need documentation on how to deploy or use SparkR in Spark 1.4.0+ > - > > Key: SPARK-8724 > URL: https://issues.apache.org/jira/browse/SPARK-8724 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 1.4.0 >Reporter: Felix Cheung >Priority: Minor > > As of now there doesn't seem to be any official documentation on how to > deploy SparkR with Spark 1.4.0+ > Also, cluster manager specific documentation (like > http://spark.apache.org/docs/latest/spark-standalone.html) does not call out > what mode is supported for SparkR and details on deployment steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6833) Extend `addPackage` so that any given R file can be sourced in the worker before functions are run.
[ https://issues.apache.org/jira/browse/SPARK-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-6833. -- Resolution: Fixed Fix Version/s: 1.5.0 Thanks [~sunrui] for checking this. We should add documentation for this but we can use another JIRA for this I guess > Extend `addPackage` so that any given R file can be sourced in the worker > before functions are run. > --- > > Key: SPARK-6833 > URL: https://issues.apache.org/jira/browse/SPARK-6833 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > Fix For: 1.5.0 > > > Similar to how extra python files or packages can be specified (in zip / egg > formats), it will be good to support the ability to add extra R files to the > executors working directory. > One thing that needs to be investigated is if this will just work out of the > box using the spark-submit flag --files ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8744) StringIndexerModel should have public constructor
[ https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610558#comment-14610558 ] yuhao yang edited comment on SPARK-8744 at 7/1/15 4:10 PM: --- There seems to be more jobs than simply changing the access modifiers. Since a passed-in labels will have a larger chance to trigger the "unseen label" exception. Perhaps we should address the exception first. was (Author: yuhaoyan): Just a reminder: There seems to be more jobs to do than simply change the access modifiers. Since a passed-in labels will have a larger chance to trigger the "unseen label" exception. Perhaps we should address the exception first. > StringIndexerModel should have public constructor > - > > Key: SPARK-8744 > URL: https://issues.apache.org/jira/browse/SPARK-8744 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > Original Estimate: 48h > Remaining Estimate: 48h > > It would be helpful to allow users to pass a pre-computed index to create an > indexer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8744) StringIndexerModel should have public constructor
[ https://issues.apache.org/jira/browse/SPARK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610558#comment-14610558 ] yuhao yang commented on SPARK-8744: --- Just a reminder: There seems to be more jobs to do than simply change the access modifiers. Since a passed-in labels will have a larger chance to trigger the "unseen label" exception. Perhaps we should address the exception first. > StringIndexerModel should have public constructor > - > > Key: SPARK-8744 > URL: https://issues.apache.org/jira/browse/SPARK-8744 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > Original Estimate: 48h > Remaining Estimate: 48h > > It would be helpful to allow users to pass a pre-computed index to create an > indexer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610555#comment-14610555 ] Mark Stephenson commented on SPARK-8596: [~cantdutchthis]: we have been getting the same error and it's definitely a user permissions issue. Even when giving the new RStudio user ownership rights to the ./spark folder, there are additional classpath errors. We are working on a solution today to utilize and login to RStudio as the 'hadoop' user to start with, just to make sure that the proof of concept works, and then expound a longer term solution with some potential bootstrap code. Will advise once we have it solved. > Install and configure RStudio server on Spark EC2 > - > > Key: SPARK-8596 > URL: https://issues.apache.org/jira/browse/SPARK-8596 > Project: Spark > Issue Type: Improvement > Components: EC2, SparkR >Reporter: Shivaram Venkataraman > > This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
[ https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610530#comment-14610530 ] Apache Spark commented on SPARK-8763: - User 'cocoatomo' has created a pull request for this issue: https://github.com/apache/spark/pull/7161 > executing run-tests.py with Python 2.6 fails with absence of > subprocess.check_output function > - > > Key: SPARK-8763 > URL: https://issues.apache.org/jira/browse/SPARK-8763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 >Reporter: Tomohiko K. > Labels: pyspark, testing > > Running run-tests.py with Python 2.6 cause following error: > {noformat} > Running PySpark tests. Output is in > python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log > Will test against the following Python executables: ['python2.6', > 'python3.4', 'pypy'] > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > Traceback (most recent call last): > File "./python/run-tests.py", line 196, in > main() > File "./python/run-tests.py", line 159, in main > python_implementation = subprocess.check_output( > AttributeError: 'module' object has no attribute 'check_output' > ... > {noformat} > The cause of this error is using subprocess.check_output function, which > exists since Python 2.7. > (ref. > https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
[ https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8763: --- Assignee: Apache Spark > executing run-tests.py with Python 2.6 fails with absence of > subprocess.check_output function > - > > Key: SPARK-8763 > URL: https://issues.apache.org/jira/browse/SPARK-8763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 >Reporter: Tomohiko K. >Assignee: Apache Spark > Labels: pyspark, testing > > Running run-tests.py with Python 2.6 cause following error: > {noformat} > Running PySpark tests. Output is in > python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log > Will test against the following Python executables: ['python2.6', > 'python3.4', 'pypy'] > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > Traceback (most recent call last): > File "./python/run-tests.py", line 196, in > main() > File "./python/run-tests.py", line 159, in main > python_implementation = subprocess.check_output( > AttributeError: 'module' object has no attribute 'check_output' > ... > {noformat} > The cause of this error is using subprocess.check_output function, which > exists since Python 2.7. > (ref. > https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
[ https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8763: --- Assignee: (was: Apache Spark) > executing run-tests.py with Python 2.6 fails with absence of > subprocess.check_output function > - > > Key: SPARK-8763 > URL: https://issues.apache.org/jira/browse/SPARK-8763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 >Reporter: Tomohiko K. > Labels: pyspark, testing > > Running run-tests.py with Python 2.6 cause following error: > {noformat} > Running PySpark tests. Output is in > python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log > Will test against the following Python executables: ['python2.6', > 'python3.4', 'pypy'] > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > Traceback (most recent call last): > File "./python/run-tests.py", line 196, in > main() > File "./python/run-tests.py", line 159, in main > python_implementation = subprocess.check_output( > AttributeError: 'module' object has no attribute 'check_output' > ... > {noformat} > The cause of this error is using subprocess.check_output function, which > exists since Python 2.7. > (ref. > https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8265) Add LinearDataGenerator to pyspark.mllib.utils
[ https://issues.apache.org/jira/browse/SPARK-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Kumar resolved SPARK-8265. Resolution: Fixed Fix Version/s: 1.5.0 > Add LinearDataGenerator to pyspark.mllib.utils > -- > > Key: SPARK-8265 > URL: https://issues.apache.org/jira/browse/SPARK-8265 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Priority: Minor > Fix For: 1.5.0 > > > This is useful in testing various linear models in pyspark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>
[ https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610456#comment-14610456 ] Alexis Seigneurin commented on SPARK-4557: -- Here: https://github.com/aseigneurin/spark/commit/9a4019caf8a3de956635a0030a43f0a5a9f4edbc And see how I've used it in Java: https://gist.github.com/aseigneurin/a200155c89cd0035d0e8 > Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a > Function<..., Void> > --- > > Key: SPARK-4557 > URL: https://issues.apache.org/jira/browse/SPARK-4557 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.1.0 >Reporter: Alexis Seigneurin >Priority: Minor > Labels: starter > > In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You > have to write: > {code:java} > .foreachRDD(items -> { > ...; > return null; > }); > {code} > Instead of: > {code:java} > .foreachRDD(items -> ...); > {code} > This is because the foreachRDD method accepts a Function, Void> > instead of a VoidFunction>. This would make sense to change it > to a VoidFunction as, in Spark's API, the foreach method already accepts a > VoidFunction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
Tomohiko K. created SPARK-8763: -- Summary: executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function Key: SPARK-8763 URL: https://issues.apache.org/jira/browse/SPARK-8763 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.0 Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 Reporter: Tomohiko K. Running run-tests.py with Python 2.6 cause following error: {noformat} Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Traceback (most recent call last): File "./python/run-tests.py", line 196, in main() File "./python/run-tests.py", line 159, in main python_implementation = subprocess.check_output( AttributeError: 'module' object has no attribute 'check_output' ... {noformat} The cause of this error is using subprocess.check_output function, which exists since Python 2.7. (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>
[ https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610414#comment-14610414 ] somil deshmukh commented on SPARK-4557: --- Can u provide me some example of JavaDStreamLike,which I run and check whether it is breaking or not. > Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a > Function<..., Void> > --- > > Key: SPARK-4557 > URL: https://issues.apache.org/jira/browse/SPARK-4557 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.1.0 >Reporter: Alexis Seigneurin >Priority: Minor > Labels: starter > > In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You > have to write: > {code:java} > .foreachRDD(items -> { > ...; > return null; > }); > {code} > Instead of: > {code:java} > .foreachRDD(items -> ...); > {code} > This is because the foreachRDD method accepts a Function, Void> > instead of a VoidFunction>. This would make sense to change it > to a VoidFunction as, in Spark's API, the foreach method already accepts a > VoidFunction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8733) ML RDD.unpersist calls should use blocking = false
[ https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8733: --- Assignee: (was: Apache Spark) > ML RDD.unpersist calls should use blocking = false > -- > > Key: SPARK-8733 > URL: https://issues.apache.org/jira/browse/SPARK-8733 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Joseph K. Bradley > Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png > > Original Estimate: 72h > Remaining Estimate: 72h > > MLlib uses unpersist in many places, but is not consistent about blocking vs > not. We should check through all of MLlib and change calls to use blocking = > false, unless there is a real need to block. I have run into issues with > futures timing out because of unpersist() calls, when there was no real need > for the ML method to fail. > See attached screenshot. Training succeeded, but the final unpersist during > cleanup failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8733) ML RDD.unpersist calls should use blocking = false
[ https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610408#comment-14610408 ] Apache Spark commented on SPARK-8733: - User 'ilganeli' has created a pull request for this issue: https://github.com/apache/spark/pull/7160 > ML RDD.unpersist calls should use blocking = false > -- > > Key: SPARK-8733 > URL: https://issues.apache.org/jira/browse/SPARK-8733 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Joseph K. Bradley > Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png > > Original Estimate: 72h > Remaining Estimate: 72h > > MLlib uses unpersist in many places, but is not consistent about blocking vs > not. We should check through all of MLlib and change calls to use blocking = > false, unless there is a real need to block. I have run into issues with > futures timing out because of unpersist() calls, when there was no real need > for the ML method to fail. > See attached screenshot. Training succeeded, but the final unpersist during > cleanup failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8733) ML RDD.unpersist calls should use blocking = false
[ https://issues.apache.org/jira/browse/SPARK-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8733: --- Assignee: Apache Spark > ML RDD.unpersist calls should use blocking = false > -- > > Key: SPARK-8733 > URL: https://issues.apache.org/jira/browse/SPARK-8733 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Apache Spark > Attachments: Screen Shot 2015-06-30 at 10.51.44 AM.png > > Original Estimate: 72h > Remaining Estimate: 72h > > MLlib uses unpersist in many places, but is not consistent about blocking vs > not. We should check through all of MLlib and change calls to use blocking = > false, unless there is a real need to block. I have run into issues with > futures timing out because of unpersist() calls, when there was no real need > for the ML method to fail. > See attached screenshot. Training succeeded, but the final unpersist during > cleanup failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>
[ https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610376#comment-14610376 ] Alexis Seigneurin commented on SPARK-4557: -- Yes, but the problem is not compiling Spark's code with this API change. The problem is about compiling Java code that is using the updated JavaDStreamLike interface: either the Java code does not compile, or you break the API. None is ideal. > Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a > Function<..., Void> > --- > > Key: SPARK-4557 > URL: https://issues.apache.org/jira/browse/SPARK-4557 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.1.0 >Reporter: Alexis Seigneurin >Priority: Minor > Labels: starter > > In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You > have to write: > {code:java} > .foreachRDD(items -> { > ...; > return null; > }); > {code} > Instead of: > {code:java} > .foreachRDD(items -> ...); > {code} > This is because the foreachRDD method accepts a Function, Void> > instead of a VoidFunction>. This would make sense to change it > to a VoidFunction as, in Spark's API, the foreach method already accepts a > VoidFunction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4557) Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a Function<..., Void>
[ https://issues.apache.org/jira/browse/SPARK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610334#comment-14610334 ] somil deshmukh commented on SPARK-4557: --- In JavaDStreamLike.scala ,I have replace JFunction(R,Void) to JVoidFunction(R),I have complied the code with no error using Java 7. > Spark Streaming' foreachRDD method should accept a VoidFunction<...>, not a > Function<..., Void> > --- > > Key: SPARK-4557 > URL: https://issues.apache.org/jira/browse/SPARK-4557 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.1.0 >Reporter: Alexis Seigneurin >Priority: Minor > Labels: starter > > In *Java*, using Spark Streaming's foreachRDD function is quite verbose. You > have to write: > {code:java} > .foreachRDD(items -> { > ...; > return null; > }); > {code} > Instead of: > {code:java} > .foreachRDD(items -> ...); > {code} > This is because the foreachRDD method accepts a Function, Void> > instead of a VoidFunction>. This would make sense to change it > to a VoidFunction as, in Spark's API, the foreach method already accepts a > VoidFunction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8762) Maven build fails if the project is in a symlinked folder
Roman Zenka created SPARK-8762: -- Summary: Maven build fails if the project is in a symlinked folder Key: SPARK-8762 URL: https://issues.apache.org/jira/browse/SPARK-8762 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: CentOS, Java 1.7, Maven 3.3.3 Reporter: Roman Zenka Priority: Minor Build was failing mysteriously in spark-core module with following error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project spark-core_2.10: Compilation failure: Compilation failure: [ERROR] /mnt/jenkins/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/DeveloperApi.java:[35,8] error: duplicate class: org.apache.spark.annotation.DeveloperApi [ERROR] /mnt/jenkins/var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/Experimental.java:[36,8] error: duplicate class: org.apache.spark.annotation.Experimental [ERROR] /var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/AlphaComponent.java:[33,8] error: duplicate class: org.apache.spark.annotation.AlphaComponent [ERROR] /var/lib/jenkins/jobs/apache.spark/workspace/core/src/main/scala/org/apache/spark/annotation/Private.java:[41,8] error: duplicate class: org.apache.spark.annotation.Private [ERROR] -> [Help 1] The /var/lib/jenkins folder is actually a symlink to /mnt/jenkins/var/lib/jenkins. This confuses the compiler that seems to resolve some paths and keep others intact, which leads to the same class appearing "twice" during the compilation. The workaround is to always point the build to the physical folder, never build through a symlink. I have not determined the precise source of the error, but it is likely inside Maven. The fix could be as easy as mentioning that this issue exists in the FAQ so others running into it can fix it instantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8734) Expose all Mesos DockerInfo options to Spark
[ https://issues.apache.org/jira/browse/SPARK-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610289#comment-14610289 ] Chris Heller commented on SPARK-8734: - I've started work on this @ https://github.com/hellertime/spark/tree/feature/SPARK-8734 Once I have all the fields I'll submit a PR, but for those eager to try it out feel free to fetch and merge. > Expose all Mesos DockerInfo options to Spark > > > Key: SPARK-8734 > URL: https://issues.apache.org/jira/browse/SPARK-8734 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Chris Heller >Priority: Minor > > SPARK-2691 only exposed a few options from the DockerInfo message. It would > be reasonable to expose them all, especially given one can now specify > arbitrary parameters to docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-8291) Add parse functionality to LabeledPoint in PySpark
[ https://issues.apache.org/jira/browse/SPARK-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Kumar closed SPARK-8291. -- Resolution: Won't Fix > Add parse functionality to LabeledPoint in PySpark > -- > > Key: SPARK-8291 > URL: https://issues.apache.org/jira/browse/SPARK-8291 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Reporter: Manoj Kumar >Priority: Minor > > It is useful to have functionality that can parse a string into a > LabeledPoint while loading files, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6602) Replace direct use of Akka with Spark RPC interface
[ https://issues.apache.org/jira/browse/SPARK-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610274#comment-14610274 ] Apache Spark commented on SPARK-6602: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/7159 > Replace direct use of Akka with Spark RPC interface > --- > > Key: SPARK-6602 > URL: https://issues.apache.org/jira/browse/SPARK-6602 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.
[ https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8755: --- Assignee: (was: Apache Spark) > Streaming application from checkpoint will fail to load in security mode. > - > > Key: SPARK-8755 > URL: https://issues.apache.org/jira/browse/SPARK-8755 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.0 >Reporter: SaintBacchus > > If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not > need *kinit* in the client machine. > But when the application was recorved from checkpoint file, it had to > *kinit*, because: > The checkpoint did not use this configurations before it use a DFSClient to > fetch the ckeckpoint file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.
[ https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610065#comment-14610065 ] Apache Spark commented on SPARK-8755: - User 'SaintBacchus' has created a pull request for this issue: https://github.com/apache/spark/pull/7158 > Streaming application from checkpoint will fail to load in security mode. > - > > Key: SPARK-8755 > URL: https://issues.apache.org/jira/browse/SPARK-8755 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.0 >Reporter: SaintBacchus > > If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not > need *kinit* in the client machine. > But when the application was recorved from checkpoint file, it had to > *kinit*, because: > The checkpoint did not use this configurations before it use a DFSClient to > fetch the ckeckpoint file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.
[ https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8755: --- Assignee: Apache Spark > Streaming application from checkpoint will fail to load in security mode. > - > > Key: SPARK-8755 > URL: https://issues.apache.org/jira/browse/SPARK-8755 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.0 >Reporter: SaintBacchus >Assignee: Apache Spark > > If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not > need *kinit* in the client machine. > But when the application was recorved from checkpoint file, it had to > *kinit*, because: > The checkpoint did not use this configurations before it use a DFSClient to > fetch the ckeckpoint file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8755) Streaming application from checkpoint will fail to load in security mode.
[ https://issues.apache.org/jira/browse/SPARK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-8755: Description: If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not need *kinit* in the client machine. But when the application was recorved from checkpoint file, it had to *kinit*, because: The checkpoint did not use this configurations before it use a DFSClient to fetch the ckeckpoint file. was: If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not need *kinit* in the client machine. But the application was recorved from checkpoint file, it had to *kinit*, because: the checkpoint did not use this configurations before it use a DFSClient to fetch the ckeckpoint file. > Streaming application from checkpoint will fail to load in security mode. > - > > Key: SPARK-8755 > URL: https://issues.apache.org/jira/browse/SPARK-8755 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.0 >Reporter: SaintBacchus > > If the user set *spark.yarn.principal* and *spark.yarn.keytab* , he does not > need *kinit* in the client machine. > But when the application was recorved from checkpoint file, it had to > *kinit*, because: > The checkpoint did not use this configurations before it use a DFSClient to > fetch the ckeckpoint file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method
[ https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610018#comment-14610018 ] Kai Sasaki commented on SPARK-1503: --- [~staple] [~josephkb] Thank you for pinging and inspiring information! I'll rewrite current patch based on your logic and codes. Thanks a lot. > Implement Nesterov's accelerated first-order method > --- > > Key: SPARK-1503 > URL: https://issues.apache.org/jira/browse/SPARK-1503 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Aaron Staple > Attachments: linear.png, linear_l1.png, logistic.png, logistic_l2.png > > > Nesterov's accelerated first-order method is a drop-in replacement for > steepest descent but it converges much faster. We should implement this > method and compare its performance with existing algorithms, including SGD > and L-BFGS. > TFOCS (http://cvxr.com/tfocs/) is a reference implementation of Nesterov's > method and its variants on composite objectives. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8761) Master.removeApplication is not thread-safe but is called from multiple threads
Shixiong Zhu created SPARK-8761: --- Summary: Master.removeApplication is not thread-safe but is called from multiple threads Key: SPARK-8761 URL: https://issues.apache.org/jira/browse/SPARK-8761 Project: Spark Issue Type: Bug Components: Deploy Reporter: Shixiong Zhu Master.removeApplication is not thread-safe. But it's called both in the message loop of Master and MasterPage.handleAppKillRequest which runs in threads of the Web server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8760) allow moving and symlinking binaries
Philipp Angerer created SPARK-8760: -- Summary: allow moving and symlinking binaries Key: SPARK-8760 URL: https://issues.apache.org/jira/browse/SPARK-8760 Project: Spark Issue Type: Improvement Components: PySpark, Spark Shell, Spark Submit, SparkR Affects Versions: 1.4.0 Reporter: Philipp Angerer Priority: Minor you use the following line to determine {{$SPARK_HOME}} in all binaries {code:none} export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" {code} however users should be able to override this. also symlinks should be followed: {code:none} if [[ -z "$SPARK_HOME" ]]; then export SPARK_HOME="$(dirname "$(readlink -f "$0")")" fi {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8723) improve code gen for divide and remainder
[ https://issues.apache.org/jira/browse/SPARK-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8723: - Assignee: Wenchen Fan > improve code gen for divide and remainder > - > > Key: SPARK-8723 > URL: https://issues.apache.org/jira/browse/SPARK-8723 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8727) Add missing python api
[ https://issues.apache.org/jira/browse/SPARK-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8727: - Assignee: Tarek Auel > Add missing python api > -- > > Key: SPARK-8727 > URL: https://issues.apache.org/jira/browse/SPARK-8727 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tarek Auel >Assignee: Tarek Auel > Fix For: 1.5.0 > > > Add the python api that is missing for > https://issues.apache.org/jira/browse/SPARK-8248 > https://issues.apache.org/jira/browse/SPARK-8234 > https://issues.apache.org/jira/browse/SPARK-8217 > https://issues.apache.org/jira/browse/SPARK-8215 > https://issues.apache.org/jira/browse/SPARK-8212 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8692) re-order the case statements that handling catalyst data types
[ https://issues.apache.org/jira/browse/SPARK-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8692: - Assignee: Wenchen Fan > re-order the case statements that handling catalyst data types > --- > > Key: SPARK-8692 > URL: https://issues.apache.org/jira/browse/SPARK-8692 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Minor > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8615) sql programming guide recommends deprecated code
[ https://issues.apache.org/jira/browse/SPARK-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8615: - Assignee: Tijo Thomas > sql programming guide recommends deprecated code > > > Key: SPARK-8615 > URL: https://issues.apache.org/jira/browse/SPARK-8615 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.4.0 >Reporter: Gergely Svigruha >Assignee: Tijo Thomas >Priority: Minor > Fix For: 1.5.0 > > > The Spark 1.4 sql programming guide has an example code on how to use JDBC > tables: > https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases > sqlContext.load("jdbc", Map(...)) > However this code complies with a warning, and recommends to do this: > sqlContext.read.format("jdbc").options(Map(...)).load() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8590) add code gen for ExtractValue
[ https://issues.apache.org/jira/browse/SPARK-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8590: - Assignee: Wenchen Fan > add code gen for ExtractValue > - > > Key: SPARK-8590 > URL: https://issues.apache.org/jira/browse/SPARK-8590 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8589) cleanup DateTimeUtils
[ https://issues.apache.org/jira/browse/SPARK-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8589: - Assignee: Wenchen Fan > cleanup DateTimeUtils > - > > Key: SPARK-8589 > URL: https://issues.apache.org/jira/browse/SPARK-8589 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8535) PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name
[ https://issues.apache.org/jira/browse/SPARK-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8535: - Assignee: Yuri Saito > PySpark : Can't create DataFrame from Pandas dataframe with no explicit > column name > --- > > Key: SPARK-8535 > URL: https://issues.apache.org/jira/browse/SPARK-8535 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.4.0 >Reporter: Christophe Bourguignat >Assignee: Yuri Saito > Fix For: 1.5.0 > > > Trying to create a Spark DataFrame from a pandas dataframe with no explicit > column name : > pandasDF = pd.DataFrame([[1, 2], [5, 6]]) > sparkDF = sqlContext.createDataFrame(pandasDF) > *** > > 1 sparkDF = sqlContext.createDataFrame(pandasDF) > /usr/local/Cellar/apache-spark/1.4.0/libexec/python/pyspark/sql/context.pyc > in createDataFrame(self, data, schema, samplingRatio) > 344 > 345 jrdd = > self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) > --> 346 df = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), > schema.json()) > 347 return DataFrame(df, self) > 348 > /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 536 answer = self.gateway_client.send_command(command) > 537 return_value = get_return_value(answer, self.gateway_client, > --> 538 self.target_id, self.name) > 539 > 540 for temp_arg in temp_args: > /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > Py4JJavaError: An error occurred while calling o87.applySchemaToPythonRDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8236) misc function: crc32
[ https://issues.apache.org/jira/browse/SPARK-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8236: - Assignee: Tarek Auel > misc function: crc32 > > > Key: SPARK-8236 > URL: https://issues.apache.org/jira/browse/SPARK-8236 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Tarek Auel > Fix For: 1.5.0 > > > crc32(string/binary): bigint > Computes a cyclic redundancy check value for string or binary argument and > returns bigint value (as of Hive 1.3.0). Example: crc32('ABC') = 2743272264. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8235) misc function: sha1 / sha
[ https://issues.apache.org/jira/browse/SPARK-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8235: - Assignee: Tarek Auel > misc function: sha1 / sha > - > > Key: SPARK-8235 > URL: https://issues.apache.org/jira/browse/SPARK-8235 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Tarek Auel > Fix For: 1.5.0 > > > sha1(string/binary): string > sha(string/binary): string > Calculates the SHA-1 digest for string or binary and returns the value as a > hex string (as of Hive 1.3.0). Example: sha1('ABC') = > '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8031) Version number written to Hive metastore is "0.13.1aa" instead of "0.13.1a"
[ https://issues.apache.org/jira/browse/SPARK-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8031: - Assignee: Cheng Lian > Version number written to Hive metastore is "0.13.1aa" instead of "0.13.1a" > --- > > Key: SPARK-8031 > URL: https://issues.apache.org/jira/browse/SPARK-8031 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1, 1.4.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Trivial > Fix For: 1.5.0 > > > While debugging {{CliSuite}} for 1.4.0-SNAPSHOT, noticed the following WARN > log line: > {noformat} > 15/06/02 13:40:29 WARN ObjectStore: Version information not found in > metastore. hive.metastore.schema.verification is not enabled so recording the > schema version 0.13.1aa > {noformat} > The problem is that, the version of Hive dependencies 1.4.0-SNAPSHOT uses is > {{0.13.1a}} (the one shaded by [~pwendell]), but the version showed in this > line is {{0.13.1aa}} (one more {{a}}). The WARN log itself is OK since > {{CliSuite}} initializes a brand new temporary Derby metastore. > While initializing Hive metastore, Hive calls {{ObjectStore.checkSchema()}} > and may write the "short" version string to metastore. This short version > string is defined by {{hive.version.shortname}} in the POM. However, [it was > defined as > {{0.13.1aa}}|https://github.com/pwendell/hive/commit/32e515907f0005c7a28ee388eadd1c94cf99b2d4#diff-600376dffeb79835ede4a0b285078036R62]. > Confirmed with [~pwendell] that it should be a typo. > This doesn't cause any trouble for now, but we probably want to fix this in > the future if we ever need to release another shaded version of Hive 0.13.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3258) Python API for streaming MLlib algorithms
[ https://issues.apache.org/jira/browse/SPARK-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3258: - Assignee: Manoj Kumar > Python API for streaming MLlib algorithms > - > > Key: SPARK-3258 > URL: https://issues.apache.org/jira/browse/SPARK-3258 > Project: Spark > Issue Type: Umbrella > Components: MLlib, PySpark, Streaming >Reporter: Xiangrui Meng >Assignee: Manoj Kumar > Fix For: 1.5.0 > > > This is an umbrella JIRA to track Python port of streaming MLlib algorithms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7810) rdd.py "_load_from_socket" cannot load data from jvm socket if ipv6 is used
[ https://issues.apache.org/jira/browse/SPARK-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-7810: - Assignee: Ai He > rdd.py "_load_from_socket" cannot load data from jvm socket if ipv6 is used > --- > > Key: SPARK-7810 > URL: https://issues.apache.org/jira/browse/SPARK-7810 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.3.1 >Reporter: Ai He >Assignee: Ai He > Fix For: 1.3.2, 1.5.0, 1.4.2 > > > Method "_load_from_socket" in rdd.py cannot load data from jvm socket if ipv6 > is used. The current method only works well with ipv4. New modification > should work around both two protocols. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8759) add default eval to binary and unary expression according to default behavior of nullable
[ https://issues.apache.org/jira/browse/SPARK-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8759: --- Assignee: (was: Apache Spark) > add default eval to binary and unary expression according to default behavior > of nullable > - > > Key: SPARK-8759 > URL: https://issues.apache.org/jira/browse/SPARK-8759 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8731) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/SPARK-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8731: - Component/s: SQL > Beeline doesn't work with -e option when started in background > -- > > Key: SPARK-8731 > URL: https://issues.apache.org/jira/browse/SPARK-8731 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Wang Yiguang >Priority: Minor > > Beeline stops when running back ground like this: > beeline -e "some query" & > it doesn't work even with the -f switch. > For example: > this works: > beeline -u "jdbc:hive2://0.0.0.0:8000" -e "show databases;" > however this not: > beeline -u "jdbc:hive2://0.0.0.0:8000" -e "show databases;" & -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org