[jira] [Created] (SPARK-5506) java.lang.ClassCastException using lambda expressions in combination of spark and Servlet
Milad Khajavi created SPARK-5506: Summary: java.lang.ClassCastException using lambda expressions in combination of spark and Servlet Key: SPARK-5506 URL: https://issues.apache.org/jira/browse/SPARK-5506 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 1.2.0 Environment: spark server: Ubuntu 14.04 amd64 $ java -version java version "1.8.0_25" Java(TM) SE Runtime Environment (build 1.8.0_25-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) Reporter: Milad Khajavi Priority: Blocker I'm trying to build a web API for my Apache spark jobs using sparkjava.com framework. My code is: @Override public void init() { get("/hello", (req, res) -> { String sourcePath = "hdfs://spark:54310/input/*"; SparkConf conf = new SparkConf().setAppName("LineCount"); conf.setJars(new String[] { "/home/sam/resin-4.0.42/webapps/test.war" }); File configFile = new File("config.properties"); String sparkURI = "spark://hamrah:7077"; conf.setMaster(sparkURI); conf.set("spark.driver.allowMultipleContexts", "true"); JavaSparkContext sc = new JavaSparkContext(conf); @SuppressWarnings("resource") JavaRDD log = sc.textFile(sourcePath); JavaRDD lines = log.filter(x -> { return true; }); return lines.count(); }); } If I remove the lambda expression or put it inside a simple jar rather than a web service (somehow a Servlet) it will run without any error. But using a lambda expression inside a Servlet will result this exception: 15/01/28 10:36:33 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hamrah): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1 at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) P.S: I tried combination of jersey and javaspark with jetty, tomcat and resin and all of them led me to the same result. Here the same issue: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-ClassCastException-SerializedLambda-to-org-apache-spark-api-java-function-Fu1-tt21261.html This is my colleague question in stackoverflow: http://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5307) Add utility to help with NotSerializableException debugging
[ https://issues.apache.org/jira/browse/SPARK-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-5307. Resolution: Fixed Fix Version/s: 1.3.0 > Add utility to help with NotSerializableException debugging > --- > > Key: SPARK-5307 > URL: https://issues.apache.org/jira/browse/SPARK-5307 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.3.0 > > > Scala closures can easily capture objects unintentionally, especially with > implicit arguments. I think we can do more than just relying on the users > being smart about using sun.io.serialization.extendedDebugInfo to find more > debug information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3694) Allow printing object graph of tasks/RDD's with a debug flag
[ https://issues.apache.org/jira/browse/SPARK-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-3694: --- Fix Version/s: 1.3.0 > Allow printing object graph of tasks/RDD's with a debug flag > > > Key: SPARK-3694 > URL: https://issues.apache.org/jira/browse/SPARK-3694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Ilya Ganelin > Labels: starter > Fix For: 1.3.0 > > > This would be useful for debugging extra references inside of RDD's > Here is an example for inspiration: > http://ehcache.org/xref/net/sf/ehcache/pool/sizeof/ObjectGraphWalker.html > We'd want to print this trace for both the RDD serialization inside of the > DAGScheduler and the task serialization in the TaskSetManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3694) Allow printing object graph of tasks/RDD's with a debug flag
[ https://issues.apache.org/jira/browse/SPARK-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-3694: --- Target Version/s: 1.3.0 (was: 1.2.0) > Allow printing object graph of tasks/RDD's with a debug flag > > > Key: SPARK-3694 > URL: https://issues.apache.org/jira/browse/SPARK-3694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Ilya Ganelin > Labels: starter > Fix For: 1.3.0 > > > This would be useful for debugging extra references inside of RDD's > Here is an example for inspiration: > http://ehcache.org/xref/net/sf/ehcache/pool/sizeof/ObjectGraphWalker.html > We'd want to print this trace for both the RDD serialization inside of the > DAGScheduler and the task serialization in the TaskSetManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5307) Add utility to help with NotSerializableException debugging
[ https://issues.apache.org/jira/browse/SPARK-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299696#comment-14299696 ] Apache Spark commented on SPARK-5307: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/4297 > Add utility to help with NotSerializableException debugging > --- > > Key: SPARK-5307 > URL: https://issues.apache.org/jira/browse/SPARK-5307 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Reynold Xin >Assignee: Reynold Xin > > Scala closures can easily capture objects unintentionally, especially with > implicit arguments. I think we can do more than just relying on the users > being smart about using sun.io.serialization.extendedDebugInfo to find more > debug information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1010) Update all unit tests to use SparkConf instead of system properties
[ https://issues.apache.org/jira/browse/SPARK-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299683#comment-14299683 ] Josh Rosen commented on SPARK-1010: --- The pull request for SPARK-5425 fixes a bug in the ResetSystemProperties trait added here. My original implementation of that trait didn't properly reset the system properties because it didn't perform a proper clone: https://github.com/apache/spark/pull/4220#issuecomment-71992373. > Update all unit tests to use SparkConf instead of system properties > --- > > Key: SPARK-1010 > URL: https://issues.apache.org/jira/browse/SPARK-1010 > Project: Spark > Issue Type: New Feature >Affects Versions: 0.9.0 >Reporter: Patrick Wendell >Assignee: Josh Rosen >Priority: Minor > Fix For: 1.0.3, 1.3.0, 1.1.2, 1.2.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition
[ https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299682#comment-14299682 ] Tathagata Das commented on SPARK-4981: -- +1 This will be awesome :P > Add a streaming singular value decomposition > > > Key: SPARK-4981 > URL: https://issues.apache.org/jira/browse/SPARK-4981 > Project: Spark > Issue Type: New Feature > Components: MLlib, Streaming >Reporter: Jeremy Freeman > > This is for tracking WIP on a streaming singular value decomposition > implementation. This will likely be more complex than the existing streaming > algorithms (k-means, regression), but should be possible using the family of > sequential update rule outlined in this paper: > "Fast low-rank modifications of the thin singular value decomposition" > by Matthew Brand > http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2614) Add the spark-examples-xxx-.jar to the Debian packages created with mvn ... -Pdeb (using assembly/pom.xml)
[ https://issues.apache.org/jira/browse/SPARK-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299668#comment-14299668 ] Christian Tzolov commented on SPARK-2614: - Not sure if this issue is still alive? In case someone is interested i've rebased and realigned the #1611 pull-request to upstream/master (e643de42a7) > Add the spark-examples-xxx-.jar to the Debian packages created with mvn ... > -Pdeb (using assembly/pom.xml) > -- > > Key: SPARK-2614 > URL: https://issues.apache.org/jira/browse/SPARK-2614 > Project: Spark > Issue Type: Improvement > Components: Build, Deploy >Reporter: Christian Tzolov > > The tar.gz distribution includes already the spark-examples.jar in the > bundle. It is a common practice for installers to run SparkPi as a smoke test > to verify that the installation is OK > /usr/share/spark/bin/spark-submit \ > --num-executors 10 --master yarn-cluster \ > --class org.apache.spark.examples.SparkPi \ > /usr/share/spark/jars/spark-examples-1.0.1-hadoop2.2.0.jar 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition
[ https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299666#comment-14299666 ] Reza Zadeh commented on SPARK-4981: --- To be model parallel, we can simply warm-start the current ALS implementation in org.apache.spark.mllib.recommendation The work involved would be to expose a warm-start option in ALS, and then redo training with say 2 iterations instead of 10, with each batch of RDDs. The stream would be over batches of Ratings. This should be the simplest option. > Add a streaming singular value decomposition > > > Key: SPARK-4981 > URL: https://issues.apache.org/jira/browse/SPARK-4981 > Project: Spark > Issue Type: New Feature > Components: MLlib, Streaming >Reporter: Jeremy Freeman > > This is for tracking WIP on a streaming singular value decomposition > implementation. This will likely be more complex than the existing streaming > algorithms (k-means, regression), but should be possible using the family of > sequential update rule outlined in this paper: > "Fast low-rank modifications of the thin singular value decomposition" > by Matthew Brand > http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5472) Add support for reading from and writing to a JDBC database
[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tor Myklebust updated SPARK-5472: - Description: It would be nice to be able to make a table in a JDBC database appear as a table in Spark SQL. This would let users, for instance, perform a JOIN between a DataFrame in Spark SQL with a table in a Postgres database. It might also be nice to be able to go the other direction -- save a DataFrame to a database -- for instance in an ETL job. Edited to clarify: Both of these tasks are certainly possible to accomplish at the moment with a little bit of ad-hoc glue code. However, there is no fundamental reason why the user should need to supply the table schema and some code for pulling data out of a ResultSet row into a Catalyst Row structure when this information can be derived from the schema of the database table itself. was: It would be nice to be able to make a table in a JDBC database appear as a table in Spark SQL. This would let users, for instance, perform a JOIN between a DataFrame in Spark SQL with a table in a Postgres database. It might also be nice to be able to go the other direction -- save a DataFrame to a database -- for instance in an ETL job. > Add support for reading from and writing to a JDBC database > --- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Blocker > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. > Edited to clarify: Both of these tasks are certainly possible to accomplish > at the moment with a little bit of ad-hoc glue code. However, there is no > fundamental reason why the user should need to supply the table schema and > some code for pulling data out of a ResultSet row into a Catalyst Row > structure when this information can be derived from the schema of the > database table itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5472) Add support for reading from and writing to a JDBC database
[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299657#comment-14299657 ] Tor Myklebust commented on SPARK-5472: -- You appear to understand the issue perfectly. You have to write the case class mapper, work out the schema, and register the thing as a temporary table. Once you've done all that for one table, you have to do something rather similar for the next table you want to load. And all this work requires Scala coding rather than a short SQL query. This is more complexity for the user than the problem really deserves. > Add support for reading from and writing to a JDBC database > --- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Blocker > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5472) Add support for reading from and writing to a JDBC database
[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299657#comment-14299657 ] Tor Myklebust edited comment on SPARK-5472 at 1/31/15 4:12 AM: --- You appear to understand the issue perfectly. You have to write the case class mapper, work out the schema, and register the thing as a temporary table. Once you've done all that for one table, you have to do something rather similar for the next table you want to load. And all this work requires Scala coding rather than a short SQL query. This is more complexity for the user than the problem really deserves, and it appears to be easy to automate in a reasonably transparent way. was (Author: tmyklebu): You appear to understand the issue perfectly. You have to write the case class mapper, work out the schema, and register the thing as a temporary table. Once you've done all that for one table, you have to do something rather similar for the next table you want to load. And all this work requires Scala coding rather than a short SQL query. This is more complexity for the user than the problem really deserves. > Add support for reading from and writing to a JDBC database > --- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Blocker > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5472) Add support for reading from and writing to a JDBC database
[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299642#comment-14299642 ] Reynold Xin commented on SPARK-5472: [~chinnitv], thanks for commenting. The existing JdbcRDD doesn't really solve the use case of loading in. In particular, it cannot: 1. Be used in pure SQL 2. Be used without tons of glue code (converting resultSet to case classes is still more code than necessary to write) 3. It does not support filter pushdown, unless users manually write the pushdown. > Add support for reading from and writing to a JDBC database > --- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Blocker > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5472) Add support for reading from and writing to a JDBC database
[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299633#comment-14299633 ] Anand Mohan Tumuluri commented on SPARK-5472: - Pardon my ignorance but I think JdbcRdd can be given a ResultSet to case class mapper which will yield a RDD[case class] Any RDD[case class] (RDD[Product]) can be converted into a SchemaRDD by using createSchemaRDD method of SQL/HiveContext. This SchemaRDD can then be registered as a temp table within Spark SQL through registerTempTable and then can be joined to other Spark SQL tables. This solves the use case of loading data from a JDBC data source, isn't it? Am I missing something. Ofcourse this requires Scala and Spark-shell, meaning it cant be done from spark-sql or thriftserver2. Howeer there currently is no easy way of saving a RDD into a JDBC data sink. (DbOutputFormat is way too rigid). This PR, providing a generic mechanism for saving SchemaRDD into a RDBMS table, will be very valuable for us. > Add support for reading from and writing to a JDBC database > --- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Blocker > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5505) ConsumerRebalanceFailedException from Kafka consumer
Greg Temchenko created SPARK-5505: - Summary: ConsumerRebalanceFailedException from Kafka consumer Key: SPARK-5505 URL: https://issues.apache.org/jira/browse/SPARK-5505 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Environment: CentOS6 / Linux 2.6.32-358.2.1.el6.x86_64 java version "1.7.0_21" Scala compiler version 2.9.3 2 cores Intel(R) Xeon(R) CPU E5620 @ 2.40GHz / 16G RAM VMWare VM. Reporter: Greg Temchenko Priority: Critical >From time to time Spark streaming produces a ConsumerRebalanceFailedException >and stops receiving messages. After that all consequential RDDs are empty. {code} 15/01/30 18:18:36 ERROR consumer.ZookeeperConsumerConnector: [terran_vmname-1422670149779-243b4e10], error during syncedRebalance kafka.common.ConsumerRebalanceFailedException: terran_vmname-1422670149779-243b4e10 can't rebalance after 4 retries at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:432) at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:355) {code} The problem is also described in the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-td19570.html As I understand it's a critical blocker for kafka-spark streaming production use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299571#comment-14299571 ] DeepakVohra commented on SPARK-5483: To test: 1. Create a Maven project in Eclipse IDE. 2. Add the Spark MLLib dependency 2.10. org.apache.spark spark-mllib_2.10 1.2.0 3. Add a Java class to the Maven project. public class Test{} 4. Add the following import statements to the Java class Test. import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:11
[jira] [Closed] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?
[ https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Shaw closed SPARK-5299. - Verified that key used to sign release is now present. Thanks. > Is http://www.apache.org/dist/spark/KEYS out of date? > - > > Key: SPARK-5299 > URL: https://issues.apache.org/jira/browse/SPARK-5299 > Project: Spark > Issue Type: Question > Components: Deploy >Reporter: David Shaw >Assignee: Patrick Wendell > > The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to > match the keys used to sign the releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5504) ScalaReflection.convertToCatalyst should support nested arrays
[ https://issues.apache.org/jira/browse/SPARK-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5504. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4295 [https://github.com/apache/spark/pull/4295] > ScalaReflection.convertToCatalyst should support nested arrays > -- > > Key: SPARK-5504 > URL: https://issues.apache.org/jira/browse/SPARK-5504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > Fix For: 1.3.0 > > > After the recent refactoring, convertToCatalyst in ScalaReflection does not > recurse on Arrays. It should. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5400) Rename GaussianMixtureEM to GaussianMixture
[ https://issues.apache.org/jira/browse/SPARK-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5400. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4290 [https://github.com/apache/spark/pull/4290] > Rename GaussianMixtureEM to GaussianMixture > --- > > Key: SPARK-5400 > URL: https://issues.apache.org/jira/browse/SPARK-5400 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Travis Galoppo >Priority: Minor > Fix For: 1.3.0 > > > GaussianMixtureEM is following the old naming convention of including the > optimization algorithm name in the class title. We should probably rename it > to GaussianMixture so that it can use other optimization algorithms in the > future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5504) ScalaReflection.convertToCatalyst should support nested arrays
[ https://issues.apache.org/jira/browse/SPARK-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299291#comment-14299291 ] Apache Spark commented on SPARK-5504: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/4295 > ScalaReflection.convertToCatalyst should support nested arrays > -- > > Key: SPARK-5504 > URL: https://issues.apache.org/jira/browse/SPARK-5504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > > After the recent refactoring, convertToCatalyst in ScalaReflection does not > recurse on Arrays. It should. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
[ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4259. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4254 [https://github.com/apache/spark/pull/4254] > Add Power Iteration Clustering Algorithm with Gaussian Similarity Function > -- > > Key: SPARK-4259 > URL: https://issues.apache.org/jira/browse/SPARK-4259 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Fan Jiang >Assignee: Fan Jiang > Labels: features > Fix For: 1.3.0 > > > In recent years, power Iteration clustering has become one of the most > popular modern clustering algorithms. It is simple to implement, can be > solved efficiently by standard linear algebra software, and very often > outperforms traditional clustering algorithms such as the k-means algorithm. > Power iteration clustering is a scalable and efficient algorithm for > clustering points given pointwise mutual affinity values. Internally the > algorithm: > computes the Gaussian distance between all pairs of points and represents > these distances in an Affinity Matrix > calculates a Normalized Affinity Matrix > calculates the principal eigenvalue and eigenvector > Clusters each of the input points according to their principal eigenvector > component value > Details of this algorithm are found within [Power Iteration Clustering, Lin > and Cohen]{www.icml2010.org/papers/387.pdf} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5503) Example code for Power Iteration Clustering
[ https://issues.apache.org/jira/browse/SPARK-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5503: - Description: There are two places we need to put examples: 1. In the user guide, we should be a small example (as in the unit test). 2. Under examples/, we can have something fancy but still need to keep it minimal. 3. The user guide contains some out-of-date links, which needs to be updated as well. was: There are two places we need to put examples: 1. In the user guide, we should be a small example (as in the unit test). 2. Under examples/, we can have something fancy. > Example code for Power Iteration Clustering > --- > > Key: SPARK-5503 > URL: https://issues.apache.org/jira/browse/SPARK-5503 > Project: Spark > Issue Type: Documentation > Components: Documentation, Examples, MLlib >Reporter: Xiangrui Meng >Assignee: Stephen Boesch > > There are two places we need to put examples: > 1. In the user guide, we should be a small example (as in the unit test). > 2. Under examples/, we can have something fancy but still need to keep it > minimal. > 3. The user guide contains some out-of-date links, which needs to be updated > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5486) Add validate function for BlockMatrix
[ https://issues.apache.org/jira/browse/SPARK-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5486. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4279 [https://github.com/apache/spark/pull/4279] > Add validate function for BlockMatrix > - > > Key: SPARK-5486 > URL: https://issues.apache.org/jira/browse/SPARK-5486 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Burak Yavuz > Fix For: 1.3.0 > > > BlockMatrix needs a validate method to make debugging easy for users. > It will be an expensive method to perform, but it would be useful for users > to know why `multiply` or `add` didn't work properly. > Things to validate: > - MatrixBlocks that are not on the edges should have the dimensions > `rowsPerBlock` and `colsPerBlock`. > - There should be at most one block for each index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5486) Add validate function for BlockMatrix
[ https://issues.apache.org/jira/browse/SPARK-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5486: - Assignee: Burak Yavuz > Add validate function for BlockMatrix > - > > Key: SPARK-5486 > URL: https://issues.apache.org/jira/browse/SPARK-5486 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Burak Yavuz >Assignee: Burak Yavuz > Fix For: 1.3.0 > > > BlockMatrix needs a validate method to make debugging easy for users. > It will be an expensive method to perform, but it would be useful for users > to know why `multiply` or `add` didn't work properly. > Things to validate: > - MatrixBlocks that are not on the edges should have the dimensions > `rowsPerBlock` and `colsPerBlock`. > - There should be at most one block for each index -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5504) ScalaReflection.convertToCatalyst should support nested arrays
Joseph K. Bradley created SPARK-5504: Summary: ScalaReflection.convertToCatalyst should support nested arrays Key: SPARK-5504 URL: https://issues.apache.org/jira/browse/SPARK-5504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley Priority: Minor After the recent refactoring, convertToCatalyst in ScalaReflection does not recurse on Arrays. It should. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5503) Example code for Power Iteration Clustering
Xiangrui Meng created SPARK-5503: Summary: Example code for Power Iteration Clustering Key: SPARK-5503 URL: https://issues.apache.org/jira/browse/SPARK-5503 Project: Spark Issue Type: Documentation Components: Documentation, Examples, MLlib Reporter: Xiangrui Meng Assignee: Stephen Boesch There are two places we need to put examples: 1. In the user guide, we should be a small example (as in the unit test). 2. Under examples/, we can have something fancy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5502) User guide for isotonic regression
Xiangrui Meng created SPARK-5502: Summary: User guide for isotonic regression Key: SPARK-5502 URL: https://issues.apache.org/jira/browse/SPARK-5502 Project: Spark Issue Type: Documentation Components: Documentation, MLlib Reporter: Xiangrui Meng Assignee: Martin Zapletal Add user guide to docs/mllib-regression.md with code examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5501) Write support for the data source API
[ https://issues.apache.org/jira/browse/SPARK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299207#comment-14299207 ] Apache Spark commented on SPARK-5501: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/4294 > Write support for the data source API > - > > Key: SPARK-5501 > URL: https://issues.apache.org/jira/browse/SPARK-5501 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5501) Write support for the data source API
[ https://issues.apache.org/jira/browse/SPARK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-5501: Summary: Write support for the data source API (was: Initial Write support) > Write support for the data source API > - > > Key: SPARK-5501 > URL: https://issues.apache.org/jira/browse/SPARK-5501 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5500) Document that feeding hadoopFile into a shuffle operation will cause problems
[ https://issues.apache.org/jira/browse/SPARK-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299199#comment-14299199 ] Apache Spark commented on SPARK-5500: - User 'sryza' has created a pull request for this issue: https://github.com/apache/spark/pull/4293 > Document that feeding hadoopFile into a shuffle operation will cause problems > - > > Key: SPARK-5500 > URL: https://issues.apache.org/jira/browse/SPARK-5500 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5501) Initial Write support
Yin Huai created SPARK-5501: --- Summary: Initial Write support Key: SPARK-5501 URL: https://issues.apache.org/jira/browse/SPARK-5501 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5500) Document that feeding hadoopFile into a shuffle operation will cause problems
Sandy Ryza created SPARK-5500: - Summary: Document that feeding hadoopFile into a shuffle operation will cause problems Key: SPARK-5500 URL: https://issues.apache.org/jira/browse/SPARK-5500 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.3.0 Reporter: Sandy Ryza Assignee: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5197: --- Fix Version/s: (was: 1.3.0) > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2335) k-Nearest Neighbor classification and regression for MLLib
[ https://issues.apache.org/jira/browse/SPARK-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299093#comment-14299093 ] Brian Gawalt commented on SPARK-2335: - I'm inclined to wonder the same thing; generalizing beyond integers would be nice > k-Nearest Neighbor classification and regression for MLLib > -- > > Key: SPARK-2335 > URL: https://issues.apache.org/jira/browse/SPARK-2335 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Brian Gawalt >Priority: Minor > Labels: features > > The k-Nearest Neighbor model for classification and regression problems is a > simple and intuitive approach, offering a straightforward path to creating > non-linear decision/estimation contours. It's downsides -- high variance > (sensitivity to the known training data set) and computational intensity for > estimating new point labels -- both play to Spark's big data strengths: lots > of data mitigates data concerns; lots of workers mitigate computational > latency. > We should include kNN models as options in MLLib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
[ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299092#comment-14299092 ] Stephen Boesch commented on SPARK-4259: --- Yes the PR has a working version . However Xiangrui has additional significant changes that will affect the API - so the recommendation here would be to wait until early next week for the dust to settle. > Add Power Iteration Clustering Algorithm with Gaussian Similarity Function > -- > > Key: SPARK-4259 > URL: https://issues.apache.org/jira/browse/SPARK-4259 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Fan Jiang >Assignee: Fan Jiang > Labels: features > > In recent years, power Iteration clustering has become one of the most > popular modern clustering algorithms. It is simple to implement, can be > solved efficiently by standard linear algebra software, and very often > outperforms traditional clustering algorithms such as the k-means algorithm. > Power iteration clustering is a scalable and efficient algorithm for > clustering points given pointwise mutual affinity values. Internally the > algorithm: > computes the Gaussian distance between all pairs of points and represents > these distances in an Affinity Matrix > calculates a Normalized Affinity Matrix > calculates the principal eigenvalue and eigenvector > Clusters each of the input points according to their principal eigenvector > component value > Details of this algorithm are found within [Power Iteration Clustering, Lin > and Cohen]{www.icml2010.org/papers/387.pdf} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299009#comment-14299009 ] DeepakVohra edited comment on SPARK-5483 at 1/30/15 7:09 PM: - Thanks for the clarification about why the master URL is not set. Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue and the org.apache.spark.mllib.* packages get found, but introduces the Scala version issue. was (Author: dvohra): Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue and the org.apache.spark.mllib.* packages get found, but introduces the Scala version issue. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.
[jira] [Commented] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
[ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299065#comment-14299065 ] Andrew Musselman commented on SPARK-4259: - Makes sense; does that pull request contain a working version? > Add Power Iteration Clustering Algorithm with Gaussian Similarity Function > -- > > Key: SPARK-4259 > URL: https://issues.apache.org/jira/browse/SPARK-4259 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Fan Jiang >Assignee: Fan Jiang > Labels: features > > In recent years, power Iteration clustering has become one of the most > popular modern clustering algorithms. It is simple to implement, can be > solved efficiently by standard linear algebra software, and very often > outperforms traditional clustering algorithms such as the k-means algorithm. > Power iteration clustering is a scalable and efficient algorithm for > clustering points given pointwise mutual affinity values. Internally the > algorithm: > computes the Gaussian distance between all pairs of points and represents > these distances in an Affinity Matrix > calculates a Normalized Affinity Matrix > calculates the principal eigenvalue and eigenvector > Clusters each of the input points according to their principal eigenvector > component value > Details of this algorithm are found within [Power Iteration Clustering, Lin > and Cohen]{www.icml2010.org/papers/387.pdf} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1517: --- Target Version/s: 1.4.0 > Publish nightly snapshots of documentation, maven artifacts, and binary builds > -- > > Key: SPARK-1517 > URL: https://issues.apache.org/jira/browse/SPARK-1517 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Patrick Wendell >Priority: Blocker > > Should be pretty easy to do with Jenkins. The only thing I can think of that > would be tricky is to set up credentials so that jenkins can publish this > stuff somewhere on apache infra. > Ideally we don't want to have to put a private key on every jenkins box > (since they are otherwise pretty stateless). One idea is to encrypt these > credentials with a passphrase and post them somewhere publicly visible. Then > the jenkins build can download the credentials provided we set a passphrase > in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1517: --- Target Version/s: (was: 1.3.0) > Publish nightly snapshots of documentation, maven artifacts, and binary builds > -- > > Key: SPARK-1517 > URL: https://issues.apache.org/jira/browse/SPARK-1517 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Patrick Wendell >Priority: Blocker > > Should be pretty easy to do with Jenkins. The only thing I can think of that > would be tricky is to set up credentials so that jenkins can publish this > stuff somewhere on apache infra. > Ideally we don't want to have to put a private key on every jenkins box > (since they are otherwise pretty stateless). One idea is to encrypt these > credentials with a passphrase and post them somewhere publicly visible. Then > the jenkins build can download the credentials provided we set a passphrase > in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299009#comment-14299009 ] DeepakVohra edited comment on SPARK-5483 at 1/30/15 6:31 PM: - Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue and the org.apache.spark.mllib.* packages get found, but introduces the Scala version issue. was (Author: dvohra): Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue of the org.apache.spark.mllib.* packages being found. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) >
[jira] [Comment Edited] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299009#comment-14299009 ] DeepakVohra edited comment on SPARK-5483 at 1/30/15 6:30 PM: - Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue of the org.apache.spark.mllib.* packages being found. was (Author: dvohra): Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$
[jira] [Commented] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299009#comment-14299009 ] DeepakVohra commented on SPARK-5483: Is the Maven dependency >spark-mllib_2.10 not a Spark issue? org.apache.spark spark-mllib_2.10 1.2.0 There must be some issue in how you are adding the classes to the classpath. Modifying "2.10" to "2.11" fixes the issue. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter
[jira] [Commented] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299006#comment-14299006 ] Sean Owen commented on SPARK-5489: -- But that *is* the artifact, which you see has the class. It must be an issue in your classpath, right? > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299001#comment-14299001 ] DeepakVohra commented on SPARK-5489: Already did before posting the previous message and the jar does have the classes, but are indicated as not found with the Maven dependency. Gets fixed with MLLib 2.11. The Maven dependency MLlib 2.10 has some issue. > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4846) When the vocabulary size is large, Word2Vec may yield "OutOfMemoryError: Requested array size exceeds VM limit"
[ https://issues.apache.org/jira/browse/SPARK-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4846. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4247 [https://github.com/apache/spark/pull/4247] > When the vocabulary size is large, Word2Vec may yield "OutOfMemoryError: > Requested array size exceeds VM limit" > --- > > Key: SPARK-4846 > URL: https://issues.apache.org/jira/browse/SPARK-4846 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.1.1, 1.2.0 > Environment: Use Word2Vec to process a corpus(sized 3.5G) with one > partition. > The corpus contains about 300 million words and its vocabulary size is about > 10 million. >Reporter: Joseph Tang >Assignee: Joseph Tang >Priority: Minor > Fix For: 1.3.0 > > > Exception in thread "Driver" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) > Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.util.Arrays.copyOf(Arrays.java:2271) > at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) > at > java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) > at > java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1870) > at > java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1779) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1186) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:610) > at > org.apache.spark.mllib.feature.Word2Vec$$anonfun$fit$1.apply$mcVI$sp(Word2Vec.scala:291) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.mllib.feature.Word2Vec.fit(Word2Vec.scala:290) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5496) Allow both "classification" and "Classification" in Algo for trees
[ https://issues.apache.org/jira/browse/SPARK-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5496. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4287 [https://github.com/apache/spark/pull/4287] > Allow both "classification" and "Classification" in Algo for trees > -- > > Key: SPARK-5496 > URL: https://issues.apache.org/jira/browse/SPARK-5496 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.3.0 > > > We use "classification" in tree but "Classification" in boosting. We switched > to "classification" in both cases, but still need to accept "Classification" > to be backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5393) Flood of util.RackResolver log messages after SPARK-1714
[ https://issues.apache.org/jira/browse/SPARK-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-5393. -- Resolution: Fixed Fix Version/s: 1.3.0 > Flood of util.RackResolver log messages after SPARK-1714 > > > Key: SPARK-5393 > URL: https://issues.apache.org/jira/browse/SPARK-5393 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza >Priority: Critical > Fix For: 1.3.0 > > > I thought I fixed this while working on the patch, but [~laserson] seems to > have encountered it when running on master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3778) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
[ https://issues.apache.org/jira/browse/SPARK-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298872#comment-14298872 ] Apache Spark commented on SPARK-3778: - User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/4292 > newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn > - > > Key: SPARK-3778 > URL: https://issues.apache.org/jira/browse/SPARK-3778 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > The newAPIHadoopRDD routine doesn't properly add the credentials to the conf > to be able to access secure hdfs. > Note that newAPIHadoopFile does handle these because the > org.apache.hadoop.mapreduce.Job automatically adds it for you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5485) typo in spark streaming configuration parameter
[ https://issues.apache.org/jira/browse/SPARK-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5485. -- Resolution: Duplicate > typo in spark streaming configuration parameter > --- > > Key: SPARK-5485 > URL: https://issues.apache.org/jira/browse/SPARK-5485 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.2.0 >Reporter: Wing Yew Poon > > In > https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#deploying-applications, > under "Requirements", the bullet point on "Configuring write ahead logs" says > "This can be enabled by setting the configuration parameter > spark.streaming.receiver.writeAheadLogs.enable to true." > There is an unfortunate typo in the name of the parameter, which I > copied-and-pasted into my deployment where I was testing it out and seeing > data loss as a result. > The same typo occurs in > https://spark.apache.org/docs/1.2.0/configuration.html, which is even more > unfortunate. > Documentation should not have typos like this for configuration parameters. I > later found the correct parameter on > http://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5491) Chi-square feature selection
[ https://issues.apache.org/jira/browse/SPARK-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298855#comment-14298855 ] Apache Spark commented on SPARK-5491: - User 'avulanov' has created a pull request for this issue: https://github.com/apache/spark/pull/1484 > Chi-square feature selection > > > Key: SPARK-5491 > URL: https://issues.apache.org/jira/browse/SPARK-5491 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Alexander Ulanov > > Implement chi-square feature selection. PR: > https://github.com/apache/spark/pull/1484 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298823#comment-14298823 ] DeepakVohra edited comment on SPARK-5483 at 1/30/15 4:41 PM: - The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.* packages, which it should according to http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.classification.NaiveBayes; import org.apache.spark.mllib.classification.NaiveBayesModel; import org.apache.spark.mllib.linalg.Vectors; import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.util.MLUtils; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. was (Author: dvohra): The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.classification.NaiveBayes; import org.apache.spark.mllib.classification.NaiveBayesModel; import org.apache.spark.mllib.linalg.Vectors; import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.util.MLUtils; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExce
[jira] [Resolved] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5483. -- Resolution: Not a Problem These too are clearly in the 2.10 artifact. Download it and grep for them. https://repo1.maven.org/maven2/org/apache/spark/spark-mllib_2.10/1.2.0/spark-mllib_2.10-1.2.0.jar There must be some issue in how you are adding the classes to the classpath. You definitely can't mix Scala versions, and the classes are where they should be, so this should be resolved. I think further questions should go to the mailing list on this one, until it's clear there is a Spark issue. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.
[jira] [Commented] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298840#comment-14298840 ] Sean Owen commented on SPARK-5489: -- No, the 2.10 artifact plainly has these classes. Download it ( https://repo1.maven.org/maven2/org/apache/spark/spark-mllib_2.10/1.2.0/spark-mllib_2.10-1.2.0.jar ) and grep for them. > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298824#comment-14298824 ] DeepakVohra edited comment on SPARK-5489 at 1/30/15 4:40 PM: - The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.* packages, which it should according to http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. was (Author: dvohra): The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.* packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298824#comment-14298824 ] DeepakVohra edited comment on SPARK-5489 at 1/30/15 4:40 PM: - The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.* packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. was (Author: dvohra): The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298823#comment-14298823 ] DeepakVohra edited comment on SPARK-5483 at 1/30/15 4:39 PM: - The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.classification.NaiveBayes; import org.apache.spark.mllib.classification.NaiveBayesModel; import org.apache.spark.mllib.linalg.Vectors; import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.util.MLUtils; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. was (Author: dvohra): The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread
[jira] [Commented] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298824#comment-14298824 ] DeepakVohra commented on SPARK-5489: The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298823#comment-14298823 ] DeepakVohra commented on SPARK-5483: The issue is with the Maven dependency org.apache.spark spark-mllib_2.10 1.2.0 spark-mllib_2.10 does not include the org.apache.spark.mllib.clustering packages, which it should according to the Maven http://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.2.0 Generates error at import statements: import org.apache.spark.mllib.clustering.KMeans; import org.apache.spark.mllib.clustering.KMeansModel; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.linalg.Vectors; "The import org.apache.spark.mllib cannot be resolved". The 2.11 version spark-mllib_2.11 fixes the error but seems to be referring Scala 2.11. > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(De
[jira] [Resolved] (SPARK-5267) Add a streaming module to ingest Apache Camel Messages from a configured endpoints
[ https://issues.apache.org/jira/browse/SPARK-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Brewin resolved SPARK-5267. - Resolution: Done Code submitted to Spark Packages @ http://spark-packages.org/package/29, homepage https://github.com/synsys/spark > Add a streaming module to ingest Apache Camel Messages from a configured > endpoints > -- > > Key: SPARK-5267 > URL: https://issues.apache.org/jira/browse/SPARK-5267 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Steve Brewin > Labels: features > Original Estimate: 120h > Remaining Estimate: 120h > > The number of input stream protocols supported by Spark Streaming is quite > limited, which constrains the number of systems with which it can be > integrated. > This proposal solves the problem by adding an optional module that integrates > Apache Camel, which supports many additional input protocols. Our tried and > tested implementation of this proposal is "spark-streaming-camel". > An Apache Camel service is run on a separate Thread, consuming each > http://camel.apache.org/maven/current/camel-core/apidocs/org/apache/camel/Message.html > and storing it into Spark's memory. The provider of the Message is specified > by any consuming component URI documented at > http://camel.apache.org/components.html, making all of these protocols > available to Spark Streaming. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5400) Rename GaussianMixtureEM to GaussianMixture
[ https://issues.apache.org/jira/browse/SPARK-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298762#comment-14298762 ] Apache Spark commented on SPARK-5400: - User 'tgaloppo' has created a pull request for this issue: https://github.com/apache/spark/pull/4290 > Rename GaussianMixtureEM to GaussianMixture > --- > > Key: SPARK-5400 > URL: https://issues.apache.org/jira/browse/SPARK-5400 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Travis Galoppo >Priority: Minor > > GaussianMixtureEM is following the old naming convention of including the > optimization algorithm name in the class title. We should probably rename it > to GaussianMixture so that it can use other optimization algorithms in the > future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3454) Expose JSON representation of data shown in WebUI
[ https://issues.apache.org/jira/browse/SPARK-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298747#comment-14298747 ] Imran Rashid commented on SPARK-3454: - design doc attached, would love any feedback > Expose JSON representation of data shown in WebUI > - > > Key: SPARK-3454 > URL: https://issues.apache.org/jira/browse/SPARK-3454 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta > Attachments: sparkmonitoringjsondesign.pdf > > > If WebUI support to JSON format extracting, it's helpful for user who want to > analyse stage / task / executor information. > Fortunately, WebUI has renderJson method so we can implement the method in > each subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3454) Expose JSON representation of data shown in WebUI
[ https://issues.apache.org/jira/browse/SPARK-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-3454: Attachment: sparkmonitoringjsondesign.pdf > Expose JSON representation of data shown in WebUI > - > > Key: SPARK-3454 > URL: https://issues.apache.org/jira/browse/SPARK-3454 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta > Attachments: sparkmonitoringjsondesign.pdf > > > If WebUI support to JSON format extracting, it's helpful for user who want to > analyse stage / task / executor information. > Fortunately, WebUI has renderJson method so we can implement the method in > each subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
[ https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298648#comment-14298648 ] Apache Spark commented on SPARK-5498: - User 'jeanlyn' has created a pull request for this issue: https://github.com/apache/spark/pull/4289 > [SPARK-SQL]when the partition schema does not match table schema,it throws > java.lang.ClassCastException and so on > - > > Key: SPARK-5498 > URL: https://issues.apache.org/jira/browse/SPARK-5498 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: jeanlyn > > when the partition schema does not match table schema,it will thows exception > when the task is running.For example,we modify the type of column from int to > bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key > BIGINT* ,then we query the patition data which was stored before the > changing,we would get the exception: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 > (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to > org.apache.spark.sql.catalyst.expressions.MutableInt > at > org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) > at > org.ap
[jira] [Comment Edited] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298622#comment-14298622 ] Tien-Dung LE edited comment on SPARK-5499 at 1/30/15 1:47 PM: -- I tried with checkpoint() but had the same error. Here is the code {code} for (i <- 1 to 1000) { newPair = pair.map(_.swap).persist() pair = newPair println("" + i + ": count = " + pair.count()) if( i % 100 == 0) { pair.checkpoint() } } {code} was (Author: tien-dung.le): I tried with checkpoint() but same had the same error. Here is the code {code} for (i <- 1 to 1000) { newPair = pair.map(_.swap).persist() pair = newPair println("" + i + ": count = " + pair.count()) if( i % 100 == 0) { pair.checkpoint() } } {code} > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298622#comment-14298622 ] Tien-Dung LE commented on SPARK-5499: - I tried with checkpoint() but same had the same error. Here is the code {code} for (i <- 1 to 1000) { newPair = pair.map(_.swap).persist() pair = newPair println("" + i + ": count = " + pair.count()) if( i % 100 == 0) { pair.checkpoint() } } {code} > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5428) Declare the 'assembly' module at the bottom of the element in the parent POM
[ https://issues.apache.org/jira/browse/SPARK-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5428. -- Resolution: Won't Fix > Declare the 'assembly' module at the bottom of the element in the > parent POM > -- > > Key: SPARK-5428 > URL: https://issues.apache.org/jira/browse/SPARK-5428 > Project: Spark > Issue Type: Improvement > Components: Build, Deploy >Reporter: Christian Tzolov >Priority: Trivial > Labels: assembly, maven, pom > > For multiple-modules projects, Maven follows those execution order rules: > http://maven.apache.org/guides/mini/guide-multiple-modules.html > If no explicit dependencies are declared Maven will follow the order declared > in the element. > Because the 'assembly' module is responsible to aggregate build artifacts > from other modules/project it make sense to be run last in the execution > chain. > At the moment the 'assembly' stays before modules like 'examples' which makes > it impossible to generate DEP package that contains the examples jar. > IMHO the 'assembly' needs to be kept at the bottom of the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298612#comment-14298612 ] Sean Owen commented on SPARK-5499: -- Ah, that may be right. persist() should also break the lineage, but here you'd still be computing the whole lineage all at once from the start before anything can persist. Yes, how about checkpoint()? > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298608#comment-14298608 ] Tien-Dung LE commented on SPARK-5499: - Thanks Sean Owen for your comment. Calling persist() or cache() does not help. Did you mean to call checkpoint() ? > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien-Dung LE updated SPARK-5499: Description: I got an error "org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError" when executing an action with 1000 transformations. Here is a code snippet to re-produce the error: {code} import org.apache.spark.rdd.RDD var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) var newPair: RDD[(Long,Long)] = null for (i <- 1 to 1000) { newPair = pair.map(_.swap) pair = newPair } println("Count = " + pair.count()) {code} was: I got an error "org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError" when executing an action with 1000 transformations. Here is a code snippet to re-produce the error: import org.apache.spark.rdd.RDD var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) var newPair: RDD[(Long,Long)] = null for (i <- 1 to 1000) { newPair = pair.map(_.swap) pair = newPair } println("Count = " + pair.count()) > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298602#comment-14298602 ] Sean Owen commented on SPARK-5499: -- I think it's expected behavior, in the sense that you have created a lineage of 1000 RDDs. You would want to break the lineage at some point with a call to persist(). > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien-Dung LE updated SPARK-5499: Description: I got an error "org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError" when executing an action with 1000 transformations. Here is a code snippet to re-produce the error: import org.apache.spark.rdd.RDD var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) var newPair: RDD[(Long,Long)] = null for (i <- 1 to 1000) { newPair = pair.map(_.swap) pair = newPair } println("Count = " + pair.count()) was: I got an error "org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError" when executing an action with 1000 transformations cause. Here is a code snippet to re-produce the error: import org.apache.spark.rdd.RDD var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) var newPair: RDD[(Long,Long)] = null for (i <- 1 to 1000) { newPair = pair.map(_.swap) pair = newPair } println("Count = " + pair.count()) > iterative computing with 1000 iterations causes stage failure > - > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5499) iterative computing with 1000 iterations causes stage failure
Tien-Dung LE created SPARK-5499: --- Summary: iterative computing with 1000 iterations causes stage failure Key: SPARK-5499 URL: https://issues.apache.org/jira/browse/SPARK-5499 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Tien-Dung LE I got an error "org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError" when executing an action with 1000 transformations cause. Here is a code snippet to re-produce the error: import org.apache.spark.rdd.RDD var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) var newPair: RDD[(Long,Long)] = null for (i <- 1 to 1000) { newPair = pair.map(_.swap) pair = newPair } println("Count = " + pair.count()) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5489) KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create (I)Lscala/runtime/IntRef;
[ https://issues.apache.org/jira/browse/SPARK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5489. -- Resolution: Duplicate Although it seems to be clearly a problem with mixing artifacts for different versions of Scala, this is at least the same problem in SPARK-5483 > KMeans clustering java.lang.NoSuchMethodError: scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > - > > Key: SPARK-5489 > URL: https://issues.apache.org/jira/browse/SPARK-5489 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Spark 1.2 > Maven >Reporter: DeepakVohra > > The KMeans clustering generates following error, which also seems to be due > version mismatch between Scala used for compiling Spark and Scala in Spark > 1.2 Maven dependency. > Exception in thread "main" java.lang.NoSuchMethodError: > scala.runtime.IntRef.create > (I)Lscala/runtime/IntRef; > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:282) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:155) > at > org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:132) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:352) > at > org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:362) > at > org.apache.spark.mllib.clustering.KMeans.train(KMeans.scala) > at > clusterer.kmeans.KMeansClusterer.main(KMeansClusterer.java:35) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5483) java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
[ https://issues.apache.org/jira/browse/SPARK-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298600#comment-14298600 ] Sean Owen commented on SPARK-5483: -- The examples don't set a master on purpose, as I understand. Like other Spark apps, they're supposed to be run with spark-submit, which sets the master. Your declaration should set the Spark dependencies as "provided". However, more importantly, you're mixing MLlib for Scala 2.11 with Scala 2.10 and Core for 2.10. This has to be the problem, right? > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > --- > > Key: SPARK-5483 > URL: https://issues.apache.org/jira/browse/SPARK-5483 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: Maven > Spark 1.2 >Reporter: DeepakVohra > > Naive Bayes classifier generates following error. > ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:200) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:199) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:142) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:58) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/01/28 21:50:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.NoSuchMethodError: > scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; > at breeze.generic.MMRegistry2$class.register(Multimethod.scala:188) > at > breeze.linalg.VectorOps$$anon$1.breeze$linalg$operators$BinaryRegistry$$super$register(Vector.scala:303) > at > breeze.linalg.operators.BinaryRegistry$class.register(BinaryOp.scala:87) > at breeze.linalg.VectorOps$$anon$1.register(Vector.scala:303) > at > breeze.linalg.operators.DenseVectorOps$$anon$1.(DenseVectorOps.scala:38) > at > breeze.linalg.operators.DenseVectorOps$class.$init$(DenseVectorOps.scala:22) > at breeze.linalg.DenseVector$.(DenseVector.scala:225) > at breeze.linalg.DenseVector$.(DenseVector.scala) > at breeze.linalg.DenseVector.(DenseVector.scala:63) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:50) > at breeze.linalg.DenseVector$mcD$sp.(DenseVector.scala:55) > at org.apache.spark.mllib.linalg.DenseVector.toBreeze(Vectors.scala:329) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:112) > at > org.apache.spark.mllib.classification.NaiveBayes$$anonfun$3.apply(NaiveBayes.scala:110) > at > org.apache.s
[jira] [Commented] (SPARK-5185) pyspark --jars does not add classes to driver class path
[ https://issues.apache.org/jira/browse/SPARK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298576#comment-14298576 ] Cristian Opris commented on SPARK-5185: --- I have a similar possible issue. I need to modify the pyspark/iphyton notebook *driver* classpath at runtime. While it's possible to modify the application classpath with addJars() it's not possible to modify the driver's classpath from within pyspark/notebook. The main use case for this is to allow users to share an ipython server process and set the classpath dynamically from within running notebooks. A possible solution is to load the py4j.GatewayServer into a dynamic classloader who's classpath can be modified at runtime. Clojure shell uses a solution like this, please see https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/DynamicClassLoader.java > pyspark --jars does not add classes to driver class path > > > Key: SPARK-5185 > URL: https://issues.apache.org/jira/browse/SPARK-5185 > Project: Spark > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Uri Laserson > > I have some random class I want access to from an Spark shell, say > {{com.cloudera.science.throwaway.ThrowAway}}. You can find the specific > example I used here: > https://gist.github.com/laserson/e9e3bd265e1c7a896652 > I packaged it as {{throwaway.jar}}. > If I then run {{bin/spark-shell}} like so: > {code} > bin/spark-shell --master local[1] --jars throwaway.jar > {code} > I can execute > {code} > val a = new com.cloudera.science.throwaway.ThrowAway() > {code} > Successfully. > I now run PySpark like so: > {code} > PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars > throwaway.jar > {code} > which gives me an error when I try to instantiate the class through Py4J: > {code} > In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway() > --- > Py4JError Traceback (most recent call last) > in () > > 1 sc._jvm.com.cloudera.science.throwaway.ThrowAway() > /Users/laserson/repos/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py > in __getattr__(self, name) > 724 def __getattr__(self, name): > 725 if name == '__call__': > --> 726 raise Py4JError('Trying to call a package.') > 727 new_fqn = self._fqn + '.' + name > 728 command = REFLECTION_COMMAND_NAME +\ > Py4JError: Trying to call a package. > {code} > However, if I explicitly add the {{--driver-class-path}} to add the same jar > {code} > PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars > throwaway.jar --driver-class-path throwaway.jar > {code} > it works > {code} > In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway() > Out[1]: JavaObject id=o18 > {code} > However, the docs state that {{--jars}} should also set the driver class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
jeanlyn created SPARK-5498: -- Summary: [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on Key: SPARK-5498 URL: https://issues.apache.org/jira/browse/SPARK-5498 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: jeanlyn when the partition schema does not match table schema,it will thows exception when the task is running.For example,we modify the type of column from int to bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key BIGINT* ,then we query the patition data which was stored before the changing,we would get the exception: {noformat} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt at org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) at a
[jira] [Commented] (SPARK-5495) Offer user the ability to kill application in master web UI for standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298415#comment-14298415 ] Apache Spark commented on SPARK-5495: - User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/4288 > Offer user the ability to kill application in master web UI for standalone > mode > --- > > Key: SPARK-5495 > URL: https://issues.apache.org/jira/browse/SPARK-5495 > Project: Spark > Issue Type: New Feature > Components: Web UI >Reporter: Saisai Shao > > For cluster admins or users who manage the whole cluster need to have the > ability to kill the dangling or long-running applications through simple > ways. > For examples, if user started with a spark-shell for a long time but actually > is pending without any job running. In this scenario, it is better for the > admins to kill that apps to free the resources. > Currently Spark user can kill the stage in driver UI, but not application. So > here I'd propose to add a function to kill the application in master web UI > for standalone mode. > The snapshot of function shows as below: > !https://dl.dropboxusercontent.com/u/19230832/master_ui.png! > Add a kill action for each active application, kill action here is to simply > stop the specific application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5428) Declare the 'assembly' module at the bottom of the element in the parent POM
[ https://issues.apache.org/jira/browse/SPARK-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298412#comment-14298412 ] Christian Tzolov commented on SPARK-5428: - To verify this change i've tried to bundle also the spark-yarn-shuffle inside the DEB package. It didn't work! Although put at the end of the modules' list the assembly module is executed before the Spark Yarn Shuffle project and therefore fails to bundle the yarn-shuffle jar. The only reliable and clean solution is to declare the required dependency in the assembly's pom. Having the assembly module at the end of the list will not guarantee that it is executed last. Unless there are some other suggestions i think we should close this issue > Declare the 'assembly' module at the bottom of the element in the > parent POM > -- > > Key: SPARK-5428 > URL: https://issues.apache.org/jira/browse/SPARK-5428 > Project: Spark > Issue Type: Improvement > Components: Build, Deploy >Reporter: Christian Tzolov >Priority: Trivial > Labels: assembly, maven, pom > > For multiple-modules projects, Maven follows those execution order rules: > http://maven.apache.org/guides/mini/guide-multiple-modules.html > If no explicit dependencies are declared Maven will follow the order declared > in the element. > Because the 'assembly' module is responsible to aggregate build artifacts > from other modules/project it make sense to be run last in the execution > chain. > At the moment the 'assembly' stays before modules like 'examples' which makes > it impossible to generate DEP package that contains the examples jar. > IMHO the 'assembly' needs to be kept at the bottom of the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5457) Add missing DSL for ApproxCountDistinct.
[ https://issues.apache.org/jira/browse/SPARK-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-5457. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Takuya Ueshin > Add missing DSL for ApproxCountDistinct. > > > Key: SPARK-5457 > URL: https://issues.apache.org/jira/browse/SPARK-5457 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5479) PySpark on yarn mode need to support non-local python files
[ https://issues.apache.org/jira/browse/SPARK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298400#comment-14298400 ] Vladimir Grigor commented on SPARK-5479: https://github.com/apache/spark/pull/3976 potentially closes this issue > PySpark on yarn mode need to support non-local python files > --- > > Key: SPARK-5479 > URL: https://issues.apache.org/jira/browse/SPARK-5479 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Lianhui Wang > > In SPARK-5162 [~vgrigor] reports this: > Now following code cannot work: > aws emr add-steps --cluster-id "j-XYWIXMD234" \ > --steps > Name=SparkPi,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://mybucketat.amazonaws.com/tasks/main.py,main.py,param1],ActionOnFailure=CONTINUE > so we need to support non-local python files on yarn client and cluster mode. > before submitting application to Yarn, we need to download non-local files to > local or hdfs path. > or spark.yarn.dist.files need to support other non-local files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5497) start-all script not working properly on Standalone HA cluster (with Zookeeper)
Roque Vassal'lo created SPARK-5497: -- Summary: start-all script not working properly on Standalone HA cluster (with Zookeeper) Key: SPARK-5497 URL: https://issues.apache.org/jira/browse/SPARK-5497 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.2.0 Reporter: Roque Vassal'lo I have configured a Standalone HA cluster with Zookeeper with: - 3 Zookeeper nodes - 2 Spark master nodes (1 alive and 1 in standby mode) - 2 Spark slave nodes While executing start-all.sh on each master, it will start the master and start a worker on each configured slave. If alive master goes down, those worker are supposed to reconfigure themselves to use the new active master automatically. I have noticed that the spark-env property SPARK_MASTER_IP is used in both called scripts, start-master and start-slaves. The problem is that if you configure SPARK_MASTER_IP with the active master ip, when it goes down, workers don't reassign themselves to the new active master. And if you configure SPARK_MASTER_IP with the masters cluster route (well, an approximation, because you have to write master's port in all-but-last ips, that is "master1:7077,master2", in order to make it work), slaves start properly but master doesn't. So, the start-master script needs SPARK_MASTER_IP property to contain its ip in order to start master properly; and start-slaves script needs SPARK_MASTER_IP property to contain the masters cluster ips (that is "master1:7077,master2") To test that idea, I have modified start-slaves and spark-env scripts on master nodes. On spark-env.sh, I have set SPARK_MASTER_IP property to master's own ip on each master node (that is, on master node 1, SPARK_MASTER_IP=master1; and on master node 2, SPARK_MASTER_IP=master2) On spark-env.sh, I have added a new property SPARK_MASTER_CLUSTER_IP with the pseudo-masters-cluster-ips (SPARK_MASTER_CLUSTER_IP=master1:7077,master2) on both masters. On start-slaves.sh, I have modified all references to SPARK_MASTER_IP to SPARK_MASTER_CLUSTER_IP. I have tried that and it works great! When active master node goes down, all workers reassign themselves to the new active node. Maybe there is a better fix for this issue. Hope this quick-fix idea can help. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5378) There is no return results with 'select' operating on View
[ https://issues.apache.org/jira/browse/SPARK-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang resolved SPARK-5378. Resolution: Cannot Reproduce As we discussed offline, this should came up with a bug elsewhere, and now it works fine. > There is no return results with 'select' operating on View > -- > > Key: SPARK-5378 > URL: https://issues.apache.org/jira/browse/SPARK-5378 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Yi Zhou > > There is 'q04_spark_RUN_QUERY_0_temp_cart_abandon' view with some of dataset > in system. There is no any return when execute below SQL in SparkSQL. > SELECT * FROM q04_spark_RUN_QUERY_0_temp_cart_abandon; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5496) Allow both "classification" and "Classification" in Algo for trees
[ https://issues.apache.org/jira/browse/SPARK-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298380#comment-14298380 ] Apache Spark commented on SPARK-5496: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/4287 > Allow both "classification" and "Classification" in Algo for trees > -- > > Key: SPARK-5496 > URL: https://issues.apache.org/jira/browse/SPARK-5496 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > We use "classification" in tree but "Classification" in boosting. We switched > to "classification" in both cases, but still need to accept "Classification" > to be backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5496) Allow both "classification" and "Classification" in Algo for trees
Xiangrui Meng created SPARK-5496: Summary: Allow both "classification" and "Classification" in Algo for trees Key: SPARK-5496 URL: https://issues.apache.org/jira/browse/SPARK-5496 Project: Spark Issue Type: Bug Components: MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng We use "classification" in tree but "Classification" in boosting. We switched to "classification" in both cases, but still need to accept "Classification" to be backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5094) Python API for gradient-boosted trees
[ https://issues.apache.org/jira/browse/SPARK-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5094. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 3951 [https://github.com/apache/spark/pull/3951] > Python API for gradient-boosted trees > - > > Key: SPARK-5094 > URL: https://issues.apache.org/jira/browse/SPARK-5094 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Xiangrui Meng >Assignee: Kazuki Taniguchi >Priority: Critical > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5094) Python API for gradient-boosted trees
[ https://issues.apache.org/jira/browse/SPARK-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5094: - Fix Version/s: (was: 1.4.0) 1.3.0 > Python API for gradient-boosted trees > - > > Key: SPARK-5094 > URL: https://issues.apache.org/jira/browse/SPARK-5094 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Xiangrui Meng >Assignee: Kazuki Taniguchi >Priority: Critical > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
[ https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298351#comment-14298351 ] Guoqiang Li edited comment on SPARK-1405 at 1/30/15 8:34 AM: - Here is a sampling faster branch(work in progress): https://github.com/witgo/spark/tree/lda_MH [It's|https://github.com/witgo/spark/tree/lda_MH] computational complexity is O(log(K)) K is the number of topic [#2388|https://github.com/apache/spark/pull/2388]'s computational complexity is O(log(K)+ Nkd) , K is the number of topic and Ndk is the number of tokens in document d that are assigned to topic k was (Author: gq): Here is a sampling faster branch(work in progress): https://github.com/witgo/spark/tree/lda_MH [It's|https://github.com/witgo/spark/tree/lda_MH] computational complexity is O(log(K)) K is the number of topic [#2388|https://github.com/apache/spark/pull/2388]'s computational complexity is O(log(K)) + Nkd, K is the number of topic and Ndk is the number of tokens in document d that are assigned to topic k > parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib > - > > Key: SPARK-1405 > URL: https://issues.apache.org/jira/browse/SPARK-1405 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xusen Yin >Assignee: Joseph K. Bradley >Priority: Critical > Labels: features > Attachments: performance_comparison.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts > topics from text corpus. Different with current machine learning algorithms > in MLlib, instead of using optimization algorithms such as gradient desent, > LDA uses expectation algorithms such as Gibbs sampling. > In this PR, I prepare a LDA implementation based on Gibbs sampling, with a > wholeTextFiles API (solved yet), a word segmentation (import from Lucene), > and a Gibbs sampling core. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
[ https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298351#comment-14298351 ] Guoqiang Li edited comment on SPARK-1405 at 1/30/15 8:33 AM: - Here is a sampling faster branch(work in progress): https://github.com/witgo/spark/tree/lda_MH [It's|https://github.com/witgo/spark/tree/lda_MH] computational complexity is O(log(K)) K is the number of topic [#2388|https://github.com/apache/spark/pull/2388]'s computational complexity is O(log(K)) + Nkd, K is the number of topic and Ndk is the number of tokens in document d that are assigned to topic k was (Author: gq): Here is a sampling faster branch(work in progress): https://github.com/witgo/spark/tree/lda_MH [It's|https://github.com/witgo/spark/tree/lda_MH] computational complexity is O(log(K)) K is the number of topic [#2388|https://github.com/apache/spark/pull/2388]'s computational complexity is O(log(K)) + Nkd, Nkd is the number of topic and Ndk is the number of tokens in document d that are assigned to topic k > parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib > - > > Key: SPARK-1405 > URL: https://issues.apache.org/jira/browse/SPARK-1405 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xusen Yin >Assignee: Joseph K. Bradley >Priority: Critical > Labels: features > Attachments: performance_comparison.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts > topics from text corpus. Different with current machine learning algorithms > in MLlib, instead of using optimization algorithms such as gradient desent, > LDA uses expectation algorithms such as Gibbs sampling. > In this PR, I prepare a LDA implementation based on Gibbs sampling, with a > wholeTextFiles API (solved yet), a word segmentation (import from Lucene), > and a Gibbs sampling core. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
[ https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298351#comment-14298351 ] Guoqiang Li commented on SPARK-1405: Here is a sampling faster branch(work in progress): https://github.com/witgo/spark/tree/lda_MH [It's|https://github.com/witgo/spark/tree/lda_MH] computational complexity is O(log(K)) K is the number of topic [#2388|https://github.com/apache/spark/pull/2388]'s computational complexity is O(log(K)) + Nkd, Nkd is the number of topic and Ndk is the number of tokens in document d that are assigned to topic k > parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib > - > > Key: SPARK-1405 > URL: https://issues.apache.org/jira/browse/SPARK-1405 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Xusen Yin >Assignee: Joseph K. Bradley >Priority: Critical > Labels: features > Attachments: performance_comparison.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts > topics from text corpus. Different with current machine learning algorithms > in MLlib, instead of using optimization algorithms such as gradient desent, > LDA uses expectation algorithms such as Gibbs sampling. > In this PR, I prepare a LDA implementation based on Gibbs sampling, with a > wholeTextFiles API (solved yet), a word segmentation (import from Lucene), > and a Gibbs sampling core. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5452) We are migrating Tera Data SQL to Spark SQL. Query is taking long time. Please have a look on this issue
[ https://issues.apache.org/jira/browse/SPARK-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-5452. Resolution: Not a Problem > We are migrating Tera Data SQL to Spark SQL. Query is taking long time. > Please have a look on this issue > > > Key: SPARK-5452 > URL: https://issues.apache.org/jira/browse/SPARK-5452 > Project: Spark > Issue Type: Test > Components: Spark Shell >Affects Versions: 1.2.0 >Reporter: irfan > Labels: SparkSql > > Hi Team, > we are migrating TeraData SQL to Spark SQL because of complexity we have > spilted into below 4 sub-quries > and we are running through hive context > > val HIVETMP1 = hc.sql("SELECT PARTY_ACCOUNT_ID AS > PARTY_ACCOUNT_ID,LMS_ACCOUNT_ID AS LMS_ACCOUNT_ID FROM VW_PARTY_ACCOUNT WHERE > PARTY_ACCOUNT_TYPE_CODE IN('04') AND LMS_ACCOUNT_ID IS NOT NULL") > HIVETMP1.registerTempTable("VW_HIVETMP1") > val HIVETMP2 = hc.sql("SELECT PACCNT.LMS_ACCOUNT_ID AS LMS_ACCOUNT_ID, > 'NULL' AS RANDOM_PARTY_ACCOUNT_ID ,'NULL' AS MOST_RECENT_SPEND_LA > ,STXN.PARTY_ACCOUNT_ID AS MAX_SPEND_12WKS_LA ,STXN.MAX_SPEND_12WKS_LADATE > AS MAX_SPEND_12WKS_LADATE FROM VW_HIVETMP1 AS PACCNT INNER JOIN (SELECT > STXTMP.PARTY_ACCOUNT_ID AS PARTY_ACCOUNT_ID, SUM(CASE WHEN > (CAST(STXTMP.TRANSACTION_DATE AS DATE ) > > DATE_SUB(CAST(CONCAT(SUBSTRING(SYSTMP.OPTION_VAL,1,4),'-',SUBSTRING(SYSTMP.OPTION_VAL,5,2),'-',SUBSTRING(SYSTMP.OPTION_VAL,7,2)) > AS DATE),84)) THEN STXTMP.TRANSACTION_VALUE ELSE 0.00 END) AS > MAX_SPEND_12WKS_LADATE FROM VW_SHOPPING_TRANSACTION_TABLE AS STXTMP INNER > JOIN SYSTEM_OPTION_TABLE AS SYSTMP ON STXTMP.FLAG == SYSTMP.FLAG AND > SYSTMP.OPTION_NAME = 'RID' AND STXTMP.PARTY_ACCOUNT_TYPE_CODE IN('04') GROUP > BY STXTMP.PARTY_ACCOUNT_ID) AS STXN ON PACCNT.PARTY_ACCOUNT_ID = > STXN.PARTY_ACCOUNT_ID WHERE STXN.MAX_SPEND_12WKS_LADATE IS NOT NULL") > HIVETMP2.registerTempTable("VW_HIVETMP2") > val HIVETMP3 = hc.sql("SELECT LMS_ACCOUNT_ID,MAX(MAX_SPEND_12WKS_LA) AS > MAX_SPEND_12WKS_LA, 1 AS RANK FROM VW_HIVETMP2 GROUP BY LMS_ACCOUNT_ID") > HIVETMP3.registerTempTable("VW_HIVETMP3") > val HIVETMP4 = hc.sql(" SELECT PACCNT.LMS_ACCOUNT_ID,'NULL' AS > RANDOM_PARTY_ACCOUNT_ID ,'NULL' AS > MOST_RECENT_SPEND_LA,STXN.MAX_SPEND_12WKS_LA AS MAX_SPEND_12WKS_LA,1 AS RANK1 > FROM VW_HIVETMP2 AS PACCNT INNER JOIN VW_HIVETMP3 AS STXN ON > PACCNT.LMS_ACCOUNT_ID = STXN.LMS_ACCOUNT_ID AND PACCNT.MAX_SPEND_12WKS_LA = > STXN.MAX_SPEND_12WKS_LA") > HIVETMP4.registerTempTable("WT03_ACCOUNT_BHVR3") > HIVETMP4.saveAsTextFile("hdfs:/file/") > == > This query has two Group By clauses which are running on huge files(19.5GB). > And the query took 40min to get the final result. Is there any changes > required in run time environment or Configuration Setting in Spark which can > improve the query performance. > below are our Environment and configuration details: > Environment details: > No of nodes:4 > capacity on each node:62 GB RAM on each node. > Storage capacity :9TB on each node > total cores :48 > Spark Configuration: > > .set("spark.default.parallelism","64") > .set("spark.driver.maxResultSize","2G") > .set("spark.driver.memory","10g") > .set("spark.rdd.compress","true") > .set("spark.shuffle.spill.compress","true") > .set("spark.shuffle.compress","true") > .set("spark.shuffle.consolidateFiles","true/false") > .set("spark.shuffle.spill","true/false") > > Data file size : > SHOPPING_TRANSACTION 19.5GB > PARTY_ACCOUNT1.4GB > SYSTEM_OPTIONS 11.6K > please help us to resolve above issue. > Thanks, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5495) Offer user the ability to kill application in master web UI for standalone mode
Saisai Shao created SPARK-5495: -- Summary: Offer user the ability to kill application in master web UI for standalone mode Key: SPARK-5495 URL: https://issues.apache.org/jira/browse/SPARK-5495 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Saisai Shao For cluster admins or users who manage the whole cluster need to have the ability to kill the dangling or long-running applications through simple ways. For examples, if user started with a spark-shell for a long time but actually is pending without any job running. In this scenario, it is better for the admins to kill that apps to free the resources. Currently Spark user can kill the stage in driver UI, but not application. So here I'd propose to add a function to kill the application in master web UI for standalone mode. The snapshot of function shows as below: !https://dl.dropboxusercontent.com/u/19230832/master_ui.png! Add a kill action for each active application, kill here to simply stop the specific application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5495) Offer user the ability to kill application in master web UI for standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-5495: --- Description: For cluster admins or users who manage the whole cluster need to have the ability to kill the dangling or long-running applications through simple ways. For examples, if user started with a spark-shell for a long time but actually is pending without any job running. In this scenario, it is better for the admins to kill that apps to free the resources. Currently Spark user can kill the stage in driver UI, but not application. So here I'd propose to add a function to kill the application in master web UI for standalone mode. The snapshot of function shows as below: !https://dl.dropboxusercontent.com/u/19230832/master_ui.png! Add a kill action for each active application, kill action here is to simply stop the specific application. was: For cluster admins or users who manage the whole cluster need to have the ability to kill the dangling or long-running applications through simple ways. For examples, if user started with a spark-shell for a long time but actually is pending without any job running. In this scenario, it is better for the admins to kill that apps to free the resources. Currently Spark user can kill the stage in driver UI, but not application. So here I'd propose to add a function to kill the application in master web UI for standalone mode. The snapshot of function shows as below: !https://dl.dropboxusercontent.com/u/19230832/master_ui.png! Add a kill action for each active application, kill here to simply stop the specific application. > Offer user the ability to kill application in master web UI for standalone > mode > --- > > Key: SPARK-5495 > URL: https://issues.apache.org/jira/browse/SPARK-5495 > Project: Spark > Issue Type: New Feature > Components: Web UI >Reporter: Saisai Shao > > For cluster admins or users who manage the whole cluster need to have the > ability to kill the dangling or long-running applications through simple > ways. > For examples, if user started with a spark-shell for a long time but actually > is pending without any job running. In this scenario, it is better for the > admins to kill that apps to free the resources. > Currently Spark user can kill the stage in driver UI, but not application. So > here I'd propose to add a function to kill the application in master web UI > for standalone mode. > The snapshot of function shows as below: > !https://dl.dropboxusercontent.com/u/19230832/master_ui.png! > Add a kill action for each active application, kill action here is to simply > stop the specific application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org