Re: Welcoming two new committers
Thanks everyone. I look forward to continuing to work with all of you! 2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com: Congratulations Andrew and Joey. Prashant Sharma On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote: Congrats, Joey Andrew!! -Xiangrui On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote: +1 Joey Andrew :) -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez jegon...@eecs.berkeley.edu wrote: Hi Everyone, Thank you for inviting me to be a committer. I look forward to working with everyone to ensure the continued success of the Spark project. Thanks! Joey On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com wrote: Hi everyone, The PMC recently voted to add two new committers and PMC members: Joey Gonzalez and Andrew Or. Both have been huge contributors in the past year -- Joey on much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew on Spark Core. Join me in welcoming them as committers! Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1
Hi Xiangrui, Based on your suggestion I moved core and mllib both to 1.1.0-SNAPSHOT...I am still getting class cast exception: Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 249 in stage 52.0 failed 4 times, most recent failure: Lost task 249.3 in stage 52.0 (TID 10002, tblpmidn06adv-hdp.tdc.vzwcorp.com): java.lang.ClassCastException: scala.Tuple1 cannot be cast to scala.Product2 I am running ALS.scala merged with my changes. I will try the mllib jar without my changes next... Can this be due to the fact that my jars are compiled with Java 1.7_55 but the cluster JRE is at 1.7_45. Thanks. Deb On Wed, Aug 6, 2014 at 12:01 PM, Debasish Das debasish.da...@gmail.com wrote: I did not play with Hadoop settings...everything is compiled with 2.3.0CDH5.0.2 for me... I did try to bump the version number of HBase from 0.94 to 0.96 or 0.98 but there was no profile for CDH in the pom...but that's unrelated to this ! On Wed, Aug 6, 2014 at 9:45 AM, DB Tsai dbt...@dbtsai.com wrote: One related question, is mllib jar independent from hadoop version (doesnt use hadoop api directly)? Can I use mllib jar compile for one version of hadoop and use it in another version of hadoop? Sent from my Google Nexus 5 On Aug 6, 2014 8:29 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, Maintaining another file will be a pain later so I deployed spark 1.0.1 without mllib and then my application jar bundles mllib 1.1.0-SNAPSHOT along with the code changes for quadratic optimization... Later the plan is to patch the snapshot mllib with the deployed stable mllib... There are 5 variants that I am experimenting with around 400M ratings (daily data, monthly data I will update in few days)... 1. LS 2. NNLS 3. Quadratic with bounds 4. Quadratic with L1 5. Quadratic with equality and positivity Now the ALS 1.1.0 snapshot runs fine but after completion on this step ALS.scala:311 // Materialize usersOut and productsOut. usersOut.count() I am getting from one of the executors: java.lang.ClassCastException: scala.Tuple1 cannot be cast to scala.Product2 I am debugging it further but I was wondering if this is due to RDD compatibility within 1.0.1 and 1.1.0-SNAPSHOT ? I have built the jars on my Mac which has Java 1.7.0_55 but the deployed cluster has Java 1.7.0_45. The flow runs fine on my localhost spark 1.0.1 with 1 worker. Can that Java version mismatch cause this ? Stack traces are below Thanks. Deb Executor stacktrace: org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:126) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:123) scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:123) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
RE: Welcoming two new committers
Congrats Joey and Andrew! Sent from my Windows Phone From: Andrew Ormailto:and...@databricks.com Sent: 8/9/2014 2:43 AM To: Prashant Sharmamailto:scrapco...@gmail.com Cc: Xiangrui Mengmailto:men...@gmail.com; Christopher Nguyenmailto:c...@adatao.com; Joseph Gonzalezmailto:jegon...@eecs.berkeley.edu; Matei Zahariamailto:ma...@databricks.com; d...@spark.incubator.apache.orgmailto:d...@spark.incubator.apache.org Subject: Re: Welcoming two new committers Thanks everyone. I look forward to continuing to work with all of you! 2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com: Congratulations Andrew and Joey. Prashant Sharma On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote: Congrats, Joey Andrew!! -Xiangrui On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote: +1 Joey Andrew :) -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez jegon...@eecs.berkeley.edu wrote: Hi Everyone, Thank you for inviting me to be a committer. I look forward to working with everyone to ensure the continued success of the Spark project. Thanks! Joey On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com wrote: Hi everyone, The PMC recently voted to add two new committers and PMC members: Joey Gonzalez and Andrew Or. Both have been huge contributors in the past year -- Joey on much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew on Spark Core. Join me in welcoming them as committers! Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Unit tests in 5 minutes
Issue with supporting this imo is the fact that scala-test uses the same vm for all the tests (surefire plugin supports fork, but scala-test ignores it iirc). So different tests would initialize different spark context, and can potentially step on each others toes. Regards, Mridul On Fri, Aug 8, 2014 at 9:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy, Do we think it's both feasible and worthwhile to invest in getting our unit tests to finish in under 5 minutes (or something similarly brief) when run by Jenkins? Unit tests currently seem to take anywhere from 30 min to 2 hours. As people add more tests, I imagine this time will only grow. I think it would be better for both contributors and reviewers if they didn't have to wait so long for test results; PR reviews would be shorter, if nothing else. I don't know how how this is normally done, but maybe it wouldn't be too much work to get a test cycle to feel lighter. Most unit tests are independent and can be run concurrently, right? Would it make sense to build a given patch on many servers at once and send disjoint sets of unit tests to each? I'd be interested in working on something like that if possible (and sensible). Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1
I was having this same problem early this week and had to include my changes in the assembly. On Sat, Aug 9, 2014 at 9:59 AM, Debasish Das debasish.da...@gmail.com wrote: I validated that I can reproduce this problem with master as well (without adding any of my mllib changes)... I separated mllib jar from assembly, deploy the assembly and then I supply the mllib jar as --jars option to spark-submit... I get this error: 14/08/09 12:49:32 INFO DAGScheduler: Failed to run count at ALS.scala:299 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 238 in stage 40.0 failed 4 times, most recent failure: Lost task 238.3 in stage 40.0 (TID 10002, tblpmidn05adv-hdp.tdc.vzwcorp.com): java.lang.ClassCastException: scala.Tuple1 cannot be cast to scala.Product2 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5$$anonfun$apply$4.apply(CoGroupedRDD.scala:159) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:129) org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:126) scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:126) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1141) at