Re: Welcoming two new committers

2014-08-09 Thread Andrew Or
Thanks everyone. I look forward to continuing to work with all of you!


2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com:

 Congratulations Andrew and Joey.

 Prashant Sharma




 On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote:

 Congrats, Joey  Andrew!!

 -Xiangrui

 On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com
 wrote:
  +1 Joey  Andrew :)
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com [ah-'DAY-tao]
  linkedin.com/in/ctnguyen
 
 
 
  On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez 
 jegon...@eecs.berkeley.edu
  wrote:
 
  Hi Everyone,
 
  Thank you for inviting me to be a committer.  I look forward to working
  with everyone to ensure the continued success of the Spark project.
 
  Thanks!
  Joey
 
 
 
 
  On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com
  wrote:
 
   Hi everyone,
  
   The PMC recently voted to add two new committers and PMC members:
 Joey
   Gonzalez and Andrew Or. Both have been huge contributors in the past
 year
   -- Joey on much of GraphX as well as quite a bit of the initial work
 in
   MLlib, and Andrew on Spark Core. Join me in welcoming them as
 committers!
  
   Matei
  
  
  
  
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-09 Thread Debasish Das
Hi Xiangrui,

Based on your suggestion I moved core and mllib both to 1.1.0-SNAPSHOT...I
am still getting class cast exception:

Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 249 in stage 52.0 failed 4 times, most recent
failure: Lost task 249.3 in stage 52.0 (TID 10002,
tblpmidn06adv-hdp.tdc.vzwcorp.com): java.lang.ClassCastException:
scala.Tuple1 cannot be cast to scala.Product2

I am running ALS.scala merged with my changes. I will try the mllib jar
without my changes next...

Can this be due to the fact that my jars are compiled with Java 1.7_55 but
the cluster JRE is at 1.7_45.

Thanks.

Deb




On Wed, Aug 6, 2014 at 12:01 PM, Debasish Das debasish.da...@gmail.com
wrote:

 I did not play with Hadoop settings...everything is compiled with
 2.3.0CDH5.0.2 for me...

 I did try to bump the version number of HBase from 0.94 to 0.96 or 0.98
 but there was no profile for CDH in the pom...but that's unrelated to this !


 On Wed, Aug 6, 2014 at 9:45 AM, DB Tsai dbt...@dbtsai.com wrote:

 One related question, is mllib jar independent from hadoop version
 (doesnt use hadoop api directly)? Can I use mllib jar compile for one
 version of hadoop and use it in another version of hadoop?

 Sent from my Google Nexus 5
 On Aug 6, 2014 8:29 AM, Debasish Das debasish.da...@gmail.com wrote:

 Hi Xiangrui,

 Maintaining another file will be a pain later so I deployed spark 1.0.1
 without mllib and then my application jar bundles mllib 1.1.0-SNAPSHOT
 along with the code changes for quadratic optimization...

 Later the plan is to patch the snapshot mllib with the deployed stable
 mllib...

 There are 5 variants that I am experimenting with around 400M ratings
 (daily data, monthly data I will update in few days)...

 1. LS
 2. NNLS
 3. Quadratic with bounds
 4. Quadratic with L1
 5. Quadratic with equality and positivity

 Now the ALS 1.1.0 snapshot runs fine but after completion on this step
 ALS.scala:311

 // Materialize usersOut and productsOut.
 usersOut.count()

 I am getting from one of the executors: java.lang.ClassCastException:
 scala.Tuple1 cannot be cast to scala.Product2

 I am debugging it further but I was wondering if this is due to RDD
 compatibility within 1.0.1 and 1.1.0-SNAPSHOT ?

 I have built the jars on my Mac which has Java 1.7.0_55 but the deployed
 cluster has Java 1.7.0_45.

 The flow runs fine on my localhost spark 1.0.1 with 1 worker. Can that
 Java
 version mismatch cause this ?

 Stack traces are below

 Thanks.
 Deb


 Executor stacktrace:


 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)



 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)


 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:126)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:123)



 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)



 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)


 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:123)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)

 

RE: Welcoming two new committers

2014-08-09 Thread Guru Medasani
Congrats Joey and Andrew!

Sent from my Windows Phone

From: Andrew Ormailto:and...@databricks.com
Sent: ‎8/‎9/‎2014 2:43 AM
To: Prashant Sharmamailto:scrapco...@gmail.com
Cc: Xiangrui Mengmailto:men...@gmail.com; Christopher 
Nguyenmailto:c...@adatao.com; Joseph 
Gonzalezmailto:jegon...@eecs.berkeley.edu; Matei 
Zahariamailto:ma...@databricks.com; 
d...@spark.incubator.apache.orgmailto:d...@spark.incubator.apache.org
Subject: Re: Welcoming two new committers

Thanks everyone. I look forward to continuing to work with all of you!


2014-08-08 3:23 GMT-07:00 Prashant Sharma scrapco...@gmail.com:

 Congratulations Andrew and Joey.

 Prashant Sharma




 On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote:

 Congrats, Joey  Andrew!!

 -Xiangrui

 On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com
 wrote:
  +1 Joey  Andrew :)
 
  --
  Christopher T. Nguyen
  Co-founder  CEO, Adatao http://adatao.com [ah-'DAY-tao]
  linkedin.com/in/ctnguyen
 
 
 
  On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez 
 jegon...@eecs.berkeley.edu
  wrote:
 
  Hi Everyone,
 
  Thank you for inviting me to be a committer.  I look forward to working
  with everyone to ensure the continued success of the Spark project.
 
  Thanks!
  Joey
 
 
 
 
  On Thu, Aug 7, 2014 at 9:57 PM, Matei Zaharia ma...@databricks.com
  wrote:
 
   Hi everyone,
  
   The PMC recently voted to add two new committers and PMC members:
 Joey
   Gonzalez and Andrew Or. Both have been huge contributors in the past
 year
   -- Joey on much of GraphX as well as quite a bit of the initial work
 in
   MLlib, and Andrew on Spark Core. Join me in welcoming them as
 committers!
  
   Matei
  
  
  
  
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Unit tests in 5 minutes

2014-08-09 Thread Mridul Muralidharan
Issue with supporting this imo is the fact that scala-test uses the
same vm for all the tests (surefire plugin supports fork, but
scala-test ignores it iirc).
So different tests would initialize different spark context, and can
potentially step on each others toes.

Regards,
Mridul


On Fri, Aug 8, 2014 at 9:31 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 Howdy,

 Do we think it's both feasible and worthwhile to invest in getting our unit
 tests to finish in under 5 minutes (or something similarly brief) when run
 by Jenkins?

 Unit tests currently seem to take anywhere from 30 min to 2 hours. As
 people add more tests, I imagine this time will only grow. I think it would
 be better for both contributors and reviewers if they didn't have to wait
 so long for test results; PR reviews would be shorter, if nothing else.

 I don't know how how this is normally done, but maybe it wouldn't be too
 much work to get a test cycle to feel lighter.

 Most unit tests are independent and can be run concurrently, right? Would
 it make sense to build a given patch on many servers at once and send
 disjoint sets of unit tests to each?

 I'd be interested in working on something like that if possible (and
 sensible).

 Nick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-09 Thread Matt Forbes
I was having this same problem early this week and had to include my
changes in the assembly.


On Sat, Aug 9, 2014 at 9:59 AM, Debasish Das debasish.da...@gmail.com
wrote:

 I validated that I can reproduce this problem with master as well (without
 adding any of my mllib changes)...

 I separated mllib jar from assembly, deploy the assembly and then I supply
 the mllib jar as --jars option to spark-submit...

 I get this error:

 14/08/09 12:49:32 INFO DAGScheduler: Failed to run count at ALS.scala:299

 Exception in thread main org.apache.spark.SparkException: Job aborted due
 to stage failure: Task 238 in stage 40.0 failed 4 times, most recent
 failure: Lost task 238.3 in stage 40.0 (TID 10002,
 tblpmidn05adv-hdp.tdc.vzwcorp.com): java.lang.ClassCastException:
 scala.Tuple1 cannot be cast to scala.Product2



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5$$anonfun$apply$4.apply(CoGroupedRDD.scala:159)

 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)



 org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)



 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)



 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)



 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:129)



 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:126)



 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)



 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:126)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)



 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)

 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

 org.apache.spark.scheduler.Task.run(Task.scala:54)


 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)



 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)



 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 java.lang.Thread.run(Thread.java:744)

 Driver stacktrace:

 at org.apache.spark.scheduler.DAGScheduler.org

 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153)

 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142)

 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141)

 at

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1141)

 at