Re: ISpark class not found

2014-11-12 Thread Laird, Benjamin
referer=http://localhost:/notebooks/Scala/Untitled0.ipynb How did you start the notebook? Thanks Regards, Meethu M On Wednesday, 12 November 2014 6:50 AM, Laird, Benjamin benjamin.la...@capitalone.commailto:benjamin.la...@capitalone.com wrote: I've been experimenting with the ISpark

ISpark class not found

2014-11-11 Thread Laird, Benjamin
I've been experimenting with the ISpark extension to IScala (https://github.com/tribbloid/ISpark) Objects created in the REPL are not being loaded correctly on worker nodes, leading to a ClassNotFound exception. This does work correctly in spark-shell. I was curious if anyone has used ISpark

Re: AVRO specific records

2014-11-05 Thread Laird, Benjamin
Something like this works and is how I create an RDD of specific records. val avroRdd = sc.newAPIHadoopFile(twitter.avro, classOf[AvroKeyInputFormat[twitter_schema]], classOf[AvroKey[twitter_schema]], classOf[NullWritable], conf) (From

Executor Memory, Task hangs

2014-08-19 Thread Laird, Benjamin
Hi all, I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with a cluster of 3 nodes Simple calculations like count take approximately 5s when using the default value of executor.memory (512MB). When I scale this up to 2GB, several Tasks take 1m or more (while most

Re: Executor Memory, Task hangs

2014-08-19 Thread Laird, Benjamin
Thanks Akhil and Sean. All three workers are doing the work and tasks stall simultaneously on all three. I think Sean hit on my issue. I've been under the impression that each application has one executor process per worker machine (not per core per machine). Is that incorrect? If an executor

Re: Avro Schema + GenericRecord to HadoopRDD

2014-07-30 Thread Laird, Benjamin
From: Laird, Benjamin [benjamin.la...@capitalone.com] Sent: Tuesday, July 29, 2014 8:00 AM To: user@spark.apache.org; u...@spark.incubator.apache.org Subject: Avro Schema + GenericRecord to HadoopRDD Hi all, I can read in Avro files to Spark with HadoopRDD and submit the schema

Avro Schema + GenericRecord to HadoopRDD

2014-07-29 Thread Laird, Benjamin
Hi all, I can read in Avro files to Spark with HadoopRDD and submit the schema in the jobConf, but with the guidance I've seen so far, I'm left with a avro GenericRecord of Java objects without type. How do I actually use the schema to have the types inferred? Example: scala

RE: help

2014-04-28 Thread Laird, Benjamin
Joe, Do you have your SPARK_HOME variable set correctly in the spark-env.sh script? I was getting that error when I was first setting up my cluster, turned out I had to make some changes in the spark-env script to get things working correctly. Ben -Original Message- From: Joe L

Running large join in ALS example through PySpark

2014-04-22 Thread Laird, Benjamin
Hello all - I'm running the ALS/Collaborative Filtering code through pySpark on spark0.9.0. (http://spark.apache.org/docs/0.9.0/mllib-guide.html#using-mllib-in-python) My data file has about 27M tuples (User, Item, Rating). ALS.train(ratings,1,30) runs on my 3 node cluster (24 cores, 60GB RAM)