referer=http://localhost:/notebooks/Scala/Untitled0.ipynb
How did you start the notebook?
Thanks Regards,
Meethu M
On Wednesday, 12 November 2014 6:50 AM, Laird, Benjamin
benjamin.la...@capitalone.commailto:benjamin.la...@capitalone.com wrote:
I've been experimenting with the ISpark
I've been experimenting with the ISpark extension to IScala
(https://github.com/tribbloid/ISpark)
Objects created in the REPL are not being loaded correctly on worker nodes,
leading to a ClassNotFound exception. This does work correctly in spark-shell.
I was curious if anyone has used ISpark
Something like this works and is how I create an RDD of specific records.
val avroRdd = sc.newAPIHadoopFile(twitter.avro,
classOf[AvroKeyInputFormat[twitter_schema]], classOf[AvroKey[twitter_schema]],
classOf[NullWritable], conf) (From
Hi all,
I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with
a cluster of 3 nodes
Simple calculations like count take approximately 5s when using the default
value of executor.memory (512MB). When I scale this up to 2GB, several Tasks
take 1m or more (while most
Thanks Akhil and Sean.
All three workers are doing the work and tasks stall simultaneously on all
three. I think Sean hit on my issue. I've been under the impression that each
application has one executor process per worker machine (not per core per
machine). Is that incorrect? If an executor
From: Laird, Benjamin [benjamin.la...@capitalone.com]
Sent: Tuesday, July 29, 2014 8:00 AM
To: user@spark.apache.org; u...@spark.incubator.apache.org
Subject: Avro Schema + GenericRecord to HadoopRDD
Hi all,
I can read in Avro files to Spark with HadoopRDD and submit the schema
Hi all,
I can read in Avro files to Spark with HadoopRDD and submit the schema in
the jobConf, but with the guidance I've seen so far, I'm left with a avro
GenericRecord of Java objects without type. How do I actually use the
schema to have the types inferred?
Example:
scala
Joe,
Do you have your SPARK_HOME variable set correctly in the spark-env.sh script?
I was getting that error when I was first setting up my cluster, turned out I
had to make some changes in the spark-env script to get things working
correctly.
Ben
-Original Message-
From: Joe L
Hello all -
I'm running the ALS/Collaborative Filtering code through pySpark on spark0.9.0.
(http://spark.apache.org/docs/0.9.0/mllib-guide.html#using-mllib-in-python)
My data file has about 27M tuples (User, Item, Rating). ALS.train(ratings,1,30)
runs on my 3 node cluster (24 cores, 60GB RAM)