Hello Mark,
I am no expert but I can answer some of your questions.
On Oct 2, 2014 2:15 AM, Mark Mandel mark.man...@gmail.com wrote:
Hi,
So I'm super confused about how to take my Spark code and actually deploy
and run it on a cluster.
Let's assume I'm writing in Java, and we'll take a
If you are using textFiles() to read data in, it also takes in a parameter
the number of minimum partitions to create. Would that not work for you?
On Oct 2, 2014 7:00 AM, jamborta jambo...@gmail.com wrote:
Hi all,
I have been testing repartitioning to ensure that my algorithms get similar
Hello Sanjay,
This can be done, and is a very effective way to debug.
1) Compile and package your project to get a fat jar
2) In your SparkConf use setJars and give location of this jar. Also set
your master here as local in SparkConf
3) Use this SparkConf when creating JavaSparkContext
4) Debug
Hello,
I have written a standalone spark job which I run through Ooyala Job
Server. The program is working correctly, now I'm looking into how to
optimize it.
My program without optimization took 4 hours to run. The first optimization
of KyroSerializer and compiling regex pattern and reusing
I solved this issue by putting hbase-protobuf in Hadoop classpath, and not
in the spark classpath.
export HADOOP_CLASSPATH=/path/to/jar/hbase-protocol-0.98.1-cdh5.1.0.jar
On Tue, Aug 26, 2014 at 5:42 PM, Ashish Jain ashish@gmail.com wrote:
Hello,
I'm using the following version
Hello,
I'm using the following version of Spark - 1.0.0+cdh5.1.0+41
(1.cdh5.1.0.p0.27).
I've tried to specify the libraries Spark uses using the following ways -
1) Adding it to spark context
2) Specifying the jar path in
a) spark.executor.extraClassPath
b) spark.executor.extraLibraryPath