Re: Confusion over how to deploy/run JAR files to a Spark Cluster

2014-10-02 Thread Ashish Jain
Hello Mark, I am no expert but I can answer some of your questions. On Oct 2, 2014 2:15 AM, Mark Mandel mark.man...@gmail.com wrote: Hi, So I'm super confused about how to take my Spark code and actually deploy and run it on a cluster. Let's assume I'm writing in Java, and we'll take a

Re: partition size for initial read

2014-10-02 Thread Ashish Jain
If you are using textFiles() to read data in, it also takes in a parameter the number of minimum partitions to create. Would that not work for you? On Oct 2, 2014 7:00 AM, jamborta jambo...@gmail.com wrote: Hi all, I have been testing repartitioning to ensure that my algorithms get similar

Re: Spark inside Eclipse

2014-10-01 Thread Ashish Jain
Hello Sanjay, This can be done, and is a very effective way to debug. 1) Compile and package your project to get a fat jar 2) In your SparkConf use setJars and give location of this jar. Also set your master here as local in SparkConf 3) Use this SparkConf when creating JavaSparkContext 4) Debug

When to start optimizing for GC?

2014-09-29 Thread Ashish Jain
Hello, I have written a standalone spark job which I run through Ooyala Job Server. The program is working correctly, now I'm looking into how to optimize it. My program without optimization took 4 hours to run. The first optimization of KyroSerializer and compiling regex pattern and reusing

Re: Specifying classpath

2014-08-27 Thread Ashish Jain
I solved this issue by putting hbase-protobuf in Hadoop classpath, and not in the spark classpath. export HADOOP_CLASSPATH=/path/to/jar/hbase-protocol-0.98.1-cdh5.1.0.jar On Tue, Aug 26, 2014 at 5:42 PM, Ashish Jain ashish@gmail.com wrote: Hello, I'm using the following version

Specifying classpath

2014-08-26 Thread Ashish Jain
Hello, I'm using the following version of Spark - 1.0.0+cdh5.1.0+41 (1.cdh5.1.0.p0.27). I've tried to specify the libraries Spark uses using the following ways - 1) Adding it to spark context 2) Specifying the jar path in a) spark.executor.extraClassPath b) spark.executor.extraLibraryPath