Re: Logging in Spark through YARN.

2014-09-24 Thread Vipul Pandey
Archit, Are you able to get it to work with 1.0.0? I tried the --files suggestion from Marcelo and it just changed logging for my client and the appmaster and executors were still the same. ~Vipul On Jul 30, 2014, at 9:59 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi Marcelo,

Re: LZO support in Spark 1.0.0 - nothing seems to work

2014-09-17 Thread Vipul Pandey
It works for me : export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native export

GraphX : AssertionError

2014-09-10 Thread Vipul Pandey
Hi, I have a small graph with about 3.3M vertices and close to 7.5M edges. It's a pretty innocent graph with the max degree of 8. Unfortunately, graph.traingleCount is failing on me with the exception below. I'm running a spark-shell on CDH5.1 with the following params : SPARK_DRIVER_MEM=10g

Re: AppMaster OOME on YARN

2014-08-22 Thread Vipul Pandey
This is all that I see related to spark.MapOutputTrackerMaster in the master logs after OOME 14/08/21 13:24:45 ERROR ActorSystemImpl: Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-27] shutting down ActorSystem [spark] java.lang.OutOfMemoryError: Java heap space

AppMaster OOME on YARN

2014-08-21 Thread Vipul Pandey
Hi, I'm running Spark on YARN carrying out a simple reduceByKey followed by another reduceByKey after some transformations. After completing the first stage my Master runs out of memory. I have 20G assigned to the master, 145 executors (12G each +4G overhead) , around 90k input files, 10+TB

Re: An attempt to implement dbscan algorithm on top of Spark

2014-06-12 Thread Vipul Pandey
Great! I was going to implement one of my own - but I may not need to do that any more :) I haven't had a chance to look deep into your code but I would recommend accepting an RDD[Double,Double] as well, instead of just a file. val data = IOHelper.readDataset(sc, /path/to/my/data.csv) And other

Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
And I thought I sent it to the right list! Here you go again - Question below : On May 14, 2014, at 3:06 PM, Vipul Pandey vipan...@gmail.com wrote: So here's a followup question : What's the preferred mode? We have a new cluster coming up with petabytes of data and we intend to take Spark

Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
manager for Spark that supports security and Kerberized clusters. Some advantages of using standalone: * It has been around for longer, so it is likely a little more stable. * Many report faster startup times for apps. -Sandy On Wed, May 14, 2014 at 3:06 PM, Vipul Pandey vipan

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-03 Thread Vipul Pandey
Any word on this one ? On Apr 2, 2014, at 12:26 AM, Vipul Pandey vipan...@gmail.com wrote: I downloaded 0.9.0 fresh and ran the mvn command - the assembly jar thus generated also has both shaded and real version of protobuf classes Vipuls-MacBook-Pro-3:spark-0.9.0-incubating vipul$ jar -ftv

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-02 Thread Vipul Pandey
] upon runnning mvn -Dhadoop.version=2.0.0-cdh4.2.1 -DskipTests clean assembly:assembly On Apr 1, 2014, at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote: Do you get the same problem if you build with maven? On Tue, Apr 1, 2014 at 12:23 PM, Vipul Pandey vipan...@gmail.com wrote

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Vipul Pandey
. - Patrick On Sun, Mar 30, 2014 at 10:03 PM, Vipul Pandey vipan...@gmail.com wrote: I'm using ScalaBuff (which depends on protobuf2.5) and facing the same issue. any word on this one? On Mar 27, 2014, at 6:41 PM, Kanwaldeep kanwal...@gmail.com wrote: We are using Protocol Buffer 2.5 to send

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Vipul Pandey
, Vipul Pandey vipan...@gmail.com wrote: Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be getting pulled in unless you are directly using akka yourself. Are you? No i'm not. Although I see that protobuf libraries are directly pulled into the 0.9.0 assembly jar - I do

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-03-30 Thread Vipul Pandey
I'm using ScalaBuff (which depends on protobuf2.5) and facing the same issue. any word on this one? On Mar 27, 2014, at 6:41 PM, Kanwaldeep kanwal...@gmail.com wrote: We are using Protocol Buffer 2.5 to send messages to Spark Streaming 0.9 with Kafka stream setup. I have protocol Buffer 2.5

batching the output

2014-03-30 Thread Vipul Pandey
Hi, I need to batch the values in my final RDD before writing out to hdfs. The idea is to batch multiple rows in a protobuf and write those batches out - mostly to save some space as a lot of metadata is the same. e.g. 1,2,3,4,5,6 just batch them (1,2), (3,4),(5,6) and save three records

Re: Lzo + Protobuf

2014-03-12 Thread Vipul Pandey
to write out the original myRDD as block compressed lzo? Thanks, Vipul On Jan 29, 2014, at 9:40 AM, Issac Buenrostro buenros...@ooyala.com wrote: Good! I'll keep your experience in mind in case we have problems in the future :) On Tue, Jan 28, 2014 at 5:55 PM, Vipul Pandey vipan