Spark ResourceLeak?

2016-07-19 Thread saurabh guru
I am running a Spark Cluster on Mesos. The module reads data from Kafka as DirectStream and pushes it into elasticsearch after referring to a redis for getting Names against IDs. I have been getting this message in my worker logs. *16/07/19 11:17:44 ERROR ResourceLeakDetector: LEAK: You are

Spark not handling Null

2016-04-11 Thread saurabh guru
Trying to run following causes a NullPointer Exception. While I thought Spark should have been able to handle Null, apparently it is not able to. What could I return in place of null? What other ways could I approach this?? There are at times, I would want to just skip parsing and proceed to next

Spark Sampling

2016-04-06 Thread saurabh guru
I have a steady stream of data coming in from my Kafka agent to my Spark system. How do I sample this data at Spark so that it doesn't get heavily loaded? I have object of type: JavaDStream lines How to achieve 1% sampling on the above? I was doing rdd.sample(false, 1.0) before inserting the

Re: Streaming app consume multiple kafka topics

2016-03-15 Thread saurabh guru
I am doing the same thing this way: // Iterate over HashSet of topics Iterator iterator = topicsSet.iterator(); JavaPairInputDStream messages; JavaDStream lines; String topic = ""; // get messages stream for each topic while

Re: NullPointerException

2016-03-12 Thread saurabh guru
>>> Looking at ExternalSorter.scala line 192 >>> >>> 189 >>> while (records.hasNext) { addElementsRead() kv = records.next() >>> map.changeValue((getPartition(kv._1), kv._1), update) >>> maybeSpillCollection(usingMap = true) } >>> >>> On Sat,

Re: NullPointerException

2016-03-11 Thread Saurabh Guru
h Spark release do you use ? > > I wonder if the following may have fixed the problem: > SPARK-8029 Robust shuffle writer > > JIRA is down, cannot check now. > > On Fri, Mar 11, 2016 at 11:01 PM, Saurabh Guru <saurabh.g...@gmail.com > <mailto:saurabh.g...@g

NullPointerException

2016-03-11 Thread Saurabh Guru
I am seeing the following exception in my Spark Cluster every few days in production. 2016-03-12 05:30:00,541 - WARN TaskSetManager - Lost task 0.0 in stage 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us -west-1.compute.internal ): java.lang.NullPointerException at