Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-20 Thread varun sharma
both cases, I would expect the remote spark app to be killed and my >> local process to be killed. >> >> Why is this happening? and how can I kill a spark app from the terminal >> launced via shell script w.o going to the spark master UI? >> >> I want to launch the spark app via script and log the pid so i can >> monitor it remotely >> >> thanks for the help >> >> > -- *VARUN SHARMA* *Flipkart* *Bangalore*

Re: In Spark application, how to get the passed in configuration?

2015-11-12 Thread varun sharma
efault > > > It looks like that I cannot access the dynamic passed in value from the > command line this way. > > In the Hadoop, the Configuration object will include all the passed in > key/value in the application. How to archive that in Spark? > > Thanks > > Yong > > -- *VARUN SHARMA* *Flipkart* *Bangalore*

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread varun sharma
in the ability to >> scale out processing beyond your number of partitions. >> >> We’re doing this to scale up from 36 partitions / topic to 140 partitions >> (20 cores * 7 nodes) and it works great. >> >> -adrian >> >> From: varun sharma >> Date:

Need more tasks in KafkaDirectStream

2015-10-29 Thread varun sharma
number of tasks created were independent of kafka partitions, I need something like that only. Any config available if I dont need one to one semantics? Is there any way I can repartition without incurring any additional cost. Thanks *VARUN SHARMA*

Re: correct and fast way to stop streaming application

2015-10-27 Thread varun sharma
t;> is job for offsets 20-30, and sent it to executor. >> - executor does all the steps (if there is only one stage) and saves >> offset 30 to zookeeper. >> >> This way I loose data in offsets 10-20 >> >> How should this be handled correctly? >> >> пн, 26 о

Re: correct and fast way to stop streaming application

2015-10-26 Thread varun sharma
next batch. That > is not desirable for me. > It works like this because JobScheduler is an actor and after it reports > error, it goes on with next message that starts next batch job. While > ssc.awaitTermination() works in another thread and happens after next batch > starts. >

Re: Kafka Streaming and Filtering > 3000 partitons

2015-10-22 Thread varun sharma
olks, >> >> >> >> I have a very large number of Kafka topics (many thousands of partitions) >> that I want to consume, filter based on topic-specific filters, then >> produce back to filtered topics in Kafka. >> >> >> >> Using the receiver-less based approach with Spark 1.4.1 (described here >> <https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md>) >> I am able to use either KafkaUtils.createDirectStream or >> KafkaUtils.createRDD, consume from many topics, and filter them with the >> same filters but I can't seem to wrap my head around how to apply >> topic-specific filters, or to finally produce to topic-specific destination >> topics. >> >> >> >> Another point would be that I will need to checkpoint the metadata after >> each successful batch and set starting offsets per partition back to ZK. I >> expect I can do that on the final RDDs after casting them accordingly, but >> if anyone has any expertise/guidance doing that and is willing to share, >> I'd be pretty grateful. >> >> >> > > -- *VARUN SHARMA* *Flipkart* *Bangalore*

Re: Issue in spark batches

2015-10-21 Thread varun sharma
s not update correctly. > > On Tue, Oct 20, 2015 at 8:38 AM, varun sharma <varunsharman...@gmail.com> > wrote: > >> Also, As you can see the timestamps in attached image. batches coming >> after the Cassandra server comes up(21:04) are processed and batches which >&

Re: Issue in spark batches

2015-10-20 Thread varun sharma
> > TD > > On Mon, Oct 19, 2015 at 5:48 AM, varun sharma <varunsharman...@gmail.com> > wrote: > >> Hi, >> I am facing this issue consistently in spark-cassandra-kafka *streaming >> job.* >> *Spark 1.4.0* >> *cassandra connector 1.4.0-M3* >>

Re: Kafka Direct Stream

2015-10-04 Thread varun sharma
at would > improve your particular usecase and vote for it if so :-) > > -kr, Gerard. > > On Sat, Oct 3, 2015 at 3:53 PM, varun sharma <varunsharman...@gmail.com> > wrote: > >> Thanks Gerardthe code snippet you shared worked.. but can you please >>

Re: Kafka Direct Stream

2015-10-03 Thread varun sharma
} > > > I'm wondering: would something like this ( > https://datastax-oss.atlassian.net/browse/SPARKC-257) better fit your > purposes? > > -kr, Gerard. > > On Fri, Oct 2, 2015 at 8:12 PM, varun sharma <varunsharman...@gmail.com> > wrote: > >> Hi Adrian, >

Re: Kafka Direct Stream

2015-10-02 Thread varun sharma
lit the RDD consisting on N topics into N RDD's >> each having 1 topic. >> >> Any help would be appreciated. >> >> Thanks in advance, >> Udit >> > > -- *VARUN SHARMA* *Flipkart* *Bangalore*

Re: Kafka Direct Stream

2015-10-02 Thread varun sharma
gt; each topic since I need to process data differently depending on the topic. >> I basically want to split the RDD consisting on N topics into N RDD's >> each having 1 topic. >> >> Any help would be appreciated. >> >> Thanks in advance, >> Udit >> > > -- *VARUN SHARMA* *Flipkart* *Bangalore*

OOM error in Spark worker

2015-10-01 Thread varun sharma
ose that data which is not yet scheduled? * -- *VARUN SHARMA* *Flipkart* *Bangalore*

OOM error in Spark worker

2015-09-29 Thread varun sharma
My workers are going OOM over time. I am running a streaming job in spark 1.4.0. Here is the heap dump of workers. /16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy 488,249,688 (95.80%) bytes. These instances are

Re: Spark Installation Maven PermGen OutOfMemoryException

2014-12-27 Thread varun sharma
This works for me: export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -DskipTests clean package -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Installation-Maven-PermGen-OutOfMemoryException-tp20831p20868.html Sent

Re: Debian package for spark?

2014-12-25 Thread varun sharma
Hi Kevin, Were you able to build spark with command export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pdeb -DskipTests clean package ? I am getting the below error for all versions of spark(even 1.2.0): Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on