Stop a Dstream computation

2015-09-24 Thread Samya
Hi Team, I have a code piece as follows. try{ someDstream.someaction(...) //Step1 }catch{ case ex:Exception =>{ someDstream.someaction(...) //Step2 } } When I get an exception for current batch, Step2 executes as

RE: Getting parent RDD

2015-09-16 Thread Samya MAITI
m: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, September 16, 2015 12:24 PM To: Samya MAITI <samya.ma...@amadeus.com> Cc: user@spark.apache.org Subject: Re: Getting parent RDD ​How many RDDs are you having in that stream? If its a single RDD then you could do a .f

Getting parent RDD

2015-09-15 Thread Samya
Hi Team I have the below situation. val ssc = val msgStream = . //SparkKafkaDirectAPI val wordCountPair = TransformStream.transform(msgStream) /wordCountPair.foreachRDD(rdd => try{ //Some action that causes exception }catch { case ex1 : Exception => {

Exception Handling : Spark Streaming

2015-09-11 Thread Samya
Hi Team, I am facing this issue where in I can't figure out why the exception is handled the first time an exception is thrown in the stream processing action, but is ignored the second time. PFB my code base. object Boot extends App { //Load the configuration val config =

RE: Maintaining Kafka Direct API Offsets

2015-09-10 Thread Samya
Thanks Ameya. From: ameya [via Apache Spark User List] [mailto:ml-node+s1001560n24650...@n3.nabble.com] Sent: Friday, September 11, 2015 4:12 AM To: Samya MAITI <samya.ma...@amadeus.com> Subject: Re: Maintaining Kafka Direct API Offsets So I added something like this: Runtime.getR

Re: Maintaining Kafka Direct API Offsets

2015-09-10 Thread Samya
Hi Ameya, Plz suggest, when you say graceful shut-down, what exactly did you handle? Thanks. Thanks, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246p24636.html Sent from the Apache Spark User List mailing

RE: Spark streaming -> cassandra : Fault Tolerance

2015-09-09 Thread Samya MAITI
From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Thursday, September 10, 2015 1:13 AM To: Samya MAITI <samya.ma...@amadeus.com> Cc: user@spark.apache.org Subject: Re: Spark streaming -> cassandra : Fault Tolerance It's been a while since I've looked at the cassandra connector, s

Spark streaming -> cassandra : Fault Tolerance

2015-09-09 Thread Samya
Hi Team, I have an sample spark application which reads from Kafka using direct API & then does some transformation & stores to cassandra (using saveToCassandra()). If Cassandra goes down, then application logs NoHostAvailable exception (as expected). But in the mean time the new incoming

RE: Relation between threads and executor core

2015-08-26 Thread Samya MAITI
? Is it in user control? Regards, Sam From: Jem Tucker [mailto:jem.tuc...@gmail.com] Sent: Wednesday, August 26, 2015 2:26 PM To: Samya MAITI samya.ma...@amadeus.com; user@spark.apache.org Subject: Re: Relation between threads and executor core Hi Samya, When submitting an application with spark-submit

Relation between threads and executor core

2015-08-26 Thread Samya
Hi All, Few basic queries :- 1. Is there a way we can control the number of threads per executor core? 2. Does this parameter “executor-cores” also has say in deciding how many threads to be run? Regards, Sam -- View this message in context:

Spark-Cassandra-connector

2015-08-20 Thread Samya
) connection object per executor, that is shared between tasks ? 2. If the above answer is YES, is there a way to create a connectionPool for each executor, so that multiple task can dump data to cassandra in parallel? Regards, Samya -- View this message in context: http://apache-spark-user-list

Re: Spark Streaming - Design considerations/Knobs

2015-05-24 Thread Maiti, Samya
Really good list to brush up basics. Just one input, regarding * An RDD's processing is scheduled by driver's jobscheduler as a job. At a given point of time only one job is active. So, if one job is executing the other jobs are queued. We can have multiple jobs running in a given

Re: Writing to a single file from multiple executors

2015-03-12 Thread Maiti, Samya
Hi TD, I want to append my record to a AVRO file which will be later used for querying. Having a single file is not mandatory for us but then how can we make the executors append the AVRO data to multiple files. Thanks, Sam On Mar 12, 2015, at 4:09 AM, Tathagata Das

Re: Kafka + Spark streaming

2014-12-31 Thread Samya Maiti
Thanks TD. On Wed, Dec 31, 2014 at 7:19 AM, Tathagata Das tathagata.das1...@gmail.com wrote: 1. Of course, a single block / partition has many Kafka messages, and from different Kafka topics interleaved together. The message count is not related to the block count. Any message received within

Re: Can we say 1 RDD is generated every batch interval?

2014-12-30 Thread Maiti, Samya
Thank Sean. That was helpful. Regards, Sam On Dec 30, 2014, at 4:12 PM, Sean Owen so...@cloudera.com wrote: The DStream model is one RDD of data per interval, yes. foreachRDD performs an operation on each RDD in the stream, which means it is executed once* for the one RDD in each interval.