good http sync client to be used with spark

2017-05-31 Thread vimal dinakaran
Hi, In our application pipeline we need to push the data from spark streaming to a http server. I would like to have a http client with below requirements. 1. synchronous calls 2. Http connection pool support 3. light weight and easy to use. spray,akka http are mostly suited for async call .

Re: Spark Launch programatically - Basics!

2017-05-23 Thread vimal dinakaran
We are using the below code for for integration test. You need to wait for the process state. .startApplication( new Listener { override def infoChanged(handle: SparkAppHandle): Unit = { println("*** info changed * ", handle.getAppId, handle.getState)

Restart if driver gets insufficient resources

2017-03-02 Thread vimal dinakaran
Hi All, We are running spark on kubernetes. There is a scenario in which the spark driver(pod) was not able to communicate properly with master and it got stuck saying insufficient resources. On restarting the spark driver (pod) manually , It was able to run properly. Is there a way to just

spark logging best practices

2016-07-08 Thread vimal dinakaran
Hi, http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala What is the best way to capture spark logs without getting task not serialzible error ? The above link has various workarounds. Also is there a way to dynamically set the log level when the application is running

Re: Spark master shuts down when one of zookeeper dies

2016-06-30 Thread vimal dinakaran
questions/13022244/zookeeper-reliability-three-versus-five-nodes > > http://www.ibm.com/developerworks/library/bd-zookeeper/ > the paragraph starting with 'A quorum is represented by a strict > majority of nodes' > > FYI > > On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <

Re: Restart App and consume from checkpoint using direct kafka API

2016-06-28 Thread vimal dinakaran
everytime you consume the > > message. Then read the offset from the database everytime you want to > start > > reading the message. > > > > nb: This approach is also explained by Cody in his blog post. > > > > Thanks > > > > On Thu, Mar 31, 2016 at 2

Spark master shuts down when one of zookeeper dies

2016-06-28 Thread vimal dinakaran
I am using zookeeper for providing HA for spark cluster. We have two nodes zookeeper cluster. When one of the zookeeper dies then the entire spark cluster goes down . Is this expected behaviour ? Am I missing something in config ? Spark version - 1.6.1. Zookeeper version - 3.4.6 //

spark streaming application - deployment best practices

2016-06-15 Thread vimal dinakaran
Hi All, I am using spark-submit cluster mode deployment for my application to run it in production. But this places a requirement of having the jars in the same path in all the nodes and also the config file which is passed as argument in the same path. I am running spark in standalone mode and I

Restart App and consume from checkpoint using direct kafka API

2016-03-31 Thread vimal dinakaran
Hi, In the blog https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md It is mentioned that enabling checkpoint works as long as the app jar is unchanged. If I want to upgrade the jar with the latest code and consume from kafka where it was stopped , how to do that ? Is there a

Re: spark streaming web ui not showing the events - direct kafka api

2016-02-03 Thread vimal dinakaran
No I am using DSE 4.8 which has spark 1.4. Is this a known issue ? On Wed, Jan 27, 2016 at 11:52 PM, Cody Koeninger <c...@koeninger.org> wrote: > Have you tried spark 1.5? > > On Wed, Jan 27, 2016 at 11:14 AM, vimal dinakaran <vimal3...@gmail.com> > wrote: > >&

spark streaming web ui not showing the events - direct kafka api

2016-01-27 Thread vimal dinakaran
Hi , I am using spark 1.4 with direct kafka api . In my streaming ui , I am able to see the events listed in UI only if add stream.print() statements or else event rate and input events remains in 0 eventhough the events gets processed. Without print statements , I have the action

Re: Cluster mode dependent jars not working

2015-12-17 Thread vimal dinakaran
Dec 15, 2015 at 3:57 AM, vimal dinakaran <vimal3...@gmail.com> > wrote: > >> I am running spark using cluster mode for deployment . Below is the >> command >> >> >> JARS=$JARS_HOME/amqp-client-3.5.3.jar,$JARS_HOME/nscala-time_2.10-2.0.0.jar,\ >> $JAR

Cluster mode dependent jars not working

2015-12-15 Thread vimal dinakaran
I am running spark using cluster mode for deployment . Below is the command JARS=$JARS_HOME/amqp-client-3.5.3.jar,$JARS_HOME/nscala-time_2.10-2.0.0.jar,\ $JARS_HOME/kafka_2.10-0.8.2.1.jar,$JARS_HOME/kafka-clients-0.8.2.1.jar,\ $JARS_HOME/spark-streaming-kafka_2.10-1.4.1.jar,\

Re: Checkpoint not working after driver restart

2015-11-07 Thread vimal dinakaran
; (x._1, DateTime.now, formatterFunc(x._2))).saveToCassandra(keyspace, hourlyStatsTable) HourlyResult.map(x => (x, "hourly")).print() } } On Wed, Nov 4, 2015 at 12:27 PM, vimal dinakaran <vimal3...@gmail.com> wrote: > I have a simple spark stream

Checkpoint not working after driver restart

2015-11-03 Thread vimal dinakaran
I have a simple spark streaming application which reads the data from the rabbitMQ and does some aggregation on window interval of 1 min and 1 hour for batch interval of 30s. I have a three node setup. And to enable checkpoint, I have mounted the same directory using sshfs to all worker node