restrict my spark app to run on specific machines
Hi, I have a cluster of 4 machines for Spark. I want my Spark app to run on 2 machines only. And rest 2 machines for other Spark apps. So my question is, can I restrict my app to run on that 2 machines only by passing some IP at the time of setting SparkConf or by any other setting? Thanks, Shams
Re: Spark streaming app starts processing when kill that app
Hey Hareesh, Thanks for the help, they were starving. I increased the core + memory on that machine. Now it is working fine. Thanks again On Tue, May 3, 2016 at 12:57 PM, Shams ul Haque <sham...@cashcare.in> wrote: > No, i made a cluster of 2 machines. And after submission to master, this > app moves on slave machine for execution. > Well i am going to give a try to your suggestion by running both on same > machine. > > Thanks > Shams > > On Tue, May 3, 2016 at 12:53 PM, hareesh makam <makamhare...@gmail.com> > wrote: > >> If you are running your master on a single core, it might be an issue of >> Starvation. >> assuming you are running it locally, try setting master to local[2] or >> higher. >> >> Check the first example at >> https://spark.apache.org/docs/latest/streaming-programming-guide.html >> >> - Hareesh >> >> On 3 May 2016 at 12:35, Shams ul Haque <sham...@cashcare.in> wrote: >> >>> Hi all, >>> >>> I am facing strange issue when running Spark Streaming app. >>> >>> What i was doing is, When i submit my app by *spark-submit *it works >>> fine and also visible in Spark UI. But it doesn't process any data coming >>> from kafka. And when i kill that app by pressing Ctrl + C on terminal, then >>> it start processing all data received from Kafka and then get shutdown. >>> >>> I am trying to figure out why is this happening. Please help me if you >>> know anything. >>> >>> Thanks and regards >>> Shams ul Haque >>> >> >> >
Re: Spark streaming app starts processing when kill that app
No, i made a cluster of 2 machines. And after submission to master, this app moves on slave machine for execution. Well i am going to give a try to your suggestion by running both on same machine. Thanks Shams On Tue, May 3, 2016 at 12:53 PM, hareesh makam <makamhare...@gmail.com> wrote: > If you are running your master on a single core, it might be an issue of > Starvation. > assuming you are running it locally, try setting master to local[2] or > higher. > > Check the first example at > https://spark.apache.org/docs/latest/streaming-programming-guide.html > > - Hareesh > > On 3 May 2016 at 12:35, Shams ul Haque <sham...@cashcare.in> wrote: > >> Hi all, >> >> I am facing strange issue when running Spark Streaming app. >> >> What i was doing is, When i submit my app by *spark-submit *it works >> fine and also visible in Spark UI. But it doesn't process any data coming >> from kafka. And when i kill that app by pressing Ctrl + C on terminal, then >> it start processing all data received from Kafka and then get shutdown. >> >> I am trying to figure out why is this happening. Please help me if you >> know anything. >> >> Thanks and regards >> Shams ul Haque >> > >
Spark streaming app starts processing when kill that app
Hi all, I am facing strange issue when running Spark Streaming app. What i was doing is, When i submit my app by *spark-submit *it works fine and also visible in Spark UI. But it doesn't process any data coming from kafka. And when i kill that app by pressing Ctrl + C on terminal, then it start processing all data received from Kafka and then get shutdown. I am trying to figure out why is this happening. Please help me if you know anything. Thanks and regards Shams ul Haque
Re: kill Spark Streaming job gracefully
Any one have any idea? or should i raise a bug for that? Thanks, Shams On Fri, Mar 11, 2016 at 3:40 PM, Shams ul Haque <sham...@cashcare.in> wrote: > Hi, > > I want to kill a Spark Streaming job gracefully, so that whatever Spark > has picked from Kafka have processed. My Spark version is: 1.6.0 > > When i tried killing a Spark Streaming Job from Spark UI dosen't stop app > completely. In Spark-UI job is moved to COMPLETED section, but in log it > continuously gives error: http://pastebin.com/TbGrdzA2 > and process is still visible with *ps* command. > > > I also tried to stop by using below command: > *bin/spark-submit --master spark://shams-cashcare:7077 --kill > app-20160311121141-0002* > but it gives me error as: > Unable to connect to server spark://shams-cashcare:7077 > > I have confirmed the Spark master host:port and they are OK. I also added > ShutdownHook in code. > What am i missing? Or if i am doing something wrong then please guide me. >
kill Spark Streaming job gracefully
Hi, I want to kill a Spark Streaming job gracefully, so that whatever Spark has picked from Kafka have processed. My Spark version is: 1.6.0 When i tried killing a Spark Streaming Job from Spark UI dosen't stop app completely. In Spark-UI job is moved to COMPLETED section, but in log it continuously gives error: http://pastebin.com/TbGrdzA2 and process is still visible with *ps* command. I also tried to stop by using below command: *bin/spark-submit --master spark://shams-cashcare:7077 --kill app-20160311121141-0002* but it gives me error as: Unable to connect to server spark://shams-cashcare:7077 I have confirmed the Spark master host:port and they are OK. I also added ShutdownHook in code. What am i missing? Or if i am doing something wrong then please guide me.
Re: does spark needs dedicated machines to run on
Hi, *Release of Spark:* 1.6.0, i downloaded it and made a built using 'sbt/sbt assembly' *command for submitting your app: *bin/spark-submit --master spark://shams-machine:7077 --executor-cores 2 --class in.myapp.email.combiner.CombinerRealtime /opt/dev/workspace-luna/combiner_spark/target/combiner-0.0.1-SNAPSHOT.jar 2>&1 & *code snippet of your app: *i developed a lot chained transormations and connected with Kafka, MongoDB, Cassandra. But tested all of them using *local[2] *setting in *conf.setMaster *method. Everything is working there. *pastebin of log:* http://pastebin.com/0LjTWLfm Thanks Shams On Thu, Mar 10, 2016 at 8:11 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you provide a bit more information ? > > Release of Spark > command for submitting your app > code snippet of your app > pastebin of log > > Thanks > > On Thu, Mar 10, 2016 at 6:32 AM, Shams ul Haque <sham...@cashcare.in> > wrote: > >> Hi, >> >> I have developed a spark realtime app and started spark-standalone on my >> laptop. But when i tried to submit that app in Spark it is always >> in WAITING state & Cores is always Zero. >> >> I have set: >> export SPARK_WORKER_CORES="2" >> export SPARK_EXECUTOR_CORES="1" >> >> in spark-env.sh, but still nothing happend. And same log entry in: >> *TaskSchedulerImpl:70 - Initial job has not accepted any resources* >> >> So, does i need a seperate machine for all this? >> >> Please help me to sort that out. >> >> Thanks >> Shams >> > >
does spark needs dedicated machines to run on
Hi, I have developed a spark realtime app and started spark-standalone on my laptop. But when i tried to submit that app in Spark it is always in WAITING state & Cores is always Zero. I have set: export SPARK_WORKER_CORES="2" export SPARK_EXECUTOR_CORES="1" in spark-env.sh, but still nothing happend. And same log entry in: *TaskSchedulerImpl:70 - Initial job has not accepted any resources* So, does i need a seperate machine for all this? Please help me to sort that out. Thanks Shams
using MongoDB Tailable Cursor in Spark Streaming
Hi, I want to implement Streaming using Mongo Tailable. Please give me hint how can i do this. I think i have to extend some class and used its method to do the stuff. Please give me a hint. Thanks and regards Shams ul Haque
Re: merge 3 different types of RDDs in one
Hi Jacek, Thanks for the suggestion, i am going to try union. And what is your opinion on 2nd question. Thanks Shams On Tue, Dec 1, 2015 at 3:23 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Never done it before, but just yesterday I found out about > SparkContext.union method that could help in your case. > > def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T] > > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | https://medium.com/@jaceklaskowski/ | > http://blog.jaceklaskowski.pl > Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > > On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in> > wrote: > > Hi All, > > > > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by > > CustomerID in which 2 RDDs have value of Iterable type and one has signle > > bean. All RDDs have id of Long type as CustomerId. Below are the model > for 3 > > RDDs: > > JavaPairRDD<Long, Iterable> > > JavaPairRDD<Long, Iterable> > > JavaPairRDD<Long, TransactionAggr> > > > > Now, i have to merge all these 3 RDDs as signle one so that i can > generate > > excel report as per each customer by using data in 3 RDDs. > > As i tried to using Join Transformation but it needs RDDs of same type > and > > it works for two RDDs. > > So my questions is, > > 1. is there any way to done my merging task efficiently, so that i can > get > > all 3 dataset by CustomerId? > > 2. If i merge 1st two using Join Transformation, then do i need to run > > groupByKey() before Join so that all data related to single customer > will be > > on one node? > > > > > > Thanks > > Shams >
Separate all values from Iterable
Hi, I have grouped all my customers in JavaPairRDDby there customerId (of Long type). Means every customerId have a List or ProductBean. Now i want to save all ProductBean to DB irrespective of customerId. I got all values by using method JavaRDD values = custGroupRDD.values(); Now i want to convert JavaRDD to JavaRDD