restrict my spark app to run on specific machines

2016-05-04 Thread Shams ul Haque
Hi,

I have a cluster of 4 machines for Spark. I want my Spark app to run on 2
machines only. And rest 2 machines for other Spark apps.
So my question is, can I restrict my app to run on that 2 machines only by
passing some IP at the time of setting SparkConf or by any other setting?


Thanks,
Shams


Re: Spark streaming app starts processing when kill that app

2016-05-03 Thread Shams ul Haque
Hey Hareesh,

Thanks for the help, they were starving. I increased the core + memory on
that machine. Now it is working fine.

Thanks again

On Tue, May 3, 2016 at 12:57 PM, Shams ul Haque <sham...@cashcare.in> wrote:

> No, i made a cluster of 2 machines. And after submission to master, this
> app moves on slave machine for execution.
> Well i am going to give a try to your suggestion by running both on same
> machine.
>
> Thanks
> Shams
>
> On Tue, May 3, 2016 at 12:53 PM, hareesh makam <makamhare...@gmail.com>
> wrote:
>
>> If you are running your master on a single core, it might be an issue of
>> Starvation.
>> assuming you are running it locally, try setting master to local[2] or
>> higher.
>>
>> Check the first example at
>> https://spark.apache.org/docs/latest/streaming-programming-guide.html
>>
>> - Hareesh
>>
>> On 3 May 2016 at 12:35, Shams ul Haque <sham...@cashcare.in> wrote:
>>
>>> Hi all,
>>>
>>> I am facing strange issue when running Spark Streaming app.
>>>
>>> What i was doing is, When i submit my app by *spark-submit *it works
>>> fine and also visible in Spark UI. But it doesn't process any data coming
>>> from kafka. And when i kill that app by pressing Ctrl + C on terminal, then
>>> it start processing all data received from Kafka and then get shutdown.
>>>
>>> I am trying to figure out why is this happening. Please help me if you
>>> know anything.
>>>
>>> Thanks and regards
>>> Shams ul Haque
>>>
>>
>>
>


Re: Spark streaming app starts processing when kill that app

2016-05-03 Thread Shams ul Haque
No, i made a cluster of 2 machines. And after submission to master, this
app moves on slave machine for execution.
Well i am going to give a try to your suggestion by running both on same
machine.

Thanks
Shams

On Tue, May 3, 2016 at 12:53 PM, hareesh makam <makamhare...@gmail.com>
wrote:

> If you are running your master on a single core, it might be an issue of
> Starvation.
> assuming you are running it locally, try setting master to local[2] or
> higher.
>
> Check the first example at
> https://spark.apache.org/docs/latest/streaming-programming-guide.html
>
> - Hareesh
>
> On 3 May 2016 at 12:35, Shams ul Haque <sham...@cashcare.in> wrote:
>
>> Hi all,
>>
>> I am facing strange issue when running Spark Streaming app.
>>
>> What i was doing is, When i submit my app by *spark-submit *it works
>> fine and also visible in Spark UI. But it doesn't process any data coming
>> from kafka. And when i kill that app by pressing Ctrl + C on terminal, then
>> it start processing all data received from Kafka and then get shutdown.
>>
>> I am trying to figure out why is this happening. Please help me if you
>> know anything.
>>
>> Thanks and regards
>> Shams ul Haque
>>
>
>


Spark streaming app starts processing when kill that app

2016-05-03 Thread Shams ul Haque
Hi all,

I am facing strange issue when running Spark Streaming app.

What i was doing is, When i submit my app by *spark-submit *it works fine
and also visible in Spark UI. But it doesn't process any data coming from
kafka. And when i kill that app by pressing Ctrl + C on terminal, then it
start processing all data received from Kafka and then get shutdown.

I am trying to figure out why is this happening. Please help me if you know
anything.

Thanks and regards
Shams ul Haque


Re: kill Spark Streaming job gracefully

2016-03-14 Thread Shams ul Haque
Any one have any idea? or should i raise a bug for that?

Thanks,
Shams

On Fri, Mar 11, 2016 at 3:40 PM, Shams ul Haque <sham...@cashcare.in> wrote:

> Hi,
>
> I want to kill a Spark Streaming job gracefully, so that whatever Spark
> has picked from Kafka have processed. My Spark version is: 1.6.0
>
> When i tried killing a Spark Streaming Job from Spark UI dosen't stop app
> completely. In Spark-UI job is moved to COMPLETED section, but in log it
> continuously gives error: http://pastebin.com/TbGrdzA2
> and process is still visible with *ps* command.
>
>
> I also tried to stop by using below command:
> *bin/spark-submit --master spark://shams-cashcare:7077 --kill
> app-20160311121141-0002*
> but it gives me error as:
> Unable to connect to server spark://shams-cashcare:7077
>
> I have confirmed the Spark master host:port and they are OK. I also added
> ShutdownHook in code.
> What am i missing? Or if i am doing something wrong then please guide me.
>


kill Spark Streaming job gracefully

2016-03-11 Thread Shams ul Haque
Hi,

I want to kill a Spark Streaming job gracefully, so that whatever Spark has
picked from Kafka have processed. My Spark version is: 1.6.0

When i tried killing a Spark Streaming Job from Spark UI dosen't stop app
completely. In Spark-UI job is moved to COMPLETED section, but in log it
continuously gives error: http://pastebin.com/TbGrdzA2
and process is still visible with *ps* command.


I also tried to stop by using below command:
*bin/spark-submit --master spark://shams-cashcare:7077 --kill
app-20160311121141-0002*
but it gives me error as:
Unable to connect to server spark://shams-cashcare:7077

I have confirmed the Spark master host:port and they are OK. I also added
ShutdownHook in code.
What am i missing? Or if i am doing something wrong then please guide me.


Re: does spark needs dedicated machines to run on

2016-03-10 Thread Shams ul Haque
Hi,

*Release of Spark:* 1.6.0, i downloaded it and made a built using 'sbt/sbt
assembly'

*command for submitting your app: *bin/spark-submit --master
spark://shams-machine:7077 --executor-cores 2 --class
in.myapp.email.combiner.CombinerRealtime
/opt/dev/workspace-luna/combiner_spark/target/combiner-0.0.1-SNAPSHOT.jar
2>&1 &

*code snippet of your app: *i developed a lot chained transormations and
connected with Kafka, MongoDB, Cassandra. But tested all of them using
*local[2]
*setting in *conf.setMaster *method. Everything is working there.

*pastebin of log:* http://pastebin.com/0LjTWLfm


Thanks
Shams

On Thu, Mar 10, 2016 at 8:11 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you provide a bit more information ?
>
> Release of Spark
> command for submitting your app
> code snippet of your app
> pastebin of log
>
> Thanks
>
> On Thu, Mar 10, 2016 at 6:32 AM, Shams ul Haque <sham...@cashcare.in>
> wrote:
>
>> Hi,
>>
>> I have developed a spark realtime app and started spark-standalone on my
>> laptop. But when i tried to submit that app in Spark it is always
>> in WAITING state & Cores is always Zero.
>>
>> I have set:
>> export SPARK_WORKER_CORES="2"
>> export SPARK_EXECUTOR_CORES="1"
>>
>> in spark-env.sh, but still nothing happend. And same log entry in:
>> *TaskSchedulerImpl:70 - Initial job has not accepted any resources*
>>
>> So, does i need a seperate machine for all this?
>>
>> Please help me to sort that out.
>>
>> Thanks
>> Shams
>>
>
>


does spark needs dedicated machines to run on

2016-03-10 Thread Shams ul Haque
Hi,

I have developed a spark realtime app and started spark-standalone on my
laptop. But when i tried to submit that app in Spark it is always
in WAITING state & Cores is always Zero.

I have set:
export SPARK_WORKER_CORES="2"
export SPARK_EXECUTOR_CORES="1"

in spark-env.sh, but still nothing happend. And same log entry in:
*TaskSchedulerImpl:70 - Initial job has not accepted any resources*

So, does i need a seperate machine for all this?

Please help me to sort that out.

Thanks
Shams


using MongoDB Tailable Cursor in Spark Streaming

2016-03-07 Thread Shams ul Haque
Hi,

I want to implement Streaming using Mongo Tailable. Please give me hint how
can i do this.
I think i have to extend some class and used its method to do the stuff.
Please give me a hint.


Thanks and regards
Shams ul Haque


Re: merge 3 different types of RDDs in one

2015-12-01 Thread Shams ul Haque
Hi Jacek,

Thanks for the suggestion, i am going to try union.
And what is your opinion on 2nd question.


Thanks
Shams

On Tue, Dec 1, 2015 at 3:23 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Never done it before, but just yesterday I found out about
> SparkContext.union method that could help in your case.
>
> def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T]
>
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <sham...@cashcare.in>
> wrote:
> > Hi All,
> >
> > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
> > CustomerID in which 2 RDDs have value of Iterable type and one has signle
> > bean. All RDDs have id of Long type as CustomerId. Below are the model
> for 3
> > RDDs:
> > JavaPairRDD<Long, Iterable>
> > JavaPairRDD<Long, Iterable>
> > JavaPairRDD<Long, TransactionAggr>
> >
> > Now, i have to merge all these 3 RDDs as signle one so that i can
> generate
> > excel report as per each customer by using data in 3 RDDs.
> > As i tried to using Join Transformation but it needs RDDs of same type
> and
> > it works for two RDDs.
> > So my questions is,
> > 1. is there any way to done my merging task efficiently, so that i can
> get
> > all 3 dataset by CustomerId?
> > 2. If i merge 1st two using Join Transformation, then do i need to run
> > groupByKey() before Join so that all data related to single customer
> will be
> > on one node?
> >
> >
> > Thanks
> > Shams
>


Separate all values from Iterable

2015-10-27 Thread Shams ul Haque
Hi,

I have grouped all my customers in JavaPairRDD
by there customerId (of Long type). Means every customerId have a List or
ProductBean.

Now i want to save all ProductBean to DB irrespective of customerId. I got
all values by using method
JavaRDD values = custGroupRDD.values();

Now i want to convert JavaRDD to JavaRDD so that i can save it to Mongo. Remember, every BSONObject is
made of Single ProductBean.

I am not getting any idea of how to do this in Spark, i mean which Spark's
Transformation is used to do that job. I think this task is some kind
of seperate
all values from Iterable. Please let me know how is this possible.
Any hint in Scala or Python are also ok.


Thanks

Shams