Re: processing 50 gb data using just one machine

2016-06-15 Thread Mich Talebzadeh
50gb of data is not much. besides master local[4] what else do you have for other parameters? ${SPARK_HOME}/bin/spark-submit \ --driver-memory 4G \ --num-executors 1 \ --executor-memory 4G \ --master local[4] \ Try running it

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
Thanks! got that. I was worried about the time itself. On Wed, Jun 15, 2016 at 10:10 AM, Sergio Fernández wrote: > In theory yes... the common sense say that: > > volume / resources = time > > So more volume on the same processing resources would just take more time. > On Jun

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
I meant local mode is testing purpose generally. But, I have to use the entire 50gb data. On Wed, Jun 15, 2016 at 10:14 AM, Deepak Goel wrote: > If it is just for test purpose, why not use a smaller size of data and > test it on your notebook. When you go for the cluster, you

Re: processing 50 gb data using just one machine

2016-06-15 Thread Deepak Goel
If it is just for test purpose, why not use a smaller size of data and test it on your notebook. When you go for the cluster, you can go for 50GB (I am a newbie so my thought would be very naive) Hey Namaskara~Nalama~Guten Tag~Bonjour -- Keigu Deepak 73500 12833 www.simtree.net,

Re: processing 50 gb data using just one machine

2016-06-15 Thread Sergio Fernández
In theory yes... the common sense say that: volume / resources = time So more volume on the same processing resources would just take more time. On Jun 15, 2016 6:43 PM, "spR" wrote: > I have 16 gb ram, i7 > > Will this config be able to handle the processing without my

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
I have 16 gb ram, i7 Will this config be able to handle the processing without my ipythin notebook dying? The local mode is for testing purpose. But, I do not have any cluster at my disposal. So can I make this work with the configuration that I have? Thank you. On Jun 15, 2016 9:40 AM, "Deepak

processing 50 gb data using just one machine

2016-06-15 Thread spR
Hi, can I use spark in local mode using 4 cores to process 50gb data effeciently? Thank you misha