Spark cluster tuning recommendation

2016-07-11 Thread Kartik Mathur
I am trying a run terasort in spark , for a 7 node cluster with only 10g of data and executors get lost with GC overhead limit exceeded error. This is what my cluster looks like - - *Alive Workers:* 7 - *Cores in use:* 28 Total, 2 Used - *Memory in use:* 56.0 GB Total, 1024.0 MB Used

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
> Thanks, > Prabhu Joseph > > On Mon, Feb 15, 2016 at 12:34 PM, Kartik Mathur <kar...@bluedata.com> > wrote: > >> Thanks Prabhu , >> >> I had wrongly configured spark_master_ip in worker nodes to `hostname -f` >> which is the worker and not master ,

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
nks, > Prabhu Joseph > > On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur <kar...@bluedata.com> > wrote: > >> on spark 1.5.2 >> I have a spark standalone cluster with 6 workers , I left the cluster >> idle for 3 days and after 3 days I saw only 4 workers on t

Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
on spark 1.5.2 I have a spark standalone cluster with 6 workers , I left the cluster idle for 3 days and after 3 days I saw only 4 workers on the spark master UI , 2 workers died with the same exception - Strange part is cluster was running stable for 2 days but on third day 2 workers abruptly

Re: Huge shuffle data size

2015-10-23 Thread Kartik Mathur
Don't use groupBy , use reduceByKey instead , groupBy should always be avoided as it leads to lot of shuffle reads/writes. On Fri, Oct 23, 2015 at 11:39 AM, pratik khadloya wrote: > Sorry i sent the wrong join code snippet, the actual snippet is > > ggImpsDf.join( >

Re: How does shuffle work in spark ?

2015-10-20 Thread Kartik Mathur
That will depend on what is your transformation , your code snippet might help . On Tue, Oct 20, 2015 at 1:53 AM, shahid ashraf <sha...@trialx.com> wrote: > Hi > > Any idea why is 50 GB shuffle read and write for 3.3 gb data > > On Mon, Oct 19, 2015 at 11:58 P

Spark Master Dying saying TimeoutException

2015-10-14 Thread Kartik Mathur
Hi, I have some nightly jobs which runs every night but dies sometimes because of unresponsive master , spark master logs says - Not seeing much else there , what could possible cause an exception like this. *Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out

Re: Spark Master Dying saying TimeoutException

2015-10-14 Thread Kartik Mathur
Retrying what ? I want to know why is it died , and what can i do to prevent ? On Wed, Oct 14, 2015 at 5:20 PM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > I fixed these timeout errors by retrying... > On Oct 15, 2015 3:41 AM, "Kartik Mathur" <kar...@blu

Re: DEBUG level log in receivers and executors

2015-10-12 Thread Kartik Mathur
You can create log4j.properties under your SPARK_HOME/conf and set up these properties - log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout

Re: Problem understanding spark word count execution

2015-10-02 Thread Kartik Mathur
e, convert to whatever the InputFormat dictates.* > > The shuffle can only be the part when a node opens an HDFS file for > instance but the node does not have a local replica of the blocks which it > needs to read (those pertaining to his assigned partitions). So he needs to > pick them

Re: Problem understanding spark word count execution

2015-10-02 Thread Kartik Mathur
> needs to read (those pertaining to his assigned partitions). So he needs to > pick them up from remote nodes which do have replicas of that data. > > After blocks are read into memory, flatMap and Map are local computations > generating new RDDs and in the end the result is sent to the drive

Re: Problem understanding spark word count execution

2015-10-01 Thread Kartik Mathur
ap and map are narrow dependencies, meaning > they can usually happen on the local node, I bet shuffle is just sending > out the textFile to a few nodes to distribute the partitions. > > > -- > *From:* Kartik Mathur <kar...@bluedata.com> > *Sent:

Shuffle Write v/s Shuffle Read

2015-10-01 Thread Kartik Mathur
Hi I am trying to better understand shuffle in spark . Based on my understanding thus far , *Shuffle Write* : writes stage output for intermediate stage on local disk if memory is not sufficient., Example , if each worker has 200 MB memory for intermediate results and the results are 300MB then

Re: Problem understanding spark word count execution

2015-10-01 Thread Kartik Mathur
can share more of your context if still unclear. > I just made assumptions to give clarity on a similar thing. > > Nicu > -- > *From:* Kartik Mathur <kar...@bluedata.com> > *Sent:* Thursday, October 1, 2015 10:25 PM > *To:* Nicolae Marasoiu >

Problem understanding spark word count execution

2015-09-30 Thread Kartik Mathur
Hi All, I tried running spark word count and I have couple of questions - I am analyzing stage 0 , i.e *sc.textFile -> flatMap -> Map (Word count example)* 1) In the *Stage logs* under Application UI details for every task I am seeing Shuffle write as 2.7 KB, *question - how can I know where

Re: SQL queries in Spark / YARN

2015-09-28 Thread Kartik Mathur
Hey Robert you could use Zeppelin iInstead If you don't want to use beeline . On Monday, September 28, 2015, Robert Grandl wrote: > Thanks Mark. Do you know how ? In Spark standalone mode I use beeline to > submit SQL scripts. > > In Spark/YARN, the only way I can see

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
Hey Rick , Not sure on this but similar situation happened with me, when starting spark-shell it was starting a new cluster instead of using the existing cluster and this new cluster was a single node cluster , that's why jobs were taking forever to complete from spark-shell and were running much

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
e at least. > > Best, > > Rick > > > On Mon, Sep 28, 2015 at 8:24 PM, Kartik Mathur <kar...@bluedata.com> > wrote: > >> Hey Rick , >> Not sure on this but similar situation happened with me, when starting >> spark-shell it was starting a new clust