Hi All
Please check this jira ticket regarding the issue. I was having the same issue 
with shuffling. Seems like the shuffling memory max is 2g.
 https://issues.apache.org/jira/browse/SPARK-5928 
<https://issues.apache.org/jira/browse/SPARK-5928>
> On Feb 11, 2016, at 9:08 AM, arun.bong...@cognizant.com wrote:
> 
> Hi All,
> 
> Even i have same issues. 
> 
> EMR conf is 3 node cluster with m3.2xlarge.
> 
> i'm tyring to read 100Gb file in spark-sql
> 
> i have set below on spark
> 
> export SPARK_EXECUTOR_MEMORY=4G
> export SPARK_DRIVER_MEMORY=12G
> 
> export SPARK_EXECUTOR_INSTANCES=16
> export SPARK_EXECUTOR_CORES=16
> 
> spark.kryoserializer.buffer.max 2000m
> spark.driver.maxResultSize 0
> 
>  -XX:MaxPermSize=1024M
> 
> 
> PFB the error:
> 
> 16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor exception 
>  for block BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984
> java.io.EOFException: Premature EOF: no length prefix available
>         at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745)
> 
> Kindly help me understand the conf.
> 
> 
> Thanks in advance.
> 
> Regards
> Arun.
> 
> From: Kuchekar [kuchekar.nil...@gmail.com]
> Sent: 11 February 2016 09:42
> To: Nirav Patel
> Cc: spark users
> Subject: Re: Spark execuotr Memory profiling
> 
> Hi Nirav,
> 
>                   I faced similar issue with Yarn, EMR 1.5.2 and following 
> Spark Conf helped me. You can set the values accordingly
> conf= 
> (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay").set("spark.driver.memory",
>  "15G").set("spark.yarn.am.memory","15G"))
> 
> conf=conf.set("spark.driver.maxResultSize","10G").set("spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction","0.6").set("spark.yarn.executor.memoryOverhead","4000")
> 
> conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", 
> "15G").set("spark.executor.instances","6")
> 
> 
> Is it also possible to use reduceBy in place of groupBy that might help the 
> shuffling too. 
> 
> 
> Kuchekar, Nilesh
> 
> On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com 
> <x-msg://22/redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>>
>  wrote:
> We have been trying to solve memory issue with a spark job that processes 
> 150GB of data (on disk). It does a groupBy operation; some of the executor 
> will receive somehwere around (2-4M scala case objects) to work with. We are 
> using following spark config:
> 
> "executorInstances": "15",
> 
>      "executorCores": "1", (we reduce it to one so single task gets all the 
> executorMemory! at least that's the assumption here)
> 
>      "executorMemory": "15000m",
> 
>      "minPartitions": "2000",
> 
>      "taskCpus": "1", 
> 
>      "executorMemoryOverhead": "1300",
> 
>      "shuffleManager": "tungsten-sort",
> 
> 
>       "storageFraction": "0.4"
> 
> 
> 
> This is a snippet of what we see in spark UI for a Job that fails. 
> 
> This is a stage of this job that fails.
> 
> 
> Stage Id      Pool Name       Description     Submitted       Duration        
> Tasks: Succeeded/Total  Input   Output  Shuffle Read ▾  Shuffle Write   
> Failure Reason
> 5 (retry 15)  prod 
> <x-msg://22/redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k>
>    map at SparkDataJobs.scala:210 
> <x-msg://22/redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>+details
> 2016/02/09 21:30:06   13 min  
> 130/389 (16 failed)
> 1982.6 MB     818.7 MB        org.apache.spark.shuffle.FetchFailedException: 
> Error in opening 
> FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
>  offset=11421300, length=2353}
> This is one of the single task attempt from above stage that threw OOM
> 
> 
> 2     22361   0       FAILED  PROCESS_LOCAL   38 / nd1.mycom.local    
> 2016/02/09 22:10:42     5.2 min 1.6 min 7.4 MB / 375509 
> java.lang.OutOfMemoryError: Java heap space+details
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
>       at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
>       at 
> org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
>       at 
> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
>       at 
> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
>       at scala.collection.immutable.List.foreach(List.scala:318)
>       at 
> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
>       at 
> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
>       at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
>       at 
> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>       at 
> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>       at 
> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3
> 
> 
> None of above suggest that it went out ot 15GB of memory that I initially 
> allocated? So what am i missing here. What's eating my memory. 
> 
> We tried executorJavaOpts to get heap dump but it doesn't seem to work.
> 
> 
> -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p' 
> -XX:HeapDumpPath=/opt/cores/spark
> 
> I don't see any cores being generated.. neither I can find Heap dump anywhere 
> in logs.
> 
> Also, how do I find yarn container ID from spark executor ID ? So that I can 
> investigate yarn nodemanager and resourcemanager logs for particular 
> container.
> PS - Job does not do any caching of intermediate RDD as each RDD is just used 
> once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client mode.
> 
> 
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
>  
> <x-msg://22/redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..>
> 
>  
> <x-msg://22/redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.>
>    
> <x-msg://22/redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.>
>    
> <x-msg://22/redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..>
>    
> <x-msg://22/redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..>
>    
> <x-msg://22/redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u>
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.

Reply via email to