Hi All
Please check this jira ticket regarding the issue. I was having the same issue
with shuffling. Seems like the shuffling memory max is 2g.
https://issues.apache.org/jira/browse/SPARK-5928
<https://issues.apache.org/jira/browse/SPARK-5928>
> On Feb 11, 2016, at 9:08 AM, arun.bong...@cognizant.com wrote:
>
> Hi All,
>
> Even i have same issues.
>
> EMR conf is 3 node cluster with m3.2xlarge.
>
> i'm tyring to read 100Gb file in spark-sql
>
> i have set below on spark
>
> export SPARK_EXECUTOR_MEMORY=4G
> export SPARK_DRIVER_MEMORY=12G
>
> export SPARK_EXECUTOR_INSTANCES=16
> export SPARK_EXECUTOR_CORES=16
>
> spark.kryoserializer.buffer.max 2000m
> spark.driver.maxResultSize 0
>
> -XX:MaxPermSize=1024M
>
>
> PFB the error:
>
> 16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor exception
> for block BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984
> java.io.EOFException: Premature EOF: no length prefix available
> at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745)
>
> Kindly help me understand the conf.
>
>
> Thanks in advance.
>
> Regards
> Arun.
>
> From: Kuchekar [kuchekar.nil...@gmail.com]
> Sent: 11 February 2016 09:42
> To: Nirav Patel
> Cc: spark users
> Subject: Re: Spark execuotr Memory profiling
>
> Hi Nirav,
>
> I faced similar issue with Yarn, EMR 1.5.2 and following
> Spark Conf helped me. You can set the values accordingly
> conf=
> (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay").set("spark.driver.memory",
> "15G").set("spark.yarn.am.memory","15G"))
>
> conf=conf.set("spark.driver.maxResultSize","10G").set("spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction","0.6").set("spark.yarn.executor.memoryOverhead","4000")
>
> conf = conf.set("spark.executor.cores","4").set("spark.executor.memory",
> "15G").set("spark.executor.instances","6")
>
>
> Is it also possible to use reduceBy in place of groupBy that might help the
> shuffling too.
>
>
> Kuchekar, Nilesh
>
> On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com
> <x-msg://22/redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>>
> wrote:
> We have been trying to solve memory issue with a spark job that processes
> 150GB of data (on disk). It does a groupBy operation; some of the executor
> will receive somehwere around (2-4M scala case objects) to work with. We are
> using following spark config:
>
> "executorInstances": "15",
>
> "executorCores": "1", (we reduce it to one so single task gets all the
> executorMemory! at least that's the assumption here)
>
> "executorMemory": "15000m",
>
> "minPartitions": "2000",
>
> "taskCpus": "1",
>
> "executorMemoryOverhead": "1300",
>
> "shuffleManager": "tungsten-sort",
>
>
> "storageFraction": "0.4"
>
>
>
> This is a snippet of what we see in spark UI for a Job that fails.
>
> This is a stage of this job that fails.
>
>
> Stage Id Pool Name Description Submitted Duration
> Tasks: Succeeded/Total Input Output Shuffle Read ▾ Shuffle Write
> Failure Reason
> 5 (retry 15) prod
> <x-msg://22/redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k>
> map at SparkDataJobs.scala:210
> <x-msg://22/redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>+details
> 2016/02/09 21:30:06 13 min
> 130/389 (16 failed)
> 1982.6 MB 818.7 MB org.apache.spark.shuffle.FetchFailedException:
> Error in opening
> FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
> offset=11421300, length=2353}
> This is one of the single task attempt from above stage that threw OOM
>
>
> 2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local
> 2016/02/09 22:10:42 5.2 min 1.6 min 7.4 MB / 375509
> java.lang.OutOfMemoryError: Java heap space+details
> java.lang.OutOfMemoryError: Java heap space
> at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
> at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
> at
> org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
> at
> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
> at
> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
> at
> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
> at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
> at
> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
> at
> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
> at
> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3
>
>
> None of above suggest that it went out ot 15GB of memory that I initially
> allocated? So what am i missing here. What's eating my memory.
>
> We tried executorJavaOpts to get heap dump but it doesn't seem to work.
>
>
> -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p'
> -XX:HeapDumpPath=/opt/cores/spark
>
> I don't see any cores being generated.. neither I can find Heap dump anywhere
> in logs.
>
> Also, how do I find yarn container ID from spark executor ID ? So that I can
> investigate yarn nodemanager and resourcemanager logs for particular
> container.
> PS - Job does not do any caching of intermediate RDD as each RDD is just used
> once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
>
>
> <x-msg://22/redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..>
>
>
> <x-msg://22/redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.>
>
> <x-msg://22/redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.>
>
> <x-msg://22/redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..>
>
> <x-msg://22/redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..>
>
> <x-msg://22/redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and from
> Cognizant e-mail addresses may be monitored.