[ https://issues.apache.org/jira/browse/SPARK-32277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sathyaprakash Govindasamy updated SPARK-32277: ---------------------------------------------- Description: We have a spark application that uses jdbc to read a table from sql server and write as parquet file in s3. It simply reads data from jdbc and write as parquet file. We are using numPartitions = 200. So, driver creates 200 executors and creates 200 parquet files each with each files size ranging from 500 to 700 MB. Since maximum file size is 700MB, we assigned more than enough executor memory of 10GB. Out of 200 tasks, ~ 2-3 tasks fail at first attempt with below error and get succedeed in retry {noformat} ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 12.1 GB of 11.9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. {noformat} I used [JVM Profile|https://github.com/uber-common/jvm-profiler] to see what is happening First run {noformat} ConsoleOutputReporter - ProcessInfo: {"jvmInputArguments":"","role":"executor","jvmClassPath":"","epochMillis":1594423898037,"cmdline":"/usr/lib/jvm/java-openjdk/bin/java -server -Xmx10240m -Dlog4j.configuration=log4j_custom.properties -Droot.logging.level=INFO -Dapp.logging.level=DEBUG -javaagent:jvm-profiler-1.0.0.jar -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/masked/masked/tmp -Dspark.driver.port=40795 -Dspark.history.ui.port=18080 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/masked/masked -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@masked:40795 --executor-id 10 --hostname masked.us-west-2.compute.internal --cores 1 --app-id masked --user-class-path file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/__app__.jar --user-class-path file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/sqljdbc42.jar ","appId":"masked","name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","agentVersion":"1.0.0","appClass":null,"xmxBytes":10737418240,"appJar":null} ConsoleOutputReporter - CpuAndMemory: {"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.06792856E8,"bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457},{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":2.5333354E9,"vmRSS":13090037760,"epochMillis":1594424049590,"nonHeapMemoryCommitted":1.1071488E8,"heapMemoryCommitted":4.758437888E9,"memoryPools":[{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":24027264,"name":"Code Cache","peakUsageCommitted":24248320,"usageUsed":21922944,"type":"Non-heap memory","usageCommitted":24248320},{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":75585800,"name":"Metaspace","peakUsageCommitted":76898304,"usageUsed":75585800,"type":"Non-heap memory","usageCommitted":76898304},{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9284112,"name":"Compressed Class Space","peakUsageCommitted":9568256,"usageUsed":9284112,"type":"Non-heap memory","usageCommitted":9568256},{"peakUsageMax":3468165120,"usageMax":3044016128,"peakUsageUsed":3054501888,"name":"PS Eden Space","peakUsageCommitted":3054501888,"usageUsed":2171684568,"type":"Heap memory","usageCommitted":3036151808},{"peakUsageMax":280494080,"usageMax":260571136,"peakUsageUsed":227304480,"name":"PS Survivor Space","peakUsageCommitted":280494080,"usageUsed":189281504,"type":"Heap memory","usageCommitted":260571136},{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":172369328,"name":"PS Old Gen","peakUsageCommitted":1461714944,"usageUsed":172369328,"type":"Heap memory","usageCommitted":1461714944}],"processCpuLoad":0.025501861918418012,"systemCpuLoad":0.3168226139213957,"processCpuTime":208310000000,"vmHWM":13090111488,"appId":"masked","vmPeak":22408749056,"name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","nonHeapMemoryMax":-1.0,"vmSize":22408744960,"gc":[{"collectionTime":799,"name":"PS Scavenge","collectionCount":25},{"collectionTime":489,"name":"PS MarkSweep","collectionCount":3}]} {noformat} Important thing note from above is direct buffer pool is using 9 GB {noformat} "bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457}{noformat} Second attempt {noformat} ConsoleOutputReporter - CpuAndMemory: \{"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.08760784E8,"bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946},\{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":4.7919392E8,"vmRSS":4663619584,"epochMillis":1594424315035,"nonHeapMemoryCommitted":1.1300864E8,"heapMemoryCommitted":4.180148224E9,"memoryPools":[\{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":25152448,"name":"Code Cache","peakUsageCommitted":26279936,"usageUsed":23439488,"type":"Non-heap memory","usageCommitted":26279936},\{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":76023336,"name":"Metaspace","peakUsageCommitted":77160448,"usageUsed":76023336,"type":"Non-heap memory","usageCommitted":77160448},\{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9297960,"name":"Compressed Class Space","peakUsageCommitted":9568256,"usageUsed":9297960,"type":"Non-heap memory","usageCommitted":9568256},\{"peakUsageMax":3490185216,"usageMax":2050490368,"peakUsageUsed":3131047936,"name":"PS Eden Space","peakUsageCommitted":3131047936,"usageUsed":40669520,"type":"Heap memory","usageCommitted":1742209024},\{"peakUsageMax":1192755200,"usageMax":755499008,"peakUsageUsed":955224704,"name":"PS Survivor Space","peakUsageCommitted":1192755200,"usageUsed":624832,"type":"Heap memory","usageCommitted":755499008},\{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":933385336,"name":"PS Old Gen","peakUsageCommitted":1682440192,"usageUsed":437899568,"type":"Heap memory","usageCommitted":1682440192}],"processCpuLoad":7.326443423830174E-5,"systemCpuLoad":0.06332881529782401,"processCpuTime":454170000000,"vmHWM":13918568448,"appId":"application_1591825945062_6066","vmPeak":22872002560,"name":"31285@ip-100-106-124-90","host":"ip-100-106-124-90","processUuid":"9d00d0c1-2be2-406e-981e-31d55cd2fcac","nonHeapMemoryMax":-1.0,"vmSize":13525504000,"gc":[\{"collectionTime":2966,"name":"PS Scavenge","collectionCount":68},\{"collectionTime":1316,"name":"PS MarkSweep","collectionCount":5}]} {noformat} Direct buffer pool usage is now only 186MB. It is exactly same task, but interstingly in second attempt usage reduced from 9GB to only 186MB. {noformat} "bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946} {noformat} I think by default Direct buffer pool memory allocated in jvm is same as executor memory. So, I passed below config to limit it to 512 MB. {noformat} --conf spark.executor.extraJavaOptions="-XX:MaxDirectMemorySize=512m" {noformat} After i set this config, there is no more task failures. Even i reduced executor memory from 10GB to 4GB and it still worked fine without any issue. I think we can increase MaxDirectMemorySize value to even higher but didn't try that. We would like to know whether we can control the usage of direct buffer pool in spark without explictly setting its value. was: We have a spark application that uses jdbc to read a table from sql server and write as parquet file in s3. It simply reads data from jdbc and write as parquet file. We are using numPartitions = 200. So, driver creates 200 executors and creates 200 parquet files each with each files size ranging from 500 to 700 MB. Since maximum file size is 700MB, we assigned more than enough executor memory of 10GB. Out of 200 tasks, ~ 2-3 tasks fail at first attempt with below error and get succedeed in retry {noformat} ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 12.1 GB of 11.9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. {noformat} I used [JVM Profile|https://github.com/uber-common/jvm-profiler] to see what is happening First run {noformat} ConsoleOutputReporter - ProcessInfo: {"jvmInputArguments":"","role":"executor","jvmClassPath":"","epochMillis":1594423898037,"cmdline":"/usr/lib/jvm/java-openjdk/bin/java -server -Xmx10240m -Dlog4j.configuration=log4j_custom.properties -Droot.logging.level=INFO -Dapp.logging.level=DEBUG -javaagent:jvm-profiler-1.0.0.jar -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/masked/masked/tmp -Dspark.driver.port=40795 -Dspark.history.ui.port=18080 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/masked/masked -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@masked:40795 --executor-id 10 --hostname masked.us-west-2.compute.internal --cores 1 --app-id masked --user-class-path file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/__app__.jar --user-class-path file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/sqljdbc42.jar ","appId":"masked","name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","agentVersion":"1.0.0","appClass":null,"xmxBytes":10737418240,"appJar":null} ConsoleOutputReporter - CpuAndMemory: {"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.06792856E8,"bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457},{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":2.5333354E9,"vmRSS":13090037760,"epochMillis":1594424049590,"nonHeapMemoryCommitted":1.1071488E8,"heapMemoryCommitted":4.758437888E9,"memoryPools":[{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":24027264,"name":"Code Cache","peakUsageCommitted":24248320,"usageUsed":21922944,"type":"Non-heap memory","usageCommitted":24248320},{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":75585800,"name":"Metaspace","peakUsageCommitted":76898304,"usageUsed":75585800,"type":"Non-heap memory","usageCommitted":76898304},{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9284112,"name":"Compressed Class Space","peakUsageCommitted":9568256,"usageUsed":9284112,"type":"Non-heap memory","usageCommitted":9568256},{"peakUsageMax":3468165120,"usageMax":3044016128,"peakUsageUsed":3054501888,"name":"PS Eden Space","peakUsageCommitted":3054501888,"usageUsed":2171684568,"type":"Heap memory","usageCommitted":3036151808},{"peakUsageMax":280494080,"usageMax":260571136,"peakUsageUsed":227304480,"name":"PS Survivor Space","peakUsageCommitted":280494080,"usageUsed":189281504,"type":"Heap memory","usageCommitted":260571136},{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":172369328,"name":"PS Old Gen","peakUsageCommitted":1461714944,"usageUsed":172369328,"type":"Heap memory","usageCommitted":1461714944}],"processCpuLoad":0.025501861918418012,"systemCpuLoad":0.3168226139213957,"processCpuTime":208310000000,"vmHWM":13090111488,"appId":"masked","vmPeak":22408749056,"name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","nonHeapMemoryMax":-1.0,"vmSize":22408744960,"gc":[{"collectionTime":799,"name":"PS Scavenge","collectionCount":25},{"collectionTime":489,"name":"PS MarkSweep","collectionCount":3}]} {noformat} Important thing note from above is direct buffer pool is using 9 GB {noformat} "bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457}{noformat} Second attempt {noformat} ConsoleOutputReporter - CpuAndMemory: \{"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.08760784E8,"bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946},\{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":4.7919392E8,"vmRSS":4663619584,"epochMillis":1594424315035,"nonHeapMemoryCommitted":1.1300864E8,"heapMemoryCommitted":4.180148224E9,"memoryPools":[\{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":25152448,"name":"Code Cache","peakUsageCommitted":26279936,"usageUsed":23439488,"type":"Non-heap memory","usageCommitted":26279936},\{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":76023336,"name":"Metaspace","peakUsageCommitted":77160448,"usageUsed":76023336,"type":"Non-heap memory","usageCommitted":77160448},\{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9297960,"name":"Compressed Class Space","peakUsageCommitted":9568256,"usageUsed":9297960,"type":"Non-heap memory","usageCommitted":9568256},\{"peakUsageMax":3490185216,"usageMax":2050490368,"peakUsageUsed":3131047936,"name":"PS Eden Space","peakUsageCommitted":3131047936,"usageUsed":40669520,"type":"Heap memory","usageCommitted":1742209024},\{"peakUsageMax":1192755200,"usageMax":755499008,"peakUsageUsed":955224704,"name":"PS Survivor Space","peakUsageCommitted":1192755200,"usageUsed":624832,"type":"Heap memory","usageCommitted":755499008},\{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":933385336,"name":"PS Old Gen","peakUsageCommitted":1682440192,"usageUsed":437899568,"type":"Heap memory","usageCommitted":1682440192}],"processCpuLoad":7.326443423830174E-5,"systemCpuLoad":0.06332881529782401,"processCpuTime":454170000000,"vmHWM":13918568448,"appId":"application_1591825945062_6066","vmPeak":22872002560,"name":"31285@ip-100-106-124-90","host":"ip-100-106-124-90","processUuid":"9d00d0c1-2be2-406e-981e-31d55cd2fcac","nonHeapMemoryMax":-1.0,"vmSize":13525504000,"gc":[\{"collectionTime":2966,"name":"PS Scavenge","collectionCount":68},\{"collectionTime":1316,"name":"PS MarkSweep","collectionCount":5}]} {noformat} Direct buffer pool usage is now only 186MB. It is exactly same task, but interstingly it reduced from 9GB to only 186MB. {noformat} "bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946} {noformat} I think by default Direct buffer pool memory allocated in jvm is same as executor memory. So, I passed below config to limit it to 512 MB. {noformat} --conf spark.executor.extraJavaOptions="-XX:MaxDirectMemorySize=512m" {noformat} After i set this config, there is no more task failures. Even i reduced executor memory from 10GB to 4GB and it still worked fine without any issue. I think we can increase MaxDirectMemorySize value to even higher but didn't try that. We would like to know whether we can control the usage of direct buffer pool in spark without explictly setting its value. > Memory limit exception on high usage of direct buffer pool > ---------------------------------------------------------- > > Key: SPARK-32277 > URL: https://issues.apache.org/jira/browse/SPARK-32277 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4 > Reporter: Sathyaprakash Govindasamy > Priority: Major > > We have a spark application that uses jdbc to read a table from sql server > and write as parquet file in s3. It simply reads data from jdbc and write as > parquet file. > We are using numPartitions = 200. So, driver creates 200 executors and > creates 200 parquet files each with each files size ranging from 500 to 700 > MB. > Since maximum file size is 700MB, we assigned more than enough executor > memory of 10GB. > Out of 200 tasks, ~ 2-3 tasks fail at first attempt with below error and get > succedeed in retry > {noformat} > ExecutorLostFailure (executor 13 exited caused by one of the running tasks) > Reason: Container killed by YARN for exceeding memory limits. 12.1 GB of 11.9 > GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead > or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. > {noformat} > I used [JVM Profile|https://github.com/uber-common/jvm-profiler] to see what > is happening > First run > {noformat} > ConsoleOutputReporter - ProcessInfo: > {"jvmInputArguments":"","role":"executor","jvmClassPath":"","epochMillis":1594423898037,"cmdline":"/usr/lib/jvm/java-openjdk/bin/java > -server -Xmx10240m -Dlog4j.configuration=log4j_custom.properties > -Droot.logging.level=INFO -Dapp.logging.level=DEBUG > -javaagent:jvm-profiler-1.0.0.jar > -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/masked/masked/tmp > -Dspark.driver.port=40795 -Dspark.history.ui.port=18080 -Dspark.ui.port=0 > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/masked/masked > -XX:OnOutOfMemoryError=kill %p > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@masked:40795 --executor-id 10 --hostname > masked.us-west-2.compute.internal --cores 1 --app-id masked --user-class-path > file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/__app__.jar > --user-class-path > file:/mnt/yarn/usercache/hadoop/appcache/masked/masked/sqljdbc42.jar > ","appId":"masked","name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","agentVersion":"1.0.0","appClass":null,"xmxBytes":10737418240,"appJar":null} > ConsoleOutputReporter - CpuAndMemory: > {"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.06792856E8,"bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457},{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":2.5333354E9,"vmRSS":13090037760,"epochMillis":1594424049590,"nonHeapMemoryCommitted":1.1071488E8,"heapMemoryCommitted":4.758437888E9,"memoryPools":[{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":24027264,"name":"Code > Cache","peakUsageCommitted":24248320,"usageUsed":21922944,"type":"Non-heap > memory","usageCommitted":24248320},{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":75585800,"name":"Metaspace","peakUsageCommitted":76898304,"usageUsed":75585800,"type":"Non-heap > > memory","usageCommitted":76898304},{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9284112,"name":"Compressed > Class > Space","peakUsageCommitted":9568256,"usageUsed":9284112,"type":"Non-heap > memory","usageCommitted":9568256},{"peakUsageMax":3468165120,"usageMax":3044016128,"peakUsageUsed":3054501888,"name":"PS > Eden > Space","peakUsageCommitted":3054501888,"usageUsed":2171684568,"type":"Heap > memory","usageCommitted":3036151808},{"peakUsageMax":280494080,"usageMax":260571136,"peakUsageUsed":227304480,"name":"PS > Survivor > Space","peakUsageCommitted":280494080,"usageUsed":189281504,"type":"Heap > memory","usageCommitted":260571136},{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":172369328,"name":"PS > Old Gen","peakUsageCommitted":1461714944,"usageUsed":172369328,"type":"Heap > memory","usageCommitted":1461714944}],"processCpuLoad":0.025501861918418012,"systemCpuLoad":0.3168226139213957,"processCpuTime":208310000000,"vmHWM":13090111488,"appId":"masked","vmPeak":22408749056,"name":"31286@masked","host":"masked","processUuid":"9a6b4f57-d5df-4dfe-87bd-3205392d2a6c","nonHeapMemoryMax":-1.0,"vmSize":22408744960,"gc":[{"collectionTime":799,"name":"PS > Scavenge","collectionCount":25},{"collectionTime":489,"name":"PS > MarkSweep","collectionCount":3}]} > {noformat} > Important thing note from above is direct buffer pool is using 9 GB > {noformat} > "bufferPools":[{"totalCapacity":9069172456,"name":"direct","count":280,"memoryUsed":9069172457}{noformat} > Second attempt > {noformat} > ConsoleOutputReporter - CpuAndMemory: > \{"heapMemoryMax":9.54466304E9,"role":"executor","nonHeapMemoryTotalUsed":1.08760784E8,"bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946},\{"totalCapacity":0,"name":"mapped","count":0,"memoryUsed":0}],"heapMemoryTotalUsed":4.7919392E8,"vmRSS":4663619584,"epochMillis":1594424315035,"nonHeapMemoryCommitted":1.1300864E8,"heapMemoryCommitted":4.180148224E9,"memoryPools":[\{"peakUsageMax":251658240,"usageMax":251658240,"peakUsageUsed":25152448,"name":"Code > Cache","peakUsageCommitted":26279936,"usageUsed":23439488,"type":"Non-heap > memory","usageCommitted":26279936},\{"peakUsageMax":-1,"usageMax":-1,"peakUsageUsed":76023336,"name":"Metaspace","peakUsageCommitted":77160448,"usageUsed":76023336,"type":"Non-heap > > memory","usageCommitted":77160448},\{"peakUsageMax":1073741824,"usageMax":1073741824,"peakUsageUsed":9297960,"name":"Compressed > Class > Space","peakUsageCommitted":9568256,"usageUsed":9297960,"type":"Non-heap > memory","usageCommitted":9568256},\{"peakUsageMax":3490185216,"usageMax":2050490368,"peakUsageUsed":3131047936,"name":"PS > Eden > Space","peakUsageCommitted":3131047936,"usageUsed":40669520,"type":"Heap > memory","usageCommitted":1742209024},\{"peakUsageMax":1192755200,"usageMax":755499008,"peakUsageUsed":955224704,"name":"PS > Survivor > Space","peakUsageCommitted":1192755200,"usageUsed":624832,"type":"Heap > memory","usageCommitted":755499008},\{"peakUsageMax":7158628352,"usageMax":7158628352,"peakUsageUsed":933385336,"name":"PS > Old Gen","peakUsageCommitted":1682440192,"usageUsed":437899568,"type":"Heap > memory","usageCommitted":1682440192}],"processCpuLoad":7.326443423830174E-5,"systemCpuLoad":0.06332881529782401,"processCpuTime":454170000000,"vmHWM":13918568448,"appId":"application_1591825945062_6066","vmPeak":22872002560,"name":"31285@ip-100-106-124-90","host":"ip-100-106-124-90","processUuid":"9d00d0c1-2be2-406e-981e-31d55cd2fcac","nonHeapMemoryMax":-1.0,"vmSize":13525504000,"gc":[\{"collectionTime":2966,"name":"PS > Scavenge","collectionCount":68},\{"collectionTime":1316,"name":"PS > MarkSweep","collectionCount":5}]} > {noformat} > Direct buffer pool usage is now only 186MB. It is exactly same task, but > interstingly in second attempt usage reduced from 9GB to only 186MB. > {noformat} > "bufferPools":[{"totalCapacity":186613945,"name":"direct","count":9,"memoryUsed":186613946} > {noformat} > I think by default Direct buffer pool memory allocated in jvm is same as > executor memory. So, I passed below config to limit it to 512 MB. > {noformat} > --conf spark.executor.extraJavaOptions="-XX:MaxDirectMemorySize=512m" > {noformat} > After i set this config, there is no more task failures. Even i reduced > executor memory from 10GB to 4GB and it still worked fine without any issue. > I think we can increase MaxDirectMemorySize value to even higher but didn't > try that. > We would like to know whether we can control the usage of direct buffer pool > in spark without explictly setting its value. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org