[jira] [Updated] (SPARK-37588) lot of strings get accumulated in the heap dump of spark thrift server

ramakrishna chilaka (Jira) Wed, 08 Dec 2021 18:40:08 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-37588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramakrishna chilaka updated SPARK-37588:
----------------------------------------
    Description: 
I am starting spark thrift server using the following options

```

/data/spark/sbin/start-thriftserver.sh --master spark://*****:7077 --conf 
"spark.cores.max=320" --conf "spark.executor.cores=3" --conf 
"spark.driver.cores=15" --executor-memory=10G --driver-memory=50G --conf 
spark.sql.adaptive.coalescePartitions.enabled=true --conf 
spark.sql.adaptive.skewJoin.enabled=true --conf spark.sql.cbo.enabled=true 
--conf spark.sql.adaptive.enabled=true --conf spark.rpc.io.serverThreads=64 
--conf "spark.driver.maxResultSize=4G" --conf 
"spark.max.fetch.failures.per.stage=10" --conf 
"spark.sql.thriftServer.incrementalCollect=false" --conf 
"spark.ui.reverseProxy=true" --conf "spark.ui.reverseProxyUrl=/spark_ui" --conf 
"spark.sql.autoBroadcastJoinThreshold=1073741824" --conf 
spark.sql.thriftServer.interruptOnCancel=true --conf 
spark.sql.thriftServer.queryTimeout=0 --hiveconf 
hive.server2.transport.mode=http --hiveconf 
hive.server2.thrift.http.path=spark_sql --hiveconf 
hive.server2.thrift.min.worker.threads=500 --hiveconf 
hive.server2.thrift.max.worker.threads=2147483647 --hiveconf 
hive.server2.thrift.http.cookie.is.secure=false --hiveconf 
hive.server2.thrift.http.cookie.auth.enabled=false --hiveconf 
hive.server2.authentication=NONE --hiveconf hive.server2.enable.doAs=false 
--hiveconf spark.sql.hive.thriftServer.singleSession=true --hiveconf 
hive.server2.thrift.bind.host=0.0.0.0 --conf 
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
 --conf "spark.sql.cbo.joinReorder.enabled=true" --conf 
"spark.sql.optimizer.dynamicPartitionPruning.enabled=true" --conf 
"spark.worker.cleanup.enabled=true" --conf 
"spark.worker.cleanup.appDataTtl=3600" --hiveconf 
hive.exec.scratchdir=/data/spark_scratch/hive --hiveconf 
hive.exec.local.scratchdir=/data/spark_scratch/local_scratch_dir --hiveconf 
hive.download.resources.dir=/data/spark_scratch/hive.downloaded.resources.dir 
--hiveconf hive.querylog.location=/data/spark_scratch/hive.querylog.location 
--conf spark.executor.extraJavaOptions="-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" --conf spark.driver.extraJavaOptions="-verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
-Xloggc:/data/thrift_driver_gc.log -XX:+ExplicitGCInvokesConcurrent 
-XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 
-XX:AdaptiveSizePolicyWeight=90 -XX:MaxRAM=55g" --hiveconf 
"hive.server2.session.check.interval=60000" --hiveconf 
"hive.server2.idle.session.timeout=900000" --hiveconf 
"hive.server2.idle.session.check.operation=true" --conf 
"spark.eventLog.enabled=false" --conf "spark.cleaner.periodicGC.interval=5min" 
--conf "spark.appStateStore.asyncTracking.enable=false" --conf 
"spark.ui.retainedJobs=30" --conf "spark.ui.retainedStages=100" --conf 
"spark.ui.retainedTasks=500" --conf "spark.sql.ui.retainedExecutions=10" --conf 
"spark.ui.retainedDeadExecutors=10" --conf 
"spark.worker.ui.retainedExecutors=10" --conf 
"spark.worker.ui.retainedDrivers=10" --conf spark.ui.enabled=false --conf 
spark.stage.maxConsecutiveAttempts=10 --conf spark.executor.memoryOverhead=1G 
--conf "spark.io.compression.codec=snappy" --conf 
"spark.default.parallelism=640" --conf spark.memory.offHeap.enabled=true --conf 
"spark.memory.offHeap.size=3g" --conf "spark.memory.fraction=0.75" --conf 
"spark.memory.storageFraction=0.75"

```

the java heap dump after heavy usage is as follows
```
 1:      50465861     9745837152  [C
   2:      23337896     1924089944  [Ljava.lang.Object;
   3:      72524905     1740597720  java.lang.Long
   4:      50463694     1614838208  java.lang.String
   5:      22718029      726976928  
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
   6:       2259416      343483328  [Lscala.collection.mutable.HashEntry;
   7:            16      141744616  [Lorg.apache.spark.sql.Row;
   8:        532529      123546728  
org.apache.spark.sql.catalyst.expressions.Cast
   9:        535418       72816848  
org.apache.spark.sql.catalyst.expressions.Literal
  10:       1105284       70738176  scala.collection.mutable.LinkedHashSet
  11:       1725833       70655016  [J
  12:       1154128       55398144  scala.collection.mutable.HashMap
  13:       1720740       55063680  org.apache.spark.util.collection.BitSet
  14:        572222       50355536  scala.collection.immutable.Vector
  15:       1602297       38455128  scala.Some
  16:       1154303       36937696  scala.collection.immutable.$colon$colon
  17:       1105284       26526816  
org.apache.spark.sql.catalyst.expressions.AttributeSet
  18:       1066442       25594608  java.lang.Integer
  19:        735502       23536064  scala.collection.immutable.HashSet$HashSet1
  20:         10300       19511408  [B
  21:        543994       17407808  scala.Tuple2
  22:        530244       16967808  org.apache.spark.sql.catalyst.trees.Origin
  23:        274445       13173360  java.util.Hashtable$Entry
  24:        225826       13089920  [Lscala.collection.immutable.HashSet;
  25:        529922       12718128  
org.apache.spark.sql.catalyst.expressions.CastBase$$Lambda$3882/1746188635
  26:        221866       10649568  java.util.concurrent.ConcurrentHashMap$Node
  27:        384729        9233496  java.lang.Double
  28:        225826        7226432  
scala.collection.immutable.HashSet$HashTrieSet
  29:          1071        6659680  
[Ljava.util.concurrent.ConcurrentHashMap$Node;
  30:         25853        4348744  java.lang.Class
  31:          5760        3916800  
io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
  32:          1232        3710888  [Ljava.util.Hashtable$Entry;
  33:         17303        3460600  
org.apache.spark.sql.catalyst.expressions.objects.Invoke
  34:         20409        3265440  java.lang.reflect.Method
  35:         47506        2280288  java.util.HashMap$Node
  36:          7489        1568600  [Ljava.util.HashMap$Node;
  37:           285        1539888  [Ljava.nio.ByteBuffer;
  38:           237        1452984  [[B
  39:          4138        1291056  org.apache.spark.status.TaskDataWrapper
```
 there is 9.7 gb of heap memory that is occupied by char array, not sure, why 
is it ? Can someone help me understand remove this 9.7 gb of memory.

  was:
I am starting spark thrift server using the following options

```

/data/spark/sbin/start-thriftserver.sh --master spark://*****:7077 --conf 
"spark.cores.max=320" --conf "spark.executor.cores=3" --conf 
"spark.driver.cores=15" --executor-memory=10G --driver-memory=50G --conf 
spark.sql.adaptive.coalescePartitions.enabled=true --conf 
spark.sql.adaptive.skewJoin.enabled=true --conf spark.sql.cbo.enabled=true 
--conf spark.sql.adaptive.enabled=true --conf spark.rpc.io.serverThreads=64 
--conf "spark.driver.maxResultSize=4G" --conf 
"spark.max.fetch.failures.per.stage=10" --conf 
"spark.sql.thriftServer.incrementalCollect=false" --conf 
"spark.ui.reverseProxy=true" --conf "spark.ui.reverseProxyUrl=/spark_ui" --conf 
"spark.sql.autoBroadcastJoinThreshold=1073741824" --conf 
spark.sql.thriftServer.interruptOnCancel=true --conf 
spark.sql.thriftServer.queryTimeout=0 --hiveconf 
hive.server2.transport.mode=http --hiveconf 
hive.server2.thrift.http.path=spark_sql --hiveconf 
hive.server2.thrift.min.worker.threads=500 --hiveconf 
hive.server2.thrift.max.worker.threads=2147483647 --hiveconf 
hive.server2.thrift.http.cookie.is.secure=false --hiveconf 
hive.server2.thrift.http.cookie.auth.enabled=false --hiveconf 
hive.server2.authentication=NONE --hiveconf hive.server2.enable.doAs=false 
--hiveconf spark.sql.hive.thriftServer.singleSession=true --hiveconf 
hive.server2.thrift.bind.host=0.0.0.0 --conf 
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
 --conf "spark.sql.cbo.joinReorder.enabled=true" --conf 
"spark.sql.optimizer.dynamicPartitionPruning.enabled=true" --conf 
"spark.worker.cleanup.enabled=true" --conf 
"spark.worker.cleanup.appDataTtl=3600" --hiveconf 
hive.exec.scratchdir=/data/spark_scratch/hive --hiveconf 
hive.exec.local.scratchdir=/data/spark_scratch/local_scratch_dir --hiveconf 
hive.download.resources.dir=/data/spark_scratch/hive.downloaded.resources.dir 
--hiveconf hive.querylog.location=/data/spark_scratch/hive.querylog.location 
--conf spark.executor.extraJavaOptions="-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" --conf spark.driver.extraJavaOptions="-verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
-Xloggc:/data/thrift_driver_gc.log -XX:+ExplicitGCInvokesConcurrent 
-XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 
-XX:AdaptiveSizePolicyWeight=90 -XX:MaxRAM=55g" --hiveconf 
"hive.server2.session.check.interval=60000" --hiveconf 
"hive.server2.idle.session.timeout=900000" --hiveconf 
"hive.server2.idle.session.check.operation=true" --conf 
"spark.eventLog.enabled=false" --conf "spark.cleaner.periodicGC.interval=5min" 
--conf "spark.appStateStore.asyncTracking.enable=false" --conf 
"spark.ui.retainedJobs=30" --conf "spark.ui.retainedStages=100" --conf 
"spark.ui.retainedTasks=500" --conf "spark.sql.ui.retainedExecutions=10" --conf 
"spark.ui.retainedDeadExecutors=10" --conf 
"spark.worker.ui.retainedExecutors=10" --conf 
"spark.worker.ui.retainedDrivers=10" --conf spark.ui.enabled=false --conf 
spark.stage.maxConsecutiveAttempts=10 --conf spark.executor.memoryOverhead=1G 
--conf "spark.io.compression.codec=snappy" --conf 
"spark.default.parallelism=640" --conf spark.memory.offHeap.enabled=true --conf 
"spark.memory.offHeap.size=3g" --conf "spark.memory.fraction=0.75" --conf 
"spark.memory.storageFraction=0.75"

```

the java heap dump after heavy usage is as follows
```
 1:      50465861     9745837152  [C
   2:      23337896     1924089944  [Ljava.lang.Object;
   3:      72524905     1740597720  java.lang.Long
   4:      50463694     1614838208  java.lang.String
   5:      22718029      726976928  
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
   6:       2259416      343483328  [Lscala.collection.mutable.HashEntry;
   7:            16      141744616  [Lorg.apache.spark.sql.Row;
   8:        532529      123546728  
org.apache.spark.sql.catalyst.expressions.Cast
   9:        535418       72816848  
org.apache.spark.sql.catalyst.expressions.Literal
  10:       1105284       70738176  scala.collection.mutable.LinkedHashSet
  11:       1725833       70655016  [J
  12:       1154128       55398144  scala.collection.mutable.HashMap
  13:       1720740       55063680  org.apache.spark.util.collection.BitSet
  14:        572222       50355536  scala.collection.immutable.Vector
  15:       1602297       38455128  scala.Some
  16:       1154303       36937696  scala.collection.immutable.$colon$colon
  17:       1105284       26526816  
org.apache.spark.sql.catalyst.expressions.AttributeSet
  18:       1066442       25594608  java.lang.Integer
  19:        735502       23536064  scala.collection.immutable.HashSet$HashSet1
  20:         10300       19511408  [B
  21:        543994       17407808  scala.Tuple2
  22:        530244       16967808  org.apache.spark.sql.catalyst.trees.Origin
  23:        274445       13173360  java.util.Hashtable$Entry
  24:        225826       13089920  [Lscala.collection.immutable.HashSet;
  25:        529922       12718128  
org.apache.spark.sql.catalyst.expressions.CastBase$$Lambda$3882/1746188635
  26:        221866       10649568  java.util.concurrent.ConcurrentHashMap$Node
  27:        384729        9233496  java.lang.Double
  28:        225826        7226432  
scala.collection.immutable.HashSet$HashTrieSet
  29:          1071        6659680  
[Ljava.util.concurrent.ConcurrentHashMap$Node;
  30:         25853        4348744  java.lang.Class
  31:          5760        3916800  
io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
  32:          1232        3710888  [Ljava.util.Hashtable$Entry;
  33:         17303        3460600  
org.apache.spark.sql.catalyst.expressions.objects.Invoke
  34:         20409        3265440  java.lang.reflect.Method
  35:         47506        2280288  java.util.HashMap$Node
  36:          7489        1568600  [Ljava.util.HashMap$Node;
  37:           285        1539888  [Ljava.nio.ByteBuffer;
  38:           237        1452984  [[B
  39:          4138        1291056  org.apache.spark.status.TaskDataWrapper
```
 there is 9.7 gb of heap memory that is occupied.


> lot of strings get accumulated in the heap dump of spark thrift server
> ----------------------------------------------------------------------
>
>                 Key: SPARK-37588
>                 URL: https://issues.apache.org/jira/browse/SPARK-37588
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.2.0
>         Environment: Open JDK (8 build 1.8.0_312-b07) and scala 12.12
> OS: Red Hat Enterprise Linux 8.4 (Ootpa), platform:el8
>            Reporter: ramakrishna chilaka
>            Priority: Major
>
> I am starting spark thrift server using the following options
> ```
> /data/spark/sbin/start-thriftserver.sh --master spark://*****:7077 --conf 
> "spark.cores.max=320" --conf "spark.executor.cores=3" --conf 
> "spark.driver.cores=15" --executor-memory=10G --driver-memory=50G --conf 
> spark.sql.adaptive.coalescePartitions.enabled=true --conf 
> spark.sql.adaptive.skewJoin.enabled=true --conf spark.sql.cbo.enabled=true 
> --conf spark.sql.adaptive.enabled=true --conf spark.rpc.io.serverThreads=64 
> --conf "spark.driver.maxResultSize=4G" --conf 
> "spark.max.fetch.failures.per.stage=10" --conf 
> "spark.sql.thriftServer.incrementalCollect=false" --conf 
> "spark.ui.reverseProxy=true" --conf "spark.ui.reverseProxyUrl=/spark_ui" 
> --conf "spark.sql.autoBroadcastJoinThreshold=1073741824" --conf 
> spark.sql.thriftServer.interruptOnCancel=true --conf 
> spark.sql.thriftServer.queryTimeout=0 --hiveconf 
> hive.server2.transport.mode=http --hiveconf 
> hive.server2.thrift.http.path=spark_sql --hiveconf 
> hive.server2.thrift.min.worker.threads=500 --hiveconf 
> hive.server2.thrift.max.worker.threads=2147483647 --hiveconf 
> hive.server2.thrift.http.cookie.is.secure=false --hiveconf 
> hive.server2.thrift.http.cookie.auth.enabled=false --hiveconf 
> hive.server2.authentication=NONE --hiveconf hive.server2.enable.doAs=false 
> --hiveconf spark.sql.hive.thriftServer.singleSession=true --hiveconf 
> hive.server2.thrift.bind.host=0.0.0.0 --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf "spark.sql.cbo.joinReorder.enabled=true" --conf 
> "spark.sql.optimizer.dynamicPartitionPruning.enabled=true" --conf 
> "spark.worker.cleanup.enabled=true" --conf 
> "spark.worker.cleanup.appDataTtl=3600" --hiveconf 
> hive.exec.scratchdir=/data/spark_scratch/hive --hiveconf 
> hive.exec.local.scratchdir=/data/spark_scratch/local_scratch_dir --hiveconf 
> hive.download.resources.dir=/data/spark_scratch/hive.downloaded.resources.dir 
> --hiveconf hive.querylog.location=/data/spark_scratch/hive.querylog.location 
> --conf spark.executor.extraJavaOptions="-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps" --conf spark.driver.extraJavaOptions="-verbose:gc 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
> -Xloggc:/data/thrift_driver_gc.log -XX:+ExplicitGCInvokesConcurrent 
> -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 
> -XX:AdaptiveSizePolicyWeight=90 -XX:MaxRAM=55g" --hiveconf 
> "hive.server2.session.check.interval=60000" --hiveconf 
> "hive.server2.idle.session.timeout=900000" --hiveconf 
> "hive.server2.idle.session.check.operation=true" --conf 
> "spark.eventLog.enabled=false" --conf 
> "spark.cleaner.periodicGC.interval=5min" --conf 
> "spark.appStateStore.asyncTracking.enable=false" --conf 
> "spark.ui.retainedJobs=30" --conf "spark.ui.retainedStages=100" --conf 
> "spark.ui.retainedTasks=500" --conf "spark.sql.ui.retainedExecutions=10" 
> --conf "spark.ui.retainedDeadExecutors=10" --conf 
> "spark.worker.ui.retainedExecutors=10" --conf 
> "spark.worker.ui.retainedDrivers=10" --conf spark.ui.enabled=false --conf 
> spark.stage.maxConsecutiveAttempts=10 --conf spark.executor.memoryOverhead=1G 
> --conf "spark.io.compression.codec=snappy" --conf 
> "spark.default.parallelism=640" --conf spark.memory.offHeap.enabled=true 
> --conf "spark.memory.offHeap.size=3g" --conf "spark.memory.fraction=0.75" 
> --conf "spark.memory.storageFraction=0.75"
> ```
> the java heap dump after heavy usage is as follows
> ```
>  1:      50465861     9745837152  [C
>    2:      23337896     1924089944  [Ljava.lang.Object;
>    3:      72524905     1740597720  java.lang.Long
>    4:      50463694     1614838208  java.lang.String
>    5:      22718029      726976928  
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
>    6:       2259416      343483328  [Lscala.collection.mutable.HashEntry;
>    7:            16      141744616  [Lorg.apache.spark.sql.Row;
>    8:        532529      123546728  
> org.apache.spark.sql.catalyst.expressions.Cast
>    9:        535418       72816848  
> org.apache.spark.sql.catalyst.expressions.Literal
>   10:       1105284       70738176  scala.collection.mutable.LinkedHashSet
>   11:       1725833       70655016  [J
>   12:       1154128       55398144  scala.collection.mutable.HashMap
>   13:       1720740       55063680  org.apache.spark.util.collection.BitSet
>   14:        572222       50355536  scala.collection.immutable.Vector
>   15:       1602297       38455128  scala.Some
>   16:       1154303       36937696  scala.collection.immutable.$colon$colon
>   17:       1105284       26526816  
> org.apache.spark.sql.catalyst.expressions.AttributeSet
>   18:       1066442       25594608  java.lang.Integer
>   19:        735502       23536064  
> scala.collection.immutable.HashSet$HashSet1
>   20:         10300       19511408  [B
>   21:        543994       17407808  scala.Tuple2
>   22:        530244       16967808  org.apache.spark.sql.catalyst.trees.Origin
>   23:        274445       13173360  java.util.Hashtable$Entry
>   24:        225826       13089920  [Lscala.collection.immutable.HashSet;
>   25:        529922       12718128  
> org.apache.spark.sql.catalyst.expressions.CastBase$$Lambda$3882/1746188635
>   26:        221866       10649568  
> java.util.concurrent.ConcurrentHashMap$Node
>   27:        384729        9233496  java.lang.Double
>   28:        225826        7226432  
> scala.collection.immutable.HashSet$HashTrieSet
>   29:          1071        6659680  
> [Ljava.util.concurrent.ConcurrentHashMap$Node;
>   30:         25853        4348744  java.lang.Class
>   31:          5760        3916800  
> io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
>   32:          1232        3710888  [Ljava.util.Hashtable$Entry;
>   33:         17303        3460600  
> org.apache.spark.sql.catalyst.expressions.objects.Invoke
>   34:         20409        3265440  java.lang.reflect.Method
>   35:         47506        2280288  java.util.HashMap$Node
>   36:          7489        1568600  [Ljava.util.HashMap$Node;
>   37:           285        1539888  [Ljava.nio.ByteBuffer;
>   38:           237        1452984  [[B
>   39:          4138        1291056  org.apache.spark.status.TaskDataWrapper
> ```
>  there is 9.7 gb of heap memory that is occupied by char array, not sure, why 
> is it ? Can someone help me understand remove this 9.7 gb of memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37588) lot of strings get accumulated in the heap dump of spark thrift server

Reply via email to