[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough

Lisheng Sun (Jira) Tue, 19 May 2020 08:09:12 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lisheng Sun updated HDFS-13639:
-------------------------------
    Description: 
When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
find that SlotReleaser of the ShortCircuitCache has some performance issue. The 
problem is that, the qps of the slot releasing could only reach to 1000+ while 
the qps of the slot allocating is ~3000. This means that the replica info on 
datanode could not be released in time, which causes a lot of GCs and finally 
full GCs.

 

The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
socket connecting and throw/catching the exception when close the domain socket 
and its streams. It doesn't make any sense to do the connecting and closing 
each time. Each time when we connect to the domain socket, Datanode allocates a 
new thread to free the slot. There are a lot of initializing work, and it's 
costly. We need reuse the domain socket. 

 

After switch to reuse the domain socket(see diff attached), we get great 
improvement(see the perf):
 # without reusing the domain socket, the get qps of the YCSB getting worse and 
worse, and after about 45 mins, full GC starts. When we reuse the domain 
socket, no full GC found, and the stress test could be finished smoothly, the 
qps of allocating and releasing match.
 # Due to the datanode young GC, without the improvement, the YCSB get qps is 
even smaller than the one with the improvement, ~3700 VS ~4200.

 

  was:
When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
find that SlotReleaser of the ShortCircuitCache has some performance issue. The 
problem is that, the qps of the slot releasing could only reach to 1000+ while 
the qps of the slot allocating is ~3000. This means that the replica info on 
datanode could not be released in time, which causes a lot of GCs and finally 
full GCs.

 

The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
socket connecting and throw/catching the exception when close the domain socket 
and its streams. It doesn't make any sense to do the connecting and closing 
each time. Each time when we connect to the domain socket, Datanode allocates a 
new thread to free the slot. There are a lot of initializing work, and it's 
costly. We need reuse the domain socket. 

 

After switch to reuse the domain socket(see diff attached), we get great 
improvement(see the perf):
 # without reusing the domain socket, the get qps of the YCSB getting worse and 
worse, and after about 45 mins, full GC starts. When we reuse the domain 
socket, no full GC found, and the stress test could be finished smoothly, the 
qps of allocating and releasing match.
 # Due to the datanode young GC, without the improvement, the YCSB get qps is 
even smaller than the one with the improvement, ~3700 VS ~4200.

The diff is against 2.4, and I think this issue exists till latest version. I 
doesn't have test env with 2.7 and higher version. 


> SlotReleaser is not fast enough
> -------------------------------
>
>                 Key: HDFS-13639
>                 URL: https://issues.apache.org/jira/browse/HDFS-13639
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.4.0, 2.6.0, 3.0.2
>         Environment: 1. YCSB:
> {color:#000000} recordcount=2000000000
>  fieldcount=1
>  fieldlength=1000
>  operationcount=10000000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#000000}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=10000 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#000000}3. regionserver:{color}
> {color:#000000}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#000000}block cache is disabled:{color}{color:#000000} <property>
>  <name>hbase.bucketcache.size</name>
>  <value>0.9</value>
>  </property>{color}
>  
>            Reporter: Gang Xie
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough

Reply via email to