[ https://issues.apache.org/jira/browse/HDDS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rakesh Radhakrishnan updated HDDS-3600: --------------------------------------- Affects Version/s: 0.6.0 > ManagedChannels leaked on ratis pipeline when there are many connection > retries > ------------------------------------------------------------------------------- > > Key: HDDS-3600 > URL: https://issues.apache.org/jira/browse/HDDS-3600 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client > Affects Versions: 0.6.0 > Reporter: Rakesh Radhakrishnan > Priority: Major > Attachments: HeapHistogram-Snapshot-ManagedChannel-Leaked-001.png, > outloggenerator-ozonefs-003.log > > > ManagedChannels leaked on ratis pipeline when there are many connection > retries > Observed that too many ManagedChannels opened while running Synthetic Hadoop > load generator. > Ran benchmark with only one pipeline in the cluster and also ran with only > two pipelines in the cluster. > Both the run failed with too many open files and could see many open TCP > connections for long time and suspecting channel leaks.. > More details below: > *1)* Execute NNloadGenerator > {code:java} > [rakeshr@ve1320 loadOutput]$ ps -ef | grep load > hdfs 362822 1 19 05:24 pts/0 00:03:16 > /usr/java/jdk1.8.0_232-cloudera/bin/java -Dproc_jar -Xmx825955249 > -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true > -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop.log > -Dyarn.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/libexec/../../hadoop-yarn > -Dyarn.root.logger=INFO,console > -Djava.library.path=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/lib/native > -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop.log > -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop > -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console > -Dhadoop.policy.file=hadoop-policy.xml > -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar > /opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/jars/hadoop-mapreduce-client-jobclient-3.1.1.7.2.0.0-141-tests.jar > NNloadGenerator -root o3fs://bucket2.vol2/ > rakeshr 368739 354174 0 05:41 pts/0 00:00:00 grep --color=auto load > {code} > *2)* Active 9858 TCP connections during the run, which is ratis pipeline > default port. > {code:java} > [rakeshr@ve1320 loadOutput]$ sudo lsof -a -p 362822 | grep "9858" | wc > 3229 32290 494080 > [rakeshr@ve1320 loadOutput]$ vi tcp_log > ............ > java 440633 hdfs 4090u IPv4 271141987 0t0 TCP > ve1320.halxg.cloudera.com:35190->ve1323.halxg.cloudera.com:9858 (ESTABLISHED) > java 440633 hdfs 4091u IPv4 271127918 0t0 TCP > ve1320.halxg.cloudera.com:35192->ve1323.halxg.cloudera.com:9858 (ESTABLISHED) > java 440633 hdfs 4092u IPv4 271038583 0t0 TCP > ve1320.halxg.cloudera.com:59116->ve1323.halxg.cloudera.com:9858 (ESTABLISHED) > java 440633 hdfs 4093u IPv4 271038584 0t0 TCP > ve1320.halxg.cloudera.com:59118->ve1323.halxg.cloudera.com:9858 (ESTABLISHED) > java 440633 hdfs 4095u IPv4 271127920 0t0 TCP > ve1320.halxg.cloudera.com:35196->ve1323.halxg.cloudera.com:9858 (ESTABLISHED) > [rakeshr@ve1320 loadOutput]$ ^C > {code} > *3)* heapdump shows there are 9571 ManagedChanel objects. Heapdump is quite > large and attached snapshot to this jira. > *4)* Attached output and threadump of the SyntheticLoadGenerator benchmark > client process to show the exceptions printed to the console. FYI, this file > was quite large and have trimmed few repeated exception traces.. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org