[ https://issues.apache.org/jira/browse/HDFS-16867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696877#comment-17696877 ]
ASF GitHub Bot commented on HDFS-16867: --------------------------------------- whbing commented on PR #5203: URL: https://github.com/apache/hadoop/pull/5203#issuecomment-1455937257 @Jing9 @Happy-shi hello, anyone follow up on this issue? I had the same problem with balancer. ```java 2023-03-06 17:40:53,264 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: Exiting balancer due an exception org.apache.hadoop.metrics2.MetricsException: Metrics source Balancer-BP-332003681-10.196.164.22-1648632173322 already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:225) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:198) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:55) at org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:344) at org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:809) at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:847) at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:952) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1102) ``` > Exiting Mover due to an exception in MoverMetrics.create > -------------------------------------------------------- > > Key: HDFS-16867 > URL: https://issues.apache.org/jira/browse/HDFS-16867 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZhiWei Shi > Assignee: ZhiWei Shi > Priority: Major > Labels: pull-request-available > > After the Mover process is started for a period of time, the process exits > unexpectedly and an error is reported in the log > {code:java} > [hdfs@${hostname} hadoop-3.3.2-nn]$ nohup bin/hdfs mover -p > /test-mover-jira9534 > mover.log.jira9534.20221209.2 & > [hdfs@{hostname} hadoop-3.3.2-nn]$ tail -f mover.log.jira9534.20221209.2 > ... > 22/12/09 14:22:32 INFO balancer.Dispatcher: Start moving > blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to > ${ip1}:800:ARCHIVE through ${ip2}:800 > 22/12/09 14:22:32 INFO balancer.Dispatcher: Successfully moved > blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to > ${ip1}:800:ARCHIVE through ${ip2}:800 > 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Stopping Mover metrics > system... > 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system stopped. > 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system shutdown > complete. > Dec 9, 2022, 2:22:42 PM Mover took 13mins, 19sec > 22/12/09 14:22:42 ERROR mover.Mover: Exiting Mover due to an exception > org.apache.hadoop.metrics2.MetricsException: Metrics source > Mover-${BlockpoolID} already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.hdfs.server.mover.MoverMetrics.create(MoverMetrics.java:49) > at org.apache.hadoop.hdfs.server.mover.Mover.<init>(Mover.java:162) > at org.apache.hadoop.hdfs.server.mover.Mover.run(Mover.java:684) > at org.apache.hadoop.hdfs.server.mover.Mover$Cli.run(Mover.java:826) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) > at org.apache.hadoop.hdfs.server.mover.Mover.main(Mover.java:908) > {code} > 1、“final ExitStatus r = m.run()”return only after scheduled one of replica > 2、“r == ExitStatus.IN_PROGRESS”,won’t run iter.remove() > 3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple > times for the same nnc,which leads to the error > {code:java} > //Mover.java > for (final StorageType t : diff.existing) { > for (final MLocation ml : locations) { > final Source source = storages.getSource(ml); > if (ml.storageType == t && source != null) { > // try to schedule one replica move. > if (scheduleMoveReplica(db, source, diff.expected)) { // 1、return only > after scheduled one of replica > return true; > } > } > } > } > while (connectors.size() > 0) { > Collections.shuffle(connectors); > Iterator<NameNodeConnector> iter = connectors.iterator(); > while (iter.hasNext()) { > NameNodeConnector nnc = iter.next(); > //3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” > multiple times for the same nnc,which leads to the error > final Mover m = new Mover(nnc, conf, retryCount, > excludedPinnedBlocks); > final ExitStatus r = m.run(); > if (r == ExitStatus.SUCCESS) { // 2、r ==ExitStatus.IN_PROGRESS,won’t run > iter.remove() > IOUtils.cleanupWithLogger(LOG, nnc); > iter.remove(); > } {code} > Probably, we should initialize movermetrics when we initialize nnc -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org