[ https://issues.apache.org/jira/browse/HDFS-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mingliang Liu updated HDFS-10986: --------------------------------- Attachment: HDFS-10986-branch-2.8.002.patch > DFSAdmin should log detailed error message if any > ------------------------------------------------- > > Key: HDFS-10986 > URL: https://issues.apache.org/jira/browse/HDFS-10986 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Reporter: Mingliang Liu > Assignee: Mingliang Liu > Attachments: HDFS-10986-branch-2.8.002.patch, HDFS-10986.000.patch, > HDFS-10986.001.patch, HDFS-10986.002.patch > > > There are some subcommands in {{DFSAdmin}} that swallow IOException and give > very limited error message, if any, to the stderr. > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866 > Datanode unreachable. > $ hdfs dfsadmin -getDatanodeInfo localhost:9866 > Datanode unreachable. > $ hdfs dfsadmin -evictWriters 127.0.0.1:9866 > $ echo $? > -1 > {code} > User is not able to get the exception stack even the LOG level is DEBUG. This > is not very user friendly. Fortunately, if the port number is not accessible > (say 9999), users can infer the detailed error message by IPC logs: > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999 > 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9999. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > ..... > 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9999. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: > localhost/127.0.0.1:9999: retries get failed due to exceeded maximum allowed > retries number: 10 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > ... > at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225) > Datanode unreachable. > {code} > We should fix this by providing detailed error message. Actually, the > {{DFSAdmin#run}} already handles exception carefully, including: > # set the exit ret value to -1 > # print the error message > # log the exception stack trace (in DEBUG level) > All we need to do is to not swallow exceptions without good reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org