[ https://issues.apache.org/jira/browse/HDFS-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mingliang Liu updated HDFS-10986: --------------------------------- Affects Version/s: (was: 2.8.0) Target Version/s: 3.0.0-alpha2 Description: There are some subcommands in {{DFSAdmin}} that swallow IOException and give very limited error message, if any, to the stderr. {code} $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866 Datanode unreachable. $ hdfs dfsadmin -getDatanodeInfo localhost:9866 Datanode unreachable. $ hdfs dfsadmin -evictWriters 127.0.0.1:9866 $ echo $? -1 {code} Fortunately, if the port number is not accessible (say 9999), users can infer the detailed error message by logs: {code} $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ..... 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed retries number: 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ... at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225) Datanode unreachable. {code} We should fix this by providing detailed error message. Actually, the {{DFSAdmin#run}} already handles exception carefully, including: # set the exit ret value to -1 # print the error message # log the exception stack trace (in DEBUG level) All we need to do is to not swallow exceptions without good reason. was: There are some subcommands in {{DFSAdmin}} that swallow IOException and give very limited error message, if any, to the stderr. {code} $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866 Datanode unreachable. $ hdfs dfsadmin -getDatanodeInfo localhost:9866 Datanode unreachable. $ hdfs dfsadmin -evictWriters 127.0.0.1:9866 $ echo $? -1 {code} Fortunately, if the port number is not accessible (say 9999), users can infer the detailed error message by logs: {code} $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ..... 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed retries number: 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ... at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225) Datanode unreachable. {code} We should fix this by providing detailed error message. Actually, the {{DFSAdmin#run}} already handles exception carefully. All we need to do is to not swallow exceptions without good reason. Summary: DFSAdmin should log detailed error message if any (was: DFSAdmin should show detailed error message if any) > DFSAdmin should log detailed error message if any > ------------------------------------------------- > > Key: HDFS-10986 > URL: https://issues.apache.org/jira/browse/HDFS-10986 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Reporter: Mingliang Liu > Assignee: Mingliang Liu > > There are some subcommands in {{DFSAdmin}} that swallow IOException and give > very limited error message, if any, to the stderr. > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866 > Datanode unreachable. > $ hdfs dfsadmin -getDatanodeInfo localhost:9866 > Datanode unreachable. > $ hdfs dfsadmin -evictWriters 127.0.0.1:9866 > $ echo $? > -1 > {code} > Fortunately, if the port number is not accessible (say 9999), users can infer > the detailed error message by logs: > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999 > 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > ..... > 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: > localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed > retries number: 10 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > ... > at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225) > Datanode unreachable. > {code} > We should fix this by providing detailed error message. Actually, the > {{DFSAdmin#run}} already handles exception carefully, including: > # set the exit ret value to -1 > # print the error message > # log the exception stack trace (in DEBUG level) > All we need to do is to not swallow exceptions without good reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org