[jira] [Updated] (HDFS-10986) DFSAdmin should log detailed error message if any

Mingliang Liu (JIRA) Fri, 07 Oct 2016 19:08:20 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mingliang Liu updated HDFS-10986:
---------------------------------
    Affects Version/s:     (was: 2.8.0)
     Target Version/s: 3.0.0-alpha2
          Description: 
There are some subcommands in {{DFSAdmin}} that swallow IOException and give 
very limited error message, if any, to the stderr.

{code}
$ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866
Datanode unreachable.
$ hdfs dfsadmin -getDatanodeInfo localhost:9866
Datanode unreachable.
$ hdfs dfsadmin -evictWriters 127.0.0.1:9866
$ echo $?
-1
{code}

Fortunately, if the port number is not accessible (say 9999), users can infer 
the detailed error message by logs:
{code}
$ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999
2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
.....
2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: 
localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed 
retries number: 10
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        ...
        at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225)
Datanode unreachable.
{code}

We should fix this by providing detailed error message. Actually, the 
{{DFSAdmin#run}} already handles exception carefully, including:
# set the exit ret value to -1
# print the error message
# log the exception stack trace (in DEBUG level)

All we need to do is to not swallow exceptions without good reason.

  was:
There are some subcommands in {{DFSAdmin}} that swallow IOException and give 
very limited error message, if any, to the stderr.

{code}
$ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866
Datanode unreachable.
$ hdfs dfsadmin -getDatanodeInfo localhost:9866
Datanode unreachable.
$ hdfs dfsadmin -evictWriters 127.0.0.1:9866
$ echo $?
-1
{code}

Fortunately, if the port number is not accessible (say 9999), users can infer 
the detailed error message by logs:
{code}
$ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999
2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
.....
2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: 
localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed 
retries number: 10
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        ...
        at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225)
Datanode unreachable.
{code}

We should fix this by providing detailed error message. Actually, the 
{{DFSAdmin#run}} already handles exception carefully. All we need to do is to 
not swallow exceptions without good reason.

              Summary: DFSAdmin should log detailed error message if any  (was: 
DFSAdmin should show detailed error message if any)

> DFSAdmin should log detailed error message if any
> -------------------------------------------------
>
>                 Key: HDFS-10986
>                 URL: https://issues.apache.org/jira/browse/HDFS-10986
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>
> There are some subcommands in {{DFSAdmin}} that swallow IOException and give 
> very limited error message, if any, to the stderr.
> {code}
> $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866
> Datanode unreachable.
> $ hdfs dfsadmin -getDatanodeInfo localhost:9866
> Datanode unreachable.
> $ hdfs dfsadmin -evictWriters 127.0.0.1:9866
> $ echo $?
> -1
> {code}
> Fortunately, if the port number is not accessible (say 9999), users can infer 
> the detailed error message by logs:
> {code}
> $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999
> 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9690. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> .....
> 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9690. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: 
> localhost/127.0.0.1:9690: retries get failed due to exceeded maximum allowed 
> retries number: 10
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         ...
>       at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225)
> Datanode unreachable.
> {code}
> We should fix this by providing detailed error message. Actually, the 
> {{DFSAdmin#run}} already handles exception carefully, including:
> # set the exit ret value to -1
> # print the error message
> # log the exception stack trace (in DEBUG level)
> All we need to do is to not swallow exceptions without good reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10986) DFSAdmin should log detailed error message if any

Reply via email to