[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027905#comment-17027905 ] Hudson commented on HDFS-7175: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17923 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17923/]) HDFS-7175. Client-side SocketTimeoutException during Fsck. Contributed (weichiu: rev 1e3a0b0d931676b191cb4813ed1a283ebb24d4eb) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, > HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026142#comment-17026142 ] Wei-Chiu Chuang commented on HDFS-7175: --- Looks good to me +1. > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, > HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023275#comment-17023275 ] Stephen O'Donnell commented on HDFS-7175: - Yea, I ran it without the -showprogress switch, which gave this truncated output: {code} hdfs fsck / 2020-01-24 11:52:24,105 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://localhost:9870/fsck?ugi=sodonnell=%2F FSCK started by sodonnell (auth:SIMPLE) from /127.0.0.1 for path / at Fri Jan 24 11:52:24 GMT 2020 . .. .. .. .. Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 Blocks queued for replication: 0 FSCK ended at Fri Jan 24 11:52:26 GMT 2020 in 1196 milliseconds {code} Note there are not 10 dots per line, while previously there should have been 100 per line. I also ran with -showprogress to ensure that still works, and it logs the expected warning: {code} hdfs fsck / -showprogress 2020-01-24 11:55:08,414 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://localhost:9870/fsck?ugi=sodonnell=1=%2F The fsck switch -showprogress is deprecated and no longer has any effect. Progress is now shown by default. FSCK started by sodonnell (auth:SIMPLE) from /127.0.0.1 for path / at Fri Jan 24 11:55:09 GMT 2020 . {code} > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.30 >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, > HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023273#comment-17023273 ] Wei-Chiu Chuang commented on HDFS-7175: --- Makes sense to me [~sodonnell]. Have you verified the fsck prints dots as expected to keep the connection open? > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.30 >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, > HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022997#comment-17022997 ] Hadoop QA commented on HDFS-7175: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 23m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | | | hadoop.hdfs.TestDeadNodeDetection | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-7175 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991733/HDFS-7157.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 82f9632c86e1 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 978c487 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28706/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28706/testReport/ | | Max. process+thread count | 3595 (vs. ulimit of
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022879#comment-17022879 ] Stephen O'Donnell commented on HDFS-7175: - This issue has been dormant for a long time, but as I mentioned in HDFS-2538, we are starting to see a lot of fsck timeout issues, caused by -showprogress being off by default. As we know fsck will fail on a large cluster without -showprogress, I would like to suggest we do the following: 1) Deprecate the -showprogress switch. For compatibility reasons, leave it in the code for now, but have it log a warning and give no effect if it is passed. Instead progress will always be printed. 2) Change the logic to print a dot for every 100 files processed, rather than every file. 3) Flush the output buffer every 1000 items processed (includes directories and symlinks as well as files) rather than 100. I did consider the merits of adding a -quiet switch, but as that would cause timeouts on medium and large clusters, it seems like a pointless addition. With the above changes, we will cut down on the volume of progress output significantly, while avoiding the timeouts caused by zero progress reporting. I will attach a patch for this shortly. > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, > HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022604#comment-17022604 ] Wei-Chiu Chuang commented on HDFS-7175: --- [~sodonnell] it looks to me HDFS-7175 wanted to do what you commented in HDFS-2538 as the middle ground approach, reducing the frequency of dots. However the patch posted didn't work (HDFS-2538.3.patch). I suspect here's the bug in the code: {code} +if ((showprogress) && res.totalFiles % 100 == 0) { + out.println(); + out.flush(); +} {code} i think the if clause shouldn't need to check for showprogress. It should flush for every 100 files regardless. > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Carl Steinbach >Assignee: Subbu Subramaniam >Priority: Major > Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, > HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304666#comment-14304666 ] Akira AJISAKA commented on HDFS-7175: - Tried tcpdump with JDK8. The channel was quiet without -showprogress option. bq. If this sounds fine, I can work on a patch to do this. I am also fine if Akira wants to work on the patch, or has alternative solutions. Yeah, you can work on a patch :) One comment: bq. Change the server to disregard the showprogress option, and send out dots every N (=10) seconds no matter what. I want to reduce network load, so would you send a dot per 100 files if -showprogress option is not specified? If you scan 1G files, the server will send extra 1GB to client. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304425#comment-14304425 ] Akira AJISAKA commented on HDFS-7175: - bq. I could see that the dots were sent out in in the channel when -showprogress was specified, but the channel was quiet when it was not. I tried tcpdump and confirmed this in JDK7. I'll try this with JDK8. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297190#comment-14297190 ] Subbu commented on HDFS-7175: - The problem that we face is that if we turn on showprogress, then the fsck command takes much longer (about 50% longer), not to mention the gazillion dots printed out. If we disable the dots, the timeout problem happens. We did some quick performance analysis on what is causing the 50% extra time, and it turns out that it is actually printing dots to the tty. From my earlier experiment with the tcpdump, it seems that we need to send something on the channel to keep it alive. So, here is a proposed solution: * Change the server to disregard the showprogress option, and send out dots every N (=10) seconds no matter what. * Change the client to filter out any line that has only dots in it, if the showprogress option is not specified. * Maybe take as N an additional option (e.g. progressFrequencySec), or make it configurable in hdfs-site.xml, or leave it at 10 (for now at least). If this sounds fine, I can work on a patch to do this. I am also fine if Akira wants to work on the patch, or has alternative solutions. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297207#comment-14297207 ] Subbu commented on HDFS-7175: - I tried on jdk7. Note that the timeout happens only on large clusters (that take more than a minute to scan). [~ajisakaa] did you try out tcpdump? Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297199#comment-14297199 ] Allen Wittenauer commented on HDFS-7175: I'm talking specifically about the null not getting sent across the socket, since it sounds like it a) it did work for [~ajisakaa] and b) I know that LI has mostly transitioned over to JDK8. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295782#comment-14295782 ] Allen Wittenauer commented on HDFS-7175: What are the chances this is a JDK7 vs. JDK8 change in behavior? Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294591#comment-14294591 ] Hadoop QA commented on HDFS-7175: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673576/HDFS-7175.3.patch against trunk revision 0a05ae1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBlockScanner The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeDeath Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9351//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9351//console This message is automatically generated. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294338#comment-14294338 ] Subbu commented on HDFS-7175: - I apologize for the delay in verification of this bug. I have now verified that no matter what the value is for frequency of flush, the solution does NOT work. Basically, the flush() call has no effect since there are no bytes to flush. Here is what I did to verify this: * Brought up a single node cluster. * I changed the frequency of flush to 1 (instead of 10k or 100k). * Ran fsck on a small directory with 10 files, both with and without -showprogress option. * Ran tcpdump on the namenode port to capture packets during the session. I could see that the dots were sent out in in the channel when -showprogress was specified, but the channel was quiet when it was not. So, we need to think of another way to solve the problem. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294442#comment-14294442 ] Subbu commented on HDFS-7175: - One way to fix this may be to put out the . on the server even if -showprogress is not specified, and then filter it out in the client (if the option is not specified). Seems like a hacky solution, though. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209988#comment-14209988 ] Subbu commented on HDFS-7175: - The number 1 does not work in our large cluster. (Sorry for the delay in verification, the problem is reproduced only in our large clusters, and we need to co-ordinate to schedule some time to test this). We will try with 1000 or 100 and see if they work. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209989#comment-14209989 ] Subbu commented on HDFS-7175: - Let me clarify that we see the same timeout issue by flushing every 10k files. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165434#comment-14165434 ] Allen Wittenauer commented on HDFS-7175: bq. could we at least considering making number of files a configurable option (with a reasonable default value of course) as a feature... Probably better to handle that as a separate JIRA given that there will likely be lots of discussion around options, etc. Plus that is a feature request whereas the current code here is all bug fix. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163240#comment-14163240 ] Vinayakumar B commented on HDFS-7175: - Below changes could serve the purpose mentioned by [~aw], with one line duplication ;) {code}--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java @@ -376,6 +376,9 @@ void check(String parent, HdfsFileStatus file, Result res) throws IOException { if ((showprogress) res.totalFiles % 100 == 0) { out.println(); out.flush(); +} else if (res.totalFiles % 1 == 0) { + // flush the buffer periodically to prevent SocketTimeoutException + out.flush(); } int missing = 0; int corrupt = 0;{code} Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163323#comment-14163323 ] Akira AJISAKA commented on HDFS-7175: - bq. elapsed time since the last flush adds a whole new level of complexity. I agree. In addition, calculating elapsed time seems costly. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163498#comment-14163498 ] Hadoop QA commented on HDFS-7175: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673576/HDFS-7175.3.patch against trunk revision 1efd9c9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.server.balancer.TestBalancer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8350//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8350//console This message is automatically generated. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up.
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164137#comment-14164137 ] Allen Wittenauer commented on HDFS-7175: It'd be good to hear from LinkedIn to see if the current patch fixes the issue for them. I'm +1 on the current patch and will commit after some confirmation. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164290#comment-14164290 ] Bob Liu commented on HDFS-7175: --- I understand the complexity of adding the time based flush(), but could we at least considering making number of files a configurable option (with a reasonable default value of course) as a feature... Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161786#comment-14161786 ] Akira AJISAKA commented on HDFS-7175: - Thanks [~mcvsubbu] for comment. [~aw], I'm thinking there are two options: # apply v1 patch (i.e. flush every 100 files) and file a separate jira to change the frequency for flush. # discuss what frequency is the best and create a patch Since this issue is to fix SocketTimeoutException, the first option makes sense to me. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162219#comment-14162219 ] Bob Liu commented on HDFS-7175: --- As a feature request, I am wondering if it's possible to make this a configurable option for the OPS folks (either based on the elapsed time since the last flush OR number of files)? Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162961#comment-14162961 ] Allen Wittenauer commented on HDFS-7175: bq. I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files). Any particular reason as to why? In any case, I think this could be handled in such a way that: if (showprogress) { every 100 print a period and flush } else { every 10k flush } ... which accomplishes both goals. I get the impression that [~ajisakaa] is trying to reduce code duplication, but I'm not that concerned about it given the size of the code here. :) bq. As a feature request, I am wondering if it's possible to make this a configurable option for the OPS folks (either based on the elapsed time since the last flush OR number of files)? We'd still have to have reasonable defaults. Also, elapsed time since the last flush adds a whole new level of complexity. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161011#comment-14161011 ] Subbu commented on HDFS-7175: - I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files). Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156224#comment-14156224 ] Akira AJISAKA commented on HDFS-7175: - bq. I'll test the patch in my environment. I've tested on my VM and confirmed flushing an empty buffer prevents SocketTimeoutException. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156227#comment-14156227 ] Hadoop QA commented on HDFS-7175: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672511/HDFS-7175.2.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8297//console This message is automatically generated. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154546#comment-14154546 ] Akira AJISAKA commented on HDFS-7175: - Attached the patch to fix it by 3 (flushing an empty buffer periodically). I'll test the patch in my environment. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154705#comment-14154705 ] Hadoop QA commented on HDFS-7175: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672284/HDFS-7175.patch against trunk revision 17d1202. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.TestRollingUpgradeRollback {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8288//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8288//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8288//console This message is automatically generated. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155970#comment-14155970 ] Mohammad Kamrul Islam commented on HDFS-7175: - Patch looks good to me. Can you please address the test case failure? Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155999#comment-14155999 ] Akira AJISAKA commented on HDFS-7175: - These tests look unrelated to the patch. Several jiras track these failures. * TestEncryptionZonesWithKMS: BUILDS-17 (failed by Too many open files) * TestPipelinesFailover: HDFS-6694 * TestRollingUpgradeRollback: no jira Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156017#comment-14156017 ] Hadoop QA commented on HDFS-7175: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672484/HDFS-7175.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8294//console This message is automatically generated. Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156051#comment-14156051 ] Allen Wittenauer commented on HDFS-7175: Doing this every 100 is way too frequent. Every write to that socket blocks the fsck. For a large enough HDFS where this is a problem, that's easily 200k+ pauses! Client-side SocketTimeoutException during Fsck -- Key: HDFS-7175 URL: https://issues.apache.org/jira/browse/HDFS-7175 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Carl Steinbach Assignee: Akira AJISAKA Attachments: HDFS-7175.patch, HDFS-7175.patch HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout: {noformat} [hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread main java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) {noformat} Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting. I can think of a couple ways to fix this: # Set an infinite read timeout on the client side (not a good idea!). # Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them. # It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)