[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication
[ https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373663#comment-15373663 ] Elliott Clark commented on HDFS-10564: -- It doesn't seem to be working. We have multiple times per day where the number of under min repl'd blocks stays for hours. > UNDER MIN REPL'D BLOCKS should be prioritized for replication > - > > Key: HDFS-10564 > URL: https://issues.apache.org/jira/browse/HDFS-10564 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark > > When datanodes get drained they are probably being drained because the > hardware is bad, or suspect. The blocks that have no live nodes should be > prioritized. However it appears not to be the case at all. > Draining full nodes with lots of blocks but only a handful of under min > replicated blocks takes about the full time before fsck reports clean again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication
[ https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362879#comment-15362879 ] Elliott Clark commented on HDFS-10564: -- Yeah sorry Draining means decommissioning. > UNDER MIN REPL'D BLOCKS should be prioritized for replication > - > > Key: HDFS-10564 > URL: https://issues.apache.org/jira/browse/HDFS-10564 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark > > When datanodes get drained they are probably being drained because the > hardware is bad, or suspect. The blocks that have no live nodes should be > prioritized. However it appears not to be the case at all. > Draining full nodes with lots of blocks but only a handful of under min > replicated blocks takes about the full time before fsck reports clean again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication
Elliott Clark created HDFS-10564: Summary: UNDER MIN REPL'D BLOCKS should be prioritized for replication Key: HDFS-10564 URL: https://issues.apache.org/jira/browse/HDFS-10564 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark When datanodes get drained they are probably being drained because the hardware is bad, or suspect. The blocks that have no live nodes should be prioritized. However it appears not to be the case at all. Draining full nodes with lots of blocks but only a handful of under min replicated blocks takes about the full time before fsck reports clean again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9859) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313207#comment-15313207 ] Elliott Clark commented on HDFS-9859: - That would be great. > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-9859 > URL: https://issues.apache.org/jira/browse/HDFS-9859 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > > HDFS-6440 is a very interesting feature for people who want to run HDFS in an > environment where machines have to join and leave a cluster. Until 3.0 is > close we should encourage that -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted
[ https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181894#comment-15181894 ] Elliott Clark commented on HDFS-9906: - +1 > Remove spammy log spew when a datanode is restarted > --- > > Key: HDFS-9906 > URL: https://issues.apache.org/jira/browse/HDFS-9906 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Brahma Reddy Battula > Attachments: HDFS-9906.patch > > > {code} > WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock > request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size > 268435456 > {code} > This happens wy too much to add any useful information. We should either > move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9906) Remove spammy log spew when a datanode is restarted
Elliott Clark created HDFS-9906: --- Summary: Remove spammy log spew when a datanode is restarted Key: HDFS-9906 URL: https://issues.apache.org/jira/browse/HDFS-9906 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.2 Reporter: Elliott Clark {code} WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 268435456 {code} This happens wy too much to add any useful information. We should either move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9859) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9859: Affects Version/s: 2.8.0 > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-9859 > URL: https://issues.apache.org/jira/browse/HDFS-9859 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > > HDFS-6440 is a very interesting feature for people who want to run HDFS in an > environment where machines have to join and leave a cluster. Until 3.0 is > close we should encourage that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9859) Backport HDFS-6440 to branch-2
Elliott Clark created HDFS-9859: --- Summary: Backport HDFS-6440 to branch-2 Key: HDFS-9859 URL: https://issues.apache.org/jira/browse/HDFS-9859 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark HDFS-6440 is a very interesting feature for people who want to run HDFS in an environment where machines have to join and leave a cluster. Until 3.0 is close we should encourage that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143279#comment-15143279 ] Elliott Clark commented on HDFS-9669: - Nope I just missed it on a pull. Sorry about the confusion. Thanks for the many many backports [~cmccabe] > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.3 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131403#comment-15131403 ] Elliott Clark commented on HDFS-9669: - Thanks [~cmccabe] any chance of getting this on branch-2 ? > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.3 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116012#comment-15116012 ] Elliott Clark commented on HDFS-9669: - Ping? This is running in production and removes thousands of tcp resets. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.1.patch Checkstyle nit. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.1.patch > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Affects Version/s: 2.7.2 Status: Patch Available (was: Open) > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.0.patch Straight forward patch to make sure that all the places that bind use the listen backlog setting. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark reassigned HDFS-9669: --- Assignee: Elliott Clark > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Elliott Clark >Assignee: Elliott Clark > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
Elliott Clark created HDFS-9669: --- Summary: TcpPeerServer should respect ipc.server.listen.queue.size Key: HDFS-9669 URL: https://issues.apache.org/jira/browse/HDFS-9669 Project: Hadoop HDFS Issue Type: Bug Reporter: Elliott Clark On periods of high traffic we are seeing: {code} 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to /10.138.178.47:50010 for file /MYPATH/MYFILE for block BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: Connection reset by peer java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) {code} At the time that this happens there are way less xceivers than configured. On most JDK's this will make 50 the total backlog at any time. This effectively means that any GC + Busy time willl result in tcp resets. http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086320#comment-15086320 ] Elliott Clark commented on HDFS-6440: - +1 for branch-2 please. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Resolution: Fixed Status: Resolved (was: Patch Available) This was pushed a while ago. > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, > HDFS-9087-v2.patch, HDFS-9087-v3.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9266: Resolution: Fixed Status: Resolved (was: Patch Available) > Avoid unsafe split and append on fields that might be IPv6 literals > --- > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9266: Summary: Avoid unsafe split and append on fields that might be IPv6 literals (was: hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 literals) > Avoid unsafe split and append on fields that might be IPv6 literals > --- > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971590#comment-14971590 ] Elliott Clark commented on HDFS-9289: - It had all of the data and the same md5sums when I checked. So the only thing different was genstamps. Not really sure about why that happened. But I didn't mean to side track this jira. Test looks nice. > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970213#comment-14970213 ] Elliott Clark commented on HDFS-9289: - {code} 15/10/22 09:37:36 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW replica with genstamp 116735085 does not match COMPLETE block's genstamp in block map 116737586 {code} Block lengths on "corrupt" replicas is the same as on the non-corrupt. The only difference is the genstamp. > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969933#comment-14969933 ] Elliott Clark commented on HDFS-9289: - Also can we add the expected and encountered genstamps to the exception message > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9289: Priority: Critical (was: Major) > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9266) hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969896#comment-14969896 ] Elliott Clark commented on HDFS-9266: - +1 lets get this in and then we can rebase on master. > hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 > literals > - > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969890#comment-14969890 ] Elliott Clark commented on HDFS-9289: - We just had this something very similar happen on a prod cluster. Then the datanode holding the only complete block was shut off for repair. {code} 15/10/22 06:29:32 INFO hdfs.StateChange: BLOCK* allocateBlock: /TESTCLUSTER-HBASE/WALs/hbase4544.test.com,16020,1444266312515/hbase4544.test.com%2C16020%2C1444266312515.default.1445520572440. BP-1735829752-10.210.49.21-1437433901380 blk_1190230043_116735085{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW], ReplicaUnderConstruction[[DISK]DS-52d9a122-a46a-4129-ab3d-d9041de109f8:NORMAL:10.210.31.48:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c734b72e-27de-4dd4-a46c-7ae59f6ef792:NORMAL:10.210.31.38:50010|RBW]]} 15/10/22 06:32:48 INFO namenode.FSNamesystem: updatePipeline(block=BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085, newGenerationStamp=116737586, newLength=201675125, newNodes=[10.210.81.33:50010, 10.210.81.45:50010, 10.210.64.29:50010], clientName=DFSClient_NONMAPREDUCE_1976436475_1) 15/10/22 06:32:48 INFO namenode.FSNamesystem: updatePipeline(BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085) successfully to BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116737586 15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.64.29:50010 is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW], ReplicaUnderConstruction[[DISK]DS-d5f7fff9-005d-4804-a223-b6e6624d3af2:NORMAL:10.210.81.45:50010|RBW], ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED]]} size 201681322 15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.81.45:50010 is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW], ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED]]} size 201681322 15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.210.81.33:50010 is added to blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-4d937567-7184-40b7-a822-c7e3b5d588d4:NORMAL:10.210.81.33:50010|FINALIZED]]} size 201681322 15/10/22 09:37:36 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW replica with genstamp 116735085 does not match COMPLETE block's genstamp in block map 116737586 15/10/22 09:37:36 INFO BlockStateChange: BLOCK* invalidateBlock: blk_1190230043_116735085(stored=blk_1190230043_116737586) on 10.210.31.38:50010 15/10/22 09:37:36 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1190230043_116735085 to 10.210.31.38:50010 15/10/22 09:37:39 INFO BlockStateChange: BLOCK* BlockManager: ask 10.210.31.38:50010 to delete [blk_1190230043_116735085] 15/10/22 12:45:03 INFO BlockStateChange: BLOCK* ask 10.210.64.29:50010 to replicate blk_1190230043_116737586 to datanode(s) 10.210.64.56:50010 15/10/22 12:45:07 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 10.210.64.29:50010 by hbase4496.test.com/10.210.64.56 because client machine reported it 15/10/22 12:50:49 INFO BlockStateChange: BLOCK* ask 10.210.81.45:50010 to replicate blk_1190230043_116737586 to datanode(s) 10.210.49.49:50010 15/10/22 12:50:55 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 10.210.81.45:50010 by hbase4478.test.com/10.210.49.49 because client machine reported it 15/10/22 12:56:01 WARN blockmanagement.BlockManager: PendingReplicationMonitor timed out blk_1190230043_116737586 {code} The patch will help but the issue will still be there. Is there some way to keep the genstamps from getting out of sync? > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/b
[jira] [Updated] (HDFS-8664) Allow wildcards in dfs.datanode.data.dir
[ https://issues.apache.org/jira/browse/HDFS-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-8664: Status: Patch Available (was: Open) > Allow wildcards in dfs.datanode.data.dir > > > Key: HDFS-8664 > URL: https://issues.apache.org/jira/browse/HDFS-8664 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, HDFS >Affects Versions: 3.0.0 >Reporter: Patrick White >Assignee: Patrick White > Attachments: HDFS-8664.001.patch, HDFS-8664.002.patch, > HDFS-8664.003.patch, TestBPOfferService-output.txt > > > We have many disks per machine (12+) that don't always have the same > numbering when they come back from provisioning, but they're always in the > same tree following the same pattern. > It would greatly reduce our config complexity to be able to specify a > wildcard for all the data directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8664) Allow wildcards in dfs.datanode.data.dir
[ https://issues.apache.org/jira/browse/HDFS-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-8664: Status: Open (was: Patch Available) > Allow wildcards in dfs.datanode.data.dir > > > Key: HDFS-8664 > URL: https://issues.apache.org/jira/browse/HDFS-8664 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, HDFS >Affects Versions: 3.0.0 >Reporter: Patrick White >Assignee: Patrick White > Attachments: HDFS-8664.001.patch, HDFS-8664.002.patch, > HDFS-8664.003.patch, TestBPOfferService-output.txt > > > We have many disks per machine (12+) that don't always have the same > numbering when they come back from provisioning, but they're always in the > same tree following the same pattern. > It would greatly reduce our config complexity to be able to specify a > wildcard for all the data directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Attachment: HDFS-9087-v3.patch > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, > HDFS-9087-v2.patch, HDFS-9087-v3.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Attachment: HDFS-9087-v2.patch > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, > HDFS-9087-v2.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905189#comment-14905189 ] Elliott Clark commented on HDFS-9087: - bq.Do we really need to add this configuration key? Nope not really needed. I'll remove it. bq. It might be better to limit the jitter to 25% or 50% of the period. Sure. bq.Both code paths should set checkDiskErrorInterval the same way. Sure. > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, > HDFS-9087-v2.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790746#comment-14790746 ] Elliott Clark commented on HDFS-9087: - On a large enough cluster anything that can thundering herd will eventually. In this case we're seeing it on disk io and before 2.7.1 we were seeing it on locking FsVolumeList. I suspect that we will now start to see this on block replication load. Anything that can de-sync these across the cluster is better. > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790740#comment-14790740 ] Elliott Clark commented on HDFS-9087: - Yeah HDFS-8845 makes things better however it still is bad to have anything in a distributed system that can get multiple machines in sync. Lots of things happen when a disk is checked and then listed as bad. It's good to have a large cluster spread out so that nothing lines up. > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Attachment: HDFS-9087-v1.patch > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Status: Patch Available (was: Open) > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
[ https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9087: Attachment: HDFS-9087-v0.patch Add 5 seconds of jitter. This has the added benefit of adding more time in between disk checker runs. There's almost never going to be a time that disks are going to fail sequentially one every five seconds. > Add some jitter to DataNode.checkDiskErrorThread > > > Key: HDFS-9087 > URL: https://issues.apache.org/jira/browse/HDFS-9087 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9087-v0.patch > > > If all datanodes are started across a cluster at the same time (or errors in > the network cause ioexceptions) there can be storms where lots of datanodes > check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread
Elliott Clark created HDFS-9087: --- Summary: Add some jitter to DataNode.checkDiskErrorThread Key: HDFS-9087 URL: https://issues.apache.org/jira/browse/HDFS-9087 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark If all datanodes are started across a cluster at the same time (or errors in the network cause ioexceptions) there can be storms where lots of datanodes check their disks at the exact same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads
[ https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark resolved HDFS-7492. - Resolution: Duplicate Fixed in HDFS-7531. Since there are no more locks on FsVolumeList there isn't a contention. > If multiple threads call FsVolumeList#checkDirs at the same time, we should > only do checkDirs once and give the results to all waiting threads > -- > > Key: HDFS-7492 > URL: https://issues.apache.org/jira/browse/HDFS-7492 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Colin Patrick McCabe >Assignee: Elliott Clark >Priority: Minor > > checkDirs is called when we encounter certain I/O errors. It's rare to get > just a single I/O error... normally you start getting many errors when a disk > is going bad. For this reason, we shouldn't start a new checkDirs scan for > each error. Instead, if multiple threads call FsVolumeList#checkDirs at > around the same time, we should only do checkDirs once and give the results > to all the waiting threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads
[ https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark reassigned HDFS-7492: --- Assignee: Elliott Clark > If multiple threads call FsVolumeList#checkDirs at the same time, we should > only do checkDirs once and give the results to all waiting threads > -- > > Key: HDFS-7492 > URL: https://issues.apache.org/jira/browse/HDFS-7492 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Colin Patrick McCabe >Assignee: Elliott Clark >Priority: Minor > > checkDirs is called when we encounter certain I/O errors. It's rare to get > just a single I/O error... normally you start getting many errors when a disk > is going bad. For this reason, we shouldn't start a new checkDirs scan for > each error. Instead, if multiple threads call FsVolumeList#checkDirs at > around the same time, we should only do checkDirs once and give the results > to all the waiting threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads
[ https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746491#comment-14746491 ] Elliott Clark commented on HDFS-7492: - I'm going to grab this one. We're seeing this in production. There's an un-related issue with one datanode locking up (still heart beating to NN but not able to make progress on anything that hits disks). So all datanodes talking to the bad node throw a bunch of IOExceptions. This causes a significant portion of the cluster to checkDiskError while the network issue is going on. FsDatasetImpl.checkDirs holds a lock so all new xceivers are blocked by the checkDiskError. This causes more time outs and basically serializes all reading and writing to blocks until everything on the cluster settles down. {code} "DataXceiver for client unix:/mnt/d2/hdfs-socket/dn.50010 [Passing file descriptors for block BP-1735829752-10.210.49.21-1437433901380:blk_1121816087_48310306]" #85474 daemon prio=5 os_prio=0 tid=0x7f10910b2800 nid=0x5d44f waiting for monitor entry [0x7f1072c06000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:606) - waiting to lock <0x0007015a3fe8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:618) at org.apache.hadoop.hdfs.server.datanode.DataNode.requestShortCircuitFdsForRead(DataNode.java:1524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:287) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:185) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) "DataXceiver for client DFSClient_NONMAPREDUCE_-1067692187_1 at /10.210.65.21:33560 [Receiving block BP-1735829752-10.210.49.21-1437433901380:blk_1121839247_48333595]" #85463 daemon prio=5 os_prio=0 tid=0x7f108933d800 nid=0x5d28e waiting for monitor entry [0x7f1072904000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:63) - waiting to lock <0x0007015a4030> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1084) - locked <0x0007015a3fe8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) "Thread-13149" #13302 daemon prio=5 os_prio=0 tid=0x7f10884a9000 nid=0xe9e7 runnable [0x7f1076e6] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1316) at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:300) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:307) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:183) - locked <0x0007015a4030> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1743) at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3002) at org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:240) at org.apach
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Resolution: Fixed Status: Resolved (was: Patch Available) > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Labels: BB2015-05-TBR > Attachments: 3node_get_200mb.png, 3node_put_200mb.png, > 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, > HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, > HDFS-5274-14.patch, HDFS-5274-15.patch, HDFS-5274-16.patch, > HDFS-5274-17.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, > HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, HDFS-5274-8.patch, > HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin Trace a06e941b0172ec73.png, > Zipkin Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628763#comment-14628763 ] Elliott Clark commented on HDFS-8078: - PING? > HDFS client gets errors trying to to connect to IPv6 DataNode > - > > Key: HDFS-8078 > URL: https://issues.apache.org/jira/browse/HDFS-8078 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.6.0 >Reporter: Nate Edel >Assignee: Nate Edel > Labels: BB2015-05-TBR, ipv6 > Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch > > > 1st exception, on put: > 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: 2401:db00:1010:70ba:face:0:8:0:50010 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) > at > org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) > Appears to actually stem from code in DataNodeID which assumes it's safe to > append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for > IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which > requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 > Currently using InetAddress.getByName() to validate IPv6 (guava > InetAddresses.forString has been flaky) but could also use our own parsing. > (From logging this, it seems like a low-enough frequency call that the extra > object creation shouldn't be problematic, and for me the slight risk of > passing in bad input that is not actually an IPv4 or IPv6 address and thus > calling an external DNS lookup is outweighed by getting the address > normalized and avoiding rewriting parsing.) > Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() > --- > 2nd exception (on datanode) > 15/04/13 13:18:07 ERROR datanode.DataNode: > dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown > operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: > /2401:db00:11:d010:face:0:2f:0:50010 > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:315) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > Which also comes as client error "-get: 2401 is not an IP string literal." > This one has existing parsing logic which needs to shift to the last colon > rather than the first. Should also be a tiny bit faster by using lastIndexOf > rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592426#comment-14592426 ] Elliott Clark commented on HDFS-8078: - +1 (non-binding) on the latest patch. I agree that more unit-testing and other things will be needed before we can declare full ipv6 support. However this is a pretty huge step in the right direction that we shouldn't let bit rot. It includes tests to keep regressions in this code from popping up and is tested on a real cluster. > HDFS client gets errors trying to to connect to IPv6 DataNode > - > > Key: HDFS-8078 > URL: https://issues.apache.org/jira/browse/HDFS-8078 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.6.0 >Reporter: Nate Edel >Assignee: Nate Edel > Labels: BB2015-05-TBR, ipv6 > Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch > > > 1st exception, on put: > 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: 2401:db00:1010:70ba:face:0:8:0:50010 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) > at > org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) > Appears to actually stem from code in DataNodeID which assumes it's safe to > append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for > IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which > requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 > Currently using InetAddress.getByName() to validate IPv6 (guava > InetAddresses.forString has been flaky) but could also use our own parsing. > (From logging this, it seems like a low-enough frequency call that the extra > object creation shouldn't be problematic, and for me the slight risk of > passing in bad input that is not actually an IPv4 or IPv6 address and thus > calling an external DNS lookup is outweighed by getting the address > normalized and avoiding rewriting parsing.) > Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() > --- > 2nd exception (on datanode) > 15/04/13 13:18:07 ERROR datanode.DataNode: > dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown > operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: > /2401:db00:11:d010:face:0:2f:0:50010 > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:315) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > Which also comes as client error "-get: 2401 is not an IP string literal." > This one has existing parsing logic which needs to shift to the last colon > rather than the first. Should also be a tiny bit faster by using lastIndexOf > rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586351#comment-14586351 ] Elliott Clark commented on HDFS-8078: - I don't think that a feature branch is really needed here since each part gets things better. Lets just keep moving forward rather than one big bang integration. I'm +1 on the patch. I've seen the results on an ipv6 machine. > HDFS client gets errors trying to to connect to IPv6 DataNode > - > > Key: HDFS-8078 > URL: https://issues.apache.org/jira/browse/HDFS-8078 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.6.0 >Reporter: Nate Edel >Assignee: Nate Edel > Labels: BB2015-05-TBR, ipv6 > Attachments: HDFS-8078.9.patch > > > 1st exception, on put: > 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: 2401:db00:1010:70ba:face:0:8:0:50010 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) > at > org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) > Appears to actually stem from code in DataNodeID which assumes it's safe to > append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for > IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which > requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 > Currently using InetAddress.getByName() to validate IPv6 (guava > InetAddresses.forString has been flaky) but could also use our own parsing. > (From logging this, it seems like a low-enough frequency call that the extra > object creation shouldn't be problematic, and for me the slight risk of > passing in bad input that is not actually an IPv4 or IPv6 address and thus > calling an external DNS lookup is outweighed by getting the address > normalized and avoiding rewriting parsing.) > Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() > --- > 2nd exception (on datanode) > 15/04/13 13:18:07 ERROR datanode.DataNode: > dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown > operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: > /2401:db00:11:d010:face:0:2f:0:50010 > java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:315) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > Which also comes as client error "-get: 2401 is not an IP string literal." > This one has existing parsing logic which needs to shift to the last colon > rather than the first. Should also be a tiny bit faster by using lastIndexOf > rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-7834: Attachment: HDFS-7834-branch-2-0.patch Here's a patch for branch-2. > Allow HDFS to bind to ipv6 conditionally > > > Key: HDFS-7834 > URL: https://issues.apache.org/jira/browse/HDFS-7834 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-7834-branch-2-0.patch > > > Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true > While this was needed a while ago. IPV6 on java works much better now and > there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-7834: Affects Version/s: 2.6.0 Fix Version/s: 2.7.0 > Allow HDFS to bind to ipv6 conditionally > > > Key: HDFS-7834 > URL: https://issues.apache.org/jira/browse/HDFS-7834 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.0 > > > Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true > While this was needed a while ago. IPV6 on java works much better now and > there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
Elliott Clark created HDFS-7834: --- Summary: Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Resolution: Won't Fix Status: Resolved (was: Patch Available) > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch, HDFS-4670-1.patch, Hadoop > JournalNode.png, Hadoop NameNode.png, ha2.PNG, hdfs_browser.png > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971846#comment-13971846 ] Elliott Clark commented on HDFS-5274: - Please don't add htrace-zpikin as a dependency. That version of thrift pulled in is very old and we don't want to have classpath issues because of a new feature. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: 3node_get_200mb.png, 3node_put_200mb.png, > 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, > HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, > HDFS-5274-14.patch, HDFS-5274-15.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, > HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, > HDFS-5274-8.patch, HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin Trace > a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, > ss-5274v8-put.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833293#comment-13833293 ] Elliott Clark commented on HDFS-5274: - Ping ? > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, > Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-6.patch Here's a patch that adds annotations for DFSInputStream.seek > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, > Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-5.patch > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, Zipkin Trace > a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: Zipkin Trace d0f0d66b8a258a69.png Another example > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, HDFS-5274-4.patch, Zipkin Trace a06e941b0172ec73.png, > Zipkin Trace d0f0d66b8a258a69.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-4.patch Here's a lot more rigorous testing. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, HDFS-5274-4.patch, Zipkin Trace a06e941b0172ec73.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-3.patch Fix for tests. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > HDFS-5274-3.patch, Zipkin Trace a06e941b0172ec73.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-2.patch Instrumented Sender and Receiver (Though some of those code paths are not hit as well). better read side instrumentation. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, > Zipkin Trace a06e941b0172ec73.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: Zipkin Trace a06e941b0172ec73.png Here's an example of what I have currently. I'm still trying to balance what should be instrumented. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, Zipkin Trace > a06e941b0172ec73.png > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-1.patch WIP path. This one has testing for the read and write paths started. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Affects Version/s: 2.1.1-beta Status: Patch Available (was: Open) > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.1.1-beta >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-5274: Attachment: HDFS-5274-0.patch Here's an initial implementation of the tracing. Some more annotations and instrumentation could be added if needed. > Add Tracing to HDFS > --- > > Key: HDFS-5274 > URL: https://issues.apache.org/jira/browse/HDFS-5274 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-5274-0.patch > > > Since Google's Dapper paper has shown the benefits of tracing for a large > distributed system, it seems like a good time to add tracing to HDFS. HBase > has added tracing using HTrace. I propose that the same can be done within > HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5274) Add Tracing to HDFS
Elliott Clark created HDFS-5274: --- Summary: Add Tracing to HDFS Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Elliott Clark Assignee: Elliott Clark Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629829#comment-13629829 ] Elliott Clark commented on HDFS-4670: - bq.In that case, can you help me understand the inclusion of the js files in the patch? I included the whole bootstrap environment so that anyone who wanted to extend the webui and knew bootstrap would feel comfortable. I could go either way on this. For HBase I ended up using most of the javascript features before I was done. Here I'm not sure that they would be worth it. bq.has anyone reviewed this on android and apple phones and tablets I've tried it on an android phone. It worked well and collapsed down. The tables were a little large for a phone. I'll put trying it on a tablet on my todo for the next revision. > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: ha2.PNG, Hadoop JournalNode.png, Hadoop NameNode.png, > HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627012#comment-13627012 ] Elliott Clark commented on HDFS-4670: - [~azuryy] I'll have to check but I thought that browse filesystem button is functional on the standby namenode. In which case displaying it isn't an error. [~cnauroth] # Browser support: This degrades all the way down to lynx so, browser suport should be good. Things may not be pretty in ie6 but they should all be usable. # Yep right now there's no reliance on javascript. I've tested in lynx and everything looked good and was pretty easy to read. # That's how HBase handled it. Good catch on the typo. [~tgraves] Bootstrap is absolutely the gold standard in base css frameworks. It's got the most momentum and some of the best community support. This is just a css change if you want functionality such as paging and ajax single page applications, this is just a stepping stone for that (though the css constructs for displaying it all exist). > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: ha2.PNG, Hadoop JournalNode.png, Hadoop NameNode.png, > HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Attachment: Hadoop NameNode.png hdfs_browser.png Hadoop JournalNode.png Screenshots > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: Hadoop JournalNode.png, Hadoop NameNode.png, > HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Attachment: HDFS-4670-1.patch Updated patch to pass tests and findbugs. > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: Hadoop JournalNode.png, Hadoop NameNode.png, > HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626153#comment-13626153 ] Elliott Clark commented on HDFS-4670: - This patch has all of the HDFS web ui's (datanode, namenode, and qjm) styled. bq.Can you post some screenshots of what the new ui looks like? Sure, I'll post some screenshots with the next version of the patch that passes the failed tests. > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625913#comment-13625913 ] Elliott Clark commented on HDFS-4670: - Mostly the styling isn't current. The Web ui doesn't have good typography, and things are not pleasing to the eye. This leads the user to think the web pages haven't seen any developer love in quite a while. > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Status: Patch Available (was: Open) > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Description: A users' first experience of Apache Hadoop is often looking at the web ui. This should give the user confidence that the project is usable and relatively current. (was: A users' first experience of Apache Hadoop is often looking at the web ui. This should give the user confidence that the project is usable and recently current.) > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and > relatively current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-4670: Attachment: HDFS-4670-0.patch > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Priority: Minor > Attachments: HDFS-4670-0.patch > > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and recently > current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
[ https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625687#comment-13625687 ] Elliott Clark commented on HDFS-4670: - [~adityaacharya] [~andrew.wang] And I have a worked on a patch that adds bootsrap to the HDFS web ui's. A mapred version is planned in the furture. > Style Hadoop HDFS web ui's with Twitter's bootstrap. > > > Key: HDFS-4670 > URL: https://issues.apache.org/jira/browse/HDFS-4670 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark >Priority: Minor > > A users' first experience of Apache Hadoop is often looking at the web ui. > This should give the user confidence that the project is usable and recently > current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.
Elliott Clark created HDFS-4670: --- Summary: Style Hadoop HDFS web ui's with Twitter's bootstrap. Key: HDFS-4670 URL: https://issues.apache.org/jira/browse/HDFS-4670 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark Priority: Minor A users' first experience of Apache Hadoop is often looking at the web ui. This should give the user confidence that the project is usable and recently current. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira