[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication

2016-07-12 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373663#comment-15373663
 ] 

Elliott Clark commented on HDFS-10564:
--

It doesn't seem to be working. We have multiple times per day where the number 
of under min repl'd blocks stays for hours.

> UNDER MIN REPL'D BLOCKS should be prioritized for replication
> -
>
> Key: HDFS-10564
> URL: https://issues.apache.org/jira/browse/HDFS-10564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>
> When datanodes get drained they are probably being drained because the 
> hardware is bad, or suspect. The blocks that have no live nodes should be 
> prioritized. However it appears not to be the case at all.
> Draining full nodes with lots of blocks but only a handful of under min 
> replicated blocks takes about the full time before fsck reports clean again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication

2016-07-05 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362879#comment-15362879
 ] 

Elliott Clark commented on HDFS-10564:
--

Yeah sorry Draining means decommissioning.

> UNDER MIN REPL'D BLOCKS should be prioritized for replication
> -
>
> Key: HDFS-10564
> URL: https://issues.apache.org/jira/browse/HDFS-10564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>
> When datanodes get drained they are probably being drained because the 
> hardware is bad, or suspect. The blocks that have no live nodes should be 
> prioritized. However it appears not to be the case at all.
> Draining full nodes with lots of blocks but only a handful of under min 
> replicated blocks takes about the full time before fsck reports clean again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication

2016-06-22 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-10564:


 Summary: UNDER MIN REPL'D BLOCKS should be prioritized for 
replication
 Key: HDFS-10564
 URL: https://issues.apache.org/jira/browse/HDFS-10564
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark


When datanodes get drained they are probably being drained because the hardware 
is bad, or suspect. The blocks that have no live nodes should be prioritized. 
However it appears not to be the case at all.

Draining full nodes with lots of blocks but only a handful of under min 
replicated blocks takes about the full time before fsck reports clean again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9859) Backport HDFS-6440 to branch-2

2016-06-02 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313207#comment-15313207
 ] 

Elliott Clark commented on HDFS-9859:
-

That would be great.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-9859
> URL: https://issues.apache.org/jira/browse/HDFS-9859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>
> HDFS-6440 is a very interesting feature for people who want to run HDFS in an 
> environment where machines have to join and leave a cluster. Until 3.0 is 
> close we should encourage that



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-03-05 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181894#comment-15181894
 ] 

Elliott Clark commented on HDFS-9906:
-

+1

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-03-04 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-9906:
---

 Summary: Remove spammy log spew when a datanode is restarted
 Key: HDFS-9906
 URL: https://issues.apache.org/jira/browse/HDFS-9906
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.2
Reporter: Elliott Clark


{code}
WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock request 
received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 268435456
{code}

This happens wy too much to add any useful information. We should either 
move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9859) Backport HDFS-6440 to branch-2

2016-02-25 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9859:

Affects Version/s: 2.8.0

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-9859
> URL: https://issues.apache.org/jira/browse/HDFS-9859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>
> HDFS-6440 is a very interesting feature for people who want to run HDFS in an 
> environment where machines have to join and leave a cluster. Until 3.0 is 
> close we should encourage that



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9859) Backport HDFS-6440 to branch-2

2016-02-25 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-9859:
---

 Summary: Backport HDFS-6440 to branch-2
 Key: HDFS-9859
 URL: https://issues.apache.org/jira/browse/HDFS-9859
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark


HDFS-6440 is a very interesting feature for people who want to run HDFS in an 
environment where machines have to join and leave a cluster. Until 3.0 is close 
we should encourage that



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-02-11 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143279#comment-15143279
 ] 

Elliott Clark commented on HDFS-9669:
-

Nope I just missed it on a pull. Sorry about the confusion. Thanks for the many 
many backports [~cmccabe]

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.7.3
>
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-02-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131403#comment-15131403
 ] 

Elliott Clark commented on HDFS-9669:
-

Thanks [~cmccabe] any chance of getting this on branch-2 ?

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.7.3
>
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-25 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116012#comment-15116012
 ] 

Elliott Clark commented on HDFS-9669:
-

Ping?

This is running in production and removes thousands of tcp resets.

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-20 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9669:

Attachment: HDFS-9669.1.patch

Checkstyle nit.

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-20 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9669:

Attachment: HDFS-9669.1.patch

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-20 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9669:

Affects Version/s: 2.7.2
   Status: Patch Available  (was: Open)

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-20 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9669:

Attachment: HDFS-9669.0.patch

Straight forward patch to make sure that all the places that bind use the 
listen backlog setting.

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-20 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HDFS-9669:
---

Assignee: Elliott Clark

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-19 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-9669:
---

 Summary: TcpPeerServer should respect ipc.server.listen.queue.size
 Key: HDFS-9669
 URL: https://issues.apache.org/jira/browse/HDFS-9669
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Elliott Clark


On periods of high traffic we are seeing:

{code}
16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to 
/10.138.178.47:50010 for file /MYPATH/MYFILE for block 
BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
 Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
{code}

At the time that this happens there are way less xceivers than configured.

On most JDK's this will make 50 the total backlog at any time. This effectively 
means that any GC + Busy time willl result in tcp resets.

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2016-01-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086320#comment-15086320
 ] 

Elliott Clark commented on HDFS-6440:
-

+1 for branch-2 please.

> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Fix For: 3.0.0
>
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
> hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
> hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-10-28 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

This was pushed a while ago.

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, 
> HDFS-9087-v2.patch, HDFS-9087-v3.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9266:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Avoid unsafe split and append on fields that might be IPv6 literals
> ---
>
> Key: HDFS-9266
> URL: https://issues.apache.org/jira/browse/HDFS-9266
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9266-HADOOP-11890.1.patch, 
> HDFS-9266-HADOOP-11890.2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9266:

Summary: Avoid unsafe split and append on fields that might be IPv6 
literals  (was: hadoop-hdfs - Avoid unsafe split and append on fields that 
might be IPv6 literals)

> Avoid unsafe split and append on fields that might be IPv6 literals
> ---
>
> Key: HDFS-9266
> URL: https://issues.apache.org/jira/browse/HDFS-9266
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9266-HADOOP-11890.1.patch, 
> HDFS-9266-HADOOP-11890.2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-23 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971590#comment-14971590
 ] 

Elliott Clark commented on HDFS-9289:
-

It had all of the data and the same md5sums when I checked. So the only thing 
different was genstamps. Not really sure about why that happened. But I didn't 
mean to side track this jira.

Test looks nice.

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970213#comment-14970213
 ] 

Elliott Clark commented on HDFS-9289:
-

{code}
15/10/22 09:37:36 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW 
replica with genstamp 116735085 does not match COMPLETE block's genstamp in 
block map 116737586
{code}

Block lengths on "corrupt" replicas is the same as on the non-corrupt. The only 
difference is the genstamp.

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969933#comment-14969933
 ] 

Elliott Clark commented on HDFS-9289:
-

Also can we add the expected and encountered genstamps to the exception message

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9289) check genStamp when complete file

2015-10-22 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9289:

Priority: Critical  (was: Major)

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9266) hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 literals

2015-10-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969896#comment-14969896
 ] 

Elliott Clark commented on HDFS-9266:
-

+1 lets get this in and then we can rebase on master.

> hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 
> literals
> -
>
> Key: HDFS-9266
> URL: https://issues.apache.org/jira/browse/HDFS-9266
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9266-HADOOP-11890.1.patch, 
> HDFS-9266-HADOOP-11890.2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969890#comment-14969890
 ] 

Elliott Clark commented on HDFS-9289:
-

We just had this something very similar happen on a prod cluster. Then the 
datanode holding the only complete block was shut off for repair.

{code}
15/10/22 06:29:32 INFO hdfs.StateChange: BLOCK* allocateBlock: 
/TESTCLUSTER-HBASE/WALs/hbase4544.test.com,16020,1444266312515/hbase4544.test.com%2C16020%2C1444266312515.default.1445520572440.
 BP-1735829752-10.210.49.21-1437433901380 
blk_1190230043_116735085{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-52d9a122-a46a-4129-ab3d-d9041de109f8:NORMAL:10.210.31.48:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-c734b72e-27de-4dd4-a46c-7ae59f6ef792:NORMAL:10.210.31.38:50010|RBW]]}
15/10/22 06:32:48 INFO namenode.FSNamesystem: 
updatePipeline(block=BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085,
 newGenerationStamp=116737586, newLength=201675125, 
newNodes=[10.210.81.33:50010, 10.210.81.45:50010, 10.210.64.29:50010], 
clientName=DFSClient_NONMAPREDUCE_1976436475_1)
15/10/22 06:32:48 INFO namenode.FSNamesystem: 
updatePipeline(BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085)
 successfully to 
BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116737586
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.64.29:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-d5f7fff9-005d-4804-a223-b6e6624d3af2:NORMAL:10.210.81.45:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED]]}
 size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.81.45:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED]]}
 size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.81.33:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-4d937567-7184-40b7-a822-c7e3b5d588d4:NORMAL:10.210.81.33:50010|FINALIZED]]}
 size 201681322
15/10/22 09:37:36 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW 
replica with genstamp 116735085 does not match COMPLETE block's genstamp in 
block map 116737586
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* invalidateBlock: 
blk_1190230043_116735085(stored=blk_1190230043_116737586) on 10.210.31.38:50010
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1190230043_116735085 to 10.210.31.38:50010
15/10/22 09:37:39 INFO BlockStateChange: BLOCK* BlockManager: ask 
10.210.31.38:50010 to delete [blk_1190230043_116735085]
15/10/22 12:45:03 INFO BlockStateChange: BLOCK* ask 10.210.64.29:50010 to 
replicate blk_1190230043_116737586 to datanode(s) 10.210.64.56:50010
15/10/22 12:45:07 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.64.29:50010 by hbase4496.test.com/10.210.64.56 because client machine 
reported it
15/10/22 12:50:49 INFO BlockStateChange: BLOCK* ask 10.210.81.45:50010 to 
replicate blk_1190230043_116737586 to datanode(s) 10.210.49.49:50010
15/10/22 12:50:55 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.81.45:50010 by hbase4478.test.com/10.210.49.49 because client machine 
reported it
15/10/22 12:56:01 WARN blockmanagement.BlockManager: PendingReplicationMonitor 
timed out blk_1190230043_116737586
{code}

The patch will help but the issue will still be there. Is there some way to 
keep the genstamps from getting out of sync?

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/b

[jira] [Updated] (HDFS-8664) Allow wildcards in dfs.datanode.data.dir

2015-10-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-8664:

Status: Patch Available  (was: Open)

> Allow wildcards in dfs.datanode.data.dir
> 
>
> Key: HDFS-8664
> URL: https://issues.apache.org/jira/browse/HDFS-8664
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, HDFS
>Affects Versions: 3.0.0
>Reporter: Patrick White
>Assignee: Patrick White
> Attachments: HDFS-8664.001.patch, HDFS-8664.002.patch, 
> HDFS-8664.003.patch, TestBPOfferService-output.txt
>
>
> We have many disks per machine (12+) that don't always have the same 
> numbering when they come back from provisioning, but they're always in the 
> same tree following the same pattern.
> It would greatly reduce our config complexity to be able to specify a 
> wildcard for all the data directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8664) Allow wildcards in dfs.datanode.data.dir

2015-10-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-8664:

Status: Open  (was: Patch Available)

> Allow wildcards in dfs.datanode.data.dir
> 
>
> Key: HDFS-8664
> URL: https://issues.apache.org/jira/browse/HDFS-8664
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, HDFS
>Affects Versions: 3.0.0
>Reporter: Patrick White
>Assignee: Patrick White
> Attachments: HDFS-8664.001.patch, HDFS-8664.002.patch, 
> HDFS-8664.003.patch, TestBPOfferService-output.txt
>
>
> We have many disks per machine (12+) that don't always have the same 
> numbering when they come back from provisioning, but they're always in the 
> same tree following the same pattern.
> It would greatly reduce our config complexity to be able to specify a 
> wildcard for all the data directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Attachment: HDFS-9087-v3.patch

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, 
> HDFS-9087-v2.patch, HDFS-9087-v3.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-23 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Attachment: HDFS-9087-v2.patch

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, 
> HDFS-9087-v2.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-23 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905189#comment-14905189
 ] 

Elliott Clark commented on HDFS-9087:
-

bq.Do we really need to add this configuration key?
Nope not really needed. I'll remove it.

bq. It might be better to limit the jitter to 25% or 50% of the period.
Sure. 

bq.Both code paths should set checkDiskErrorInterval the same way.
Sure.

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch, 
> HDFS-9087-v2.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790746#comment-14790746
 ] 

Elliott Clark commented on HDFS-9087:
-

On a large enough cluster anything that can thundering herd will eventually. In 
this case we're seeing it on disk io and before 2.7.1 we were seeing it on 
locking FsVolumeList. I suspect that we will now start to see this on block 
replication load. Anything that can de-sync these across the cluster is better.

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790740#comment-14790740
 ] 

Elliott Clark commented on HDFS-9087:
-

Yeah HDFS-8845 makes things better however it still is bad to have anything in 
a distributed system that can get multiple machines in sync. Lots of things 
happen when a disk is checked and then listed as bad. It's good to have a large 
cluster spread out so that nothing lines up.

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Attachment: HDFS-9087-v1.patch

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch, HDFS-9087-v1.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Status: Patch Available  (was: Open)

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-9087:

Attachment: HDFS-9087-v0.patch

Add 5 seconds of jitter. This has the added benefit of adding more time in 
between disk checker runs. There's almost never going to be a time that disks 
are going to fail sequentially one every five seconds.

> Add some jitter to DataNode.checkDiskErrorThread
> 
>
> Key: HDFS-9087
> URL: https://issues.apache.org/jira/browse/HDFS-9087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9087-v0.patch
>
>
> If all datanodes are started across a cluster at the same time (or errors in 
> the network cause ioexceptions) there can be storms where lots of datanodes 
> check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9087) Add some jitter to DataNode.checkDiskErrorThread

2015-09-15 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-9087:
---

 Summary: Add some jitter to DataNode.checkDiskErrorThread
 Key: HDFS-9087
 URL: https://issues.apache.org/jira/browse/HDFS-9087
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark


If all datanodes are started across a cluster at the same time (or errors in 
the network cause ioexceptions) there can be storms where lots of datanodes 
check their disks at the exact same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads

2015-09-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark resolved HDFS-7492.
-
Resolution: Duplicate

Fixed in HDFS-7531. Since there are no more locks on FsVolumeList there isn't a 
contention.

> If multiple threads call FsVolumeList#checkDirs at the same time, we should 
> only do checkDirs once and give the results to all waiting threads
> --
>
> Key: HDFS-7492
> URL: https://issues.apache.org/jira/browse/HDFS-7492
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Colin Patrick McCabe
>Assignee: Elliott Clark
>Priority: Minor
>
> checkDirs is called when we encounter certain I/O errors.  It's rare to get 
> just a single I/O error... normally you start getting many errors when a disk 
> is going bad.  For this reason, we shouldn't start a new checkDirs scan for 
> each error.  Instead, if multiple threads call FsVolumeList#checkDirs at 
> around the same time, we should only do checkDirs once and give the results 
> to all the waiting threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads

2015-09-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HDFS-7492:
---

Assignee: Elliott Clark

> If multiple threads call FsVolumeList#checkDirs at the same time, we should 
> only do checkDirs once and give the results to all waiting threads
> --
>
> Key: HDFS-7492
> URL: https://issues.apache.org/jira/browse/HDFS-7492
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Colin Patrick McCabe
>Assignee: Elliott Clark
>Priority: Minor
>
> checkDirs is called when we encounter certain I/O errors.  It's rare to get 
> just a single I/O error... normally you start getting many errors when a disk 
> is going bad.  For this reason, we shouldn't start a new checkDirs scan for 
> each error.  Instead, if multiple threads call FsVolumeList#checkDirs at 
> around the same time, we should only do checkDirs once and give the results 
> to all the waiting threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7492) If multiple threads call FsVolumeList#checkDirs at the same time, we should only do checkDirs once and give the results to all waiting threads

2015-09-15 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746491#comment-14746491
 ] 

Elliott Clark commented on HDFS-7492:
-

I'm going to grab this one. We're seeing this in production.

There's an un-related issue with one datanode locking up (still heart beating 
to NN but not able to make progress on anything that hits disks). So all 
datanodes talking to the bad node throw a bunch of IOExceptions. This causes a 
significant portion of the cluster to checkDiskError while the network issue is 
going on. FsDatasetImpl.checkDirs holds a lock so all new xceivers are blocked 
by the checkDiskError. This causes more time outs and basically serializes all 
reading and writing to blocks until everything on the cluster settles down.
{code}
"DataXceiver for client unix:/mnt/d2/hdfs-socket/dn.50010 [Passing file 
descriptors for block 
BP-1735829752-10.210.49.21-1437433901380:blk_1121816087_48310306]" #85474 
daemon prio=5 os_prio=0 tid=0x7f10910b2800 nid=0x5d44f waiting for monitor 
entry [0x7f1072c06000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:606)
- waiting to lock <0x0007015a3fe8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:618)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.requestShortCircuitFdsForRead(DataNode.java:1524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:287)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:185)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:745)

"DataXceiver for client DFSClient_NONMAPREDUCE_-1067692187_1 at 
/10.210.65.21:33560 [Receiving block 
BP-1735829752-10.210.49.21-1437433901380:blk_1121839247_48333595]" #85463 
daemon prio=5 os_prio=0 tid=0x7f108933d800 nid=0x5d28e waiting for monitor 
entry [0x7f1072904000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:63)
- waiting to lock <0x0007015a4030> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1084)
- locked <0x0007015a3fe8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:114)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:745)

"Thread-13149" #13302 daemon prio=5 os_prio=0 tid=0x7f10884a9000 nid=0xe9e7 
runnable [0x7f1076e6]
   java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.createDirectory(Native Method)
at java.io.File.mkdir(File.java:1316)
at 
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:300)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:307)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:183)
- locked <0x0007015a4030> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1743)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3002)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:240)
at 
org.apach

[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2015-08-11 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>  Labels: BB2015-05-TBR
> Attachments: 3node_get_200mb.png, 3node_put_200mb.png, 
> 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, 
> HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, 
> HDFS-5274-14.patch, HDFS-5274-15.patch, HDFS-5274-16.patch, 
> HDFS-5274-17.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, 
> HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, HDFS-5274-8.patch, 
> HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin   Trace a06e941b0172ec73.png, 
> Zipkin   Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-07-15 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628763#comment-14628763
 ] 

Elliott Clark commented on HDFS-8078:
-

PING?

> HDFS client gets errors trying to to connect to IPv6 DataNode
> -
>
> Key: HDFS-8078
> URL: https://issues.apache.org/jira/browse/HDFS-8078
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Nate Edel
>Assignee: Nate Edel
>  Labels: BB2015-05-TBR, ipv6
> Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch
>
>
> 1st exception, on put:
> 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 2401:db00:1010:70ba:face:0:8:0:50010
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
> Appears to actually stem from code in DataNodeID which assumes it's safe to 
> append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for 
> IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
> requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
> Currently using InetAddress.getByName() to validate IPv6 (guava 
> InetAddresses.forString has been flaky) but could also use our own parsing. 
> (From logging this, it seems like a low-enough frequency call that the extra 
> object creation shouldn't be problematic, and for me the slight risk of 
> passing in bad input that is not actually an IPv4 or IPv6 address and thus 
> calling an external DNS lookup is outweighed by getting the address 
> normalized and avoiding rewriting parsing.)
> Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
> ---
> 2nd exception (on datanode)
> 15/04/13 13:18:07 ERROR datanode.DataNode: 
> dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
> operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
> /2401:db00:11:d010:face:0:2f:0:50010
> java.io.EOFException
> at java.io.DataInputStream.readShort(DataInputStream.java:315)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)
> Which also comes as client error "-get: 2401 is not an IP string literal."
> This one has existing parsing logic which needs to shift to the last colon 
> rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
> rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-06-18 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592426#comment-14592426
 ] 

Elliott Clark commented on HDFS-8078:
-

+1 (non-binding) on the latest patch.

I agree that more unit-testing and other things will be needed before we can 
declare full ipv6 support. However this is a pretty huge step in the right 
direction that we shouldn't let bit rot. It includes tests to keep regressions 
in this code from popping up and is tested on a real cluster.


> HDFS client gets errors trying to to connect to IPv6 DataNode
> -
>
> Key: HDFS-8078
> URL: https://issues.apache.org/jira/browse/HDFS-8078
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Nate Edel
>Assignee: Nate Edel
>  Labels: BB2015-05-TBR, ipv6
> Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch
>
>
> 1st exception, on put:
> 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 2401:db00:1010:70ba:face:0:8:0:50010
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
> Appears to actually stem from code in DataNodeID which assumes it's safe to 
> append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for 
> IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
> requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
> Currently using InetAddress.getByName() to validate IPv6 (guava 
> InetAddresses.forString has been flaky) but could also use our own parsing. 
> (From logging this, it seems like a low-enough frequency call that the extra 
> object creation shouldn't be problematic, and for me the slight risk of 
> passing in bad input that is not actually an IPv4 or IPv6 address and thus 
> calling an external DNS lookup is outweighed by getting the address 
> normalized and avoiding rewriting parsing.)
> Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
> ---
> 2nd exception (on datanode)
> 15/04/13 13:18:07 ERROR datanode.DataNode: 
> dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
> operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
> /2401:db00:11:d010:face:0:2f:0:50010
> java.io.EOFException
> at java.io.DataInputStream.readShort(DataInputStream.java:315)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)
> Which also comes as client error "-get: 2401 is not an IP string literal."
> This one has existing parsing logic which needs to shift to the last colon 
> rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
> rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-06-15 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586351#comment-14586351
 ] 

Elliott Clark commented on HDFS-8078:
-

I don't think that a feature branch is really needed here since each part gets 
things better. Lets just keep moving forward rather than one big bang 
integration.

I'm +1 on the patch. I've seen the results on an ipv6 machine.

> HDFS client gets errors trying to to connect to IPv6 DataNode
> -
>
> Key: HDFS-8078
> URL: https://issues.apache.org/jira/browse/HDFS-8078
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Nate Edel
>Assignee: Nate Edel
>  Labels: BB2015-05-TBR, ipv6
> Attachments: HDFS-8078.9.patch
>
>
> 1st exception, on put:
> 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 2401:db00:1010:70ba:face:0:8:0:50010
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
> Appears to actually stem from code in DataNodeID which assumes it's safe to 
> append together (ipaddr + ":" + port) -- which is OK for IPv4 and not OK for 
> IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
> requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
> Currently using InetAddress.getByName() to validate IPv6 (guava 
> InetAddresses.forString has been flaky) but could also use our own parsing. 
> (From logging this, it seems like a low-enough frequency call that the extra 
> object creation shouldn't be problematic, and for me the slight risk of 
> passing in bad input that is not actually an IPv4 or IPv6 address and thus 
> calling an external DNS lookup is outweighed by getting the address 
> normalized and avoiding rewriting parsing.)
> Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
> ---
> 2nd exception (on datanode)
> 15/04/13 13:18:07 ERROR datanode.DataNode: 
> dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
> operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
> /2401:db00:11:d010:face:0:2f:0:50010
> java.io.EOFException
> at java.io.DataInputStream.readShort(DataInputStream.java:315)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)
> Which also comes as client error "-get: 2401 is not an IP string literal."
> This one has existing parsing logic which needs to shift to the last colon 
> rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
> rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-7834:

Attachment: HDFS-7834-branch-2-0.patch

Here's a patch for branch-2.

> Allow HDFS to bind to ipv6 conditionally
> 
>
> Key: HDFS-7834
> URL: https://issues.apache.org/jira/browse/HDFS-7834
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-7834-branch-2-0.patch
>
>
> Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
> While this was needed a while ago. IPV6 on java works much better now and 
> there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-7834:

Affects Version/s: 2.6.0
Fix Version/s: 2.7.0

> Allow HDFS to bind to ipv6 conditionally
> 
>
> Key: HDFS-7834
> URL: https://issues.apache.org/jira/browse/HDFS-7834
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.7.0
>
>
> Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
> While this was needed a while ago. IPV6 on java works much better now and 
> there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-7834:
---

 Summary: Allow HDFS to bind to ipv6 conditionally
 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark


Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true

While this was needed a while ago. IPV6 on java works much better now and there 
should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2014-05-01 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch, HDFS-4670-1.patch, Hadoop 
> JournalNode.png, Hadoop NameNode.png, ha2.PNG, hdfs_browser.png
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-04-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971846#comment-13971846
 ] 

Elliott Clark commented on HDFS-5274:
-

Please don't add htrace-zpikin as a dependency.  That version of thrift pulled 
in is very old and we don't want to have classpath issues because of a new 
feature.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: 3node_get_200mb.png, 3node_put_200mb.png, 
> 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, 
> HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, 
> HDFS-5274-14.patch, HDFS-5274-15.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, 
> HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, 
> HDFS-5274-8.patch, HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin   Trace 
> a06e941b0172ec73.png, Zipkin   Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, 
> ss-5274v8-put.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2013-11-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833293#comment-13833293
 ] 

Elliott Clark commented on HDFS-5274:
-

Ping ?

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-10-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-6.patch

Here's a patch that adds annotations for DFSInputStream.seek

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-10-07 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-5.patch

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, Zipkin   Trace 
> a06e941b0172ec73.png, Zipkin   Trace d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-10-07 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: Zipkin   Trace d0f0d66b8a258a69.png

Another example

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, Zipkin   Trace a06e941b0172ec73.png, 
> Zipkin   Trace d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-10-07 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-4.patch

Here's a lot more rigorous testing.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, Zipkin   Trace a06e941b0172ec73.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-10-01 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-3.patch

Fix for tests.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, Zipkin   Trace a06e941b0172ec73.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-09-30 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-2.patch

Instrumented Sender and Receiver (Though some of those code paths are not hit 
as well).
better read side instrumentation.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> Zipkin   Trace a06e941b0172ec73.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-09-30 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: Zipkin   Trace a06e941b0172ec73.png

Here's an example of what I have currently.  I'm still trying to balance what 
should be instrumented.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, Zipkin   Trace 
> a06e941b0172ec73.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-09-30 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-1.patch

WIP path.

This one has testing for the read and write paths started. 

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-09-30 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Affects Version/s: 2.1.1-beta
   Status: Patch Available  (was: Open)

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2013-09-27 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-5274:


Attachment: HDFS-5274-0.patch

Here's an initial implementation of the tracing.  Some more annotations and 
instrumentation could be added if needed.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5274) Add Tracing to HDFS

2013-09-27 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-5274:
---

 Summary: Add Tracing to HDFS
 Key: HDFS-5274
 URL: https://issues.apache.org/jira/browse/HDFS-5274
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Elliott Clark
Assignee: Elliott Clark


Since Google's Dapper paper has shown the benefits of tracing for a large 
distributed system, it seems like a good time to add tracing to HDFS.  HBase 
has added tracing using HTrace.  I propose that the same can be done within 
HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-11 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629829#comment-13629829
 ] 

Elliott Clark commented on HDFS-4670:
-

bq.In that case, can you help me understand the inclusion of the js files in 
the patch? 
I included the whole bootstrap environment so that anyone who wanted to extend 
the webui and knew bootstrap would feel comfortable.  I could go either way on 
this.  For HBase I ended up using most of the javascript features before I was 
done.  Here I'm not sure that they would be worth it.

bq.has anyone reviewed this on android and apple phones and tablets
I've tried it on an android phone.  It worked well and collapsed down.  The 
tables were a little large for a phone.  I'll put trying it on a tablet on my 
todo for the next revision.

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: ha2.PNG, Hadoop JournalNode.png, Hadoop NameNode.png, 
> HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-09 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627012#comment-13627012
 ] 

Elliott Clark commented on HDFS-4670:
-

[~azuryy]
I'll have to check but I thought that browse filesystem button is functional on 
the standby namenode.  In which case displaying it isn't an error.

[~cnauroth]
# Browser support: This degrades all the way down to lynx so, browser suport 
should be good.  Things may not be pretty in ie6 but they should all be usable.
# Yep right now there's no reliance on javascript.  I've tested in lynx and 
everything looked good and was pretty easy to read.
# That's how HBase handled it.

Good catch on the typo.

[~tgraves]
Bootstrap is absolutely the gold standard in base css frameworks.  It's got the 
most momentum and some of the best community support.

This is just a css change if you want functionality such as paging and ajax 
single page applications, this is just a stepping stone for that (though the 
css constructs for displaying it all exist).


> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: ha2.PNG, Hadoop JournalNode.png, Hadoop NameNode.png, 
> HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Attachment: Hadoop NameNode.png
hdfs_browser.png
Hadoop JournalNode.png

Screenshots

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: Hadoop JournalNode.png, Hadoop NameNode.png, 
> HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Attachment: HDFS-4670-1.patch

Updated patch to pass tests and findbugs.

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: Hadoop JournalNode.png, Hadoop NameNode.png, 
> HDFS-4670-0.patch, HDFS-4670-1.patch, hdfs_browser.png
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626153#comment-13626153
 ] 

Elliott Clark commented on HDFS-4670:
-

This patch has all of the HDFS web ui's (datanode, namenode, and qjm) styled.

bq.Can you post some screenshots of what the new ui looks like?
Sure, I'll post some screenshots with the next version of the patch that passes 
the failed tests.

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625913#comment-13625913
 ] 

Elliott Clark commented on HDFS-4670:
-

Mostly the styling isn't current.  The Web ui doesn't have good typography, and 
things are not pleasing to the eye.  This leads the user to think the web pages 
haven't seen any developer love in quite a while.

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Status: Patch Available  (was: Open)

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Description: A users' first experience of Apache Hadoop is often looking at 
the web ui.  This should give the user confidence that the project is usable 
and relatively current.  (was: A users' first experience of Apache Hadoop is 
often looking at the web ui.  This should give the user confidence that the 
project is usable and recently current.)

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and 
> relatively current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-4670:


Attachment: HDFS-4670-0.patch

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Priority: Minor
> Attachments: HDFS-4670-0.patch
>
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and recently 
> current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625687#comment-13625687
 ] 

Elliott Clark commented on HDFS-4670:
-


[~adityaacharya] [~andrew.wang] And I have a worked on a patch that adds 
bootsrap to the HDFS web ui's.  A mapred version is planned in the furture.

> Style Hadoop HDFS web ui's with Twitter's bootstrap.
> 
>
> Key: HDFS-4670
> URL: https://issues.apache.org/jira/browse/HDFS-4670
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>Priority: Minor
>
> A users' first experience of Apache Hadoop is often looking at the web ui.  
> This should give the user confidence that the project is usable and recently 
> current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4670) Style Hadoop HDFS web ui's with Twitter's bootstrap.

2013-04-08 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-4670:
---

 Summary: Style Hadoop HDFS web ui's with Twitter's bootstrap.
 Key: HDFS-4670
 URL: https://issues.apache.org/jira/browse/HDFS-4670
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark
Priority: Minor


A users' first experience of Apache Hadoop is often looking at the web ui.  
This should give the user confidence that the project is usable and recently 
current.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira