[jira] [Comment Edited] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout

2018-01-09 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319031#comment-16319031
 ] 

Manoj Govindassamy edited comment on HDFS-12994 at 1/9/18 7:51 PM:
---

Got it. Patch v01 looks good to me. +1, thanks for working on this.


was (Author: manojg):
Got it. Patch v02 looks good to me. +1, thanks for working on this.

> TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket 
> timeout
> 
>
> Key: HDFS-12994
> URL: https://issues.apache.org/jira/browse/HDFS-12994
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch
>
>
> Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout
> {code}
> 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO  
> datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - 
> Exception while creating remote block reader, datanode 127.0.0.1:34145
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> while the target datanode is removed in the test:
> {code}
> 2017-12-26 20:35:18,710 [Thread-2393] INFO  net.NetworkTopology 
> (NetworkTopology.java:remove(219)) - Removing a node: 
> /default-rack/127.0.0.1:34145
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout

2018-01-09 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319031#comment-16319031
 ] 

Manoj Govindassamy edited comment on HDFS-12994 at 1/9/18 7:51 PM:
---

Got it. Patch v02 looks good to me. +1, thanks for working on this.


was (Author: manojg):
Got it. Patch v01 looks good to me. +1, thanks for working on this.

> TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket 
> timeout
> 
>
> Key: HDFS-12994
> URL: https://issues.apache.org/jira/browse/HDFS-12994
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch
>
>
> Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout
> {code}
> 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO  
> datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - 
> Exception while creating remote block reader, datanode 127.0.0.1:34145
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> while the target datanode is removed in the test:
> {code}
> 2017-12-26 20:35:18,710 [Thread-2393] INFO  net.NetworkTopology 
> (NetworkTopology.java:remove(219)) - Removing a node: 
> /default-rack/127.0.0.1:34145
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout

2018-01-09 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319027#comment-16319027
 ] 

Lei (Eddy) Xu edited comment on HDFS-12994 at 1/9/18 7:49 PM:
--

Yes, [~manojg]. this patch is let DN issues being discovered within the test 
timeout. Originally, the socket timeout is {{60s}}, and the unit test is also 
{{60s}} for two configurations combined. 

bq.  And, the problem should happen always when the DN is removed right? 

This happens only when the NN schedule the recovery tasks *between* shutting 
down two DNs. If two DNs have been shutdown before scheduling the recovery 
task, none of these two DNs will be considered as valid source. 


was (Author: eddyxu):
Yes, this patch is let DN issues being discovered within the test timeout. 
Originally, the socket timeout is {{60s}}, and the unit test is also {{60s}} 
for two configurations combined. 

bq.  And, the problem should happen always when the DN is removed right? 

This happens only when the NN schedule the recovery tasks *between* shutting 
down two DNs. If two DNs have been shutdown before scheduling the recovery 
task, none of these two DNs will be considered as valid source. 

> TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket 
> timeout
> 
>
> Key: HDFS-12994
> URL: https://issues.apache.org/jira/browse/HDFS-12994
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch
>
>
> Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout
> {code}
> 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO  
> datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - 
> Exception while creating remote block reader, datanode 127.0.0.1:34145
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> while the target datanode is removed in the test:
> {code}
> 2017-12-26 20:35:18,710 [Thread-2393] INFO  net.NetworkTopology 
> (NetworkTopology.java:remove(219)) - Removing a node: 
> /default-rack/127.0.0.1:34145
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org