[jira] [Updated] (HDFS-12326) What is the correct way of retrying when failure occurs during writing

ZhangBiao (JIRA) Sat, 19 Aug 2017 20:45:10 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ZhangBiao updated HDFS-12326:
-----------------------------
    Description: 
I'm using hdfs client for golang https://github.com/colinmarc/hdfs to write to 
the hdfs. And I'm using hadoop 2.7.3
When the number of files concurrently being opened is larger, for example 200. 
I'll always get the 'broken pipe' error.

So I want to retry to continue writing. What is the correct way of retrying? 
Because https://github.com/colinmarc/hdfs hasn't been able to recover the 
stream status when an error occurs duing writing, so I have to reopen and get a 
new stream. So I tried the following steps:
1 close the current stream
2 Append the file to get a new stream

But when I close the stream, I got the error "updateBlockForPipeline call 
failed with ERROR_APPLICATION (java.io.IOException"
and it seems the namenode complains:

{code:java}
2017-08-20 03:22:55,598 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 
on 9000, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from 
192.168.0.39:46827 Call#50183 Retry#-1
java.io.IOException: 
BP-1152809458-192.168.0.39-1502261411064:blk_1073825071_111401 does not exist 
or is not under Constructionblk_1073825071_111401{UCState=COMMITTED, 
truncateBlock=null, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
 
ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6241)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6309)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:806)
    at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
    at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2017-08-20 03:22:56,333 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
blk_1073825071_111401{UCState=COMMITTED, truncateBlock=null, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
 
ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  minimum = 1) in file 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
{code}

when I Appended to get a new stream, I got the error 'append call failed with 
ERROR_APPLICATION 
(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException)', and the 
corresponding error in namenode is:

{code:java}
2017-08-20 03:22:56,335 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to APPEND_FILE 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
 for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because go-hdfs-OAfvZiSUM2Eu894p 
is already the current lease holder.
2017-08-20 03:22:56,335 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 
192.168.0.39:46827 Call#50186 Retry#-1: 
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
APPEND_FILE 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
 for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because go-hdfs-OAfvZiSUM2Eu894p 
is already the current lease holder.
{code}

Could you please suggest the correct way of retrying of the client side when 
write fails?

  was:
I'm using hdfs client for golang https://github.com/colinmarc/hdfs to write to 
the hdfs.
When the number of files concurrently being opened is larger, for example 200. 
I'll always get the 'broken pipe' error.

So I want to retry to continue writing. What is the correct way of retrying? 
Because https://github.com/colinmarc/hdfs hasn't been able to recover the 
stream status when an error occurs duing writing, so I have to reopen and get a 
new stream. So I tried the following steps:
1 close the current stream
2 Append the file to get a new stream

But when I close the stream, I got the error "updateBlockForPipeline call 
failed with ERROR_APPLICATION (java.io.IOException"
and it seems the namenode complains:

{code:java}
2017-08-20 03:22:55,598 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 
on 9000, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from 
192.168.0.39:46827 Call#50183 Retry#-1
java.io.IOException: 
BP-1152809458-192.168.0.39-1502261411064:blk_1073825071_111401 does not exist 
or is not under Constructionblk_1073825071_111401{UCState=COMMITTED, 
truncateBlock=null, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
 
ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6241)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6309)
    at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:806)
    at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
    at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2017-08-20 03:22:56,333 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
blk_1073825071_111401{UCState=COMMITTED, truncateBlock=null, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
 
ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  minimum = 1) in file 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
{code}

when I Appended to get a new stream, I got the error 'append call failed with 
ERROR_APPLICATION 
(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException)', and the 
corresponding error in namenode is:

{code:java}
2017-08-20 03:22:56,335 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to APPEND_FILE 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
 for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because go-hdfs-OAfvZiSUM2Eu894p 
is already the current lease holder.
2017-08-20 03:22:56,335 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 
192.168.0.39:46827 Call#50186 Retry#-1: 
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
APPEND_FILE 
/user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
 for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because go-hdfs-OAfvZiSUM2Eu894p 
is already the current lease holder.
{code}

Could you please suggest the correct way of retrying of the client side when 
write fails?


> What is the correct way of retrying when failure occurs during writing
> ----------------------------------------------------------------------
>
>                 Key: HDFS-12326
>                 URL: https://issues.apache.org/jira/browse/HDFS-12326
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: hdfs-client
>            Reporter: ZhangBiao
>
> I'm using hdfs client for golang https://github.com/colinmarc/hdfs to write 
> to the hdfs. And I'm using hadoop 2.7.3
> When the number of files concurrently being opened is larger, for example 
> 200. I'll always get the 'broken pipe' error.
> So I want to retry to continue writing. What is the correct way of retrying? 
> Because https://github.com/colinmarc/hdfs hasn't been able to recover the 
> stream status when an error occurs duing writing, so I have to reopen and get 
> a new stream. So I tried the following steps:
> 1 close the current stream
> 2 Append the file to get a new stream
> But when I close the stream, I got the error "updateBlockForPipeline call 
> failed with ERROR_APPLICATION (java.io.IOException"
> and it seems the namenode complains:
> {code:java}
> 2017-08-20 03:22:55,598 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 9000, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from 
> 192.168.0.39:46827 Call#50183 Retry#-1
> java.io.IOException: 
> BP-1152809458-192.168.0.39-1502261411064:blk_1073825071_111401 does not exist 
> or is not under Constructionblk_1073825071_111401{UCState=COMMITTED, 
> truncateBlock=null, primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
>  
> ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6241)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6309)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:806)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> 2017-08-20 03:22:56,333 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1073825071_111401{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
>  
> ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 <  minimum = 1) in 
> file 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
> {code}
> when I Appended to get a new stream, I got the error 'append call failed with 
> ERROR_APPLICATION 
> (org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException)', and the 
> corresponding error in namenode is:
> {code:java}
> 2017-08-20 03:22:56,335 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.append: Failed to APPEND_FILE 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
>  for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because 
> go-hdfs-OAfvZiSUM2Eu894p is already the current lease holder.
> 2017-08-20 03:22:56,335 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 
> 192.168.0.39:46827 Call#50186 Retry#-1: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
> APPEND_FILE 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
>  for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because 
> go-hdfs-OAfvZiSUM2Eu894p is already the current lease holder.
> {code}
> Could you please suggest the correct way of retrying of the client side when 
> write fails?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12326) What is the correct way of retrying when failure occurs during writing

Reply via email to