[jira] [Resolved] (HDFS-12326) What is the correct way of retrying when failure occurs during writing

Andras Bokor (JIRA) Fri, 08 Sep 2017 01:43:02 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andras Bokor resolved HDFS-12326.
---------------------------------
    Resolution: Not A Problem

It seems like a question, not a bug.

> What is the correct way of retrying when failure occurs during writing
> ----------------------------------------------------------------------
>
>                 Key: HDFS-12326
>                 URL: https://issues.apache.org/jira/browse/HDFS-12326
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: hdfs-client
>            Reporter: ZhangBiao
>
> I'm using hdfs client for golang https://github.com/colinmarc/hdfs to write 
> to the hdfs. And I'm using hadoop 2.7.3
> When the number of files concurrently being opened is larger, for example 
> 200. I'll always get the 'broken pipe' error.
> So I want to retry to continue writing. What is the correct way of retrying? 
> Because https://github.com/colinmarc/hdfs hasn't been able to recover the 
> stream status when an error occurs duing writing, so I have to reopen and get 
> a new stream. So I tried the following steps:
> 1 close the current stream
> 2 Append the file to get a new stream
> But when I close the stream, I got the error "updateBlockForPipeline call 
> failed with ERROR_APPLICATION (java.io.IOException"
> and it seems the namenode complains:
> {code:java}
> 2017-08-20 03:22:55,598 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 9000, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from 
> 192.168.0.39:46827 Call#50183 Retry#-1
> java.io.IOException: 
> BP-1152809458-192.168.0.39-1502261411064:blk_1073825071_111401 does not exist 
> or is not under Constructionblk_1073825071_111401{UCState=COMMITTED, 
> truncateBlock=null, primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
>  
> ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6241)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6309)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:806)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> 2017-08-20 03:22:56,333 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1073825071_111401{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-d61914ba-df64-467b-bb75-272875e5e865:NORMAL:192.168.0.39:50010|RBW],
>  
> ReplicaUC[[DISK]DS-1314debe-ab08-4001-ab9a-8e234f28f87c:NORMAL:192.168.0.38:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 <  minimum = 1) in 
> file 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
> {code}
> when I Appended to get a new stream, I got the error 'append call failed with 
> ERROR_APPLICATION 
> (org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException)', and the 
> corresponding error in namenode is:
> {code:java}
> 2017-08-20 03:22:56,335 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.append: Failed to APPEND_FILE 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
>  for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because 
> go-hdfs-OAfvZiSUM2Eu894p is already the current lease holder.
> 2017-08-20 03:22:56,335 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 
> 192.168.0.39:46827 Call#50186 Retry#-1: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
> APPEND_FILE 
> /user/am/scan_task/2017-08-20/192.168.0.38_audience_f/user-bak010-20170820030804.log
>  for go-hdfs-OAfvZiSUM2Eu894p on 192.168.0.39 because 
> go-hdfs-OAfvZiSUM2Eu894p is already the current lease holder.
> {code}
> Could you please suggest the correct way of retrying of the client side when 
> write fails?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDFS-12326) What is the correct way of retrying when failure occurs during writing

Reply via email to