[jira] [Assigned] (HDFS-8973) NameNode exit without any exception log
[ https://issues.apache.org/jira/browse/HDFS-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kanaka Kumar Avvaru reassigned HDFS-8973: - Assignee: Kanaka Kumar Avvaru > NameNode exit without any exception log > --- > > Key: HDFS-8973 > URL: https://issues.apache.org/jira/browse/HDFS-8973 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1 >Reporter: He Xiaoqiao >Assignee: Kanaka Kumar Avvaru >Priority: Critical > > namenode process exit without any useful WARN/ERROR log, and after .log file > output interrupt .out file continue show about 5 min GC log. when .log file > intertupt .out file print the follow ERROR, it may hint some info. it seems > cause by log4j ERROR. > {code:title=namenode.out|borderStyle=solid} > log4j:ERROR Failed to flush writer, > java.io.IOException: 错误的文件描述符 > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:318) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) > at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) > at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) > at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) > at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) > at > org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) > at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) > at > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) > at > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) > at org.apache.log4j.Category.callAppenders(Category.java:206) > at org.apache.log4j.Category.forcedLog(Category.java:391) > at org.apache.log4j.Category.log(Category.java:856) > at > org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8968) New benchmark throughput tool for striping erasure coding
[ https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HDFS-8968: - Attachment: HDFS-8968-HDFS-7285.1.patch > New benchmark throughput tool for striping erasure coding > - > > Key: HDFS-8968 > URL: https://issues.apache.org/jira/browse/HDFS-8968 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-8968-HDFS-7285.1.patch > > > We need a new benchmark tool to measure the throughput of client writing and > reading considering cases or factors: > * 3-replica or striping; > * write or read, stateful read or positional read; > * which erasure coder; > * striping cell size; > * concurrent readers/writers using processes or threads. > The tool should be easy to use and better to avoid unnecessary local > environment impact, like local disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8833: Attachment: HDFS-8833-HDFS-7285.06.patch Thanks Rui for checking the latest patch. Attaching rebased patch for the latest feature branch. > Erasure coding: store EC schema and cell size in INodeFile and eliminate > notion of EC zones > --- > > Key: HDFS-8833 > URL: https://issues.apache.org/jira/browse/HDFS-8833 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-7285 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-8833-HDFS-7285-merge.00.patch, > HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch, > HDFS-8833-HDFS-7285.03.patch, HDFS-8833-HDFS-7285.04.patch, > HDFS-8833-HDFS-7285.05.patch, HDFS-8833-HDFS-7285.06.patch > > > We have [discussed | > https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] > storing EC schema with files instead of EC zones and recently revisited the > discussion under HDFS-8059. > As a recap, the _zone_ concept has severe limitations including renaming and > nested configuration. Those limitations are valid in encryption for security > reasons and it doesn't make sense to carry them over in EC. > This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For > simplicity, we should first implement it as an xattr and consider memory > optimizations (such as moving it to file header) as a follow-on. We should > also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733250#comment-14733250 ] Harsh J commented on HDFS-7899: --- Thanks Jagadesh, that message change was just a small idea to make it carry slightly more sense. Do you have any ideas also to improve the situation such that users may be able to self-figure out whats going on? I've seen this appear during socket disconnects/timeouts/etc. - but the message it prints is from the software layer instead, which causes confusion. > Improve EOF error message > - > > Key: HDFS-7899 > URL: https://issues.apache.org/jira/browse/HDFS-7899 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.6.0 >Reporter: Harsh J >Assignee: Jagadesh Kiran N >Priority: Minor > Attachments: HDFS-7899-00.patch > > > Currently, a DN disconnection for reasons other than connection timeout or > refused messages, such as an EOF message as a result of rejection or other > network fault, reports in this manner: > {code} > WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for > block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no > length prefix available > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) > > at > org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) > > at > org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) > > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) > > at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) > > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) > {code} > This is not very clear to a user (warn's at the hdfs-client). It could likely > be improved with a more diagnosable message, or at least the direct reason > than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733248#comment-14733248 ] Li Bo commented on HDFS-8704: - I have tried replacing the failed streamer with a new one. When replacing, the outputstream has to stop sending packets to old streamer and start sending packets to new streamer after all packets of next block are moved from old to new streamer. It's much more difficult than restarting the failed streamer. The auto restart of failed streamer makes ouputstream unnecessary to care about if some streamer is failed. > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9031) libhdfs should use doxygen plugin to generate mvn site output
Allen Wittenauer created HDFS-9031: -- Summary: libhdfs should use doxygen plugin to generate mvn site output Key: HDFS-9031 URL: https://issues.apache.org/jira/browse/HDFS-9031 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer Priority: Blocker Rather than point people to the hdfs.h file, we should take advantage of the doxyfile and actually generate for mvn site so it shows up on the website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733224#comment-14733224 ] Rui Li commented on HDFS-8833: -- Hey [~zhz], we'd like to try out your patch. But it seems v5 patch doesn't apply to latest HDFS-7285 branch. Would you mind rebase your patch? Thanks. > Erasure coding: store EC schema and cell size in INodeFile and eliminate > notion of EC zones > --- > > Key: HDFS-8833 > URL: https://issues.apache.org/jira/browse/HDFS-8833 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-7285 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-8833-HDFS-7285-merge.00.patch, > HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch, > HDFS-8833-HDFS-7285.03.patch, HDFS-8833-HDFS-7285.04.patch, > HDFS-8833-HDFS-7285.05.patch > > > We have [discussed | > https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] > storing EC schema with files instead of EC zones and recently revisited the > discussion under HDFS-8059. > As a recap, the _zone_ concept has severe limitations including renaming and > nested configuration. Those limitations are valid in encryption for security > reasons and it doesn't make sense to carry them over in EC. > This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For > simplicity, we should first implement it as an xattr and consider memory > optimizations (such as moving it to file header) as a follow-on. We should > also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9030) libwebhdfs lacks headers and documentation
[ https://issues.apache.org/jira/browse/HDFS-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N reassigned HDFS-9030: -- Assignee: Jagadesh Kiran N > libwebhdfs lacks headers and documentation > -- > > Key: HDFS-9030 > URL: https://issues.apache.org/jira/browse/HDFS-9030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Jagadesh Kiran N >Priority: Blocker > > This library is useless without header files to include and documentation on > how to use it. Both appear to be missing from the mvn package and site > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9029) libwebhdfs is not in the mvn package and likely missing from all distributions
[ https://issues.apache.org/jira/browse/HDFS-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N reassigned HDFS-9029: -- Assignee: Jagadesh Kiran N > libwebhdfs is not in the mvn package and likely missing from all distributions > -- > > Key: HDFS-9029 > URL: https://issues.apache.org/jira/browse/HDFS-9029 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Jagadesh Kiran N >Priority: Blocker > > libwebhdfs is not in the tar.gz generated by maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8647: --- Status: Patch Available (was: Open) > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > Attachments: HDFS-8647-001.patch > > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8647: --- Attachment: HDFS-8647-001.patch > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > Attachments: HDFS-8647-001.patch > > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8647: --- Attachment: (was: HDFS-8647-001.patch) > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-237) Better handling of dfsadmin command when namenode is slow
[ https://issues.apache.org/jira/browse/HDFS-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-237. -- Resolution: Later This older JIRA is a bit stale given the multiple changes that went into the RPC side. Follow HADOOP-9640 and related JIRAs instead for more recent work. bq. a separate rpc queue This is supported today via the servicerpc-address configs (typically set to 8022, and strongly recommended for HA modes). > Better handling of dfsadmin command when namenode is slow > - > > Key: HDFS-237 > URL: https://issues.apache.org/jira/browse/HDFS-237 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Koji Noguchi > > Probably when hitting HADOOP-3810, Namenode became unresponsive. Large time > spent in GC. > All dfs/dfsadmin command were timing out. > WebUI was coming up after waiting for a long time. > Maybe setting a long timeout would have made the dfsadmin command go through. > But it would be nice to have a separate queue/handler which doesn't compete > with regular rpc calls. > All I wanted to do was dfsadmin -safemode enter, dfsadmin -finalizeUpgrade ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8647: --- Attachment: HDFS-8647-001.patch Attached the initial patch for review.. Kindly Review.. Did the following 1) Moved useDelHint 2) replaced shouldCheckForEnoughRacks with hasClusterEverBeenMultiRack 3) Moved excessReplicas logic(from racks) > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > Attachments: HDFS-8647-001.patch > > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733201#comment-14733201 ] Walter Su commented on HDFS-8704: - How about streamer replacement? > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9011) Support splitting BlockReport of a storage into multiple RPC
[ https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733192#comment-14733192 ] Yi Liu commented on HDFS-9011: -- Please also add datanode restart in the {{TestSplitBlockReport}}, then it can cover tests for my comment #1. > Support splitting BlockReport of a storage into multiple RPC > > > Key: HDFS-9011 > URL: https://issues.apache.org/jira/browse/HDFS-9011 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, > HDFS-9011.002.patch > > > Currently if a DataNode has too many blocks (more than 1m by default), it > sends multiple RPC to the NameNode for the block report, each RPC contains > report for a single storage. However, in practice we've seen sometimes even a > single storage can contains large amount of blocks and the report even > exceeds the max RPC data length. It may be helpful to support sending > multiple RPC for the block report of a storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9011) Support splitting BlockReport of a storage into multiple RPC
[ https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733188#comment-14733188 ] Yi Liu commented on HDFS-9011: -- Thanks [~jingzhao] for working on this. Besides Nicholas' comments. *1.* In BlockPoolSlice {code} + private void saveReplicas(List persistList) { +if (persistList == null || persistList.isEmpty()) { return; } File tmpFile = new File(currentDir, REPLICA_CACHE_FILE + ".tmp"); @@ -787,7 +787,9 @@ private void saveReplicas(BlockListAsLongs blocksListToPersist) { FileOutputStream out = null; try { out = new FileOutputStream(tmpFile); - blocksListToPersist.writeTo(out); + for (BlockListAsLongs blockLists : persistList) { +blockLists.writeTo(out); + } {code} Now we write {{BlockListAsLongs}} *list* to {{REPLICA_CACHE_FILE}}, so we should also change the logic of {{readReplicasFromCache}}: {code} BlockListAsLongs blocksList = BlockListAsLongs.readFrom(inputStream); {code} It currently read the first {{BlockListAsLongs}}. Also in {{saveReplicas}}, if one BlockListAsLongs has 0 number of blocks, it's better not to persist it, otherwise there is NullPointerException while reading replicas from cache file. *2.* We should also change the description about {{dfs.blockreport.split.threshold}} in hdfs-default.xml Nits: some line are longer than 80 characters in the patch. > Support splitting BlockReport of a storage into multiple RPC > > > Key: HDFS-9011 > URL: https://issues.apache.org/jira/browse/HDFS-9011 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, > HDFS-9011.002.patch > > > Currently if a DataNode has too many blocks (more than 1m by default), it > sends multiple RPC to the NameNode for the block report, each RPC contains > report for a single storage. However, in practice we've seen sometimes even a > single storage can contains large amount of blocks and the report even > exceeds the max RPC data length. It may be helpful to support sending > multiple RPC for the block report of a storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8763) After file closed, a race condition between IBR of 3rd replica of lastBlock and ReplicationMonitor
[ https://issues.apache.org/jira/browse/HDFS-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su resolved HDFS-8763. - Resolution: Duplicate > After file closed, a race condition between IBR of 3rd replica of lastBlock > and ReplicationMonitor > -- > > Key: HDFS-8763 > URL: https://issues.apache.org/jira/browse/HDFS-8763 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.4.0 >Reporter: jiangyu >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-8763.01.patch, HDFS-8763.02.patch > > > -For our cluster, the NameNode is always very busy, so for every incremental > block report , the contention of lock is heavy.- > -The logic of incremental block report is as follow, client send block to dn1 > and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all > datanode will report the newly received block to namenode. In NameNode side, > all will go to the method processIncrementalBlockReport in BlockManager > class. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, > for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is > common), but in some busy environment, it is easy to find dn1 report before > dn2 or dn3, let’s assume dn2 report first, dn1 report second, and dn3 report > third.- > -So dn1 will addStoredBlock and find the replica of this block is not reach > the the original number(which is 3), and the block will add to > neededReplications construction and soon ask some node in pipeline (dn1 or > dn2)to replica it dn4 . After sometime, dn4 and dn3 all report this block, > then choose one node to invalidate.- > Here is one log i found in our cluster: > {noformat} > 2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: > /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp. > BP-1386326728-xx.xx.2.131-1382089338395 > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]} > 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.7.75:50010 is added to > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]} > size 0 > 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.4.62:50010 is added to > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]} > size 0 > 2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to > replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010 > 2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size > 67750 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size > 67750 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add > blk_3194502674_2121080184 to xx.xx.7.75:50010 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: > (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks > set > 2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, > blk_3194497594_2121075104] > {noformat} > Some day, the number of this situation can be 40, that is not good for > the performance and waste network band. > Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn’t find > any difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8763) After file closed, a race condition between IBR of 3rd replica of lastBlock and ReplicationMonitor
[ https://issues.apache.org/jira/browse/HDFS-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733173#comment-14733173 ] Konstantin Shvachko commented on HDFS-8763: --- I think we can close this as a duplicate of HDFS-1172. See the overview [in this comment|https://issues.apache.org/jira/browse/HDFS-8999?focusedCommentId=14733172&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14733172] > After file closed, a race condition between IBR of 3rd replica of lastBlock > and ReplicationMonitor > -- > > Key: HDFS-8763 > URL: https://issues.apache.org/jira/browse/HDFS-8763 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.4.0 >Reporter: jiangyu >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-8763.01.patch, HDFS-8763.02.patch > > > -For our cluster, the NameNode is always very busy, so for every incremental > block report , the contention of lock is heavy.- > -The logic of incremental block report is as follow, client send block to dn1 > and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all > datanode will report the newly received block to namenode. In NameNode side, > all will go to the method processIncrementalBlockReport in BlockManager > class. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, > for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is > common), but in some busy environment, it is easy to find dn1 report before > dn2 or dn3, let’s assume dn2 report first, dn1 report second, and dn3 report > third.- > -So dn1 will addStoredBlock and find the replica of this block is not reach > the the original number(which is 3), and the block will add to > neededReplications construction and soon ask some node in pipeline (dn1 or > dn2)to replica it dn4 . After sometime, dn4 and dn3 all report this block, > then choose one node to invalidate.- > Here is one log i found in our cluster: > {noformat} > 2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: > /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp. > BP-1386326728-xx.xx.2.131-1382089338395 > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]} > 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.7.75:50010 is added to > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]} > size 0 > 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.4.62:50010 is added to > blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]} > size 0 > 2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to > replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010 > 2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size > 67750 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size > 67750 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add > blk_3194502674_2121080184 to xx.xx.7.75:50010 > 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: > (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks > set > 2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, > blk_3194497594_2121075104] > {noformat} > Some day, the number of this situation can be 40, that is not good for > the performance and waste network band. > Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 di
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733172#comment-14733172 ] Konstantin Shvachko commented on HDFS-8999: --- Spent some time browsing jira. This issue was discussed earlier in HDFS-1172 (linking). # NN cannot rely on locations reported by the client (or a primary DN) because it leads to a race condition between the client report and block reports from the DN, that contains the replica. The block report may not contain the replica that was reported by the client. As noted in [Hairong's comment|https://issues.apache.org/jira/browse/HDFS-1172?focusedCommentId=12874030&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12874030] # [~hairong] proposed a solution, which makes NN place replicas that were not yet reported by DNs into {{pendingReplication}} queue instead of {{neededRepication}}. This is absolutely logical, because NN knows that missing replicas were in the succeeded pipeline and can assume they will be reported soon. I don't know why HDFS-1172 was never committed. May be it is time to revisit it now. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733165#comment-14733165 ] Hadoop QA commented on HDFS-6955: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 24s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 186m 53s | Tests passed in hadoop-hdfs. | | | | 228m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754404/HDFS-6955-05.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9b68577 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12323/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12323/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12323/console | This message was automatically generated. > DN should reserve disk space for a full block when creating tmp files > - > > Key: HDFS-6955 > URL: https://issues.apache.org/jira/browse/HDFS-6955 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Arpit Agarwal >Assignee: Kanaka Kumar Avvaru > Attachments: HDFS-6955-01.patch, HDFS-6955-02.patch, > HDFS-6955-03.patch, HDFS-6955-04.patch, HDFS-6955-05.patch > > > HDFS-6898 is introducing disk space reservation for RBW files to avoid > running out of disk space midway through block creation. > This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732518#comment-14732518 ] Kanaka Kumar Avvaru commented on HDFS-6955: --- Thanks for the review [~vinayrpet] 1. Reserved space is not released unless we call {{cleanupBlock()}}. Yes its good point to verify double release . I have updated the test case to assert this case. Please correct me if still you feel some other call is clearing space. 2. Thanks for pointing the name correction. I have renamed in latest version of patch (v5) > DN should reserve disk space for a full block when creating tmp files > - > > Key: HDFS-6955 > URL: https://issues.apache.org/jira/browse/HDFS-6955 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Arpit Agarwal >Assignee: Kanaka Kumar Avvaru > Attachments: HDFS-6955-01.patch, HDFS-6955-02.patch, > HDFS-6955-03.patch, HDFS-6955-04.patch, HDFS-6955-05.patch > > > HDFS-6898 is introducing disk space reservation for RBW files to avoid > running out of disk space midway through block creation. > This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kanaka Kumar Avvaru updated HDFS-6955: -- Attachment: HDFS-6955-05.patch > DN should reserve disk space for a full block when creating tmp files > - > > Key: HDFS-6955 > URL: https://issues.apache.org/jira/browse/HDFS-6955 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Arpit Agarwal >Assignee: Kanaka Kumar Avvaru > Attachments: HDFS-6955-01.patch, HDFS-6955-02.patch, > HDFS-6955-03.patch, HDFS-6955-04.patch, HDFS-6955-05.patch > > > HDFS-6898 is introducing disk space reservation for RBW files to avoid > running out of disk space midway through block creation. > This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8998) Small files storage supported inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732434#comment-14732434 ] Yong Zhang commented on HDFS-8998: -- Hi [~andrew.wang], thanks for your comments, sorry for late reply. I read the Ozone design document and view exist code, Ozone is design for cloudy file system with multi-tenancy, it is different goal, but also store some metadata in LevelDB to reduce memory used. {quote} The design you proposed sounds like it needs compaction to be coordinated by the NN, rather than offloading to the DNs. Level/RocksDB I think would also better handle concurrent writes without the concept of "locked" and "unlocked" blocks. {quote} Maybe small file zone is not exact for hdfs directory, but just call it 'small file zone'. Client creating file under small file zone, ad these file written is just append on exist block, if one block is being written, it is blocked, and this blocked info keep in NN to let other client not use this block until write finish. Compaction is only happened on block rewrite because one block belong to more than one file, and file deletion is only deleting INode, one block will be rewritten if more than threshold data should be removed, it is controlled by NN, in other words delete operation is offline. {quote} Also, could you comment on the usecase where you see the issues with # of files affecting DNs before NNs? IIUC this design does not address NN memory consumption, which is the issue we see first in practice. {quote} Yes, most work on DN because of we already have jira of keeping meta in LevelDB. {quote} Goal # of files, expected size of a "small" file Any bad behavior if a large file is accidentally written to the small file zone? {quote} I also have face some problems on it, user may copy file from local to hdfs, or streaming writing to hdfs, it is hard to identify, so just like I mentioned before, all data writing is append on exist block. {quote} Support for rename into / out of small file zone? {quote} Yes. but rename is only meta changed, and will add more xattr to identify small file move out of small file zone. {quote} Is there a way to convert a bunch of small files into a compacted file, like with HAR? {quote} Files will be bunched on block level. {quote} How common is it for a user to know apriori that a bunch of small files will be written, and is okay putting them in a zone? A lot of the time I see this happening by accident, either a poorly written app or misconfiguration. {quote} when file written finish, we call close output stream, block append will finish, I will try to do some test on exist append feature, and update document about chapter Reliability. I will propose design updated soon. > Small files storage supported inside HDFS > - > > Key: HDFS-8998 > URL: https://issues.apache.org/jira/browse/HDFS-8998 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Yong Zhang >Assignee: Yong Zhang > Attachments: HDFS-8998.design.001.pdf > > > HDFS has problems on store small files, just like this blog said > (http://blog.cloudera.com/blog/2009/02/the-small-files-problem). > This blog also tell us some way how to store small file in HDFS, but they are > not good way, seems HAR files and Sequence Files are better for read-only > files. > Current each HDFS block is only for one HDFS file, if too many small file > there, many small blocks will be in DataNode, which will make DataNode heavy > loading. > This jira will show how to online merge small blocks to big one, and how to > delete small file, and so on. > Cerrentlly we have many open jira for improving HDFS scalability on NameNode, > such as HDFS-7836, HDFS-8286 and so on. > So small file meta (INode and BlocksMap) will also be in NameNode. > Design document will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9008) Balancer#Parameters class could use a builder pattern
[ https://issues.apache.org/jira/browse/HDFS-9008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732384#comment-14732384 ] Tsz Wo Nicholas Sze commented on HDFS-9008: --- Could we keep the fields final? > Balancer#Parameters class could use a builder pattern > - > > Key: HDFS-9008 > URL: https://issues.apache.org/jira/browse/HDFS-9008 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: HDFS-9008-trunk-v1.patch, HDFS-9008-trunk-v2.patch > > > The Balancer#Parameters class is violating a few checkstyle rules. > # Instance variables are not privately scoped and do not have accessor > methods. > # The Balancer#Parameter constructor has too many arguments (according to > checkstyle). > Changing this class to use the builder pattern could fix both of these style > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732377#comment-14732377 ] Hadoop QA commented on HDFS-8704: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 17s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 9s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 18s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 36s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 220m 47s | Tests failed in hadoop-hdfs. | | | | 264m 24s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754381/HDFS-8704-HDFS-7285-007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 60bd765 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/artifact/patchprocess/patchReleaseAuditProblems.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12322/console | This message was automatically generated. > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7982) huge non dfs space used
[ https://issues.apache.org/jira/browse/HDFS-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732342#comment-14732342 ] dengkanghua commented on HDFS-7982: --- Hi,regis le bretonnic,do you resolves this issue in you environment,and I hive the same problem in hadoop 2.6.0. > huge non dfs space used > --- > > Key: HDFS-7982 > URL: https://issues.apache.org/jira/browse/HDFS-7982 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: regis le bretonnic > Fix For: 2.7.0 > > > Hi... > I'm trying to load an external textfile table into a internal orc table using > hive. My process failed with the following error : > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /tmp/hive/blablabla could only be replicated to 0 nodes instead of > minReplication (=1). There are 3 datanode(s) running and no node(s) are > excluded in this operation. > After investigation, I saw that the quantity of "non dfs space" grows more > and more, until the job fails. > Just before failing, the "non dfs used space" reaches 54.GB on each datanode. > I still have space in "remaining DFS". > Here the dfsadmin report just before the issue : > [hdfs@hadoop-01 data]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 475193597952 (442.56 GB) > Present Capacity: 290358095182 (270.42 GB) > DFS Remaining: 228619903369 (212.92 GB) > DFS Used: 61738191813 (57.50 GB) > DFS Used%: 21.26% > Under replicated blocks: 38 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 192.168.3.36:50010 (hadoop-04.X.local) > Hostname: hadoop-04.X.local > Decommission Status : Normal > Configured Capacity: 158397865984 (147.52 GB) > DFS Used: 20591481196 (19.18 GB) > Non DFS Used: 61522602976 (57.30 GB) > DFS Remaining: 76283781812 (71.04 GB) > DFS Used%: 13.00% > DFS Remaining%: 48.16% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 182 > Last contact: Tue Mar 24 10:56:05 CET 2015 > Name: 192.168.3.35:50010 (hadoop-03.X.local) > Hostname: hadoop-03.X.local > Decommission Status : Normal > Configured Capacity: 158397865984 (147.52 GB) > DFS Used: 20555853589 (19.14 GB) > Non DFS Used: 61790296136 (57.55 GB) > DFS Remaining: 76051716259 (70.83 GB) > DFS Used%: 12.98% > DFS Remaining%: 48.01% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 184 > Last contact: Tue Mar 24 10:56:05 CET 2015 > Name: 192.168.3.37:50010 (hadoop-05.X.local) > Hostname: hadoop-05.X.local > Decommission Status : Normal > Configured Capacity: 158397865984 (147.52 GB) > DFS Used: 20590857028 (19.18 GB) > Non DFS Used: 61522603658 (57.30 GB) > DFS Remaining: 76284405298 (71.05 GB) > DFS Used%: 13.00% > DFS Remaining%: 48.16% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 182 > Last contact: Tue Mar 24 10:56:05 CET 2015 > I was expected to find a temporary space used within my filesystem (ie /data). > I found the DFS usage under /data/hadoop/hdfs/data (19GB) but no trace of > 57GB for non DFS... > [root@hadoop-05 hadoop]# df -h /data > FilesystemSize Used Avail Use% Mounted on > /dev/sdb1 148G 20G 121G 14% /data > I also checked dfs.datanode.du.reserved that is set to zero. > [root@hadoop-05 hadoop]# hdfs getconf -confkey dfs.datanode.du.reserved > 0 > Did I miss something ? Where is non DFS space on linux ? Why did I get this > message "could only be replicated to 0 nodes instead of minReplication (=1). > There are 3 datanode(s) running and no node(s) are excluded in this > operation." knowing that datanodes were up and running with still remaining > DFS space. > This error is blocking us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732339#comment-14732339 ] Walter Su commented on HDFS-8704: - {{DFSTestUtil.waitReplication(..)}} works for file smaller than a block group. For file has at least two block groups, you can use {{StripedFileTestUtil.waitBlockGroupsReported(..)}}. Or you can just {{triggerBlockReports}} if you are confident the blocks must be existing and finalized. In this case I think it's ok. > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732332#comment-14732332 ] Walter Su commented on HDFS-8383: - bq. When only one streamer fails, do we need to do anything? I think we can just ignore the failed streamer unless more than 3 streamers are found failed. The offline decode work will be started by some datanode later. maintenance of the correctness of UC.replicas is requied by lease recovery. bq. I think it’s not right to set the failed status of streamer in outputstream due to the asynchronization. So I make it a follow-on. bq. Not very clear about the error handling. For example, streamer_i fails to write a packet of block_j, but it succeeds to write block_j+1, could you give some detailed description about this situation? Are you talking about different block groups? we haven't solved restarting streamer for single failure yet. This jira doesn't care about two failure from two block groups. It should not be a problem once single failure solved. except fowllow-on #4 of my last comment. > Tolerate multiple failures in DFSStripedOutputStream > > > Key: HDFS-8383 > URL: https://issues.apache.org/jira/browse/HDFS-8383 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: Walter Su > Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8829) DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning
[ https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732323#comment-14732323 ] Hadoop QA commented on HDFS-8829: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 30s | The applied patch generated 13 new checkstyle issues (total was 675, now 686). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 201m 39s | Tests failed in hadoop-hdfs. | | | | 246m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.fs.TestFcHdfsCreateMkdir | | | hadoop.fs.TestSWebHdfsFileContextMainOperations | | | hadoop.fs.TestEnhancedByteBufferAccess | | | hadoop.fs.TestUnbuffer | | | hadoop.hdfs.server.namenode.TestFileTruncate | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestBackupNode | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754364/HDFS-8829.0004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9b68577 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12321/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12321/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12321/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12321/console | This message was automatically generated. > DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning > --- > > Key: HDFS-8829 > URL: https://issues.apache.org/jira/browse/HDFS-8829 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.3.0, 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-8829.0001.patch, HDFS-8829.0002.patch, > HDFS-8829.0003.patch, HDFS-8829.0004.patch > > > {code:java} > private void initDataXceiver(Configuration conf) throws IOException { > // find free port or use privileged port provided > TcpPeerServer tcpPeerServer; > if (secureResources != null) { > tcpPeerServer = new TcpPeerServer(secureResources); > } else { > tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout, > DataNode.getStreamingAddr(conf)); > } > > tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); > {code} > The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on > some system. > Shall we make this behavior configurable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732320#comment-14732320 ] Li Bo commented on HDFS-8383: - Thanks [~walter.k.su] for the work! I have just read the code and find some points to be discussed: 1) When only one streamer fails, do we need to do anything? I think we can just ignore the failed streamer unless more than 3 streamers are found failed. The offline decode work will be started by some datanode later. 2) I think it’s not right to set the failed status of streamer in outputstream due to the asynchronization. I have given some reasons in HDFS-8704. The outputstream doesn’t need to care about the status of each streamer if just one or two streamers fail. This will not complicate the logic of outputstreamer. 3) Not very clear about the error handling. For example, streamer_i fails to write a packet of block_j, but it succeeds to write block_j+1, could you give some detailed description about this situation? > Tolerate multiple failures in DFSStripedOutputStream > > > Key: HDFS-8383 > URL: https://issues.apache.org/jira/browse/HDFS-8383 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: Walter Su > Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732318#comment-14732318 ] Walter Su commented on HDFS-8383: - *1. What's the difference between datanodeError and externalError?* They are both error states of streamer. datanodeError is set inside, by streamer itself. externalError is set outside, by DFSOutputStream. We provide one node for each internal block. We have no node replacement. So if one node is marked error, streamer is dead. externalError is an error signal from outside, means another streamer has datanodeError, probably dead already. In this case, all the left healthy will received externalError from DFSOutputStream prepare to start a recovery. *2. What's the difference between {{failed}} and datanodeError?* No difference, mostly. {{failed}} can be removed. Some unexpected error is not datanodeError but should be {{failed}}, like NPE, in this case streamer will close. So {{failed}} == error && streamerClosed. *3. How does a recovery begin?* The failed streamer which has datanodeError will be dead. It will not trigger recovery. When a streamer failed, it saves lastException. When DFSOutputStream writes to this streamer, it calls {{checkClosed()}} first to check if the streamer is healthy by checking lastException. When DFSOutputStream finds out the streamer failed, it notifies other streamers by setting externalError. Other streamers begin recovery. *4. How does a recovery begin if DFSOutputStream doesn't write to the failed streamer?* DFSOutputStream just finish writing to streamer#3. And streamer#5 failed already. DFSOutputStream by accident suspends (possible if client calls write(b) slowly) and never touch streamer#5 again. DFSOutputStream doesn't know streamer#5 failed. So no recovery. When it calls {{close()}} it will check streamer#5 for the last time and will trigger recovery. *5. What if a second streamer failed during recovery?* the first recovery will succeed. the second failed streamer will have datanodeError and be dead. A second recovery will begin once condition of #3,#4 has been met. *6. How does a sceond recovery begin if the first recovery(1001-->1002) is unfinished?* The second recovery will be scheduled. The second recovery should bump GS to 1003, because the second recovery maybe from some failed streamer finished bump GS to 1002. So the second recovery should bump to 1003. The second recovery should wait(or force) the first one to finish. *7. How does a third recovery begin if the first recovery(1001-->1002) is unfinished?* The third reocvery merged with the second one. Only schedule once. have I answered your question, Jing? == *follow-on:* 1. remove {{failed}}. 2. Coordinator periodically search failed streamer. Start recovery automatically. Should't depend on DFSOutputStream. 3. We faked {{DataStreamer#block}} if the streamer failed. We also faked {{DataStreamer#bytesCurBlock}}. But {{dataQueue}} is lost. (DFSOutputStream is async with streamer. So part of {{dataQueue}} belongs to old block and part of it belongs to new block when DFSOutputStream begins write next block) So it's hard to restart a failed streamer when moving on to next blockGroup. We have 2 options: 3a. replace the failed streamer with a new one. we have to cache the new block part of {{dataQueue}}. 3b. restart the failed streamer. HDFS-8704 tries to restart the failed streamer. HDFS-8704 disables {{checkClosed()}}, and consider failed streamer as normal one. So {{dataQueue}} is not lost. And we can simplify {{getBlockGroup}}. 4. block recovery when block group is ending. This's nothing like BlockConstructionStage.PIPELINE_CLOSE_RECOVERY. The fastest streamer have ended previous block, and sent request to NN to get new block group, while some streamer planing to bump GS for the old blocks. There is no way to bump the ended/finalized block. I have no clue how to solve this. My first plan is to disable block recovery in this situation. > Tolerate multiple failures in DFSStripedOutputStream > > > Key: HDFS-8383 > URL: https://issues.apache.org/jira/browse/HDFS-8383 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: Walter Su > Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732314#comment-14732314 ] Li Bo commented on HDFS-8704: - Thanks for [~walter.k.su] and [~zhz]' s review. The cause of failed test cases are caused by the unit test itself. I have fixed them in patch 007. The test will hang at {{TestDFSStripedOutputStreamWithFailure#DFSTestUtil.waitReplication}}, I just omit this sentence in patch in order to make the tests pass. [~walter.k.su], could you help me check this problem? I will switch to HDFS-8383 later. > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8704: Attachment: HDFS-8704-HDFS-7285-007.patch > Erasure Coding: client fails to write large file when one datanode fails > > > Key: HDFS-8704 > URL: https://issues.apache.org/jira/browse/HDFS-8704 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, > HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, > HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch, > HDFS-8704-HDFS-7285-007.patch > > > I test current code on a 5-node cluster using RS(3,2). When a datanode is > corrupt, client succeeds to write a file smaller than a block group but fails > to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests > files smaller than a block group, this jira will add more test situations. > A streamer may encounter some bad datanodes when writing blocks allocated to > it. When it fails to connect datanode or send a packet, the streamer needs to > prepare for the next block. First it removes the packets of current block > from its data queue. If the first packet of next block has already been in > the data queue, the streamer will reset its state and start to wait for the > next block allocated for it; otherwise it will just wait for the first packet > of next block. The streamer will check periodically if it is asked to > terminate during its waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9030) libwebhdfs lacks headers and documentation
[ https://issues.apache.org/jira/browse/HDFS-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-9030: --- Description: This library is useless without header files to include and documentation on how to use it. Both appear to be missing from the mvn package and site documentation. (was: This library is useless without header files to include and documentation on how to use it.) > libwebhdfs lacks headers and documentation > -- > > Key: HDFS-9030 > URL: https://issues.apache.org/jira/browse/HDFS-9030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > This library is useless without header files to include and documentation on > how to use it. Both appear to be missing from the mvn package and site > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9030) libwebhdfs lacks headers and documentation
[ https://issues.apache.org/jira/browse/HDFS-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-9030: --- Target Version/s: 3.0.0 > libwebhdfs lacks headers and documentation > -- > > Key: HDFS-9030 > URL: https://issues.apache.org/jira/browse/HDFS-9030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > This library is useless without header files to include and documentation on > how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9030) libwebhdfs lacks headers and documentation
Allen Wittenauer created HDFS-9030: -- Summary: libwebhdfs lacks headers and documentation Key: HDFS-9030 URL: https://issues.apache.org/jira/browse/HDFS-9030 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer Priority: Blocker This library is useless without header files to include and documentation on how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9030) libwebhdfs lacks headers and documentation
[ https://issues.apache.org/jira/browse/HDFS-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-9030: --- Affects Version/s: 3.0.0 > libwebhdfs lacks headers and documentation > -- > > Key: HDFS-9030 > URL: https://issues.apache.org/jira/browse/HDFS-9030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > This library is useless without header files to include and documentation on > how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9029) libwebhdfs is not in the mvn package and likely missing from all distributions
[ https://issues.apache.org/jira/browse/HDFS-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732288#comment-14732288 ] Allen Wittenauer commented on HDFS-9029: Basic problem: {code} $ pwd [...]/hadoop-hdfs-project/hadoop-hdfs $ find . -name '*dylib' ./target/hadoop-hdfs-3.0.0-SNAPSHOT/lib/native/libhdfs.0.0.0.dylib ./target/hadoop-hdfs-3.0.0-SNAPSHOT/lib/native/libhdfs.dylib ./target/native/target/libwebhdfs.0.0.0.dylib ./target/native/target/libwebhdfs.dylib ./target/native/target/usr/local/lib/libhdfs.0.0.0.dylib ./target/native/target/usr/local/lib/libhdfs.dylib {code} libwebhdfs content needs to get copied into the target/hadoop-hdfs-3.0.0-SNAPSHOT/ dir after it is built. > libwebhdfs is not in the mvn package and likely missing from all distributions > -- > > Key: HDFS-9029 > URL: https://issues.apache.org/jira/browse/HDFS-9029 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > libwebhdfs is not in the tar.gz generated by maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9029) libwebhdfs is not in the mvn package and likely missing from all distributions
[ https://issues.apache.org/jira/browse/HDFS-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-9029: --- Environment: (was: OS X) > libwebhdfs is not in the mvn package and likely missing from all distributions > -- > > Key: HDFS-9029 > URL: https://issues.apache.org/jira/browse/HDFS-9029 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Blocker > > libwebhdfs is not in the tar.gz generated by maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9029) libwebhdfs is not in the mvn package and likely missing from all distributions
[ https://issues.apache.org/jira/browse/HDFS-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-9029: --- Summary: libwebhdfs is not in the mvn package and likely missing from all distributions (was: libwebhdfs is not in the mvn package) > libwebhdfs is not in the mvn package and likely missing from all distributions > -- > > Key: HDFS-9029 > URL: https://issues.apache.org/jira/browse/HDFS-9029 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: OS X >Reporter: Allen Wittenauer >Priority: Blocker > > libwebhdfs is not in the tar.gz generated by maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9029) libwebhdfs is not in the mvn package
Allen Wittenauer created HDFS-9029: -- Summary: libwebhdfs is not in the mvn package Key: HDFS-9029 URL: https://issues.apache.org/jira/browse/HDFS-9029 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: OS X Reporter: Allen Wittenauer Priority: Blocker libwebhdfs is not in the tar.gz generated by maven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)