[
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733502#action_12733502
]
Ruyue Ma commented on HDFS-200:
-------------------------------
to: dhruba borthakur
> This is not related to HDFS-4379. let me explain why.
> The problem is actually related to HDFS-xxx. The namenode waits for 10
> minutes after losing heartbeats from a datanode to declare it dead. During
> this 10 minutes, the NN is free to choose the dead datanode as a possible
> replica for a newly allocated block.
> If during a write, the dfsclient sees that a block replica location for a
> newly allocated block is not-connectable, it re-requests the NN to get a
> fresh set of replica locations of the block. It tries this
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between
> each retry ( see DFSClient.nextBlockOutputStream). > This setting works well
> when you have a reasonable size cluster; if u have only 4 datanodes in the
> cluster, every retry picks the dead-datanode and the above logic bails out.
> One solution is to change the value of dfs.client.block.write.retries to a
> much much larger value, say 200 or so. Better still, increase the number of
> nodes in ur cluster.
Our modification: when getting block location from namenode, we give nn the
excluded datanodes. The list of dead datanodes is only for one block
allocation.
+++ hadoop-new/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java 2009-07-20
00:19:03.000000000 +0800
@@ -2734,6 +2734,7 @@
LocatedBlock lb = null;
boolean retry = false;
DatanodeInfo[] nodes;
+ DatanodeInfo[] exludedNodes = null;
int count = conf.getInt("dfs.client.block.write.retries", 3);
boolean success;
do {
@@ -2745,7 +2746,7 @@
success = false;
long startTime = System.currentTimeMillis();
- lb = locateFollowingBlock(startTime);
+ lb = locateFollowingBlock(startTime, exludedNodes);
block = lb.getBlock();
nodes = lb.getLocations();
@@ -2755,6 +2756,19 @@
success = createBlockOutputStream(nodes, clientName, false);
if (!success) {
+
+ LOG.info("Excluding node: " + nodes[errorIndex]);
+ // Mark datanode as excluded
+ DatanodeInfo errorNode = nodes[errorIndex];
+ if (exludedNodes != null) {
+ DatanodeInfo[] newExcludedNodes = new
DatanodeInfo[exludedNodes.length + 1];
+ System.arraycopy(exludedNodes, 0, newExcludedNodes, 0,
exludedNodes.length);
+ newExcludedNodes[exludedNodes.length] = errorNode;
+ exludedNodes = newExcludedNodes;
+ } else {
+ exludedNodes = new DatanodeInfo[] { errorNode };
+ }
+
LOG.info("Abandoning block " + block);
namenode.abandonBlock(block, src, clientName);
> In HDFS, sync() not yet guarantees data available to the new readers
> --------------------------------------------------------------------
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: dhruba borthakur
> Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt,
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt,
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch,
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch,
> fsyncConcurrentReaders9.patch,
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz,
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java,
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before
> the reader opened the file
> However, this feature is not yet implemented. Note that the operation
> 'flushed' is now called "sync".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.