[
https://issues.apache.org/jira/browse/HADOOP-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486546
]
Konstantin Shvachko commented on HADOOP-1093:
---------------------------------------------
0. This patch does not apply since it has
Index: src/test/org/apache/hadoop/dfs/NNBench.java
instead of
Index: src/java/org/apache/hadoop/dfs/DFSClient.java
So all changes intended for the client are considered to be for NNBench.
Has it been manually compiled or something?
1. I like that data-nodes confirm written blocks rather than the client.
I am not sure we are fixing the problem completely here.
If blockReport() happens before blockReceived() the received block will be
removed, wont it?
2. I think we should retain verification of the minimal block replication on
the name-node as it was before.
Suppose we are writing to a file for a long time and in the end get a message
the first block was not written properly.
I think the client should rather fail on allocating the second block in the
case and retry.
In order to accelerate data-node reporting of received blocks we should move it
before the blockReport().
3. Exponential backoff seems a little aggressive. You start with 400 msec sleep
and on the last (out of 5)
retry of allocating the next block the client will sleep for 32 seconds.
If the name-node is not busy then this will substantially slow down the
process, if the name-node is busy
then the timeouts should take care of the overwhelming. I think we should have
more experimental data on
this issue before we apply that approach. This seems like a change of the
general strategy which we should
consider for all communications rather than just for one case, and it should
belong to a separate issue.
In this particular case the slowdown is not justified since when the data-node
returns to the client everything
is successfully replicated, written and confirmed.
4. Local disk is faster than network, so if the disk is full or RO there is no
reason to send data over the wire,
since it will be redistributed again anyway. This again looks like an attempt
to optimize, but has little to do
with solving the problem.
5. The default should be optimal for the most common usage scenario, and should
be tested well.
10 handlers handled the traffic well so far. Why changing the default?
I don't see enough motivation for changes 2 through 5 yet. They should be
discussed in separate issues.
-1 on including 2-5.
I ran 1, it works on my small cluster. But it needs
a) to move blockReceived() before blockReport(), may be even before
sendHeartbeat()
b) to be verified and confirmed with a successful NNBench run.
> NNBench generates millions of NotReplicatedYetException in Namenode log
> -----------------------------------------------------------------------
>
> Key: HADOOP-1093
> URL: https://issues.apache.org/jira/browse/HADOOP-1093
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: Nigel Daley
> Assigned To: dhruba borthakur
> Fix For: 0.13.0
>
> Attachments: nyr2.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes
> yielded 2.3 million of these exceptions in the NN log:
> 2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8020 call error:
> org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
> at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
> at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to
> 1. NNBench then writes 1 byte to the file. Minimum replication for the
> cluster is the default, ie 1. If it encounters an exception while trying to
> do either the create or write operations, it loops and tries again. Multiply
> this by 1000 files per node and a few hundred nodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.