[ 
https://issues.apache.org/jira/browse/HADOOP-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486546
 ] 

Konstantin Shvachko commented on HADOOP-1093:
---------------------------------------------

0. This patch does not apply since it has
Index: src/test/org/apache/hadoop/dfs/NNBench.java
instead of
Index: src/java/org/apache/hadoop/dfs/DFSClient.java
So all changes intended for the client are considered to be for NNBench.
Has it been manually compiled or something?

1. I like that data-nodes confirm written blocks rather than the client.
I am not sure we are fixing the problem completely here.
If blockReport() happens before blockReceived() the received block will be 
removed, wont it?

2. I think we should retain verification of the minimal block replication on 
the name-node as it was before.
Suppose we are writing to a file for a long time and in the end get a message 
the first block was not written properly.
I think the client should rather fail on allocating the second block in the 
case and retry.
In order to accelerate data-node reporting of received blocks we should move it 
before the blockReport().

3. Exponential backoff seems a little aggressive. You start with 400 msec sleep 
and on the last (out of 5)
retry of allocating the next block the client will sleep for 32 seconds.
If the name-node is not busy then this will substantially slow down the 
process, if the name-node is busy
then the timeouts should take care of the overwhelming. I think we should have 
more experimental data on
this issue before we apply that approach. This seems like a change of the 
general strategy which we should
consider for all communications rather than just for one case, and it should 
belong to a separate issue.
In this particular case the slowdown is not justified since when the data-node 
returns to the client everything
is successfully replicated, written and confirmed.

4. Local disk is faster than network, so if the disk is full or RO there is no 
reason to send data over the wire,
since it will be redistributed again anyway. This again looks like an attempt 
to optimize, but has little to do
with solving the problem.

5. The default should be optimal for the most common usage scenario, and should 
be tested well.
10 handlers handled the traffic well so far. Why changing the default?

I don't see enough motivation for changes 2 through 5 yet. They should be 
discussed in separate issues.
-1 on including 2-5.

I ran 1, it works on my small cluster. But it needs
a) to move blockReceived() before blockReport(), may be even before 
sendHeartbeat()
b) to be verified and confirmed with a successful NNBench run.

> NNBench generates millions of NotReplicatedYetException in Namenode log
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1093
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1093
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Nigel Daley
>         Assigned To: dhruba borthakur
>             Fix For: 0.13.0
>
>         Attachments: nyr2.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes 
> yielded 2.3 million of these exceptions in the NN log:
>    2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 0 on 8020 call error:
>    org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
>         at 
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 
> 1.  NNBench then writes 1 byte to the file.  Minimum replication for the 
> cluster is the default, ie 1.  If it encounters an exception while trying to 
> do either the create or write operations, it loops and tries again.  Multiply 
> this by 1000 files per node and a few hundred nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to