[jira] Created: (HDFS-1239) All datanodes are bad in 2nd phase

2010-06-16 Thread Thanh Do (JIRA)
All datanodes are bad in 2nd phase
--

 Key: HDFS-1239
 URL: https://issues.apache.org/jira/browse/HDFS-1239
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Setups:
number of datanodes = 2
replication factor = 2
Type of failure: transient fault (a java i/o call throws an exception or return 
false)
Number of failures = 2
when/where failures happen = during the 2nd phase of the pipeline, each happens 
at each datanode when trying to perform I/O 
(e.g. dataoutputstream.flush())
 
- Details:
 
This is similar to HDFS-1237.
In this case, node1 throws exception that makes client creates
a pipeline only with node2, then tries to redo the whole thing,
which throws another failure. So at this point, the client considers
all datanodes are bad, and never retries the whole thing again, 
(i.e. it never asks the namenode again to ask for a new set of datanodes).
In HDFS-1237, the bug is due to permanent disk fault. In this case, it's about 
transient error.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1238) A block is stuck in ongoingRecovery due to exception not propagated

2010-06-16 Thread Thanh Do (JIRA)
A block is stuck in ongoingRecovery due to exception not propagated 


 Key: HDFS-1238
 URL: https://issues.apache.org/jira/browse/HDFS-1238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Setup:
+  # datanodes = 2
+ replication factor = 2
+ failure type = transient (i.e. a java I/O call that throws I/O Exception or 
returns false)
+ # failures = 2
+ When/where failures happen: (This is a subtle bug) The first failure is a 
transient failure at a datanode during the second phase. Due to the first 
failure, the DFSClient will call recoverBlock.  The second failure is injected 
during this recover block process (i.e. another failure during the recovery 
process).
 
- Details:
 
The expectation here is that since the DFSClient performs lots of retries,
two transient failures should be masked properly by the retries.
We found one case, where the failures are not transparent to the users.
 
Here's the stack trace of when/where the two failures happen (please ignore the 
line number).
 
1. The first failure:
Exception is thrown at
call(void java.io.DataOutputStream.flush())
SourceLoc: org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java(252)
Stack Trace:
  [0] datanode.BlockReceiver (flush:252)
  [1] datanode.BlockReceiver (receivePacket:660)
  [2] datanode.BlockReceiver (receiveBlock:743)
  [3] datanode.DataXceiver (writeBlock:468)
  [4] datanode.DataXceiver (run:119)
 
2. The second failure:
False is returned at
   call(boolean java.io.File.renameTo(File))
   SourceLoc: org/apache/hadoop/hdfs/server/datanode/FSDataset.java(105)
Stack Trace:
  [0] datanode.FSDataset (tryUpdateBlock:1008)
  [1] datanode.FSDataset (updateBlock:859)
  [2] datanode.DataNode (updateBlock:1780)
  [3] datanode.DataNode (syncBlock:2032)
  [4] datanode.DataNode (recoverBlock:1962)
  [5] datanode.DataNode (recoverBlock:2101)
 
This is what we found out:
The first failure causes the DFSClient to somehow calls recoverBlock,
which will force us to see the 2nd failure. The 2nd failure makes
renameTo returns false, which then causes an IOException to be thrown
from the function that calls renameTo.
But this IOException is not propagated properly!
It is dropped inside DN.syncBlock(). Specifically DN.syncBlock
calls DN.updateBlock() which gets the exception. But syncBlock
only catches that and prints a warning without propagating the exception
properly.  Thus syncBlock returns without any exception,
and thus recoverBlock returns without executing the finally{} block
(see below).
 
Now, the client will retry recoverBlock for 3-5 more times,
but this retries always see exceptions! The reason is that the first
time we call recoverBlock(blk), this blk is being put into
an ongoingRecovery list inside DN.recoverBlock().
Normally, blk is only removed (ongoingRecovery.remove(blk)) inside the 
finally{} block.
But since the exception is not propagated properly, this finally{}
block is never called, thus the blk is stuck
forever inside the ongoingRecovery list, and hence the next time
client performs the retry, it gets this error message
"Block ... is already being recovered" and recoverBlock() throws
IOException.  As a result, the client which calls this whole
process in the context of processDatanodeError will return
from the pde function with closed = true, and hence it never
retries the whole thing again from the beginning, and instead
just returns error.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1237) Client logic for 1st phase and 2nd phase failover are different

2010-06-16 Thread Thanh Do (JIRA)
Client logic for 1st phase and 2nd phase failover are different
---

 Key: HDFS-1237
 URL: https://issues.apache.org/jira/browse/HDFS-1237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Setup:
number of datanodes = 4
replication factor = 2 (2 datanodes in the pipeline)
number of failure injected = 2
failure type: crash
Where/When failures happen: There are two scenarios: First, is when two 
datanodes crash at the same time in the first phase of the pipeline. Second, 
when two datanodes crash at the second phase of the pipeline.
 
- Details:
 
In this setting, we set the datanode's heartbeat message to be 1 second to the 
namenode.
This is just to show that if the NN has declared a datanode dead, the DFSClient 
will not
get that dead datanode from the server. Here's our observations:
 
1. If the two crashes happen during the first phase,
the client will wait for 6 seconds (which is enough time for NN to detect
dead datanodes in this setting). So after waiting for 6 seconds, the client
asks the NN again, and the NN is able to give a fresh two healthy datanodes.
and the experiment is successful!
 
2. BUT, If the two crashes happen during the second phase (e.g. renameTo).
The client *never waits for 6 secs* which implies that the logic of the client
for 1st phase and 2nd phase are different.  What happens here, DFSClient gives
up and (we believe) it never falls back to the outer while loop to contact the
NN again.  So the two crashes in this second phase are not masked properly,
and the write operation fails. 
 
In summary, scenario (1) is good, but scenario (2) is not successful. This shows
a bad retry logic during the second phase.  (We note again that we change
the setup a bit by setting the DN's hearbeat interval to 1 second.  If we use
the default interval, scenario (1) will fail too because the NN will give the
client the same dead datanodes).

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1236) Client uselessly retries recoverBlock 5 times

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1236:
---

Description: 
Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)

The setup:
+ # available datanodes = 4
+ Replication factor = 2 (hence there are 2 datanodes in the pipeline)
+ Failure type = Bad disk at datanode (not crashes)
+ # failures = 2
+ # disks / datanode = 1
+ Where/when the failures happen: This is a scenario where each disk of the two 
datanodes in the pipeline go bad at the same time during the 2nd phase of the 
pipeline (the data transfer phase).
 
Details:
 
In this case, the client will call processDatanodeError
which will call datanode.recoverBlock to those two datanodes.
But since these two datanodes have bad disks (although they're still alive),
then recoverBlock() will fail.
For this one, the client's retry logic ends when streamer is closed (close == 
true).
But before this happen, the client will retry 5 times
(maxRecoveryErrorCount) and will fail all the time, until
it finishes.  What is interesting is that
during each retry, there is a wait of 1 second in
DataStreamer.run (i.e. dataQueue.wait(1000)).
So it will be a 5-second total wait before declaring it fails.
 
This is a different bug than HDFS-1235, where the client retries
3 times for 6 seconds (resulting in 25 seconds wait time).
In this experiment, what we get for the total wait time is only
12 seconds (not sure why it is 12). So the DFSClient quits without
contacting the namenode again (say to ask for a new set of
two datanodes).
So interestingly we find another
bug that shows client retry logic is complex and not deterministic
depending on where and when failures happen.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)

The setup:
# available datanodes = 4
Replication factor = 2 (hence there are 2 datanodes in the pipeline)
Failure type = Bad disk at datanode (not crashes)
# failures = 2
# disks / datanode = 1
Where/when the failures happen: This is a scenario where each disk of the two 
datanodes in the pipeline go bad at the same time during the 2nd phase of the 
pipeline (the data transfer phase).
 
Details:
 
In this case, the client will call processDatanodeError
which will call datanode.recoverBlock to those two datanodes.
But since these two datanodes have bad disks (although they're still alive),
then recoverBlock() will fail.
For this one, the client's retry logic ends when streamer is closed (close == 
true).
But before this happen, the client will retry 5 times
(maxRecoveryErrorCount) and will fail all the time, until
it finishes.  What is interesting is that
during each retry, there is a wait of 1 second in
DataStreamer.run (i.e. dataQueue.wait(1000)).
So it will be a 5-second total wait before declaring it fails.
 
This is a different bug than HDFS-1235, where the client retries
3 times for 6 seconds (resulting in 25 seconds wait time).
In this experiment, what we get for the total wait time is only
12 seconds (not sure why it is 12). So the DFSClient quits without
contacting the namenode again (say to ask for a new set of
two datanodes).
So interestingly we find another
bug that shows client retry logic is complex and not deterministic
depending on where and when failures happen.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)


> Client uselessly retries recoverBlock 5 times
> -
>
> Key: HDFS-1236
> URL: https://issues.apache.org/jira/browse/HDFS-1236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Summary:
> Client uselessly retries recoverBlock 5 times
> The same behavior is also seen in append protocol (HDFS-1229)
> The setup:
> + # available datanodes = 4
> + Replication factor = 2 (hence there are 2 datanodes in the pipeline)
> + Failure type = Bad disk at datanode (not crashes)
> + # failures = 2
> + # disks / datanode = 1
> + Where/when the failures happen: This is a scenario where each disk of the 
> two datanodes in the pipeline go bad at the same time during the 2nd phase of 
> the pipeline (the data transfer phase).
>  
> Details:
>  
> In this case, the client will call processDatanodeError
> which will call datanode.recoverBlock to those two datanodes.
> But since 

[jira] Updated: (HDFS-1236) Client uselessly retries recoverBlock 5 times

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1236:
---

Description: 
Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)

The setup:
# available datanodes = 4
Replication factor = 2 (hence there are 2 datanodes in the pipeline)
Failure type = Bad disk at datanode (not crashes)
# failures = 2
# disks / datanode = 1
Where/when the failures happen: This is a scenario where each disk of the two 
datanodes in the pipeline go bad at the same time during the 2nd phase of the 
pipeline (the data transfer phase).
 
Details:
 
In this case, the client will call processDatanodeError
which will call datanode.recoverBlock to those two datanodes.
But since these two datanodes have bad disks (although they're still alive),
then recoverBlock() will fail.
For this one, the client's retry logic ends when streamer is closed (close == 
true).
But before this happen, the client will retry 5 times
(maxRecoveryErrorCount) and will fail all the time, until
it finishes.  What is interesting is that
during each retry, there is a wait of 1 second in
DataStreamer.run (i.e. dataQueue.wait(1000)).
So it will be a 5-second total wait before declaring it fails.
 
This is a different bug than HDFS-1235, where the client retries
3 times for 6 seconds (resulting in 25 seconds wait time).
In this experiment, what we get for the total wait time is only
12 seconds (not sure why it is 12). So the DFSClient quits without
contacting the namenode again (say to ask for a new set of
two datanodes).
So interestingly we find another
bug that shows client retry logic is complex and not deterministic
depending on where and when failures happen.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)

The setup:
# available datanodes = 4
Replication factor = 2 (hence there are 2 datanodes in the pipeline)
Failure type = Bad disk at datanode (not crashes)
# failures = 2
# disks / datanode = 1
Where/when the failures happen: This is a scenario where each disk of the two 
datanodes in the pipeline go bad at the same time during the 2nd phase of the 
pipeline (the data transfer phase).
 
Details:
 
In this case, the client will call processDatanodeError
which will call datanode.recoverBlock to those two datanodes.
But since these two datanodes have bad disks (although they're still alive),
then recoverBlock() will fail.
For this one, the client's retry logic ends when streamer is closed (close == 
true).
But before this happen, the client will retry 5 times
(maxRecoveryErrorCount) and will fail all the time, until
it finishes.  What is interesting is that
during each retry, there is a wait of 1 second in
DataStreamer.run (i.e. dataQueue.wait(1000)).
So it will be a 5-second total wait before declaring it fails.
 
This is a different bug than HDFS-1235, where the client retries
3 times for 6 seconds (resulting in 25 seconds wait time).
In this experiment, what we get for the total wait time is only
12 seconds (not sure why it is 12). So the DFSClient quits without
contacting the namenode again (say to ask for a new set of
two datanodes).
So interestingly we find another
bug that shows client retry logic is complex and not deterministic
depending on where and when failures happen.

Component/s: hdfs client

> Client uselessly retries recoverBlock 5 times
> -
>
> Key: HDFS-1236
> URL: https://issues.apache.org/jira/browse/HDFS-1236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Summary:
> Client uselessly retries recoverBlock 5 times
> The same behavior is also seen in append protocol (HDFS-1229)
> The setup:
> # available datanodes = 4
> Replication factor = 2 (hence there are 2 datanodes in the pipeline)
> Failure type = Bad disk at datanode (not crashes)
> # failures = 2
> # disks / datanode = 1
> Where/when the failures happen: This is a scenario where each disk of the two 
> datanodes in the pipeline go bad at the same time during the 2nd phase of the 
> pipeline (the data transfer phase).
>  
> Details:
>  
> In this case, the client will call processDatanodeError
> which will call datanode.recoverBlock to those two datanodes.
> But since these two datanodes have bad disks (although they're still alive),
> then recoverBlock() will fail.
> For this one, the client's retry logic ends when streamer is closed (close == 
> true).
> But before this happen, the client will r

[jira] Created: (HDFS-1236) Client uselessly retries recoverBlock 5 times

2010-06-16 Thread Thanh Do (JIRA)
Client uselessly retries recoverBlock 5 times
-

 Key: HDFS-1236
 URL: https://issues.apache.org/jira/browse/HDFS-1236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Thanh Do


Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)

The setup:
# available datanodes = 4
Replication factor = 2 (hence there are 2 datanodes in the pipeline)
Failure type = Bad disk at datanode (not crashes)
# failures = 2
# disks / datanode = 1
Where/when the failures happen: This is a scenario where each disk of the two 
datanodes in the pipeline go bad at the same time during the 2nd phase of the 
pipeline (the data transfer phase).
 
Details:
 
In this case, the client will call processDatanodeError
which will call datanode.recoverBlock to those two datanodes.
But since these two datanodes have bad disks (although they're still alive),
then recoverBlock() will fail.
For this one, the client's retry logic ends when streamer is closed (close == 
true).
But before this happen, the client will retry 5 times
(maxRecoveryErrorCount) and will fail all the time, until
it finishes.  What is interesting is that
during each retry, there is a wait of 1 second in
DataStreamer.run (i.e. dataQueue.wait(1000)).
So it will be a 5-second total wait before declaring it fails.
 
This is a different bug than HDFS-1235, where the client retries
3 times for 6 seconds (resulting in 25 seconds wait time).
In this experiment, what we get for the total wait time is only
12 seconds (not sure why it is 12). So the DFSClient quits without
contacting the namenode again (say to ask for a new set of
two datanodes).
So interestingly we find another
bug that shows client retry logic is complex and not deterministic
depending on where and when failures happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1234:
---

Description: 
- Summary: Datanode 'alive' but with its disk failed, Namenode still thinks 
it's alive
 
- Setups:
+ Replication = 1
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ Failure type = bad disk
+ When/where failure happens = first phase of the pipeline
 
- Details:
In this experiment we have two datanodes. Each node has 1 disk.
However, if one datanode has a failed disk (but the node is still alive), the 
datanode
does not keep track of this.  From the perspective of the namenode,
that datanode is still alive, and thus the namenode gives back the same datanode
to the client.  The client will retry 3 times by asking the namenode to
give a new set of datanodes, and always get the same datanode.
And every time the client wants to write there, it gets an exception.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
- Summary: Datanode 'alive' but with its disk failed, Namenode still thinks 
it's alive
 
- Setups:
+ Replication = 1
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ Failure type = bad disk
+ When/where failure happens = first phase of the pipeline
 
- Details:
In this experiment we have two datanodes. Each node has 1 disk.
However, if one datanode has a failed disk (but the node is still alive), the 
datanode
does not keep track of this.  From the perspective of the namenode,
that datanode is still alive, and thus the namenode gives back the same datanode
to the client.  The client will retry 3 times by asking the namenode to
give a new set of datanodes, and always get the same datanode.
And every time the client wants to write there, it gets an exception.


> Datanode 'alive' but with its disk failed, Namenode thinks it's alive
> -
>
> Key: HDFS-1234
> URL: https://issues.apache.org/jira/browse/HDFS-1234
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks 
> it's alive
>  
> - Setups:
> + Replication = 1
> + # available datanodes = 2
> + # disks / datanode = 1
> + # failures = 1
> + Failure type = bad disk
> + When/where failure happens = first phase of the pipeline
>  
> - Details:
> In this experiment we have two datanodes. Each node has 1 disk.
> However, if one datanode has a failed disk (but the node is still alive), the 
> datanode
> does not keep track of this.  From the perspective of the namenode,
> that datanode is still alive, and thus the namenode gives back the same 
> datanode
> to the client.  The client will retry 3 times by asking the namenode to
> give a new set of datanodes, and always get the same datanode.
> And every time the client wants to write there, it gets an exception.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1235) Namenode returning the same Datanode to client, due to infrequent heartbeat

2010-06-16 Thread Thanh Do (JIRA)
Namenode returning the same Datanode to client, due to infrequent heartbeat
---

 Key: HDFS-1235
 URL: https://issues.apache.org/jira/browse/HDFS-1235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Thanh Do


This bug has been reported.
Basically since datanode's hearbeat messages are infrequent (~ every 10 
minutes),
NameNode always gives the client the same datanode even if the datanode is dead.
 
We want to point out that the client wait 6 seconds before retrying,
which could be considered long and useless retries in this scenario,
because in 6 secs, the namenode hasn't declared the datanode dead.

Overall this happens when a datanode is dead during the first phase of the 
pipeline (file setups).
If a datanode is dead during the second phase (byte transfer), the DFSClient 
still
could proceed with the other surviving datanodes (which is consistent with what
Hadoop books always say -- the write should proceed if at least we have one good
datanode).  But unfortunately this specification is not true during the first 
phase of the
pipeline.  Overall we suggest that the namenode take into consideration the 
client's
view of unreachable datanodes.  That is, if a client says that it cannot reach 
DN-X,
then the namenode might give the client another node other than X (but the 
namenode
does not have to declare N dead). 

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive

2010-06-16 Thread Thanh Do (JIRA)
Datanode 'alive' but with its disk failed, Namenode thinks it's alive
-

 Key: HDFS-1234
 URL: https://issues.apache.org/jira/browse/HDFS-1234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: Datanode 'alive' but with its disk failed, Namenode still thinks 
it's alive
 
- Setups:
+ Replication = 1
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ Failure type = bad disk
+ When/where failure happens = first phase of the pipeline
 
- Details:
In this experiment we have two datanodes. Each node has 1 disk.
However, if one datanode has a failed disk (but the node is still alive), the 
datanode
does not keep track of this.  From the perspective of the namenode,
that datanode is still alive, and thus the namenode gives back the same datanode
to the client.  The client will retry 3 times by asking the namenode to
give a new set of datanodes, and always get the same datanode.
And every time the client wants to write there, it gets an exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1232) Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut

2010-06-16 Thread Thanh Do (JIRA)
Corrupted block if a crash happens before writing to checksumOut but after 
writing to dataOut
-

 Key: HDFS-1232
 URL: https://issues.apache.org/jira/browse/HDFS-1232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: block is corrupted if a crash happens before writing to checksumOut 
but
after writing to dataOut. 
 
- Setup:
+ # available datanodes = 1
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+When/where failure happens = (see below)
 
- Details:
The order of processing a packet during client write/append at datanode
is first forward the packet to downstream, then write to data the block file, 
and 
and finally, write to the checksum file. Hence if a crash happens BEFORE the 
write
to checksum file but AFTER the write to data file, the block is corrupted.
Worse, if this is the only available replica, the block is lost.
 
We also found this problem in case there are 3 replicas for a particular block,
and during append, there are two failures. (see HDFS-1231)

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1233) Bad retry logic at DFSClient

2010-06-16 Thread Thanh Do (JIRA)
Bad retry logic at DFSClient


 Key: HDFS-1233
 URL: https://issues.apache.org/jira/browse/HDFS-1233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 
2nd disk
 
- Setups:
+ # available datanodes = 1
+ # disks / datanode = 2
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
 
- Details:

The setup is:
1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2).
 
We injected a single disk failure to see if we can failover to the
second disk or not.
 
If a persistent disk failure happens during createBlockOutputStream
(the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad),
then createBlockOutputStream (cbos) will get an exception and it
will retry!  When it retries it will get the same DN1 from the namenode,
and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile,
and finally getNextVolume() which a moving volume#.  Thus, on the
second try, the write will be successfully go to the second disk.
So essentially createBlockOutputStream is wrapped in a
do/while(retry && --count >= 0). The first cbos will fail, the second
will be successful in this particular scenario.
 
NOW, say cbos is successful, but the failure is persistent.
Then the "retry" is in a different while loop.
First, hasError is set to true in RP.run (responder packet).
Thus, DataStreamer.run() will go back to the loop:
while(!closed && clientRunning && !lastPacketInBlock).
Now this second iteration of the loop will call
processDatanodeError because hasError has been set to true.
In processDatanodeError (pde), the client sees that this is the only datanode
in the pipeline, and hence it considers that the node is bad! Although actually
only 1 disk is bad!  Hence, pde throws IOException suggesting
all the datanodes (in this case, only DN1) in the pipeline is bad.
Hence, in this error, the exception is thrown to the client.
But if the exception, say, is catched by the most outer while loop
do-while(retry && --count >= 0), then this outer retry will be
successful then (as suggested in the previous paragraph).
 
In summary, if in a deployment scenario, we only have one datanode
that has multiple disks, and one disk goes bad, then the current
retry logic at the DFSClient side is not robust enough to mask the
failure from the client.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1231) Generation Stamp mismatches, leading to failed append

2010-06-16 Thread Thanh Do (JIRA)
Generation Stamp mismatches, leading to failed append
-

 Key: HDFS-1231
 URL: https://issues.apache.org/jira/browse/HDFS-1231
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: the recoverBlock is not atomic, leading retrial fails when 
facing a failure.
 
- Setup:
+ # available datanodes = 3
+ # disks / datanode = 1
+ # failures = 2
+ failure type = crash
+ When/where failure happens = (see below)
 
- Details:
Suppose there are 3 datanodes in the pipeline: dn3, dn2, and dn1. Dn1 is 
primary.
When appending, client first calls dn1.recoverBlock to make all the datanodes 
in 
pipeline agree on the new Generation Stamp (GS1) and the length of the block.
Client then sends a data packet to dn3. dn3 in turn forwards this packet to 
down stream
dns (dn2 and dn1) and starts writing to its own disk, then it crashes AFTER 
writing to the block
file but BEFORE writing to the meta file. Client notices the crash, it calls 
dn1.recoverBlock().
dn1.recoverBlock() first creates a syncList (by calling getMetadataInfo at all 
dn2 and dn1).
Then dn1 calls NameNode.getNextGS() to get new Generation Stamp (GS2).
Then it calls dn2.updateBlock(), this returns successfully.
Now, it starts calling its own updateBlock and crashes after renaming from
blk_X_GS1.meta to blk_X_GS1.meta_tmpGS2.
Therefore, dn1.recoverBlock() from the client point of view fails.
but the GS for corresponding block has been incremented in the namenode (GS2)
The client retries by calling dn2.recoverBlock with old GS (GS1), which does 
not match with
the new GS at the NameNode (GS1) -->exception, leading to append fails.
 
Now, after all, we have
- in dn3 (which is crashed)
tmp/blk_X
tmp/blk_X_GS1.meta
- in dn2
current/blk_X
current/blk_X_GS2
- in dn1:
current/blk_X
current/blk_X_GS1.meta_tmpGS2
- in NameNode, the block X has generation stamp GS1 (because dn1 has not called
commitSyncronization yet).
 
Therefore, when crashed datanodes restart, at dn1 the block is invalid because 
there is no meta file. In dn3, block file and meta file are finalized, however, 
the 
block is corrupted because CRC mismatch. In dn2, the GS of the block is GS2,
which is not equal with the generation stamp info of the block maintained in 
NameNode.
Hence, the block blk_X is inaccessible.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1219) Data Loss due to edits log truncation

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1219.
---

Resolution: Duplicate

Why file this bug if it's the same as 955?

> Data Loss due to edits log truncation
> -
>
> Key: HDFS-1219
> URL: https://issues.apache.org/jira/browse/HDFS-1219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.2
>Reporter: Thanh Do
>
> We found this problem almost at the same time as HDFS developers.
> Basically, the edits log is truncated before fsimage.ckpt is renamed to 
> fsimage.
> Hence, any crash happens after the truncation but before the renaming will 
> lead
> to a data loss. Detailed description can be found here:
> https://issues.apache.org/jira/browse/HDFS-955
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879666#action_12879666
 ] 

Todd Lipcon commented on HDFS-1220:
---

I believe we fixed this in trunk by saving to an fsimage_ckpt dir and then 
moving it into place atomically once all the files are on disk. See HDFS-955?

> Namenode unable to start due to truncated fstime
> 
>
> Key: HDFS-1220
> URL: https://issues.apache.org/jira/browse/HDFS-1220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: updating fstime file on disk is not atomic, so it is possible that
> if a crash happens in the middle, next time when NameNode reboots, it will
> read stale fstime, hence unable to start successfully.
>  
> - Details:
> Basically, this involve 3 steps:
> 1) delete fstime file (timeFile.delete())
> 2) truncate fstime file (new FileOutputStream(timeFile))
> 3) write new time to fstime file (out.writeLong(checkpointTime))
> If a crash happens after step 2 and before step 3, in the next reboot, 
> NameNode
> got an exception when reading the time (8 byte) from an empty fstime file.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879665#action_12879665
 ] 

Todd Lipcon commented on HDFS-1223:
---

Already fixed by HDFS-457

> DataNode fails stop due to a bad disk (or storage directory)
> 
>
> Key: HDFS-1223
> URL: https://issues.apache.org/jira/browse/HDFS-1223
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> A datanode can store block files in multiple volumes.
> If a datanode sees a bad volume during start up (i.e, face an exception
> when accessing that volume), it simply fail stops, making all block files
> stored in other healthy volumes inaccessible. Consequently, these lost
> replicas will be generated later on in other datanodes. 
> If a datanode is able to mark the bad disk and continue working with
> healthy ones, this will increase availability and avoid unnecessary 
> regeneration. As an extreme example, consider one datanode which has
> 2 volumes V1 and V2, each contains about 1 64MB block files.
> During startup, the datanode gets an exception when accessing V1, it then 
> fail stops, making 2 block files generated later on.
> If the datanode masks V1 as bad and continues working with V2, the number
> of replicas needed to be regenerated is cut in to half.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1224) Stale connection makes node miss append

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879664#action_12879664
 ] 

Todd Lipcon commented on HDFS-1224:
---

If the node has crashed, the TCP connection should be broken and thus it won't 
re-use an existing connection, no?
Even so, does this cause any actual problems aside from a shorter pipeline?
Given we only cache IPC connections for a short amount of time, the likelihood 
of a DN restart while a connection is cached is very small

> Stale connection makes node miss append
> ---
>
> Key: HDFS-1224
> URL: https://issues.apache.org/jira/browse/HDFS-1224
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: if a datanode crashes and restarts, it may miss an append.
>  
> - Setup:
> + # available datanodes = 3
> + # replica = 3 
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = after the first append succeed
>  
> - Details:
> Since each datanode maintains a pool of IPC connections, whenever it wants
> to make an IPC call, it first looks into the pool. If the connection is not 
> there, 
> it is created and put in to the pool. Otherwise the existing connection is 
> used.
> Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the 
> primary.
> After the client appends to block X successfully, dn2 crashes and restarts.
> Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
> Client starts appending to block Y. It first calls dn1.recoverBlock().
> Dn1 will first create a proxy corresponding with each of the datanode in the 
> pipeline
> (in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
> However, because
> dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
> Dn1 uses
> this connection to make a call to dn2, hence an exception. Therefore, append 
> will be
> made only to dn1 and dn3, although dn2 is alive and the write of block Y to 
> dn2 has
> been successful.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1227) UpdateBlock fails due to unmatched file length

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879662#action_12879662
 ] 

Todd Lipcon commented on HDFS-1227:
---

Believe this is addressed by HDFS-1186 in the 20-append branch

> UpdateBlock fails due to unmatched file length
> --
>
> Key: HDFS-1227
> URL: https://issues.apache.org/jira/browse/HDFS-1227
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: client append is not atomic, hence, it is possible that
> when retrying during append, there is an exception in updateBlock
> indicating unmatched file length, making append failed.
>  
> - Setup:
> + # available datanodes = 3
> + # disks / datanode = 1
> + # failures = 2
> + failure type = bad disk
> + When/where failure happens = (see below)
> + This bug is non-deterministic, to reproduce it, add a sufficient sleep 
> before out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3
>  
> - Details:
>  Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, 
> dn2, dn3.
> Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds.
> Client starts sending data to the dn3 - the first datanode in pipeline.
> dn3 forwards the packet to downstream datanodes, and starts writing
> data to its disk. Suppose there is an exception in dn3 when writing to disk.
> Client gets the exception, it starts the recovery code by calling 
> dn1.recoverBlock() again.
> dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build 
> the syncList.
> Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and 
> dn2),
> the previous packet (which is sent from dn3) has not come to disk yet.
> Hence, the block Info given by getMetaDataInfo contains the length of 16 
> bytes.
> But after that, the packet "comes" to disk, making the block file length now 
> becomes 32 bytes.
> Using the syncList (with contains block info with length 16 byte), dn1 calls 
> updateBlock at
> dn2 and dn1, which will failed, because the length of new block info (given 
> by updateBlock,
> which is 16 byte) does not match with its actual length on disk (which is 32 
> byte)
>  
> Note that this bug is non-deterministic. Its depends on the thread 
> interleaving
> at datanodes.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879663#action_12879663
 ] 

Todd Lipcon commented on HDFS-1226:
---

This is addressed by combination of HDFS-142, HDFS-200, HDFS-1057 in 20 append

> Last block is temporary unavailable for readers because of crashed appender
> ---
>
> Key: HDFS-1226
> URL: https://issues.apache.org/jira/browse/HDFS-1226
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: the last block is unavailable to subsequent readers if appender 
> crashes in the
> middle of appending workload.
>  
> - Setup:
> + # available datanodes = 3
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = (see below)
>  
> - Details:
> Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
> successful 
> recoverBlock at primary datanode, client calls createOutputStream, which make 
> all datanodes
> move the block file and the meta file from current directory to tmp 
> directory. Now suppose
> the client crashes. Since all replicas of block X are in tmp folders of 
> corresponding datanode,
> subsequent readers cannot read block X.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1230) BlocksMap.blockinfo is not getting cleared immediately after deleting a block.This will be cleared only after block report comes from the datanode.Why we need to maintain t

2010-06-16 Thread Gokul (JIRA)
BlocksMap.blockinfo is not getting cleared immediately after deleting a 
block.This will be cleared only after block report comes from the datanode.Why 
we need to maintain the blockinfo till that time.


 Key: HDFS-1230
 URL: https://issues.apache.org/jira/browse/HDFS-1230
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.1
Reporter: Gokul


BlocksMap.blockinfo is not getting cleared immediately after deleting a 
block.This will be cleared only after block report comes from the datanode.Why 
we need to maintain the blockinfo till that time It increases namenode 
memory unnecessarily. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1229:
---

Description: 
Setup:

+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
Details:
--
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>  boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>try {
> primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
> ...
>catch (IOException e) {
> ...
> if (recoveryErrorCount > maxRecoveryErrorCount) { 
> // this condition is false
> }
> ...
> return true;
>} // end catch
>finally {...}
>
>this.hasError = false;
>lastException = null;
>errorIndex = 0;
>success = createBlockOutputStream(nodes, clientName, true);
>}
>...
 
Because dn1 crashes during client call to recoverBlock, we have an exception.
Hence, go to the catch block, in which processDatanodeError returns true
before setting hasError to false. Also, because createBlockOutputStream() is 
not called
(due to an early return), blockStream is still null.
 
4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
Because hasError = false, PDNE 2 returns false immediately without doing 
anything

> if (!hasError) { return false; }
 
5) still in the DataStreamer.run(), after returning false from PDNE 2, we still 
have
blockStream = null, hence the following code is executed:

if (blockStream == null) {
   nodes = nextBlockOutputStream(src);
   this.setName("DataStreamer for file " + src + " block " + block);
   response = new ResponseProcessor(nodes);
   response.start();
}
 
nextBlockOutputStream which asks namenode to allocate new Block is called.
(This is not good, because we are appending, not writing).
Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
this leads to createOutputStream() fails because it tries to contact the dn1 
first.
(which has crashed). The client retries 5 times without any success,
because every time, it asks namenode for new block! Again we see
that the retry logic at client is weird!

*This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)*

  was:
Setup:

+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
Details:
--
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>  boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>try {
> primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
> ...
>catch (IOException e) {
> ...
> if (recoveryError

[jira] Updated: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1229:
---

Description: 
Setup:

+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
Details:
--
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>  boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>try {
> primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
> ...
>catch (IOException e) {
> ...
> if (recoveryErrorCount > maxRecoveryErrorCount) { 
> /* this condition is false */
> }
> ...
> return true;
>} // end catch
>finally {...}
>
>this.hasError = false;
>lastException = null;
>errorIndex = 0;
>success = createBlockOutputStream(nodes, clientName, true);
>}
>...
 
Because dn1 crashes during client call to recoverBlock, we have an exception.
Hence, go to the catch block, in which processDatanodeError returns true
before setting hasError to false. Also, because createBlockOutputStream() is 
not called
(due to an early return), blockStream is still null.
 
4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
Because hasError = false, PDNE 2 returns false immediately without doing 
anything
>if (!hasError) {
> return false;
>}
 
5) still in the DataStreamer.run(), after returning false from PDNE 2, we still 
have
blockStream = null, hence the following code is executed:
>  if (blockStream == null) {
>   nodes = nextBlockOutputStream(src);
>   this.setName("DataStreamer for file " + src +
> " block " + block);
>response = new ResponseProcessor(nodes);
>response.start();
>  }
 
nextBlockOutputStream which asks namenode to allocate new Block is called.
(This is not good, because we are appending, not writing).
Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
this leads to createOutputStream() fails because it tries to contact the dn1 
first.
(which has crashed). The client retries 5 times without any success,
because every time, it asks namenode for new block! Again we see
that the retry logic at client is weird!

*This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)*

  was:
- Setup:
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
- Details:
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>  boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>try {
> primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> newBlock = primary.recoverBlock(block, isAppend, newn

[jira] Created: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock

2010-06-16 Thread Thanh Do (JIRA)
DFSClient incorrectly asks for new block if primary crashes during first 
recoverBlock
-

 Key: HDFS-1229
 URL: https://issues.apache.org/jira/browse/HDFS-1229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do


- Setup:
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
- Details:
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>  boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>try {
> primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
> ...
>catch (IOException e) {
> ...
> if (recoveryErrorCount > maxRecoveryErrorCount) { 
> /* this condition is false */
> }
> ...
> return true;
>} // end catch
>finally {...}
>
>this.hasError = false;
>lastException = null;
>errorIndex = 0;
>success = createBlockOutputStream(nodes, clientName, true);
>}
>...
 
Because dn1 crashes during client call to recoverBlock, we have an exception.
Hence, go to the catch block, in which processDatanodeError returns true
before setting hasError to false. Also, because createBlockOutputStream() is 
not called
(due to an early return), blockStream is still null.
 
4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
Because hasError = false, PDNE 2 returns false immediately without doing 
anything
>if (!hasError) {
> return false;
>}
 
5) still in the DataStreamer.run(), after returning false from PDNE 2, we still 
have
blockStream = null, hence the following code is executed:
>  if (blockStream == null) {
>   nodes = nextBlockOutputStream(src);
>   this.setName("DataStreamer for file " + src +
> " block " + block);
>response = new ResponseProcessor(nodes);
>response.start();
>  }
 
nextBlockOutputStream which asks namenode to allocate new Block is called.
(This is not good, because we are appending, not writing).
Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
this leads to createOutputStream() fails because it tries to contact the dn1 
first.
(which has crashed). The client retries 5 times without any success,
because every time, it asks namenode for new block! Again we see
that the retry logic at client is weird!

*This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1228) CRC does not match when retrying appending a partial block

2010-06-16 Thread Thanh Do (JIRA)
CRC does not match when retrying appending a partial block
--

 Key: HDFS-1228
 URL: https://issues.apache.org/jira/browse/HDFS-1228
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: when appending to partial block, if is possible that
retrial when facing an exception fails due to a checksum mismatch.
Append operation is not atomic (either complete or fail completely).
 
- Setup:
+ # available datanodes = 2
+# disks / datanode = 1
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
 
- Details:
Client writes 16 bytes to dn1 and dn2. Write completes. So far so good.
The meta file now contains: 7 bytes header + 4 byte checksum (CK1 -
checksum for 16 byte) Client then appends 16 bytes more, and let assume there 
is an
exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2
is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum 
portion
has been made to disk to the corresponding block file and meta file), meaning 
that the
checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this 
is
checksum for 32 byte data). Because dn2 has an exception, client calls 
recoverBlock and
starts append again to dn1. dn1 receives 16 byte data, it verifies if the 
pre-computed
crc (CK2) matches what we recalculate just now (CK1), which obviously does not 
match.
Hence an exception and retrial fails.
 
- a similar bug has been reported at
https://issues.apache.org/jira/browse/HDFS-679
but here, it manifests in different context.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1219) Data Loss due to edits log truncation

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879655#action_12879655
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1219:
--

Then, is this the same as HDFS-955?

> Data Loss due to edits log truncation
> -
>
> Key: HDFS-1219
> URL: https://issues.apache.org/jira/browse/HDFS-1219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.2
>Reporter: Thanh Do
>
> We found this problem almost at the same time as HDFS developers.
> Basically, the edits log is truncated before fsimage.ckpt is renamed to 
> fsimage.
> Hence, any crash happens after the truncation but before the renaming will 
> lead
> to a data loss. Detailed description can be found here:
> https://issues.apache.org/jira/browse/HDFS-955
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1227) UpdateBlock fails due to unmatched file length

2010-06-16 Thread Thanh Do (JIRA)
UpdateBlock fails due to unmatched file length
--

 Key: HDFS-1227
 URL: https://issues.apache.org/jira/browse/HDFS-1227
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: client append is not atomic, hence, it is possible that
when retrying during append, there is an exception in updateBlock
indicating unmatched file length, making append failed.
 
- Setup:
+ # available datanodes = 3
+ # disks / datanode = 1
+ # failures = 2
+ failure type = bad disk
+ When/where failure happens = (see below)
+ This bug is non-deterministic, to reproduce it, add a sufficient sleep before 
out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3
 
- Details:
 Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, 
dn2, dn3.
Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds.
Client starts sending data to the dn3 - the first datanode in pipeline.
dn3 forwards the packet to downstream datanodes, and starts writing
data to its disk. Suppose there is an exception in dn3 when writing to disk.
Client gets the exception, it starts the recovery code by calling 
dn1.recoverBlock() again.
dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build the 
syncList.
Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and dn2),
the previous packet (which is sent from dn3) has not come to disk yet.
Hence, the block Info given by getMetaDataInfo contains the length of 16 bytes.
But after that, the packet "comes" to disk, making the block file length now 
becomes 32 bytes.
Using the syncList (with contains block info with length 16 byte), dn1 calls 
updateBlock at
dn2 and dn1, which will failed, because the length of new block info (given by 
updateBlock,
which is 16 byte) does not match with its actual length on disk (which is 32 
byte)
 
Note that this bug is non-deterministic. Its depends on the thread interleaving
at datanodes.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1226:
---

Description: 
- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
+ # available datanodes = 3
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = (see below)
 
- Details:
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)


  was:
- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)



> Last block is temporary unavailable for readers because of crashed appender
> ---
>
> Key: HDFS-1226
> URL: https://issues.apache.org/jira/browse/HDFS-1226
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: the last block is unavailable to subsequent readers if appender 
> crashes in the
> middle of appending workload.
>  
> - Setup:
> + # available datanodes = 3
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = (see below)
>  
> - Details:
> Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
> successful 
> recoverBlock at primary datanode, client calls createOutputStream, which make 
> all datanodes
> move the block file and the meta file from current directory to tmp 
> directory. Now suppose
> the client crashes. Since all replicas of block X are in tmp folders of 
> corresponding datanode,
> subsequent readers cannot read block X.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1226:
---

Description: 
- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)


  was:
- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
 
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.


> Last block is temporary unavailable for readers because of crashed appender
> ---
>
> Key: HDFS-1226
> URL: https://issues.apache.org/jira/browse/HDFS-1226
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: the last block is unavailable to subsequent readers if appender 
> crashes in the
> middle of appending workload.
>  
> - Setup:
> # available datanodes = 3
> # disks / datanode = 1
> # failures = 1
> failure type = crash
> When/where failure happens = (see below)
>  
> - Details:
> Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
> successful 
> recoverBlock at primary datanode, client calls createOutputStream, which make 
> all datanodes
> move the block file and the meta file from current directory to tmp 
> directory. Now suppose
> the client crashes. Since all replicas of block X are in tmp folders of 
> corresponding datanode,
> subsequent readers cannot read block X.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender

2010-06-16 Thread Thanh Do (JIRA)
Last block is temporary unavailable for readers because of crashed appender
---

 Key: HDFS-1226
 URL: https://issues.apache.org/jira/browse/HDFS-1226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

- Summary: the last block is unavailable to subsequent readers if appender 
crashes in the
middle of appending workload.
 
- Setup:
# available datanodes = 3
# disks / datanode = 1
# failures = 1
failure type = crash
When/where failure happens = (see below)
 
- Details:
 
Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After 
successful 
recoverBlock at primary datanode, client calls createOutputStream, which make 
all datanodes
move the block file and the meta file from current directory to tmp directory. 
Now suppose
the client crashes. Since all replicas of block X are in tmp folders of 
corresponding datanode,
subsequent readers cannot read block X.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1225) Block lost when primary crashes in recoverBlock

2010-06-16 Thread Thanh Do (JIRA)
Block lost when primary crashes in recoverBlock
---

 Key: HDFS-1225
 URL: https://issues.apache.org/jira/browse/HDFS-1225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: Block is lost if primary datanode crashes in the middle 
tryUpdateBlock.
 
- Setup:
# available datanode = 2
# replica = 2
# disks / datanode = 1
# failures = 1
# failure type = crash
When/where failure happens = (see below)
 
- Details:
 Suppose we have 2 datanodes: dn1 and dn2 and dn1 is primary.
Client appends to blk_X_1001 and crash happens during dn1.recoverBlock,
at the point after blk_X_1001.meta is renamed to blk_X_1001.meta_tmp1002
**Interesting**, this case, the block X is lost eventually. Why?
After dn1.recoverBlock crashes at rename, what left at dn1 current directory is:
1) blk_X

 
2) blk_X_1001.meta_tmp1002
==> this is an invalid block, because it has no meta file associated with it.
dn2 (after dn1 crash) now contains:
1) blk_X

 
2) blk_X_1002.meta
(note that the rename at dn2 is completed, because dn1 called dn2.updateBlock() 
before
calling its own updateBlock())
But the command namenode.commitBlockSynchronization is not reported to namenode,
because dn1 is crashed. Therefore, from namenode point of view, the block X has 
GS 1001.
Hence, the block is lost.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1224) Stale connection makes node miss append

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1224:
---

Description: 
- Summary: if a datanode crashes and restarts, it may miss an append.
 
- Setup:
+ # available datanodes = 3
+ # replica = 3 
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = after the first append succeed
 
- Details:
Since each datanode maintains a pool of IPC connections, whenever it wants
to make an IPC call, it first looks into the pool. If the connection is not 
there, 
it is created and put in to the pool. Otherwise the existing connection is used.
Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the primary.
After the client appends to block X successfully, dn2 crashes and restarts.
Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
Client starts appending to block Y. It first calls dn1.recoverBlock().
Dn1 will first create a proxy corresponding with each of the datanode in the 
pipeline
(in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
However, because
dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
Dn1 uses
this connection to make a call to dn2, hence an exception. Therefore, append 
will be
made only to dn1 and dn3, although dn2 is alive and the write of block Y to dn2 
has
been successful.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
- Summary: if a datanode crashes and restarts, it may miss an append.
 
- Setup:
+ available datanodes = 3
+ replica = 3 
+ disks / datanode = 1
+ failures = 1
+ failure type = crash
+ When/where failure happens = after the first append succeed
 
- Details:
Since each datanode maintains a pool of IPC connections, whenever it wants
to make an IPC call, it first looks into the pool. If the connection is not 
there, 
it is created and put in to the pool. Otherwise the existing connection is used.
Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the primary.
After the client appends to block X successfully, dn2 crashes and restarts.
Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
Client starts appending to block Y. It first calls dn1.recoverBlock().
Dn1 will first create a proxy corresponding with each of the datanode in the 
pipeline
(in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
However, because
dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
Dn1 uses
this connection to make a call to dn2, hence an exception. Therefore, append 
will be
made only to dn1 and dn3, although dn2 is alive and the write of block Y to dn2 
has
been successful.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)


> Stale connection makes node miss append
> ---
>
> Key: HDFS-1224
> URL: https://issues.apache.org/jira/browse/HDFS-1224
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: if a datanode crashes and restarts, it may miss an append.
>  
> - Setup:
> + # available datanodes = 3
> + # replica = 3 
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = after the first append succeed
>  
> - Details:
> Since each datanode maintains a pool of IPC connections, whenever it wants
> to make an IPC call, it first looks into the pool. If the connection is not 
> there, 
> it is created and put in to the pool. Otherwise the existing connection is 
> used.
> Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the 
> primary.
> After the client appends to block X successfully, dn2 crashes and restarts.
> Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
> Client starts appending to block Y. It first calls dn1.recoverBlock().
> Dn1 will first create a proxy corresponding with each of the datanode in the 
> pipeline
> (in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
> However, because
> dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
> Dn1 uses
> this connection to make a call to dn2, hence an exception. Therefore, append 
> will be
> made only to dn1 and dn3, although dn2 is alive and the write of block Y to 
> dn2 has
> been successful.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechR

[jira] Updated: (HDFS-1224) Stale connection makes node miss append

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1224:
---

Affects Version/s: 0.20.1
  Component/s: data-node

> Stale connection makes node miss append
> ---
>
> Key: HDFS-1224
> URL: https://issues.apache.org/jira/browse/HDFS-1224
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: if a datanode crashes and restarts, it may miss an append.
>  
> - Setup:
> + # available datanodes = 3
> + # replica = 3 
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = after the first append succeed
>  
> - Details:
> Since each datanode maintains a pool of IPC connections, whenever it wants
> to make an IPC call, it first looks into the pool. If the connection is not 
> there, 
> it is created and put in to the pool. Otherwise the existing connection is 
> used.
> Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the 
> primary.
> After the client appends to block X successfully, dn2 crashes and restarts.
> Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
> Client starts appending to block Y. It first calls dn1.recoverBlock().
> Dn1 will first create a proxy corresponding with each of the datanode in the 
> pipeline
> (in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
> However, because
> dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
> Dn1 uses
> this connection to make a call to dn2, hence an exception. Therefore, append 
> will be
> made only to dn1 and dn3, although dn2 is alive and the write of block Y to 
> dn2 has
> been successful.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1224) Stale connection makes node miss append

2010-06-16 Thread Thanh Do (JIRA)
Stale connection makes node miss append
---

 Key: HDFS-1224
 URL: https://issues.apache.org/jira/browse/HDFS-1224
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Thanh Do


- Summary: if a datanode crashes and restarts, it may miss an append.
 
- Setup:
+ available datanodes = 3
+ replica = 3 
+ disks / datanode = 1
+ failures = 1
+ failure type = crash
+ When/where failure happens = after the first append succeed
 
- Details:
Since each datanode maintains a pool of IPC connections, whenever it wants
to make an IPC call, it first looks into the pool. If the connection is not 
there, 
it is created and put in to the pool. Otherwise the existing connection is used.
Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the primary.
After the client appends to block X successfully, dn2 crashes and restarts.
Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
Client starts appending to block Y. It first calls dn1.recoverBlock().
Dn1 will first create a proxy corresponding with each of the datanode in the 
pipeline
(in order to make RPC call like getMetadataInfo( )  or updateBlock( )). 
However, because
dn2 has just crashed and restarts, its connection in dn1's pool become stale. 
Dn1 uses
this connection to make a call to dn2, hence an exception. Therefore, append 
will be
made only to dn1 and dn3, although dn2 is alive and the write of block Y to dn2 
has
been successful.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)

2010-06-16 Thread Thanh Do (JIRA)
DataNode fails stop due to a bad disk (or storage directory)


 Key: HDFS-1223
 URL: https://issues.apache.org/jira/browse/HDFS-1223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do


A datanode can store block files in multiple volumes.
If a datanode sees a bad volume during start up (i.e, face an exception
when accessing that volume), it simply fail stops, making all block files
stored in other healthy volumes inaccessible. Consequently, these lost
replicas will be generated later on in other datanodes. 
If a datanode is able to mark the bad disk and continue working with
healthy ones, this will increase availability and avoid unnecessary 
regeneration. As an extreme example, consider one datanode which has
2 volumes V1 and V2, each contains about 1 64MB block files.
During startup, the datanode gets an exception when accessing V1, it then 
fail stops, making 2 block files generated later on.
If the datanode masks V1 as bad and continues working with V2, the number
of replicas needed to be regenerated is cut in to half.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-16 Thread Thanh Do (JIRA)
NameNode fail stop in spite of multiple metadata directories


 Key: HDFS-1222
 URL: https://issues.apache.org/jira/browse/HDFS-1222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do


Despite the ability to configure multiple name directories
(to store fsimage) and edits directories, the NameNode will fail stop 
in most of the time it faces exception when accessing to these directories.
 
NameNode fail stops if an exception happens when loading fsimage,
reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
are still good replicas. NameNode could have tried to work with other replicas,
and marked the faulty one.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1221) NameNode unable to start due to stale edits log after a crash

2010-06-16 Thread Thanh Do (JIRA)
NameNode unable to start due to stale edits log after a crash
-

 Key: HDFS-1221
 URL: https://issues.apache.org/jira/browse/HDFS-1221
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: 
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode 
will get an exception when parsing the edits file, because of stale data, 
leading to unsuccessful reboot.
Note: This is just one example. Since we see that edits log (and fsimage)
does not have checksum, they are vulnerable to corruption too.
 
- Details:
The steps to create new edits log (which we infer from HDFS code) are:
1) truncate the file to zero size
2) write FSConstants.LAYOUT_VERSION to buffer
3) insert the end-of-file marker OP_INVALID to the end of the buffer
4) preallocate 1MB of data, and fill the data with 0
5) flush the buffer to disk
 
Note that only in step 1, 4, 5, the data on disk is actually changed.
Now, suppose a crash happens after step 4, but before step 5.
In the next reboot, NameNode will fetch this edits log file (which contains
all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
because NameNode has code to handle that case.
(but we expect LAYOUT_VERSION to be -18, don't we). 
Now it parses the operation code, which happens to be 0. Unfortunately, since 0
is the value for OP_ADD, the NameNode expects some parameters corresponding 
to that operation. Now NameNode calls readString to read the path, which throws
an exception leading to a failed reboot.

We found this problem almost at the same time as HDFS developers.
Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage.
Hence, any crash happens after the truncation but before the renaming will lead
to a data loss. Detailed description can be found here:
https://issues.apache.org/jira/browse/HDFS-955
This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1221) NameNode unable to start due to stale edits log after a crash

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1221:
---

Description: 
- Summary: 
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode 
will get an exception when parsing the edits file, because of stale data, 
leading to unsuccessful reboot.
Note: This is just one example. Since we see that edits log (and fsimage)
does not have checksum, they are vulnerable to corruption too.
 
- Details:
The steps to create new edits log (which we infer from HDFS code) are:
1) truncate the file to zero size
2) write FSConstants.LAYOUT_VERSION to buffer
3) insert the end-of-file marker OP_INVALID to the end of the buffer
4) preallocate 1MB of data, and fill the data with 0
5) flush the buffer to disk
 
Note that only in step 1, 4, 5, the data on disk is actually changed.
Now, suppose a crash happens after step 4, but before step 5.
In the next reboot, NameNode will fetch this edits log file (which contains
all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
because NameNode has code to handle that case.
(but we expect LAYOUT_VERSION to be -18, don't we). 
Now it parses the operation code, which happens to be 0. Unfortunately, since 0
is the value for OP_ADD, the NameNode expects some parameters corresponding 
to that operation. Now NameNode calls readString to read the path, which throws
an exception leading to a failed reboot.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
- Summary: 
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode 
will get an exception when parsing the edits file, because of stale data, 
leading to unsuccessful reboot.
Note: This is just one example. Since we see that edits log (and fsimage)
does not have checksum, they are vulnerable to corruption too.
 
- Details:
The steps to create new edits log (which we infer from HDFS code) are:
1) truncate the file to zero size
2) write FSConstants.LAYOUT_VERSION to buffer
3) insert the end-of-file marker OP_INVALID to the end of the buffer
4) preallocate 1MB of data, and fill the data with 0
5) flush the buffer to disk
 
Note that only in step 1, 4, 5, the data on disk is actually changed.
Now, suppose a crash happens after step 4, but before step 5.
In the next reboot, NameNode will fetch this edits log file (which contains
all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
because NameNode has code to handle that case.
(but we expect LAYOUT_VERSION to be -18, don't we). 
Now it parses the operation code, which happens to be 0. Unfortunately, since 0
is the value for OP_ADD, the NameNode expects some parameters corresponding 
to that operation. Now NameNode calls readString to read the path, which throws
an exception leading to a failed reboot.

We found this problem almost at the same time as HDFS developers.
Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage.
Hence, any crash happens after the truncation but before the renaming will lead
to a data loss. Detailed description can be found here:
https://issues.apache.org/jira/browse/HDFS-955
This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

Component/s: name-node

> NameNode unable to start due to stale edits log after a crash
> -
>
> Key: HDFS-1221
> URL: https://issues.apache.org/jira/browse/HDFS-1221
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: 
> If a crash happens during FSEditLog.createEditLogFile(), the
> edits log file on disk may be stale. During next reboot, NameNode 
> will get an exception when parsing the edits file, because of stale data, 
> leading to unsuccessful reboot.
> Note: This is just one example. Since we see that edits log (and fsimage)
> does not have checksum, they are vulnerable to corruption too.
>  
> - Details:
> The steps to create new edits log (which we infer from HDFS code) are:
> 1) truncate the file to zero size
> 2) write FSConstants.LAYOUT_VERSION to buffer
> 3) insert the end-of-file marker OP_INVALID to the end of the buffer
> 4) preallocate 1MB of data, and fill the data with 0
> 5) flush the buffer to disk
>  
> Note that only in step 1, 4, 5, the data on disk is actually changed.
> Now, suppose a crash happ

[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime < 0L)
  return; // do not write negative time
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }


Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


> Namenode unable to start due to truncated fstime
> 
>
> Key: HDFS-1220
> URL: https://issues.apache.org/jira/browse/HDFS-1220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: updating fstime file on disk is not atomic, so it is possible that
> if a crash happens in the middle, next time when NameNode reboots, it will
> read stale fstime, hence unable to start successfully.
>  
> - Details:
> Below is the code for updating fstime file on disk
>   void writeCheckpointTime(StorageDirectory sd) throws IOException {
> if (checkpointTime < 0L)
>   return; // do not write negative time
> File timeFile = getImageFile(sd, NameNodeFile.TIME);
> if (timeFile.exists()) { timeFile.delete(); }
> DataOutputStream out = new DataOutputStream(
> new 
> FileOutputStream(timeFile));
> try {
>   out.writeLong(checkpointTime);
> } finally {
>   out.close();
> }
>   }
> Basically, this involve 3 steps:
> 1) delete fstime file (timeFile.delete())
> 2) truncate fstime file (new FileOutputStream(timeFile))
> 3) write new time to fstime file (out.writeLong(checkpointTime))
> If a crash happens after step 2 and before step 3, in the next reboot, 
> NameNode
> got an exception when reading the time (8 byte) from an empty fstime file.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime < 0L)
  return; // do not write negative time
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }


Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


> Namenode unable to start due to truncated fstime
> 
>
> Key: HDFS-1220
> URL: https://issues.apache.org/jira/browse/HDFS-1220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: updating fstime file on disk is not atomic, so it is possible that
> if a crash happens in the middle, next time when NameNode reboots, it will
> read stale fstime, hence unable to start successfully.
>  
> - Details:
> Basically, this involve 3 steps:
> 1) delete fstime file (timeFile.delete())
> 2) truncate fstime file (new FileOutputStream(timeFile))
> 3) write new time to fstime file (out.writeLong(checkpointTime))
> If a crash happens after step 2 and before step 3, in the next reboot, 
> NameNode
> got an exception when reading the time (8 byte) from an empty fstime file.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)
Namenode unable to start due to truncated fstime


 Key: HDFS-1220
 URL: https://issues.apache.org/jira/browse/HDFS-1220
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do


- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime < 0L)
  return; // do not write negative time 

 
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }
 
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime < 0L)
  return; // do not write negative time 

 
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }
 
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


> Namenode unable to start due to truncated fstime
> 
>
> Key: HDFS-1220
> URL: https://issues.apache.org/jira/browse/HDFS-1220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> - Summary: updating fstime file on disk is not atomic, so it is possible that
> if a crash happens in the middle, next time when NameNode reboots, it will
> read stale fstime, hence unable to start successfully.
>  
> - Details:
> Basically, this involve 3 steps:
> 1) delete fstime file (timeFile.delete())
> 2) truncate fstime file (new FileOutputStream(timeFile))
> 3) write new time to fstime file (out.writeLong(checkpointTime))
> If a crash happens after step 2 and before step 3, in the next reboot, 
> NameNode
> got an exception when reading the time (8 byte) from an empty fstime file.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1219) Data Loss due to edits log truncation

2010-06-16 Thread Thanh Do (JIRA)
Data Loss due to edits log truncation
-

 Key: HDFS-1219
 URL: https://issues.apache.org/jira/browse/HDFS-1219
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Thanh Do


We found this problem almost at the same time as HDFS developers.
Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage.
Hence, any crash happens after the truncation but before the renaming will lead
to a data loss. Detailed description can be found here:
https://issues.apache.org/jira/browse/HDFS-955

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879633#action_12879633
 ] 

Scott Carey commented on HDFS-1114:
---

bq. # Capacity should be divided by a reference size 8 or 4 depending on the 
64bit or 32bit java version

What about -XX:+UseCompressedOops ? 

All users should be using this flag on a 64 bit JVM to save a lot of space. It 
only works up to -Xmx32G though, beyond that its large pointers again.

> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, 
> h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, 
> h1114_20100616b.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1211) 0.20 append: Block receiver should not log "rewind" packets at INFO level

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1211.


Resolution: Fixed

I just committed this. Thanks Todd!

> 0.20 append: Block receiver should not log "rewind" packets at INFO level
> -
>
> Key: HDFS-1211
> URL: https://issues.apache.org/jira/browse/HDFS-1211
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 0.20-append
>
> Attachments: hdfs-1211.txt
>
>
> In the 0.20 append implementation, it logs an INFO level message for every 
> packet that "rewinds" the end of the block file. This is really noisy for 
> applications like HBase which sync every edit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1209) Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts

2010-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879630#action_12879630
 ] 

dhruba borthakur commented on HDFS-1209:


It still does not apply cleanly, can you pl post a new patch

> Add conf dfs.client.block.recovery.retries to configure number of block 
> recovery attempts
> -
>
> Key: HDFS-1209
> URL: https://issues.apache.org/jira/browse/HDFS-1209
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1209.txt
>
>
> This variable is referred to in the TestFileAppend4 tests, but it isn't 
> actually looked at by DFSClient (I'm betting this is in FB's branch).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1210) DFSClient should log exception when block recovery fails

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1210.


Fix Version/s: 0.20-append
   Resolution: Fixed

I just committed this. Thanks Todd.

> DFSClient should log exception when block recovery fails
> 
>
> Key: HDFS-1210
> URL: https://issues.apache.org/jira/browse/HDFS-1210
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append, 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Trivial
> Fix For: 0.20-append
>
> Attachments: hdfs-1210.txt
>
>
> Right now we just retry without necessarily showing the exception. It can be 
> useful to see what the error was that prevented the recovery RPC from 
> succeeding.
> (I believe this only applies in 0.20 style of block recovery)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode

2010-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879626#action_12879626
 ] 

dhruba borthakur commented on HDFS-1118:


Code looks good to me. I will commit this to trunk.

> DFSOutputStream socket leak when cannot connect to DataNode
> ---
>
> Key: HDFS-1118
> URL: https://issues.apache.org/jira/browse/HDFS-1118
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append, 0.20.1, 0.20.2
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.20-append
>
> Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch
>
>
> The offending code is in {{DFSOutputStream.nextBlockOutputStream}}
> This function retries several times to call {{createBlockOutputStream}}. Each 
> time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}.
> That object is never closed, but overwritten the next time 
> {{createBlockOutputStream}} is called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1204.


Resolution: Fixed

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: sam rash
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt, hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1207) 0.20-append: stallReplicationWork should be volatile

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1207.


Fix Version/s: 0.20-append
   Resolution: Fixed

I just committed this. Thanks Todd!

> 0.20-append: stallReplicationWork should be volatile
> 
>
> Key: HDFS-1207
> URL: https://issues.apache.org/jira/browse/HDFS-1207
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: hdfs-1207.txt
>
>
> the stallReplicationWork member in FSNamesystem is accessed by multiple 
> threads without synchronization, but isn't marked volatile. I believe this is 
> responsible for about 1% failure rate on 
> TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at 
> logs I see replication happening even though we've supposedly disabled it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1209) Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879620#action_12879620
 ] 

Todd Lipcon commented on HDFS-1209:
---

This should apply after HDFS-1210 - can you commit that one first?

> Add conf dfs.client.block.recovery.retries to configure number of block 
> recovery attempts
> -
>
> Key: HDFS-1209
> URL: https://issues.apache.org/jira/browse/HDFS-1209
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1209.txt
>
>
> This variable is referred to in the TestFileAppend4 tests, but it isn't 
> actually looked at by DFSClient (I'm betting this is in FB's branch).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879618#action_12879618
 ] 

Todd Lipcon commented on HDFS-1204:
---

I think it does not - it looks like it was a regression caused by HDFS-200 in 
branch 20 append.

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: sam rash
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt, hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1209) Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts

2010-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879617#action_12879617
 ] 

dhruba borthakur commented on HDFS-1209:


Thsi one does not apply cleanly to 0.20-append. Can you pl post a new patch?

> Add conf dfs.client.block.recovery.retries to configure number of block 
> recovery attempts
> -
>
> Key: HDFS-1209
> URL: https://issues.apache.org/jira/browse/HDFS-1209
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1209.txt
>
>
> This variable is referred to in the TestFileAppend4 tests, but it isn't 
> actually looked at by DFSClient (I'm betting this is in FB's branch).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879616#action_12879616
 ] 

dhruba borthakur commented on HDFS-1204:


Sam/Todd : can you pl comment on whether this bug exists in Hadoop trunk?

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: sam rash
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt, hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1206) TestFiHFlush depends on BlocksMap implementation

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879615#action_12879615
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1206:
--

Saw it failing again.
{noformat}
Testcase: hFlushFi01_a took 4.553 sec
FAILED

junit.framework.AssertionFailedError: 
at 
org.apache.hadoop.hdfs.TestFiHFlush.runDiskErrorTest(TestFiHFlush.java:56)
at 
org.apache.hadoop.hdfs.TestFiHFlush.hFlushFi01_a(TestFiHFlush.java:72)
{noformat}

> TestFiHFlush depends on BlocksMap implementation
> 
>
> Key: HDFS-1206
> URL: https://issues.apache.org/jira/browse/HDFS-1206
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo (Nicholas), SZE
>
> When I was testing HDFS-1114, the patch passed all tests except TestFiHFlush. 
>  Then, I tried to print out some debug messages, however, TestFiHFlush 
> succeeded after added the messages.
> TestFiHFlush probably depends on the speed of BlocksMap.  If BlocksMap is 
> slow enough, then it will pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1114:
-

Status: Patch Available  (was: Open)

Hudson does not seem working.  It did not pick up my previous for a long time.  
Re-submit.

> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, 
> h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, 
> h1114_20100616b.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1114:
-

Status: Open  (was: Patch Available)

Thanks for the detail review, Suresh.

   1.  BlocksMap.java

done.

   2. LightWeightGSet.java

done all except the follwoing.

  * remove() - for better readability ...

Implicit else is better the explicit else?

   3. TestGSet.java
  * In exception tests, ...

Catching specific exceptions but I did not change the messages.

  * println should use Log.info instead of System.out.println?

No, print(..) and println(..) work together.

  * add some comments to ...
  * add some comments to ...
  * add comments to ...

Added more some comments.

> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, 
> h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, 
> h1114_20100616b.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1114:
-

Attachment: h1114_20100616b.patch

h1114_20100616b:
- Rewrote codes on capacity computation
- By following Java, throwing NPE instead of IllegalArugmentException when the 
parameter is null.
- Split toString() to two methods.
- Catching specific exception and added more comments on the test.

> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, 
> h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, 
> h1114_20100616b.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1000) libhdfs needs to be updated to use the new UGI

2010-06-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879584#action_12879584
 ] 

Suresh Srinivas commented on HDFS-1000:
---

+1 patch looks good.

> libhdfs needs to be updated to use the new UGI
> --
>
> Key: HDFS-1000
> URL: https://issues.apache.org/jira/browse/HDFS-1000
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
> Attachments: fs-javadoc.patch, hdfs-1000-bp20.3.patch, 
> hdfs-1000-bp20.4.patch, hdfs-1000-bp20.patch, hdfs-1000-trunk.1.patch
>
>
> libhdfs needs to be updated w.r.t the APIs in the new UGI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879582#action_12879582
 ] 

Suresh Srinivas commented on HDFS-1114:
---

# BlocksMap.java 
#* typo exponient. Should be exponent?
#* Capacity should be divided by a reference size 8 or 4 depending on the 64bit 
or 32bit java version
#* Current capacity calculation seems quite complex. Add more explanation on 
why it is implemented that way.
# LightWeightGSet.java
#* "which uses a hash table for storing the elements" should this say "uses 
array"?
#* Add a comment that the size of entries is power of two
#* Throw HadoopIllegalArgumentException instead of IllegalArgumentException 
(for 20 version of the patch it could remain IllegalArugmentException)
#* remove() - for better readability no need for else if and else since the 
previous block returns
#* toString() - prints the all the entries. This is a bad idea if some one 
passes this object to Log unknowingly. If all the details of the HashMap is 
needed, we should have some other method such as dump() or printDetails() to do 
the same.
# TestGSet.java
#* In exception tests, instead of printing log when expected exception 
happened, print a log in Assert.fail(), like Assert.fail("Excepected exception 
was not thrown"). Check for exceptions should be more specific, instead 
Exception. It is also good idea to document these exceptions in javadoc for 
methods in GSet.
#* println should use Log.info instead of System.out.println?
#* add some comments to classes on what they do/how they are used
#* add some comments to GSetTestCase members denominator etc. and constructor
#* add comments to testGSet() on what each of the case is accomplishing 



> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, 
> h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879563#action_12879563
 ] 

Todd Lipcon commented on HDFS-1057:
---

[for branch 0.20 append, +1 -- I've been running with this for 6 weeks, it 
works, and looks good!]

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> ---
>
> Key: HDFS-1057
> URL: https://issues.apache.org/jira/browse/HDFS-1057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: sam rash
>Priority: Blocker
> Attachments: conurrent-reader-patch-1.txt, 
> conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
> hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1141) completeFile does not check lease ownership

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1141.


Resolution: Fixed

Pulled into hadoop-0.20-append

> completeFile does not check lease ownership
> ---
>
> Key: HDFS-1141
> URL: https://issues.apache.org/jira/browse/HDFS-1141
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt
>
>
> completeFile should check that the caller still owns the lease of the file 
> that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
> case in HDFS-1139.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-142.
---

Resolution: Fixed

I have committed this. Thanks Sam, Nicolas and Todd.

> In 0.20, move blocks being written into a blocksBeingWritten directory
> --
>
> Key: HDFS-142
> URL: https://issues.apache.org/jira/browse/HDFS-142
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
>Priority: Blocker
> Fix For: 0.20-append
>
> Attachments: appendFile-recheck-lease.txt, appendQuestions.txt, 
> deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, 
> deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, 
> handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, 
> HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, 
> hdfs-142-minidfs-fix-from-409.txt, 
> HDFS-142-multiple-blocks-datanode-exception.patch, 
> hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, 
> hdfs-142-testleaserecovery-fix.txt, HDFS-142_20-append2.patch, 
> HDFS-142_20.patch, recentInvalidateSets-assertion-fix.txt, 
> recover-rbw-v2.txt, testfileappend4-deaddn.txt, 
> validateBlockMetaData-synchronized.txt
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  
> directory since these files are not valid anymore. But in 0.18 it moves these 
> files to normal directory incorrectly making them valid blocks. One of the 
> following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete 
> them.
> Currently effect of this bug is that, these files end up failing block 
> verification and eventually get deleted. But cause incorrect over-replication 
> at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be 
> defined better. Right now there are probably one or two more bugs with it. 
> Dhruba, please file them if you rememeber.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1207) 0.20-append: stallReplicationWork should be volatile

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1207:
--

Attachment: hdfs-1207.txt

> 0.20-append: stallReplicationWork should be volatile
> 
>
> Key: HDFS-1207
> URL: https://issues.apache.org/jira/browse/HDFS-1207
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1207.txt
>
>
> the stallReplicationWork member in FSNamesystem is accessed by multiple 
> threads without synchronization, but isn't marked volatile. I believe this is 
> responsible for about 1% failure rate on 
> TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at 
> logs I see replication happening even though we've supposedly disabled it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1194) Secondary namenode fails to fetch the image from the primary

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1194:
--

Affects Version/s: (was: 0.20-append)

Removing append tag, since it's unrelated.

> Secondary namenode fails to fetch the image from the primary
> 
>
> Key: HDFS-1194
> URL: https://issues.apache.org/jira/browse/HDFS-1194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
> Environment: Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
> CentOS 5
>Reporter: Dmytro Molkov
>Assignee: Dmytro Molkov
>
> We just hit the problem described in HDFS-1024 again.
> After more investigation of the underlying problems with 
> CancelledKeyException there are some findings:
> One of the symptoms: the transfer becomes really slow (it does 700 kb/s) when 
> I am doing the fetch using wget. At the same time disk and network are OK 
> since I can copy at 50 mb/s using scp.
> I was taking jstacks of the namenode while the transfer is in process and we 
> found that every stack trace has one thread of jetty sitting in this place:
> {code}
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:452)
>   at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>   at 
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> {code}
> Here is a jetty code that corresponds to this:
> {code}
> // Look for JVM bug 
> if (selected==0 && wait>0 && (now-before) _selector.selectedKeys().size()==0)
> {
> if (_jvmBug++>5)  // TODO tune or configure this
> {
> // Probably JVM BUG!
> 
> Iterator iter = _selector.keys().iterator();
> while(iter.hasNext())
> {
> key = (SelectionKey) iter.next();
> if (key.isValid()&&key.interestOps()==0)
> {
> key.cancel();
> }
> }
> try
> {
> Thread.sleep(20);  // tune or configure this
> }
> catch (InterruptedException e)
> {
> Log.ignore(e);
> }
> } 
> }
> {code}
> Based on this it is obvious we are hitting a jetty workaround for a JVM bug 
> that doesn't handle select() properly.
> There is a jetty JIRA for this http://jira.codehaus.org/browse/JETTY-937 (it 
> actually introduces the workaround for the JVM bug that we are hitting)
> They say that the problem was fixed in 6.1.22, there is a person on that JIRA 
> also saying that switching to using SocketConnector instead of 
> SelectChannelConnector helped in their case.
> Since we are hitting the same bug in our world we should either adopt the 
> newer Jetty version where there is a better workaround, but it might not help 
> if we are still hitting that bug constantly, the workaround might be better 
> though.
> Another approach is to switch to using SocketConnector which will eliminate 
> the problem completely, although I am not sure what problems that will bring.
> The java version we are running is in Environment
> Any thoughts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1218:
--

Attachment: hdfs-1281.txt

Here's a patch, but won't apply on top of the branch currently. Requires 
HDFS-1057 and possibly some other FSDataset patches first to apply without 
conflict (possibly HDFS-1056)

> 20 append: Blocks recovered on startup should be treated with lower priority 
> during block synchronization
> -
>
> Key: HDFS-1218
> URL: https://issues.apache.org/jira/browse/HDFS-1218
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 0.20-append
>
> Attachments: hdfs-1281.txt
>
>
> When a datanode experiences power loss, it can come back up with truncated 
> replicas (due to local FS journal replay). Those replicas should not be 
> allowed to truncate the block during block synchronization if there are other 
> replicas from DNs that have _not_ restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1141) completeFile does not check lease ownership

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1141:
---

Fix Version/s: 0.20-append

> completeFile does not check lease ownership
> ---
>
> Key: HDFS-1141
> URL: https://issues.apache.org/jira/browse/HDFS-1141
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt
>
>
> completeFile should check that the caller still owns the lease of the file 
> that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
> case in HDFS-1139.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization

2010-06-16 Thread Todd Lipcon (JIRA)
20 append: Blocks recovered on startup should be treated with lower priority 
during block synchronization
-

 Key: HDFS-1218
 URL: https://issues.apache.org/jira/browse/HDFS-1218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.20-append


When a datanode experiences power loss, it can come back up with truncated 
replicas (due to local FS journal replay). Those replicas should not be allowed 
to truncate the block during block synchronization if there are other replicas 
from DNs that have _not_ restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879524#action_12879524
 ] 

Tom White commented on HDFS-1216:
-

HADOOP-6800 will upgrade to JUnit 4.8.1, so perhaps you'd like to use that.

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur resolved HDFS-1216.


Resolution: Fixed

I just committed this. Thanks Todd!

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1217) Some methods in the NameNdoe should not be public

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879517#action_12879517
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1217:
--

Here is some suggestions:
{code}
+++ src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java   
(working copy)
@@ -1137,7 +1137,7 @@
* @param nodeReg data node registration
* @throws IOException
*/
-  public void verifyRequest(NodeRegistration nodeReg) throws IOException {
+  void verifyRequest(NodeRegistration nodeReg) throws IOException {
 verifyVersion(nodeReg.getVersion());
 if (!namesystem.getRegistrationID().equals(nodeReg.getRegistrationID()))
   throw new UnregisteredNodeException(nodeReg);
@@ -1149,19 +1149,12 @@
* @param version
* @throws IOException
*/
-  public void verifyVersion(int version) throws IOException {
+  private static void verifyVersion(int version) throws IOException {
 if (version != LAYOUT_VERSION)
   throw new IncorrectVersionException(version, "data node");
   }
 
-  /**
-   * Returns the name of the fsImage file
-   */
-  public File getFsImageName() throws IOException {
-return getFSImage().getFsImageName();
-  }
-
-  public FSImage getFSImage() {
+  FSImage getFSImage() {
 return namesystem.dir.fsImage;
   }
 
@@ -1169,7 +1162,7 @@
* Returns the name of the fsImage file uploaded by periodic
* checkpointing
*/
-  public File[] getFsImageNameCheckpoint() throws IOException {
+  File[] getFsImageNameCheckpoint() throws IOException {
 return getFSImage().getFsImageNameCheckpoint();
   }
 
@@ -1187,7 +1180,7 @@
* 
* @return the http address.
*/
-  public InetSocketAddress getHttpAddress() {
+  InetSocketAddress getHttpAddress() {
 return httpAddress;
   }
{code}

> Some methods in the NameNdoe should not be public
> -
>
> Key: HDFS-1217
> URL: https://issues.apache.org/jira/browse/HDFS-1217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>
> There are quite a few NameNode methods which are not required to be public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1216:
--

Attachment: (was: junit-4.5.txt)

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1216:
--

Attachment: junit-4.5.txt

Ah, uploaded wrong file. Take 2.

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1216:
--

Attachment: junit-4.5.txt

Update to junit 4.5 (it's not the newest, but it's what we use in trunk, so we 
should be consistent)

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1217) Some methods in the NameNdoe should not be public

2010-06-16 Thread Tsz Wo (Nicholas), SZE (JIRA)
Some methods in the NameNdoe should not be public
-

 Key: HDFS-1217
 URL: https://issues.apache.org/jira/browse/HDFS-1217
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE


There are quite a few NameNode methods which are not required to be public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879513#action_12879513
 ] 

dhruba borthakur commented on HDFS-1216:


+1

> Update to JUnit 4 in branch 20 append
> -
>
> Key: HDFS-1216
> URL: https://issues.apache.org/jira/browse/HDFS-1216
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: junit-4.5.txt
>
>
> A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
> Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1216) Update to JUnit 4 in branch 20 append

2010-06-16 Thread Todd Lipcon (JIRA)
Update to JUnit 4 in branch 20 append
-

 Key: HDFS-1216
 URL: https://issues.apache.org/jira/browse/HDFS-1216
 Project: Hadoop HDFS
  Issue Type: Task
  Components: test
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append


A lot of the append tests are JUnit 4 style. We should upgrade in branch - 
Junit 4 is entirely backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1054) Remove unnecessary sleep after failure in nextBlockOutputStream

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1054:
--

Fix Version/s: 0.20-append

> Remove unnecessary sleep after failure in nextBlockOutputStream
> ---
>
> Key: HDFS-1054
> URL: https://issues.apache.org/jira/browse/HDFS-1054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append, 0.20.3, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append, 0.21.0
>
> Attachments: hdfs-1054-0.20-append.txt, hdfs-1054.txt, hdfs-1054.txt
>
>
> If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds 
> before retrying. I don't see a great reason to wait at all, much less 6 
> seconds (especially now that HDFS-630 ensures that a retry won't go back to 
> the bad node). We should at least make it configurable, and perhaps something 
> like backoff makes some sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1215) TestNodeCount infinite loops on branch-20-append

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-1215.
---

  Assignee: Todd Lipcon
Resolution: Fixed

Dhruba committed to 20-append branch

> TestNodeCount infinite loops on branch-20-append
> 
>
> Key: HDFS-1215
> URL: https://issues.apache.org/jira/browse/HDFS-1215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: 
> 0025-Fix-TestNodeCount-to-not-infinite-loop-after-HDFS-40.patch
>
>
> HDFS-409 made some minicluster changes, which got incorporated into one of 
> the earlier 20-append patches. This breaks TestNodeCount so it infinite loops 
> on the branch. This patch fixes it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-988) saveNamespace can corrupt edits log

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-988:
-

Fix Version/s: 0.20-append

Marking this as fixed for the append branch (it's committed there, but not 
resolved for trunk yet)

> saveNamespace can corrupt edits log
> ---
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.20-append
>
> Attachments: hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1215) TestNodeCount infinite loops on branch-20-append

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1215:
--

Attachment: 0025-Fix-TestNodeCount-to-not-infinite-loop-after-HDFS-40.patch

Here's a -p1 patch that fixes this issue.

> TestNodeCount infinite loops on branch-20-append
> 
>
> Key: HDFS-1215
> URL: https://issues.apache.org/jira/browse/HDFS-1215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: 
> 0025-Fix-TestNodeCount-to-not-infinite-loop-after-HDFS-40.patch
>
>
> HDFS-409 made some minicluster changes, which got incorporated into one of 
> the earlier 20-append patches. This breaks TestNodeCount so it infinite loops 
> on the branch. This patch fixes it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1215) TestNodeCount infinite loops on branch-20-append

2010-06-16 Thread Todd Lipcon (JIRA)
TestNodeCount infinite loops on branch-20-append


 Key: HDFS-1215
 URL: https://issues.apache.org/jira/browse/HDFS-1215
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.20-append
Reporter: Todd Lipcon
 Fix For: 0.20-append


HDFS-409 made some minicluster changes, which got incorporated into one of the 
earlier 20-append patches. This breaks TestNodeCount so it infinite loops on 
the branch. This patch fixes it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-826:
--

Fix Version/s: 0.20-append

> Allow a mechanism for an application to detect that datanode(s)  have died in 
> the write pipeline
> 
>
> Key: HDFS-826
> URL: https://issues.apache.org/jira/browse/HDFS-826
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.20-append
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.20-append, 0.21.0
>
> Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, 
> Replicable4.txt, ReplicableHdfs.txt, ReplicableHdfs2.txt, ReplicableHdfs3.txt
>
>
> HDFS does not replicate the last block of the file that is being currently 
> written to by an application. Every datanode death in the write pipeline 
> decreases the reliability of the last block of the currently-being-written 
> block. This situation can be improved if the application can be notified of a 
> datanode death in the write pipeline. Then, the application can decide what 
> is the right course of action to be taken on this event.
> In our use-case, the application can close the file on the first datanode 
> death, and start writing to a newly created file. This ensures that the 
> reliability guarantee of a block is close to 3 at all time.
> One idea is to make DFSOutoutStream. write() throw an exception if the number 
> of datanodes in the write pipeline fall below minimum.replication.factor that 
> is set on the client (this is backward compatible).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-793) DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-793:
-

Affects Version/s: (was: 0.20-append)

Removing 0.20-append, since it was already applied to 20 branch in the form of 
HDFS-872.

> DataNode should first receive the whole packet ack message before it 
> constructs and sends its own ack message for the packet
> 
>
> Key: HDFS-793
> URL: https://issues.apache.org/jira/browse/HDFS-793
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20.2, 0.21.0
>
> Attachments: separateSendRcvAck-0.20-yahoo.patch, 
> separateSendRcvAck-0.20.patch, separateSendRcvAck.patch, 
> separateSendRcvAck1.patch, separateSendRcvAck2.patch
>
>
> Currently BlockReceiver#PacketResponder interleaves receiving ack message and 
> sending ack message for the same packet. It reads a portion of the message, 
> sends a portion of its ack, and continues like this until it hits the end of 
> the message. The problem is that if it gets an error receiving the ack, it is 
> not able to send an ack that reflects the source of the error.
> The PacketResponder should receives the whole packet ack message first and 
> then constuct and sends out its ack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-630:
--

Fix Version/s: 0.20-append

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client, name-node
>Affects Versions: 0.20-append
>Reporter: Ruyue Ma
>Assignee: Cosmin Lehene
> Fix For: 0.20-append, 0.21.0
>
> Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
> 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
> 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
> 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
> 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
> hdfs-630-0.20-append.patch, hdfs-630-0.20.txt, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-457) better handling of volume failure in Data Node storage

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-457:
-

Affects Version/s: (was: 0.20-append)

Removing 0.20-append tag - this isn't append-specific.

> better handling of volume failure in Data Node storage
> --
>
> Key: HDFS-457
> URL: https://issues.apache.org/jira/browse/HDFS-457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch, 
> HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch, 
> jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip
>
>
> Current implementation shuts DataNode down completely when one of the 
> configured volumes of the storage fails.
> This is rather wasteful behavior because it  decreases utilization (good 
> storage becomes unavailable) and imposes extra load on the system 
> (replication of the blocks from the good volumes). These problems will become 
> even more prominent when we move to mixed (heterogeneous) clusters with many 
> more volumes per Data Node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-561) Fix write pipeline READ_TIMEOUT

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-561:
--

Fix Version/s: 0.20-append

> Fix write pipeline READ_TIMEOUT
> ---
>
> Key: HDFS-561
> URL: https://issues.apache.org/jira/browse/HDFS-561
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Affects Versions: 0.20-append
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Fix For: 0.20-append, 0.21.0
>
> Attachments: h561-01.patch, h561-02.patch, hdfs-561-0.20-append.txt
>
>
> When writing a file, the pipeline status read timeouts for datanodes are not 
> set up properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-445) pread() fails when cached block locations are no longer valid

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-445:
--

Fix Version/s: 0.20-append

> pread() fails when cached block locations are no longer valid
> -
>
> Key: HDFS-445
> URL: https://issues.apache.org/jira/browse/HDFS-445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Fix For: 0.20-append, 0.21.0
>
> Attachments: 445-06.patch, 445-08.patch, hdfs-445-0.20-append.txt, 
> HDFS-445-0_20.2.patch
>
>
> when cached block locations are no longer valid (e.g., datanodes restart on 
> different ports), pread() will fail, whereas normal read() still succeeds 
> through re-fetching of block locations from namenode (up to a max number of 
> times). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

2010-06-16 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HDFS-127:
---

Affects Version/s: (was: 0.20-append)

> DFSClient block read failures cause open DFSInputStream to become unusable
> --
>
> Key: HDFS-127
> URL: https://issues.apache.org/jira/browse/HDFS-127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Reporter: Igor Bolotin
>Assignee: Igor Bolotin
> Fix For: 0.21.0
>
> Attachments: 4681.patch, h127_20091016.patch, h127_20091019.patch, 
> h127_20091019b.patch, hdfs-127-branch20-redone-v2.txt, 
> hdfs-127-branch20-redone.txt, hdfs-127-regression-test.txt
>
>
> We are using some Lucene indexes directly from HDFS and for quite long time 
> we were using Hadoop version 0.15.3.
> When tried to upgrade to Hadoop 0.19 - index searches started to fail with 
> exceptions like:
> 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: 
> java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 
> file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
> at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
> at java.io.DataInputStream.read(DataInputStream.java:132)
> at 
> org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
> at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
> at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
> at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
> at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
> at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
> at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) 
> ...
> The investigation showed that the root of this issue is that we exceeded # of 
> xcievers in the data nodes and that was fixed by changing configuration 
> settings to 2k.
> However - one thing that bothered me was that even after datanodes recovered 
> from overload and most of client servers had been shut down - we still 
> observed errors in the logs of running servers.
> Further investigation showed that fix for HADOOP-1911 introduced another 
> problem - the DFSInputStream instance might become unusable once number of 
> failures over lifetime of this instance exceeds configured threshold.
> The fix for this specific issue seems to be trivial - just reset failure 
> counter before reading next block (patch will be attached shortly).
> This seems to be also related to HADOOP-3185, but I'm not sure I really 
> understand necessity of keeping track of failed block accesses in the DFS 
> client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

2010-06-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-101:
--

Fix Version/s: 0.20-append

> DFS write pipeline : DFSClient sometimes does not detect second datanode 
> failure 
> -
>
> Key: HDFS-101
> URL: https://issues.apache.org/jira/browse/HDFS-101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append, 0.20.1
>Reporter: Raghu Angadi
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20-append, 0.20.2, 0.21.0
>
> Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch, 
> detectDownDN2.patch, detectDownDN3-0.20-yahoo.patch, 
> detectDownDN3-0.20.patch, detectDownDN3.patch, 
> hdfs-101-branch-0.20-append-cdh3.txt, hdfs-101.tar.gz, 
> HDFS-101_20-append.patch, pipelineHeartbeat.patch, 
> pipelineHeartbeat_yahoo.patch
>
>
> When the first datanode's write to second datanode fails or times out 
> DFSClient ends up marking first datanode as the bad one and removes it from 
> the pipeline. Similar problem exists on DataNode as well and it is fixed in 
> HADOOP-3339. From HADOOP-3339 : 
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of 
> DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
> coarse control. We don't know what state the responder is in and interrupting 
> has different effects depending on responder state. To fix this properly we 
> need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should 
> properly read all the data left in the socket.. Also, DataNode's closing of 
> the socket should not result in a TCP reset, otherwise I think DFSClient will 
> not be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-16 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879479#action_12879479
 ] 

Hairong Kuang commented on HDFS-1057:
-

> they aren't guaranteed to be since there are methods to change the 
> bytesOnDisk separate from the lastCheckSum bytes.
I do not see any place that updates bytesOnDisk except for BlockReceiver. 
That's why I suggested to remove setBytesOnDisk from ReplicaInPipeline etc.

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> ---
>
> Key: HDFS-1057
> URL: https://issues.apache.org/jira/browse/HDFS-1057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: sam rash
>Priority: Blocker
> Attachments: conurrent-reader-patch-1.txt, 
> conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
> hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1214) hdfs client metadata cache

2010-06-16 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879466#action_12879466
 ] 

Joydeep Sen Sarma commented on HDFS-1214:
-

while a cache can be maintained on the application side - it's harder and it 
seems like the wrong place to implement the same. in the case of a query 
compiler - different compilation stages may be fetching metadata to figure out 
cost of query. furthermore, different queries may be compiled from the same jvm 
that end up requesting metadata for the same objects. 

the application can identify calls that can deal with out of date metadata. (so 
a separate api or an overlaid filesystem driver  or additional flags in the 
current api are all acceptable)

ideally the cache should be write-through (it's very common for a single jvm to 
be reading/writing the same fs object repeatedly).

> hdfs client metadata cache
> --
>
> Key: HDFS-1214
> URL: https://issues.apache.org/jira/browse/HDFS-1214
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Reporter: Joydeep Sen Sarma
>
> In some applications, latency is affected by the cost of making rpc calls to 
> namenode to fetch metadata. the most obvious case are calls to fetch 
> file/directory status. applications like hive like to make optimizations 
> based on file size/number etc. - and for such optimizations - 'recent' status 
> data (as opposed to most up-to-date) is acceptable. in such cases, a cache on 
> the DFS client that transparently caches metadata would be greatly benefit 
> applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1212) Harmonize HDFS JAR library versions with Common

2010-06-16 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HDFS-1212:


Attachment: HDFS-1212.patch

> Harmonize HDFS JAR library versions with Common
> ---
>
> Key: HDFS-1212
> URL: https://issues.apache.org/jira/browse/HDFS-1212
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: HDFS-1212.patch
>
>
> HDFS part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1214) hdfs client metadata cache

2010-06-16 Thread Joydeep Sen Sarma (JIRA)
hdfs client metadata cache
--

 Key: HDFS-1214
 URL: https://issues.apache.org/jira/browse/HDFS-1214
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Joydeep Sen Sarma


In some applications, latency is affected by the cost of making rpc calls to 
namenode to fetch metadata. the most obvious case are calls to fetch 
file/directory status. applications like hive like to make optimizations based 
on file size/number etc. - and for such optimizations - 'recent' status data 
(as opposed to most up-to-date) is acceptable. in such cases, a cache on the 
DFS client that transparently caches metadata would be greatly benefit 
applications.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-16 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879460#action_12879460
 ] 

sam rash commented on HDFS-1057:


1. they aren't guaranteed to be since there are methods to change the 
bytesOnDisk separate from the lastCheckSum bytes.  It's entirely conceivable 
that something could update the bytes on disk w/o updating the lastChecksum 
with the current set of methods

If we are ok with a loosely coupled guarantee, then we can use bytesOnDisk and 
be careful never to call setBytesOnDisk() for any RBW

2. oh--your previous comments indicated we shouldn't change either 
ReplicaInPipelineInterface or ReplicaInPipeline.  If that's not the case and we 
can do this, then my comment above doesn't hold.  we use bytesOnDisk and 
guarantee it's in sync with the checksum in a single synchronized method (I 
like this)

3. will make the update to treat missing last blocks as 0-length and re-instate 
the unit test.

thanks for all the help on this

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> ---
>
> Key: HDFS-1057
> URL: https://issues.apache.org/jira/browse/HDFS-1057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: sam rash
>Priority: Blocker
> Attachments: conurrent-reader-patch-1.txt, 
> conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
> hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-16 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879454#action_12879454
 ] 

Hairong Kuang commented on HDFS-1057:
-

Sam, the patch is in good shape. Thanks for working on this. A few minor 
comments: 
1. ReplicaBeingWritten.java: dataLength and bytesOnDisk are the same, right? We 
do not need to introduce another field dataLength. I am also hesitate to delare 
datalength & lastchecksum as volatible. Accesses to them are already 
synchronized  and the norm case is that writing without reading. 
2. We probably should remove setBytesOnDisk from ReplicaInPipelineInterface & 
ReplicaInPipeline.

> In 0.20, I made it so that client just treats this as a 0-length file. one of 
> our internal tools saw this rather frequently in 0.20.
Good to know this. Then could you please handle this case in the trunk the same 
as well? Thanks again, Sam.

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> ---
>
> Key: HDFS-1057
> URL: https://issues.apache.org/jira/browse/HDFS-1057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: sam rash
>Priority: Blocker
> Attachments: conurrent-reader-patch-1.txt, 
> conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
> hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1213) Implement an Apache Commons VFS Driver for HDFS

2010-06-16 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-1213:
---

Summary: Implement an Apache Commons VFS Driver for HDFS  (was: Implement a 
VFS Driver for HDFS)

> Implement an Apache Commons VFS Driver for HDFS
> ---
>
> Key: HDFS-1213
> URL: https://issues.apache.org/jira/browse/HDFS-1213
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Reporter: Michael D'Amour
> Attachments: pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, 
> pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar
>
>
> We have an open source ETL tool (Kettle) which uses VFS for many input/output 
> steps/jobs.  We would like to be able to read/write HDFS from Kettle using 
> VFS.  
>  
> I haven't been able to find anything out there other than "it would be nice."
>  
> I had some time a few weeks ago to begin writing a VFS driver for HDFS and we 
> (Pentaho) would like to be able to contribute this driver.  I believe it 
> supports all the major file/folder operations and I have written unit tests 
> for all of these operations.  The code is currently checked into an open 
> Pentaho SVN repository under the Apache 2.0 license.  There are some current 
> limitations, such as a lack of authentication (kerberos), which appears to be 
> coming in 0.22.0, however, the driver supports username/password, but I just 
> can't use them yet.
> I will be attaching the code for the driver once the case is created.  The 
> project does not modify existing hadoop/hdfs source.
> Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

2010-06-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-101:
-

Attachment: hdfs-101-branch-0.20-append-cdh3.txt

Hey Nicolas,

I just compared our two patches side by side. The one I've been testing with 
(and made a noticeable improvement in recovery detecting the correct down node 
in cluster failure testing) is attached. Here are a few differences I noticed 
(though maybe because the diffs are against different trees):

- Looks like your patch doesn't maintain wire compat when mirrorError is true, 
since it constructs a "replies" list with only 2 elements (not based on the 
number of downstream nodes)
- When receiving packets in BlockReceiver, I am explicitly forwarding 
HEART_BEAT packets where it looks like you're not checking for them. Have you 
verified by leaving a connection open with no data flowing that heartbeats are 
handled properly in BlockReceiver?

> DFS write pipeline : DFSClient sometimes does not detect second datanode 
> failure 
> -
>
> Key: HDFS-101
> URL: https://issues.apache.org/jira/browse/HDFS-101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append, 0.20.1
>Reporter: Raghu Angadi
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20.2, 0.21.0
>
> Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch, 
> detectDownDN2.patch, detectDownDN3-0.20-yahoo.patch, 
> detectDownDN3-0.20.patch, detectDownDN3.patch, 
> hdfs-101-branch-0.20-append-cdh3.txt, hdfs-101.tar.gz, 
> HDFS-101_20-append.patch, pipelineHeartbeat.patch, 
> pipelineHeartbeat_yahoo.patch
>
>
> When the first datanode's write to second datanode fails or times out 
> DFSClient ends up marking first datanode as the bad one and removes it from 
> the pipeline. Similar problem exists on DataNode as well and it is fixed in 
> HADOOP-3339. From HADOOP-3339 : 
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of 
> DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
> coarse control. We don't know what state the responder is in and interrupting 
> has different effects depending on responder state. To fix this properly we 
> need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should 
> properly read all the data left in the socket.. Also, DataNode's closing of 
> the socket should not result in a TCP reset, otherwise I think DFSClient will 
> not be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1213) Implement a VFS Driver for HDFS

2010-06-16 Thread Michael D'Amour (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879448#action_12879448
 ] 

Michael D'Amour commented on HDFS-1213:
---

Allen-  Sorry for any confusion, I am referring to Apache VFS 
(http://commons.apache.org/vfs/)

> Implement a VFS Driver for HDFS
> ---
>
> Key: HDFS-1213
> URL: https://issues.apache.org/jira/browse/HDFS-1213
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Reporter: Michael D'Amour
> Attachments: pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, 
> pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar
>
>
> We have an open source ETL tool (Kettle) which uses VFS for many input/output 
> steps/jobs.  We would like to be able to read/write HDFS from Kettle using 
> VFS.  
>  
> I haven't been able to find anything out there other than "it would be nice."
>  
> I had some time a few weeks ago to begin writing a VFS driver for HDFS and we 
> (Pentaho) would like to be able to contribute this driver.  I believe it 
> supports all the major file/folder operations and I have written unit tests 
> for all of these operations.  The code is currently checked into an open 
> Pentaho SVN repository under the Apache 2.0 license.  There are some current 
> limitations, such as a lack of authentication (kerberos), which appears to be 
> coming in 0.22.0, however, the driver supports username/password, but I just 
> can't use them yet.
> I will be attaching the code for the driver once the case is created.  The 
> project does not modify existing hadoop/hdfs source.
> Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1213) Implement a VFS Driver for HDFS

2010-06-16 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879447#action_12879447
 ] 

Arun C Murthy commented on HDFS-1213:
-

Michael, could you please upload this as a patch rather than a tarball?

> Implement a VFS Driver for HDFS
> ---
>
> Key: HDFS-1213
> URL: https://issues.apache.org/jira/browse/HDFS-1213
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Reporter: Michael D'Amour
> Attachments: pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, 
> pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar
>
>
> We have an open source ETL tool (Kettle) which uses VFS for many input/output 
> steps/jobs.  We would like to be able to read/write HDFS from Kettle using 
> VFS.  
>  
> I haven't been able to find anything out there other than "it would be nice."
>  
> I had some time a few weeks ago to begin writing a VFS driver for HDFS and we 
> (Pentaho) would like to be able to contribute this driver.  I believe it 
> supports all the major file/folder operations and I have written unit tests 
> for all of these operations.  The code is currently checked into an open 
> Pentaho SVN repository under the Apache 2.0 license.  There are some current 
> limitations, such as a lack of authentication (kerberos), which appears to be 
> coming in 0.22.0, however, the driver supports username/password, but I just 
> can't use them yet.
> I will be attaching the code for the driver once the case is created.  The 
> project does not modify existing hadoop/hdfs source.
> Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

2010-06-16 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879439#action_12879439
 ] 

Nicolas Spiegelberg commented on HDFS-101:
--

Todd, you're assumption is correct.  I needed a couple small things from the 
HDFS-793 patch (namely, getNumOfReplies) to make HDFS-101 compatible with 
HDFS-872.

> DFS write pipeline : DFSClient sometimes does not detect second datanode 
> failure 
> -
>
> Key: HDFS-101
> URL: https://issues.apache.org/jira/browse/HDFS-101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append, 0.20.1
>Reporter: Raghu Angadi
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20.2, 0.21.0
>
> Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch, 
> detectDownDN2.patch, detectDownDN3-0.20-yahoo.patch, 
> detectDownDN3-0.20.patch, detectDownDN3.patch, hdfs-101.tar.gz, 
> HDFS-101_20-append.patch, pipelineHeartbeat.patch, 
> pipelineHeartbeat_yahoo.patch
>
>
> When the first datanode's write to second datanode fails or times out 
> DFSClient ends up marking first datanode as the bad one and removes it from 
> the pipeline. Similar problem exists on DataNode as well and it is fixed in 
> HADOOP-3339. From HADOOP-3339 : 
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of 
> DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
> coarse control. We don't know what state the responder is in and interrupting 
> has different effects depending on responder state. To fix this properly we 
> need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should 
> properly read all the data left in the socket.. Also, DataNode's closing of 
> the socket should not result in a TCP reset, otherwise I think DFSClient will 
> not be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1213) Implement a VFS Driver for HDFS

2010-06-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879437#action_12879437
 ] 

Allen Wittenauer commented on HDFS-1213:


Do you mean VFS as in the Linux virtual file system kernel API or some other 
VFS?

> Implement a VFS Driver for HDFS
> ---
>
> Key: HDFS-1213
> URL: https://issues.apache.org/jira/browse/HDFS-1213
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Reporter: Michael D'Amour
> Attachments: pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, 
> pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar
>
>
> We have an open source ETL tool (Kettle) which uses VFS for many input/output 
> steps/jobs.  We would like to be able to read/write HDFS from Kettle using 
> VFS.  
>  
> I haven't been able to find anything out there other than "it would be nice."
>  
> I had some time a few weeks ago to begin writing a VFS driver for HDFS and we 
> (Pentaho) would like to be able to contribute this driver.  I believe it 
> supports all the major file/folder operations and I have written unit tests 
> for all of these operations.  The code is currently checked into an open 
> Pentaho SVN repository under the Apache 2.0 license.  There are some current 
> limitations, such as a lack of authentication (kerberos), which appears to be 
> coming in 0.22.0, however, the driver supports username/password, but I just 
> can't use them yet.
> I will be attaching the code for the driver once the case is created.  The 
> project does not modify existing hadoop/hdfs source.
> Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >