[jira] Commented: (HDFS-86) Corrupted blocks get deleted but not replicated

2010-08-19 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900342#action_12900342
 ] 

Thanh Do commented on HDFS-86:
--

i have a cluster of two nodes. Say a block with 2 replicas, and one of them get 
corrupted.
The corrupted block is reported to NN, but it is never deleted or replicated, 
even after NN restarts.
Not sure this is a bug or just a policy.
I am playing the append-trunk

 Corrupted blocks get deleted but not replicated
 ---

 Key: HDFS-86
 URL: https://issues.apache.org/jira/browse/HDFS-86
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: blockInvalidate.patch


 When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that 
 dfs correctly delete the corrupted replica and successfully retry reading 
 from the other correct replica, but the block does not get replicated. The 
 block remains with only 1 replica until the next block report comes in.
 In my testcase, since the dfs cluster has only 2 datanodes, the target of 
 replication is the same as the target of block invalidation.  After poking 
 the logs, I found out that the namenode sent the replication request before 
 the block invalidation request. 
 This is because the namenode does not invalidate a block well. In 
 FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue 
 and then immediately removes the replica from its state, which triggers the 
 choosing a target for the block. When requests are sent back to the target 
 datanode as a reply to a heartbeat message, the replication requests have 
 higher priority than the invalidate requests.
 This problem could be solved if a namenode removes an invalidated replica 
 from its state only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-86) Corrupted blocks get deleted but not replicated

2010-08-19 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900357#action_12900357
 ] 

Hairong Kuang commented on HDFS-86:
---

This jira is too old. It should be closed.

Now HDFS has a different policy with corrupt replicas. A corrupt replica does 
not get deleted until a good replica gets replicated. 

The problem you have is caused by the 2-node cluster. Because it does not an 
extra node to place the good replica, the corrupt one never gets deleted. If 
you add one more node to the cluster, the problem will go away. 

 Corrupted blocks get deleted but not replicated
 ---

 Key: HDFS-86
 URL: https://issues.apache.org/jira/browse/HDFS-86
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: blockInvalidate.patch


 When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that 
 dfs correctly delete the corrupted replica and successfully retry reading 
 from the other correct replica, but the block does not get replicated. The 
 block remains with only 1 replica until the next block report comes in.
 In my testcase, since the dfs cluster has only 2 datanodes, the target of 
 replication is the same as the target of block invalidation.  After poking 
 the logs, I found out that the namenode sent the replication request before 
 the block invalidation request. 
 This is because the namenode does not invalidate a block well. In 
 FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue 
 and then immediately removes the replica from its state, which triggers the 
 choosing a target for the block. When requests are sent back to the target 
 datanode as a reply to a heartbeat message, the replication requests have 
 higher priority than the invalidate requests.
 This problem could be solved if a namenode removes an invalidated replica 
 from its state only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-86) Corrupted blocks get deleted but not replicated

2010-08-19 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-86.
---

Resolution: Invalid

 Corrupted blocks get deleted but not replicated
 ---

 Key: HDFS-86
 URL: https://issues.apache.org/jira/browse/HDFS-86
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: blockInvalidate.patch


 When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that 
 dfs correctly delete the corrupted replica and successfully retry reading 
 from the other correct replica, but the block does not get replicated. The 
 block remains with only 1 replica until the next block report comes in.
 In my testcase, since the dfs cluster has only 2 datanodes, the target of 
 replication is the same as the target of block invalidation.  After poking 
 the logs, I found out that the namenode sent the replication request before 
 the block invalidation request. 
 This is because the namenode does not invalidate a block well. In 
 FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue 
 and then immediately removes the replica from its state, which triggers the 
 choosing a target for the block. When requests are sent back to the target 
 datanode as a reply to a heartbeat message, the replication requests have 
 higher priority than the invalidate requests.
 This problem could be solved if a namenode removes an invalidated replica 
 from its state only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-08-19 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900465#action_12900465
 ] 

Hairong Kuang commented on HDFS-1346:
-

Todd, yours is missing this patch: 
https://issues.apache.org/jira/secure/attachment/12439379/pipelineHeartbeat.patch.
 HDFS-101 says that  it fixes a bug of incorrect handle of pipeline heartbeat 
in yahoo's hadoop security branch 0.20.  But I did not put the bug description 
there.

Koji, do you still remember what exact problem that pipelineHeartbeat.patch is 
fixed?

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: blockrecv-diff.txt, outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1347) TestDelegationToken uses mortbay.log for logging

2010-08-19 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900467#action_12900467
 ] 

Boris Shkolnik commented on HDFS-1347:
--

ran tests manually, ant test - passed.

 TestDelegationToken uses mortbay.log for logging
 

 Key: HDFS-1347
 URL: https://issues.apache.org/jira/browse/HDFS-1347
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1347.patch


 needs to be changed to commons.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1320) Add LOG.isDebugEnabled() guard for each LOG.debug(...)

2010-08-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900484#action_12900484
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1320:
--

 does the JVM not optimize for this case in the fast-path? 

Hi Ryan, from the benchmark results 
[here|https://issues.apache.org/jira/browse/HADOOP-6884?focusedCommentId=12900087page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12900087],
 it does not seem JVM optimized this.  I think JVM cannot do anything in 
general since parameter evaluation may have side-effect.  It is hard for the 
JVM to determine whether it is safe to skip those instructions.

 Add LOG.isDebugEnabled() guard for each LOG.debug(...)
 

 Key: HDFS-1320
 URL: https://issues.apache.org/jira/browse/HDFS-1320
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Erik Steffl
Assignee: Erik Steffl
 Fix For: 0.22.0

 Attachments: HDFS-1320-0.22-1.patch, HDFS-1320-0.22-2.patch, 
 HDFS-1320-0.22.patch


 Each LOG.debug(...) should be executed only if LOG.isDebugEnabled() is 
 true, in some cases it's expensive to construct the string that is being 
 printed to log. It's much easier to always use LOG.isDebugEnabled() because 
 it's easier to check (rather than in each case reason wheather it's 
 neccessary or not).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1348) DecommissionManager holds fsnamesystem lock during the whole process of checking if decomissioning DataNodes are finished or not

2010-08-19 Thread Hairong Kuang (JIRA)
DecommissionManager holds fsnamesystem lock during the whole process of 
checking if decomissioning DataNodes are finished or not


 Key: HDFS-1348
 URL: https://issues.apache.org/jira/browse/HDFS-1348
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0


NameNode normally is busy all the time. Its log is full of activities every 
second. But once for a while, NameNode seems to pause for more than 10 seconds 
without doing anything, leaving a blank in its log even though no garbage 
collection is happening.

One culprit is DecommionManager. Its monitor holds the fsynamesystem lock 
during the whole process of checking if decomissioning DataNodes are finished 
or not, during which it checks every block of up to a default of 5 datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

2010-08-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900499#action_12900499
 ] 

Todd Lipcon commented on HDFS-1073:
---

Hey Sanjay,

Thanks for reviving this. The notes you wrote above seem accurate.

Couple of questions:

bq. while writing edit logs to multiple files, a failure of the th system can 
result in different amounts of data written to each file - the tid allows one 
to pick one with the most tranasactions.

Isn't this also doable by just seeing which as more non-zero bytes? ie seek to 
the end of the file, scan backwards through the 0 bytes, and stop. Whichever 
valid log is longer wins. Even in the case with the transaction-id, you have to 
do something like this for a few reasons: a) we'd rather scan backward from the 
end of the edit log than forward from the beginning, since it's going to be a 
faster startup, and b) even if we see a higher transaction id header on the 
last entry, that entry might have been incompletely written to the file, so we 
still have to verify that it deserializes correctly.

bq. Main disadvantage is that the editlogs will be little bigger.

So are you suggesting that each edit will include a header with the transaction 
ID in it? Isn't this redundant if the header of the whole edit file has the 
starting txid -- ie is there ever a case where we'd skip a txid?

bq. In order to do an offline fsck one can needs to dump the block map; clearly 
one does not want to the local the system to do an atomic dump. The transaction 
id of when the dump is started can be written in the dump to allow the fsck to 
report consistently.

Sorry, can you elaborate a little bit here? In order to get a consistent dump 
of the block map don't we need to take the FSN lock and thus stall all 
operations? Is the idea that the BackupNode would do the blockmap dump offline 
since it can hold a lock for some time without stalling clients? If that's the 
case, what's the purpose of the offline nature of the fsck instead of just 
having BackupNode allow fsck to point directly at it and access memory under 
the same lock?

Mahadev said:
bq. Is it the minimum set of code changes that is making you guys reject on the 
txn based snapshots and logging?

I don't think either way has been decided/rejected yet. What you're saying has 
been my view - that doing txid based is a bigger change, since we have to 
introduce the txid concept and add extra code that allows replaying partial 
edit log files (ie a subrange of the edits within). But it's certainly doable 
and Sanjay has presented some good advantages.


 Simpler model for Namenode's fs Image and edit Logs 
 

 Key: HDFS-1073
 URL: https://issues.apache.org/jira/browse/HDFS-1073
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia
Assignee: Todd Lipcon
 Attachments: hdfs1073.pdf


 The naming and handling of  NN's fsImage and edit logs can be significantly 
 improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1347) TestDelegationToken uses mortbay.log for logging

2010-08-19 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1347:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

committed to trunk

 TestDelegationToken uses mortbay.log for logging
 

 Key: HDFS-1347
 URL: https://issues.apache.org/jira/browse/HDFS-1347
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1347.patch


 needs to be changed to commons.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-535) TestFileCreation occasionally fails because of an exception in DataStreamer.

2010-08-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-535:
-

Status: Open  (was: Patch Available)

This patch does indeed cause testFsCloseAfterClusterShutdown to fail for me:
{noformat}4893 Testcase: testFsClose took 3.137 sec
4894 Testcase: testFsCloseAfterClusterShutdown took 2.751 sec
4895   FAILED
4896 Failed to close file after cluster shutdown
4897 junit.framework.AssertionFailedError: Failed to close file after cluster 
shutdown
4898   at 
org.apache.hadoop.hdfs.TestFileCreation.testFsCloseAfterClusterShutdown(TestFileCreation.java:851){noformat}
Canceling patch for Konstanin to update, although I don't believe we've seen 
this problem for a while, so may we can just close this issue?

 TestFileCreation occasionally fails because of an exception in DataStreamer.
 

 Key: HDFS-535
 URL: https://issues.apache.org/jira/browse/HDFS-535
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, test
Affects Versions: 0.20.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: TestFileCreate.patch


 One of test cases, namely {{testFsCloseAfterClusterShutdown()}}, of 
 {{TestFileCreation}} fails occasionally.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-718) configuration parameter to prevent accidental formatting of HDFS filesystem

2010-08-19 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900580#action_12900580
 ] 

Jakob Homan commented on HDFS-718:
--

I'm +1.  Once an Hadoop cluster is up and running in production it can 
potentially hold very critical and valuable information.  An extra, optional 
safeguard that saves one such cluster and doesn't add any serious complexity to 
the code is worth it.  A steadystate cluster is a very valuable thing...

 configuration parameter to prevent accidental formatting of HDFS filesystem
 ---

 Key: HDFS-718
 URL: https://issues.apache.org/jira/browse/HDFS-718
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
 Environment: Any
Reporter: Andrew Ryan
Assignee: Andrew Ryan
Priority: Minor
 Attachments: HDFS-718.patch-2.txt, HDFS-718.patch.txt


 Currently, any time the NameNode is not running, an HDFS filesystem will 
 accept the 'format' command, and will duly format itself. There are those of 
 us who have multi-PB HDFS filesystems who are really quite uncomfortable with 
 this behavior. There is Y/N confirmation in the format command, but if the 
 formatter genuinely believes themselves to be doing the right thing, the 
 filesystem will be formatted.
 This patch adds a configuration parameter to the namenode, 
 dfs.namenode.support.allowformat, which defaults to true, the current 
 behavior: always allow formatting if the NameNode is down or some other 
 process is not holding the namenode lock. But if 
 dfs.namenode.support.allowformat is set to false, the NameNode will not 
 allow itself to be formatted until this config parameter is changed to true.
 The general idea is that for production HDFS filesystems, the user would 
 format the HDFS once, then set dfs.namenode.support.allowformat to false 
 for all time.
 The attached patch was generated against trunk and +1's on my test machine. 
 We have a 0.20 version that we are using in our cluster as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.