[jira] [Commented] (HDFS-1195) Offer rate limits for replicating data

2015-01-07 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267740#comment-14267740
 ] 

Cosmin Lehene commented on HDFS-1195:
-

[~kevinweil] is this still valid?

 Offer rate limits for replicating data 
 ---

 Key: HDFS-1195
 URL: https://issues.apache.org/jira/browse/HDFS-1195
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 0.20.2
 Environment: Linux, Hadoop 0.20.1 CDH
Reporter: Kevin Weil

 If a rack of Hadoop nodes goes down, there is a lot of data to re-replicate.  
 It would be great to have a configuration option to rate-limit the amount of 
 bandwidth used for re-replication so as not to saturate network backlinks.  
 There is a similar option for rate limiting the speed at which a DFS 
 rebalance takes place: dfs.balance.bandwidthPerSec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-06-04 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875519#action_12875519
 ] 

Cosmin Lehene commented on HDFS-630:


There's a patch for 0.20 adapted by tlipcon. Can we use that?

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Fix For: 0.21.0

 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException

2010-03-30 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-1024:


Affects Version/s: 0.21.0
   0.20.3
   0.20.1
   0.20.2
Fix Version/s: 0.21.0

Adding effected versions

 SecondaryNamenode fails to checkpoint because namenode fails with 
 CancelledKeyException
 ---

 Key: HDFS-1024
 URL: https://issues.apache.org/jira/browse/HDFS-1024
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
Priority: Blocker
 Fix For: 0.20.3, 0.21.0, 0.22.0

 Attachments: HDFS-1024.patch, HDFS-1024.patch.1


 The secondary namenode fails to retrieve the entire fsimage from the 
 Namenode. It fetches a part of the fsimage but believes that it has fetched 
 the entire fsimage file and proceeds ahead with the checkpointing. Stack 
 traces will be attached below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-16 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834158#action_12834158
 ] 

Cosmin Lehene commented on HDFS-909:


@Todd what's the state of this patch? This happens more often than I initially 
thought. Just hit it again.

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
 hdfs-909.txt, hdfs-909.txt


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-16 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834264#action_12834264
 ] 

Cosmin Lehene commented on HDFS-909:


@Todd,

Thanks! We're using 0.21 :)

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
 hdfs-909.txt, hdfs-909.txt


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-02 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-909:
---

Description: 
closing the edits log file can race with write to edits log file operation 
resulting in OP_INVALID end-of-file marker being initially overwritten by the 
concurrent (in setReadyToFlush) threads and then removed twice from the buffer, 
losing a good byte from edits log.

Example:
{code}
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.flush() - 
EditLogFileOutputStream.flushAndSync()
OR
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.setReadyToFlush() 
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()

VERSUS

FSNameSystem.completeFile - FSEditLog.logSync() - 
EditLogOutputStream.setReadyToFlush()
FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.flush() 
- EditLogFileOutputStream.flushAndSync()
OR 
Any FSEditLog.write
{code}

Access on the edits flush operations is synchronized only in the 
FSEdits.logSync() method level. However at a lower level access to 
EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
synchronized. These can be called from concurrent threads like in the example 
above

So if a rollEditLog or rollFSIMage is happening at the same time with a write 
operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
end-of-file marker) and then remove it twice (from each thread) in 
flushAndSync()! Hence there will be a valid byte missing from the edits log 
that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
cluster restart. 

We got to this point after investigating a corrupted edits file that made HDFS 
unable to start with 

{code:title=namenode.log}
java.io.IOException: Incorrect data format. logVersion is -20 but 
writables.length is 768. 
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
{code}

EDIT: moved the logs to a comment to make this readable


  was:
closing the edits log file can race with write to edits log file operation 
resulting in OP_INVALID end-of-file marker being initially overwritten by the 
concurrent (in setReadyToFlush) threads and then removed twice from the buffer, 
losing a good byte from edits log.

Example:
{code}
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.flush() - 
EditLogFileOutputStream.flushAndSync()
OR
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.setReadyToFlush() 
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()

VERSUS

FSNameSystem.completeFile - FSEditLog.logSync() - 
EditLogOutputStream.setReadyToFlush()
FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.flush() 
- EditLogFileOutputStream.flushAndSync()
OR 
Any FSEditLog.write
{code}

Access on the edits flush operations is synchronized only in the 
FSEdits.logSync() method level. However at a lower level access to 
EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
synchronized. These can be called from concurrent threads like in the example 
above

So if a rollEditLog or rollFSIMage is happening at the same time with a write 
operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
end-of-file marker) and then remove it twice (from each thread) in 
flushAndSync()! Hence there will be a valid byte missing from the edits log 
that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
cluster restart. 

We got to this point after investigating a corrupted edits file that made HDFS 
unable to start with 

{code:title=namenode.log}
java.io.IOException: Incorrect data format. logVersion is -20 but 
writables.length is 768. 
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
{code}

In the edits file we found the first 2 entries:
{code:title=edits}

[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-02 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828543#action_12828543
 ] 

Cosmin Lehene commented on HDFS-909:


@Konstantin I moved the details to a comment and broke it on more lines but I 
missed the log entry that really messes up the layout. I can't edit the 
comment, unfortunately so if you can please break the log entry lines in my 
comment to have decent layout on this page. Sorry and thanks. 

PS I'll look at the code again to see the race issue you described. 

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-26 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804947#action_12804947
 ] 

Cosmin Lehene commented on HDFS-630:


I'm glad it finally got in both 0.21 and trunk. It was a long lived issue. 
Thanks for the support! :)

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Fix For: 0.21.0, 0.22.0

 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-01-20 Thread Cosmin Lehene (JIRA)
Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
operations  corrupts edits log
-

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Priority: Blocker
 Fix For: 0.21.0, 0.22.0


closing the edits log file can race with write to edits log file operation 
resulting in OP_INVALID end-of-file marker being initially overwritten by the 
concurrent (in setReadyToFlush) threads and then removed twice from the buffer, 
losing a good byte from edits log.

Example:
{code}
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
FSEditLog.closeStream() - EditLogOutputStream.flush() - 
EditLogFileOutputStream.flushAndSync()
OR
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.setReadyToFlush() 
FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() 
- FSEditLog.revertFileStreams() - FSEditLog.closeStream() 
-EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()

VERSUS

FSNameSystem.completeFile - FSEditLog.logSync() - 
EditLogOutputStream.setReadyToFlush()
FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.flush() 
- EditLogFileOutputStream.flushAndSync()
OR 
Any FSEditLog.write
{code}

Access on the edits flush operations is synchronized only in the 
FSEdits.logSync() method level. However at a lower level access to 
EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
synchronized. These can be called from concurrent threads like in the example 
above

So if a rollEditLog or rollFSIMage is happening at the same time with a write 
operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
end-of-file marker) and then remove it twice (from each thread) in 
flushAndSync()! Hence there will be a valid byte missing from the edits log 
that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
cluster restart. 

We got to this point after investigating a corrupted edits file that made HDFS 
unable to start with 

{code:title=namenode.log}
java.io.IOException: Incorrect data format. logVersion is -20 but 
writables.length is 768. 
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
{code}

In the edits file we found the first 2 entries:
{code:title=edits}
FFEC090005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D31323631303832363238303934000836373130383836340003F6CBB87EF376E3E604039665F9549DE069A5735E04039665ADCC71A050B16ABF015A179A00039665066861646F6F700A737570657267726F757001010003003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F3833323831343837313930373033313732373900352F68626173652F64656D6F5F5F75736572732F3336343035313634362F746573742F36393137333831323838333034343734333836000D3132363130383236333138363902
...
{code}
This is the completeFile operation that's missing the last byte 

{code:title=completeFile}
FFEC090005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D31323631303832363238303934000836373130383836340003F6CBB87EF376E3E604039665F9549DE069A5735E04039665ADCC71A050B16ABF015A179A00039665066861646F6F700A737570657267726F757001??
{code}
followed by a rename operation

{code:Title=rename}
010003003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F3833323831343837313930373033313732373900352F68626173652F64656D6F5F5F75736572732F3336343035313634362F746573742F36393137333831323838333034343734333836000D31323631303832363331383639
{code}

The first byte of the rename was instead read as part of the completeFile() 
operation. This resulted in reading the next operation as 0x00 (OP_ADD) 
followed by an int (length) which would have been 0x030 which is 768 that 
was read and failed in the following code

{code:Title=FSEditLog.java}
 case OP_ADD:
case OP_CLOSE: {
  // versions  0 support per file replication
  // get name and replication
  int length = in.readInt();
  

[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-01-20 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802941#action_12802941
 ] 

Cosmin Lehene commented on HDFS-909:


Hi Todd, 

I haven't check yet, so it's possible to be on 0.20 as well. I forgot to add 
that the issue particularly nasty because it will first fail silently. In our 
case, the log was corrupted on 17th of December but we only discovered it 
yesterday when we restarted HDFS. It can be detected early by monitoring the 
secondary-namenode.out log file.

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Priority: Blocker
 Fix For: 0.21.0, 0.22.0


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 In the edits file we found the first 2 entries:
 {code:title=edits}
 FFEC090005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D31323631303832363238303934000836373130383836340003F6CBB87EF376E3E604039665F9549DE069A5735E04039665ADCC71A050B16ABF015A179A00039665066861646F6F700A737570657267726F757001010003003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F3833323831343837313930373033313732373900352F68626173652F64656D6F5F5F75736572732F3336343035313634362F746573742F36393137333831323838333034343734333836000D3132363130383236333138363902
 ...
 {code}
 This is the completeFile operation that's missing the last byte 
 {code:title=completeFile}
 FFEC090005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D31323631303832363238303934000836373130383836340003F6CBB87EF376E3E604039665F9549DE069A5735E04039665ADCC71A050B16ABF015A179A00039665066861646F6F700A737570657267726F757001??
 {code}
 followed by a rename operation
 {code:Title=rename}
 

[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-18 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-18 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Open  (was: Patch Available)

tests fail erratically canceling again

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-0.21-svn-2.patch

attaching 0.21 patch with javadoc link fixed

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-trunk-svn-4.patch

patch for trunk with javadoc link fixed. 
the TestFiHFlush test that failed previously seems to work fine when running 
tests using ant - so nothing done regarding that.

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Open  (was: Patch Available)

Canceling to restart build

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

Trying the trunk patch one more time. I dont' exactly know how to trigger a 
0.21 patch/build

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2010-01-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

I have an it runs on my machine feeling. Trying once more

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, 
 hdfs-630-0.20.txt, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-12-26 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-0.21-svn-1.patch
0001-Fix-HDFS-630-trunk-svn-3.patch

New patches for 0.21 and trunk. ClientProtcol versionID is 53L for 0.21 54L for 
trunk. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 
 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 
 HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-12-22 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793640#action_12793640
 ] 

Cosmin Lehene commented on HDFS-630:


@stack unfortunately, no. The patch needs to be changed for trunk. 
{code:title=ClientProtocol.java}
Index: src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
===
--- src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
(revision 891402)
+++ src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
(working copy)
@@ -44,9 +44,9 @@
* Compared to the previous version the following changes have been 
introduced:
* (Only the latest change is reflected.
* The log of historical changes can be retrieved from the svn).
-   * 50: change LocatedBlocks to include last block information.
+   * 51: changed addBlock to include a list of excluded datanodes.
*/
-  public static final long versionID = 50L;
+  public static final long versionID = 51L;
{code}

The versionID in 0.21 changes from 50L to 51L. The problem is that on trunk is 
already 52L so it should probably change it from 52L to 53L. This could be, 
however ignored on trunk and changed independently. I'm not sure what's the 
right approach. I could create another patch for trunk, however this would just 
poise versionID meaningless - It's 51L on 0.21, but on trunk 51L is something 
else. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, name-node
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
 Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-12-16 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-0.21-svn.patch

new patch for 0.21

removed previous addBlock method
changed ClientProtocol version
changed log level in DFSClient to debug for the node exclusion operation
refactored TestDFSClientExcludedNodes to junit4


 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Cosmin Lehene
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 
 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-23 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Open  (was: Patch Available)

Can't see that build issue locally and can't figure out what caused it on the 
build server. Trying once more time

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-23 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-20 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-trunk-svn-2.patch

I reformatted the code a little, trying to stay close to the files it changes. 
There's no consistent style across files however.

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-18 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-trunk-svn-1.patch

last patch doesn't apply on trunk after the commit for HDFS-764. Here's a new 
patch for trunk that also fix the previous javac warning

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
 HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-svn.patch

fixed old method in NameNode.addBlock
it returned addBlock(src, clientName, null, null); instead of addBlock(src, 
clientName, previous, null);
and when called it never committed previous block. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

Fix for 0.21 and trunk. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-16 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-svn.patch

I've 
patch -p1  0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch
svn add src/test/hdfs/org/apache/hadoop/hdfs/TestDFSClientExcludedNodes.java
svn diff  0001-Fix-HDFS-630-svn.patch

I really hope this works. It appears there's no easy way to generate a patch 
from git and have it working in this setup. 

Dhruba: if it still won't work, please run the patch with -p1 and then generate 
a patch that will work. 
By the way, a unit test is included with the last 3 patches.


 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-14 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch

The patch applies on trunk as well. However since it's a git patch I guess it 
caused some confusion. Here is the unified patch.

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Fix For: 0.21.0

 Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
 0001-Fix-HDFS-630-for-0.21.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-13 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777500#action_12777500
 ] 

Cosmin Lehene commented on HDFS-630:


stack: I can't reproduce it on 0.21. I did find it in the NN log before 
upgrading the HBase jar to the patched hdfs. 

java.io.IOException: Cannot complete block: block has not been COMMITTED by the 
client
at 
org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction.convertToCompleteBlock(BlockInfoUnderConstruction.java:158)
at 
org.apache.hadoop.hdfs.server.namenode.BlockManager.completeBlock(BlockManager.java:288)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1243)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:637)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)
at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)

I should point that 
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)

line 621 in the NameNode means it was called from an unpached DFSClient that 
calls the old NameNode interface
line 621: return addBlock(src, clientName, null, null); 

This is part of  public LocatedBlock addBlock(String src, String clientName, 
Block previous)

  @Override
  public LocatedBlock addBlock(String src, String clientName,
   Block previous)
throws IOException {
return addBlock(src, clientName, null, null);
  }

This is different than your stacktrace http://pastie.org/695936 that calls the 
complete() method. 

However could you search for the same error while adding a new block with 
addBlock() (like mine)? If you find it, you could figure out what's the entry 
point in NameNode, and if it's line 621 you might have a an unpatched 
DFSClient. 

However, even with an unpatched DFSClient I still fail, yet, to figure out why 
would it cause it. Perhaps I should get a better understanding of the cause of 
the exception. So far, from the code comments in BlockInfoUnderConstruction I 
have that
the state of the block  (the generation stamp and the length) has not been 
committed by the client or it does not have at least a minimal number of 
replicas reported from data-nodes. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Fix For: 0.21.0

 Attachments: 0001-Fix-HDFS-630-for-0.21.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-10-15 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Affects Version/s: (was: 0.20.1)
   Status: Patch Available  (was: Open)

Adapted for 0.21 branch. 

Added excludedNodes back to BlockPlacementPolicy. 
Adapted to use HashMapNode, Node instead of ListNode since 
BlockPlacementPolicyDefault was changed to use HashMap. However I'm not sure if 
it's supposed to be a HashMap... 
Luckily, Dhruba didn't removed the code that dealt with excludedNodes from 
BlockPlacementPolicyDefault so I only had to wire up the methods.


I also added a unit test - it's practically a functional test that spins up a 
DFSMiniCluster with 3 DataNodes and kills one before creating the file. 

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Fix For: 0.21.0

 Attachments: HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-10-15 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-for-0.21.patch

 In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
 datanodes when locating the next block.
 ---

 Key: HDFS-630
 URL: https://issues.apache.org/jira/browse/HDFS-630
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Ruyue Ma
Assignee: Ruyue Ma
Priority: Minor
 Fix For: 0.21.0

 Attachments: 0001-Fix-HDFS-630-for-0.21.patch, HDFS-630.patch


 created from hdfs-200.
 If during a write, the dfsclient sees that a block replica location for a 
 newly allocated block is not-connectable, it re-requests the NN to get a 
 fresh set of replica locations of the block. It tries this 
 dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
 each retry ( see DFSClient.nextBlockOutputStream).
 This setting works well when you have a reasonable size cluster; if u have 
 few datanodes in the cluster, every retry maybe pick the dead-datanode and 
 the above logic bails out.
 Our solution: when getting block location from namenode, we give nn the 
 excluded datanodes. The list of dead datanodes is only for one block 
 allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.