[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread Yajun Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830480#action_12830480
 ] 

Yajun Dong commented on HDFS-951:
-

> Also the file will leave in incomplete/being create state if that DfsClient 
> instance does not get a chance to close.
> But it will be not nice at all for users with respect their user experience. 
Agreed. 

I come across this problem, DFSClient should complete those creating files when 
IOExceptions encoutered. 

But Normally we could not complete these files successfully before all blocks 
of creating files are reported to NameNode, In short: the last block is in 
indeterminate state at this time.

in this case, the only one option I think is delete the last failing block and 
then complete/close the file.

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830479#action_12830479
 ] 

He Yongqiang commented on HDFS-951:
---

>>If all of the datanodes fail, the last block is in an indeterminate state
Yes, i knew this is main problem.  i think there should be some policy for 
handling that.
>>If you wait an hour for the hard lease limit
i am not sure if i get what i expect after one hour. But it will be not nice at 
all for users with respect their user experience. for example, if you upload a 
file to a website and you get to know sth after 1 hour, how that would be? 
(this is just an example, and of course there are workarounds for this example.)

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-935) Real user in delegation token.

2010-02-05 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830460#action_12830460
 ] 

Kan Zhang commented on HDFS-935:


See my comments in MAPREDUCE-1464.

> Real user in delegation token.
> --
>
> Key: HDFS-935
> URL: https://issues.apache.org/jira/browse/HDFS-935
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-935.3.patch, HDFS-935.5.patch, HDFS-935.6.patch
>
>
> Delegation Token should also contain the real user which got it issues in 
> behalf of an effective user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830456#action_12830456
 ] 

Todd Lipcon commented on HDFS-951:
--

I'm not quite understanding what you're describing, I think.

If all of the datanodes fail, the last block is in an indeterminate state - we 
don't know what length ever made it to the DNs, so we can't really close the 
file properly. I suppose we could use the length from the last acked seqno, but 
the DNs will still have the replicas in the "rbw" state. There is some kind of 
state transition for recovery of rbw replicas described in the HDFS-265 
document - I don't recall off the top of my head if it will function if none of 
the DNs are up.

If you wait an hour for the hard lease limit, does the file end up in some kind 
of state that you expect?

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830448#action_12830448
 ] 

He Yongqiang commented on HDFS-951:
---

Btw, the code line in the description is from trunk. It seems the problem is 
still there in the client code.

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830447#action_12830447
 ] 

He Yongqiang commented on HDFS-951:
---

@Todd,
The problem was seen in hadoop-0.19.2, not sure about 0.20.1

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-935) Real user in delegation token.

2010-02-05 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830435#action_12830435
 ] 

Kan Zhang commented on HDFS-935:


+1, pending hudson.

> Real user in delegation token.
> --
>
> Key: HDFS-935
> URL: https://issues.apache.org/jira/browse/HDFS-935
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-935.3.patch, HDFS-935.5.patch, HDFS-935.6.patch
>
>
> Delegation Token should also contain the real user which got it issues in 
> behalf of an effective user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-935) Real user in delegation token.

2010-02-05 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-935:
--

Attachment: HDFS-935.6.patch

> Real user in delegation token.
> --
>
> Key: HDFS-935
> URL: https://issues.apache.org/jira/browse/HDFS-935
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-935.3.patch, HDFS-935.5.patch, HDFS-935.6.patch
>
>
> Delegation Token should also contain the real user which got it issues in 
> behalf of an effective user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830421#action_12830421
 ] 

Todd Lipcon commented on HDFS-200:
--

That seems like a bit of a tenuous assumption. I agree that it's currently 
true, but just seems like "coincidence" :) What about checking:

{code}
LocatedBlock lastLocated = newInfo.get(newInfo.locatedBlockCount() - 1);
if (lastLocated.getStartOffset() + lastLocated.getBlockSize() == 
newInfo.getFileLength()) {
  // the last block we located is the last block in the file
  ...
}
{code}

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830417#action_12830417
 ] 

dhruba borthakur commented on HDFS-200:
---

I meant "For a HDFS file, only the last block can be partial in size".

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830416#action_12830416
 ] 

dhruba borthakur commented on HDFS-200:
---

Hi Todd, I think your point of concern deems investigation. I will post a new 
patch pretty soon. Thanks for looking over this one.

One option is to check if the size of the lastBlock in newInfo is lesser than 
the size of the previous block.  if so, then it means that we have reached the 
last block. of the file; only then trigger the code to retrieve the  last block 
size from the datanode. (For a HDFS file, all blocks except the last one can be 
partial.)


> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-737) Improvement in metasave output

2010-02-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830404#action_12830404
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-737:
-

+1  HDFS-737.3.rel20.patch looks good.

> Improvement in metasave output
> --
>
> Key: HDFS-737
> URL: https://issues.apache.org/jira/browse/HDFS-737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.21.0, 0.22.0
>
> Attachments: HDFS-737.1.patch, HDFS-737.2.patch, HDFS-737.3.patch, 
> HDFS-737.3.rel20.patch
>
>
> This jira tracks following improvements in metasave output
> 1. A summary of total files/directories and blocks at the begining
> 2. Full path name of the files should also be written out for 
> under-replicated/corrupt or missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830399#action_12830399
 ] 

Konstantin Shvachko commented on HDFS-909:
--

> NN is in safe mode, so there shouldn't be any new edits coming in in the 
> first place.

True no new edits, but those started before entering safemode can still run.

> what if the order of storage dirs is EDITS then IMAGE, so we kill our current 
> edit log, and then crash before saving the current image?

This is a reasonable concern. 
Historically fsimage and edits used to be always in the same directory, so we 
first saved the image 
in the directory and then reset the edits. Now the order may change, and this 
was not taken care of.
We should file a new jira for fixing this.

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
> operations  corrupts edits log
> -
>
> Key: HDFS-909
> URL: https://issues.apache.org/jira/browse/HDFS-909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
> Environment: CentOS
>Reporter: Cosmin Lehene
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
> Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
> hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation 
> resulting in OP_INVALID end-of-file marker being initially overwritten by the 
> concurrent (in setReadyToFlush) threads and then removed twice from the 
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the 
> FSEdits.logSync() method level. However at a lower level access to 
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
> synchronized. These can be called from concurrent threads like in the example 
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write 
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
> "end-of-file marker") and then remove it twice (from each thread) in 
> flushAndSync()! Hence there will be a valid byte missing from the edits log 
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
> cluster restart. 
> We got to this point after investigating a corrupted edits file that made 
> HDFS unable to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but 
> writables.length is 768. 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-737) Improvement in metasave output

2010-02-05 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-737:
-

Attachment: HDFS-737.3.rel20.patch

Patch for branch 20 attached.

> Improvement in metasave output
> --
>
> Key: HDFS-737
> URL: https://issues.apache.org/jira/browse/HDFS-737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.21.0, 0.22.0
>
> Attachments: HDFS-737.1.patch, HDFS-737.2.patch, HDFS-737.3.patch, 
> HDFS-737.3.rel20.patch
>
>
> This jira tracks following improvements in metasave output
> 1. A summary of total files/directories and blocks at the begining
> 2. Full path name of the files should also be written out for 
> under-replicated/corrupt or missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-245) Create symbolic links in HDFS

2010-02-05 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-245:
-

Attachment: symlink38-hdfs.patch

Attached patch merges with trunk to resolve with the DFSClient refactoring.

> Create symbolic links in HDFS
> -
>
> Key: HDFS-245
> URL: https://issues.apache.org/jira/browse/HDFS-245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: Eli Collins
> Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
> designdocv1.txt, designdocv2.txt, designdocv3.txt, 
> HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
> symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
> symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
> symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
> symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
> symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
> symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
> symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
> symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
> symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
> symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
> symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
> symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
> symlink37-hdfs.patch, symlink38-hdfs.patch, symLink4.patch, symLink5.patch, 
> symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-898) Sequential generation of block ids

2010-02-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-898:
-

Attachment: FreeBlockIds.pdf

Here is what I got running the block id freeing tool on some cluster images. 
All 10 images tolerated the reset of 8 bits not causing any collisions among 
projected ids.

> Sequential generation of block ids
> --
>
> Key: HDFS-898
> URL: https://issues.apache.org/jira/browse/HDFS-898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, 
> HighBitProjection.pdf
>
>
> This is a proposal to replace random generation of block ids with a 
> sequential generator in order to avoid block id reuse in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830350#action_12830350
 ] 

Jakob Homan commented on HDFS-938:
--

+1

> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, contrib.ivy.jackson.patch-3, 
> HDFS-938-BP20-1.patch, HDFS-938-BP20-2.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-954) There are two security packages in hdfs, should be one

2010-02-05 Thread Jakob Homan (JIRA)
There are two security packages in hdfs, should be one
--

 Key: HDFS-954
 URL: https://issues.apache.org/jira/browse/HDFS-954
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jakob Homan


Currently the test source tree has both
src/test/hdfs/org/apache/hadoop/hdfs/security with:
SecurityTestUtil.java
TestAccessToken.java
TestClientProtocolWithDelegationToken.java

and 
src/test/hdfs/org/apache/hadoop/security with:
TestDelegationToken.java
TestGroupMappingServiceRefresh.java
TestPermission.java

These should be combined into one package and possibly some things moved to 
common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830334#action_12830334
 ] 

Todd Lipcon commented on HDFS-200:
--

Regarding above comment - I suppose it doesn't entirely matter, since you only 
care about getting the true file length if you're reading
at the end of the file. So, when you get near the end (10 blocks away) you'll 
get the updated length. However, you'll still do useless
getBlockInfo calls on earlier blocks in the file. Perhaps some check against 
the LocatedBlocks structure to see if the last block
in the block list is actually the same as the last block of the file is in 
order?

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830327#action_12830327
 ] 

Todd Lipcon commented on HDFS-200:
--

Hey Dhruba,

In the client-side length fetching code, you use check the LocatedBlock last = 
newInfo.get(newInfo.locatedBlockCount()-1) to fetch the size.
However, this is only the last block in the file for cases where the file has 
fewer than prefetchSize blocks, right? Has anyone tried a test
after setting block size to 1MB?

The trunk append code solves this issue by adding a lastBlock field to 
LocatedBlocks. I imagine you avoided this to keep wire compatibility
in this patch.

-Todd

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-02-05 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-200:
--

Attachment: fsyncConcurrentReaders15_20.txt

Merged with latest trunk

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-938:


Attachment: HDFS-938-BP20-2.patch

address Jacob comments for HDFS-938-BP20-2.patch.
removed TestGroupMappingServiceRefresh.java changes.


> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, contrib.ivy.jackson.patch-3, 
> HDFS-938-BP20-1.patch, HDFS-938-BP20-2.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-953) TestBlockReport times out intermittently

2010-02-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830316#action_12830316
 ] 

Konstantin Boudnik commented on HDFS-953:
-

Yes, it seems very similar. There's a strange problem with the test which never 
happens on Mac: writing a block finishes very quickly thus the test won't 
notice that a replica has became TEMPORARY and will eventually timeout (after 
like 4 seconds or so).

> TestBlockReport times out intermittently
> 
>
> Key: HDFS-953
> URL: https://issues.apache.org/jira/browse/HDFS-953
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>
> TestBlockReport appears to occasionally time out with "Was waiting too long 
> for a replica to become TEMPORARY". Looks like something similar to HDFS-733
> Test failure here:
> http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockReport/blockReport_08/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830313#action_12830313
 ] 

Todd Lipcon commented on HDFS-909:
--

I think we're talking about different things. I'm discussing the saveFSImage 
that is called by
FSNamesystem.saveNamespace, where you found the race above.

FSImage.saveFSImage has this code:
{noformat}
1240   if (dirType.isOfType(NameNodeDirType.IMAGE))
1241 saveFSImage(getImageFile(sd, NameNodeFile.IMAGE_NEW));
1242   if (dirType.isOfType(NameNodeDirType.EDITS)) {
1243 editLog.createEditLogFile(getImageFile(sd, NameNodeFile.EDITS));
1244 File editsNew = getImageFile(sd, NameNodeFile.EDITS_NEW);
1245 if (editsNew.exists())
1246   editLog.createEditLogFile(editsNew);
1247   }
{noformat}

On line 1243 we truncate EDITS. Then if EDITS_NEW exists, we truncate it on 
1246. All of this happens when
the NN is in safe mode, so there shouldn't be any new edits coming in in the 
first place.

I'm contending that line 1243 and 1245 should both be deleted. We should always 
create the image as
IMAGE_NEW (line 1241). Touching EDITS seems incorrect - what if the order of 
storage dirs is EDITS
then IMAGE, so we run line 1243, kill our current edit log, and then crash 
before saving the current image?

(this is possibly orthogonal to the issue you raised - even though there are no 
edits, there can be an ongoing sync.
I think the NN should call editLog.waitForSyncToFinish before entering safemode 
to avoid this issue)

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
> operations  corrupts edits log
> -
>
> Key: HDFS-909
> URL: https://issues.apache.org/jira/browse/HDFS-909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
> Environment: CentOS
>Reporter: Cosmin Lehene
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
> Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
> hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation 
> resulting in OP_INVALID end-of-file marker being initially overwritten by the 
> concurrent (in setReadyToFlush) threads and then removed twice from the 
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the 
> FSEdits.logSync() method level. However at a lower level access to 
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
> synchronized. These can be called from concurrent threads like in the example 
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write 
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
> "end-of-file marker") and then remove it twice (from each thread) in 
> flushAndSync()! Hence there will be a valid byte missing from the edits log 
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
> cluster restart. 
> We got to this point after investigating a corrupted edits file that made 
> HDFS unable to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but 
> writables.length is 768. 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-894) DatanodeID.ipcPort is not updated when existing node re-registers

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830309#action_12830309
 ] 

Todd Lipcon commented on HDFS-894:
--

Filed HDFS-953 for the testpatch failure seen above.

I think this is ready to commit.

> DatanodeID.ipcPort is not updated when existing node re-registers
> -
>
> Key: HDFS-894
> URL: https://issues.apache.org/jira/browse/HDFS-894
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-894.txt
>
>
> In FSNamesystem.registerDatanode, it checks if a registering node is a 
> reregistration of an old one based on storage ID. If so, it simply updates 
> the old one with the new registration info. However, the new ipcPort is lost 
> when this happens.
> I produced manually this by setting up a DN with IPC port set to 0 (so it 
> picks an ephemeral port) and then restarting the DN. At this point, the NN's 
> view of the ipcPort is stale, and clients will not be able to achieve 
> pipeline recovery.
> This should be easy to fix and unit test, but not sure when I'll get to it, 
> so anyone else should feel free to grab it if they get to it first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-953) TestBlockReport times out intermittently

2010-02-05 Thread Todd Lipcon (JIRA)
TestBlockReport times out intermittently


 Key: HDFS-953
 URL: https://issues.apache.org/jira/browse/HDFS-953
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 0.22.0
Reporter: Todd Lipcon


TestBlockReport appears to occasionally time out with "Was waiting too long for 
a replica to become TEMPORARY". Looks like something similar to HDFS-733
Test failure here:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/111/testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockReport/blockReport_08/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-833) TestDiskError.testReplicationError fails with locked storage error

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830297#action_12830297
 ] 

Todd Lipcon commented on HDFS-833:
--

TestDatanodeBlockScanner failed with this same error here:

http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/112/testReport/org.apache.hadoop.hdfs/TestDatanodeBlockScanner/testDatanodeBlockScanner/

{noformat}
2010-02-04 07:08:23,803 INFO  common.Storage (Storage.java:lock(523)) - Cannot 
lock storage 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h2.grid.sp2.yahoo.net/trunk/build/test/data/dfs/name1.
 The directory is already locked.
{noformat}
though with no stack trace - the test eventually failed with a 
FileNotFoundException:

{noformat}
java.io.FileNotFoundException: 
http://localhost:53291/blockScannerReport?listblocks
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1288)
at org.apache.hadoop.hdfs.DFSTestUtil.urlGet(DFSTestUtil.java:286)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.waitForVerification(TestDatanodeBlockScanner.java:70)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testDatanodeBlockScanner(TestDatanodeBlockScanner.java:129)
{noformat}

(I assume because the NN got in a bad state due to locked storage)

> TestDiskError.testReplicationError fails with locked storage error 
> ---
>
> Key: HDFS-833
> URL: https://issues.apache.org/jira/browse/HDFS-833
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: gary murry
>Priority: Blocker
>
> The current build is failing with on TestDiskError.testReplication with the 
> following error:
> Cannot lock storage 
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/test/data/dfs/name1.
>  The directory is already locked.
>  http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/171/ 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-914) Refactor DFSOutputStream and DFSInputStream out of DFSClient

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830298#action_12830298
 ] 

Todd Lipcon commented on HDFS-914:
--

The failures seem to be the same as HDFS-615 and HDFS-833 - the NN storage has 
an "already locked" error, and then the test fails because the NN isn't in a 
good state.

> Refactor DFSOutputStream and DFSInputStream out of DFSClient
> 
>
> Key: HDFS-914
> URL: https://issues.apache.org/jira/browse/HDFS-914
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-914.txt, hdfs-914.txt, hdfs-914.txt
>
>
> I'd like to propose refactoring DFSClient to extract DFSOutputStream and 
> DFSInputStream into a new org.apache.hadoop.hdfs.client package. DFSClient 
> has become unmanageably large, containing 8 inner classes.and approaching 
> 4kloc. Factoring out the non-static inner classes will also make them easier 
> to test in isolation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830284#action_12830284
 ] 

Jakob Homan commented on HDFS-938:
--

For the 938 backport, looks like you got all the references in HDFS.  Since 
this patch is being backported in three pieces, rather than the usual one, I 
have a question as to whether it's correct that 
org/apache/hadoop/security/TestGroupMappingServiceRefresh.java is being patched 
here?

> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, contrib.ivy.jackson.patch-3, 
> HDFS-938-BP20-1.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830264#action_12830264
 ] 

Boris Shkolnik commented on HDFS-938:
-

moving ivy configuration stuff to HADOOP-6544


> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, contrib.ivy.jackson.patch-3, 
> HDFS-938-BP20-1.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-938:


Attachment: contrib.ivy.jackson.patch-3

> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, contrib.ivy.jackson.patch-3, 
> HDFS-938-BP20-1.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-938:


Attachment: contrib.ivy.jackson.patch-1

> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> contrib.ivy.jackson.patch-1, HDFS-938-BP20-1.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-938) Replace calls to UGI.getUserName() with UGI.getShortUserName()

2010-02-05 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-938:


Attachment: contrib.ivy.jackson.patch-1

added eclipse.template/.classpath to include JSON libs

> Replace calls to UGI.getUserName() with UGI.getShortUserName()
> --
>
> Key: HDFS-938
> URL: https://issues.apache.org/jira/browse/HDFS-938
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: contrib.ivy.jackson.patch, contrib.ivy.jackson.patch-1, 
> HDFS-938-BP20-1.patch, HDFS-938.patch
>
>
> HADOOP-6526 details why UGI.getUserName() will not work to identify users. 
> Until the proposed UGI.getLocalName() is implemented, calls to getUserName() 
> should be replaced with the short name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-935) Real user in delegation token.

2010-02-05 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-935:
--

Attachment: HDFS-935.5.patch

> Real user in delegation token.
> --
>
> Key: HDFS-935
> URL: https://issues.apache.org/jira/browse/HDFS-935
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-935.3.patch, HDFS-935.5.patch
>
>
> Delegation Token should also contain the real user which got it issues in 
> behalf of an effective user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-914) Refactor DFSOutputStream and DFSInputStream out of DFSClient

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830208#action_12830208
 ] 

Todd Lipcon commented on HDFS-914:
--

The failed tests were unrelated - those same errors have shown up on a lot of 
test-builds recently. I ran TestDatanodeBlockScanner locally on a patched tree 
and it passed. Sorry that I didn't comment here.

> Refactor DFSOutputStream and DFSInputStream out of DFSClient
> 
>
> Key: HDFS-914
> URL: https://issues.apache.org/jira/browse/HDFS-914
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-914.txt, hdfs-914.txt, hdfs-914.txt
>
>
> I'd like to propose refactoring DFSClient to extract DFSOutputStream and 
> DFSInputStream into a new org.apache.hadoop.hdfs.client package. DFSClient 
> has become unmanageably large, containing 8 inner classes.and approaching 
> 4kloc. Factoring out the non-static inner classes will also make them easier 
> to test in isolation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-914) Refactor DFSOutputStream and DFSInputStream out of DFSClient

2010-02-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830203#action_12830203
 ] 

Konstantin Shvachko commented on HDFS-914:
--

This did not get approval from Hudson and still was committed. I don't see any 
comments about failed tests, and I don't see any jiras filed about that.

> Refactor DFSOutputStream and DFSInputStream out of DFSClient
> 
>
> Key: HDFS-914
> URL: https://issues.apache.org/jira/browse/HDFS-914
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-914.txt, hdfs-914.txt, hdfs-914.txt
>
>
> I'd like to propose refactoring DFSClient to extract DFSOutputStream and 
> DFSInputStream into a new org.apache.hadoop.hdfs.client package. DFSClient 
> has become unmanageably large, containing 8 inner classes.and approaching 
> 4kloc. Factoring out the non-static inner classes will also make them easier 
> to test in isolation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-945) Make NameNode resilient to DoS attacks (malicious or otherwise)

2010-02-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830200#action_12830200
 ] 

Allen Wittenauer commented on HDFS-945:
---

QoS (which is really what we're talking about here) is better done at the 
application layer, IMO.  Passing this work off to an already overworked 
iptables (which is providing security since hadoop doesn't have much of any) is 
an idea that won't scale, esp at Yahoo! levels.

> Make NameNode resilient to DoS attacks (malicious or otherwise)
> ---
>
> Key: HDFS-945
> URL: https://issues.apache.org/jira/browse/HDFS-945
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Arun C Murthy
>
> We've seen defective applications cause havoc on the NameNode, for e.g. by 
> doing 100k+ 'listStatus' on very large directories (60k files) etc.
> I'd like to start a discussion around how we prevent such, and possibly 
> malicious applications in the future, taking down the NameNode.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830148#action_12830148
 ] 

Todd Lipcon commented on HDFS-951:
--

What version are you seeing this issue in? I noticed something similar when 
producing HDFS-915 on trunk, but haven't seen it on 0.20.1

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830136#action_12830136
 ] 

He Yongqiang commented on HDFS-951:
---

Throwing exception at future close() and write() is perfectly fine.

I am saying the second one. 
Also the file will leave in incomplete/being create state if that DfsClient  
instance does not get a chance to close. (which is common for many daemon apps 
which use hdfs as the backend). 

> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-951) DFSClient should handle all nodes in a pipeline failed.

2010-02-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830125#action_12830125
 ] 

dhruba borthakur commented on HDFS-951:
---

If all datanodes in the pipleline are dead, then the application cannot write 
anymore to the file. (This can be improved, of course). Are you saying that 
throwing exceptions to the write/close call (after all datanodes in pipeline 
have failed) is a problem? 

Or are you saying that when all datanodes in the pipeline fail, all resources 
associated with that OutputStream should be automatically released?


> DFSClient should handle all nodes in a pipeline failed.
> ---
>
> Key: HDFS-951
> URL: https://issues.apache.org/jira/browse/HDFS-951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> processDatanodeError-> setupPipelineForAppendOrRecovery  will set 
> streamerClosed to be true if all nodes in the pipeline failed in the past, 
> and just return.
> Back to run() in data streammer,  the logic 
>  if (streamerClosed || hasError || dataQueue.size() == 0 || !clientRunning) {
> continue;
>   }
> will just let set closed=true in closeInternal().
> And DataOutputStream will not get a chance to clean up. The DataOutputStream 
> will throw exception or return null for following write/close.
> It will leave the file in writing in incomplete state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-898) Sequential generation of block ids

2010-02-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829998#action_12829998
 ] 

Konstantin Shvachko commented on HDFS-898:
--

Yes the birthday problem is similar, but not the same. In BDP you project 
/mm/dd to . 
But in BDP years are arbitrary, while in our case -s are limited to a 
certain number as Nicholas mentioned. 
Also birthdays do not uniquely identify a person, while in our case all blocks 
have unique ids.

> Sequential generation of block ids
> --
>
> Key: HDFS-898
> URL: https://issues.apache.org/jira/browse/HDFS-898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf
>
>
> This is a proposal to replace random generation of block ids with a 
> sequential generator in order to avoid block id reuse in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.