[jira] Commented: (HDFS-417) Improvements to Hadoop Thrift bindings

2009-07-01 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726134#action_12726134
 ] 

dhruba borthakur commented on HDFS-417:
---

is it possible to enhance this patch so that the thrift-hdfs server can be run 
separately from the NN?

> Improvements to Hadoop Thrift bindings
> --
>
> Key: HDFS-417
> URL: https://issues.apache.org/jira/browse/HDFS-417
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/thriftfs
> Environment: Tested under Linux x86-64
>Reporter: Carlos Valiente
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: all.diff, BlockManager.java, build_xml.diff, 
> DefaultBlockManager.java, DFSBlockManager.java, gen.diff, 
> hadoop-4707-31c331.patch.gz, HADOOP-4707-55c046a.txt, hadoop-4707-6bc958.txt, 
> hadoop-4707-867f26.txt.gz, HADOOP-4707.diff, HADOOP-4707.patch, 
> HADOOP-4707.patch, hadoopfs_thrift.diff, hadoopthriftapi.jar, 
> HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py, 
> hdfs_py_venky.diff, libthrift.jar, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them 
> consistent (in particular FileStatus{blockReplication,blockSize} vs 
> FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a 
> Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2) 
> the fact that we're dealing with Hadoop is already explicit in the namespace. 
>  The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
> *Perl*:
> use HadoopFS;
> my $client = HadoopFS::FileSystemClient->new(..);
>  _instead of:_
> my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
> *Python*:
> from hadoopfs import FileSystem
> client = FileSystem.Client(..)
> _instead of_
> from hadoopfs import ThriftHadoopFileSystem
> client = ThriftHadoopFileSystem.Client(..)
> (See also the attached diff [^scripts_hdfs_py.diff] for the
>  new version of 'scripts/hdfs.py').
> *C++*:
> hadoopfs::FileSystemClient client(..);
>  _instead of_:
> hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing 
> with a Thrift object, and its purpose (to act as a handle for file 
> operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and 
> consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, 
> MalformedInputException and ThriftIOException/IOException, to improve 
> compatibility of existing clients with future versions of the interface which 
> might add new fields to those objects (like stack traces for the exception 
> types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in 
> [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also 
> attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the 
> Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java 
> (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to 
> make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be 
> regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant 
> target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-414) add fuse-dfs to src/contrib/build.xml test target

2009-07-01 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-414:
--

Status: Open  (was: Patch Available)

Patch does not merge with latest trunk.

> add fuse-dfs to src/contrib/build.xml test target
> -
>
> Key: HDFS-414
> URL: https://issues.apache.org/jira/browse/HDFS-414
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Pete Wyckoff
>Assignee: Pete Wyckoff
>Priority: Minor
> Attachments: HADOOP-4644.txt
>
>
> since contrib/build.xml test target now specifically includes contrib 
> projects rather than all, fuse-dfs needs to be added.
> Note that fuse-dfs' test target is gated on -Dfusedfs=1 and -Dlibhdfs=1, so 
> just running ant test-contrib will not actually trigger it being run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-301) Provide better error messages when fs.default.name is invalid

2009-07-01 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-301:
--

Status: Open  (was: Patch Available)

@Steve:  this patch does not merge with hdfs trunk anymore. 

> Provide better error messages when fs.default.name is invalid
> -
>
> Key: HDFS-301
> URL: https://issues.apache.org/jira/browse/HDFS-301
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-5095-1.patch
>
>
> this the followon to HADOOP-5687 - its not enough to detect bad uris, we need 
> good error messages and a set of tests to make sure everything works as 
> intended.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-01 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726329#action_12726329
 ] 

dhruba borthakur commented on HDFS-385:
---

@Tom: would appreciate it a lot if you can review it one more time. Thanks.

@Jingkei: can you use HDFS-207 to address the issue you have raised?

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-04 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-385:
--

Fix Version/s: 0.21.0
   Status: Patch Available  (was: Open)

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727842#action_12727842
 ] 

dhruba borthakur commented on HDFS-385:
---

This patch is ready for commit. I would like to see if anybody else would like 
to review this JIRA.

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-297) Implement a pure Java CRC32 calculator

2009-07-07 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728534#action_12728534
 ] 

dhruba borthakur commented on HDFS-297:
---

This looks good. Does it make sense to run the unit test for longer period of 
time (more than the current 24 iterations)  just so as to get more coverage? 

Is this new code written from scratch or has it been taken from some Apache 
project (just making sure that there aren't any GPL/LGPL issues)

> Implement a pure Java CRC32 calculator
> --
>
> Key: HDFS-297
> URL: https://issues.apache.org/jira/browse/HDFS-297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Todd Lipcon
> Attachments: crc32-results.txt, hadoop-5598-evil.txt, 
> hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, hdfs-297.txt, 
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-08 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Attachment: softMount8.txt

I incorporated both of Konstantin's  review comments and added a unit test.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-08 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Status: Patch Available  (was: Open)

Trigger Hudson QA tests.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728594#action_12728594
 ] 

dhruba borthakur commented on HDFS-385:
---

@Hong: * javadoc for chooseTarget contains an unused parameter : I wiill fix.
* two versions of chooseTarget. I will make the default as you 
suggested.
* asymmetry for chooseTarget which takes a list of 
DatanodeDescriptor: I would like to leave it as it is because lots of code in 
the namenode depends on this behaviour. Is this ok with you?

@Matei: 
  HashMap vs HashSet, not much difference as explained by Nicholas. Also, the 
last path does not have any excludedNodes parameters
 * verifyBlockPlacement part of the abstract API? This method is used by fsck 
to verify that the block satisfies the placement policy. 


> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-297) Implement a pure Java CRC32 calculator

2009-07-08 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728838#action_12728838
 ] 

dhruba borthakur commented on HDFS-297:
---

> During testing I ran on many more iterations than just the 24. W

Thanks for the info.

> I think we now have the fastest crc32 in the wes

Way to go!

> Implement a pure Java CRC32 calculator
> --
>
> Key: HDFS-297
> URL: https://issues.apache.org/jira/browse/HDFS-297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Todd Lipcon
> Attachments: crc32-results.txt, hadoop-5598-evil.txt, 
> hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, hdfs-297.txt, 
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-385:
--

Status: Open  (was: Patch Available)

Cancelling patch to incorporate Hong's review comments.

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-385:
--

Attachment: BlockPlacementPluggable5.txt

Incorporated Hong's review comments. I did not change the "asymmetry for 
chooseTarget which takes a list of DatanodeDescriptor" because this list is 
actually generated from the BlockManager and represents the datanodes where the 
block currently resides. It will one additional memory allocation if I change 
this list to a array.

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-385:
--

Status: Patch Available  (was: Open)

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728865#action_12728865
 ] 

dhruba borthakur commented on HDFS-385:
---

{quote}

 +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

{quote}

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728986#action_12728986
 ] 

dhruba borthakur commented on HDFS-385:
---

The usage model it to define a policy for the entire cluster when you create 
the cluster. This is especially useful when you have an HDFS instance on Amazon 
EB2 instance for example. This is not intended to be dynamic in any shape or 
form for a specified cluster.

> Existing files will retain policy 1. Fsck will report violations for policy 2 
> for the old files; correct?
Correct.

> It would be an admin error to configure NN and Balancer with different 
> policies; correct? There is no check for this; correct?
Correct.  

> Q. The policy manager is global to the file system. Can it have its own 
> config to to do different policies for different subtrees?
Sure can. I do not have a use-case for now that needs different policies for 
different files. But when it is required, we can always do that.

You could also use this to co-locate blocks of the same fle in the same set of 
datanodes. But here again, I do not see different policies for different files.

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729453#action_12729453
 ] 

dhruba borthakur commented on HDFS-385:
---

I agree that that this API might have to evolve over time. We should mark it as 
"unstable" in bold letters.

> In the past folks have complained that hadoop is too easy to misconfigure.

The default policy should work well for 99/9% people out there. Only a system 
admin can change the default policy. And one has to write Java code to 
implement a new policy... making it even tougher for most people to change 
policy.

>Given the above should the system record the policy in the fsImage to prevent 
>it from being changed? Similarly should the balancer check to see if it has 
>the same policy as the NN?
This can be done. This is mostly to reduce configuration errors, right? If so, 
can we defer it till we see it being a problem?

> However the experimentation is useful and as long it does not impact the base 
> code in a negative way, we should be able to add such features to hadoop 
> after careful review.

Thanks. Please review the code and provide some feedback if you so desire.



> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-09 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Status: Open  (was: Patch Available)

Re-triggring HudsonQA

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-09 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Fix Version/s: 0.21.0
   Status: Patch Available  (was: Open)

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-10 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729898#action_12729898
 ] 

dhruba borthakur commented on HDFS-278:
---

I do not think that Findbugs failing is caused by this patch. If anybody has 
any clues, please let me know.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-10 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729904#action_12729904
 ] 

dhruba borthakur commented on HDFS-385:
---

I am all for marking this API as experimental and can change anytime.

I will also open a JIRA (and patch) for exposing a fileid for a HDFS file

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-10 Thread dhruba borthakur (JIRA)
HDFS should expose a fileid to uniquely identify a file
---

 Key: HDFS-487
 URL: https://issues.apache.org/jira/browse/HDFS-487
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur


HDFS should expose a id that uniquely identifies a file. This helps in 
developing  applications that work correctly even when files are moved from one 
directory to another. A typical use-case is to make the Pluggable Block 
Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-10 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729913#action_12729913
 ] 

dhruba borthakur commented on HDFS-487:
---

A few preliminary requirements as I see it:

1. The fileid should be unique per path for the lifetime of a filesystem.
2.  FileStatus should contain the fileid, typical aplications like "hadoop dfs 
-ls" and "fsck" should be able to display the fileid
3.  There is no need for an API to map a fileid to a pathname.
4.  Regular files as well as directories have valid fileids.


> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-11 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730035#action_12730035
 ] 

dhruba borthakur commented on HDFS-278:
---

Thanks for the tip Nicholas. I will compile the latest trunk from common and 
checkin the files lib/hadoop-core-0.21.0-dev.jar and 
lib/hadoop-core-test-0.21.0-dev.jar.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Attachment: softMount9.txt

Merged patch with latest trunk

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Status: Patch Available  (was: Open)

Triggering HudsonQA tests

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

Status: Open  (was: Patch Available)

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730097#action_12730097
 ] 

dhruba borthakur commented on HDFS-278:
---

{quote}

[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
{quote}

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730119#action_12730119
 ] 

dhruba borthakur commented on HDFS-278:
---

All unit tests passed except TestHDFSCLI and TestNNThroughputBenchmark. 

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730190#action_12730190
 ] 

dhruba borthakur commented on HDFS-278:
---

The unit test that failed is TestHDFSCLI and it also fails in hdfs trunk 
(without this patch). So, this patch is ready to commit.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-278) Should DFS outputstream's close wait forever?

2009-07-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-278:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this.

> Should DFS outputstream's close wait forever?
> -
>
> Key: HDFS-278
> URL: https://issues.apache.org/jira/browse/HDFS-278
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch, softMount4.txt, softMount5.txt, softMount6.txt, 
> softMount7.txt, softMount8.txt, softMount9.txt
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-14 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-487:
--

Attachment: fileid1.txt

This patch implements a 64 bit fileid per file. It uses the already existing 
generation stamp counter to populate this field. 

One major question that is unanswered by this patch: what to do with files that 
were created by older versions of hadoop. I think a good answer will be assign 
them unique fileids.

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-15 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731602#action_12731602
 ] 

dhruba borthakur commented on HDFS-487:
---

Using a globally unique UUID means that it has to be at least 128 bits whereas  
a id that is unique within a cluster needs 64 bits. This has some impact on the 
amount of memory needed by the NN. What do other people think about sing 64 bit 
ids vs 128 bit ids?

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-15 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731661#action_12731661
 ] 

dhruba borthakur commented on HDFS-487:
---

I am of the opinion that it would be sufficient to go with a 64 bit fileid that 
is unique to the namespace.

Another use case: "distcp -update" looks at modification time and length of a 
file to determine whether it should be copied or not. This is not ideal, 
especially if somebody deletes and recreates a file with different contents 
within the time-precision of the clock used by HDFS. Also, it will not work 
when HDFS supports "truncates". (The modtime of a file can be set by an 
application). Having a unique fileid makes distcp work accurately.


> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-07-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-200:
--

Attachment: fsyncConcurrentReaders12_20.txt

Here is another patch that prints out the exception stack trace when the 
ReplicationMonitor encounters an exception. If you can reproduce this failure 
case with this patch, that will be great.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
> fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731849#action_12731849
 ] 

dhruba borthakur commented on HDFS-487:
---

> So truncating a file would change the fileid?

Truncating a file does not change the fileid. There isn't an operation that can 
change the fileid of an existing file. The filid is associated with a file at 
file creation time. If you delete a file and then recreate a file with the same 
pathname, the new file will get a new fileid. The reason I mention truncate is 
to exemplify the fact that the heuristic used in "distcp -update" option might 
not work very well when hdfs supports truncates. "distcp -update" could use the 
fileid to reduce the probability of not detecting modified files.

> I am still not clear about block placement use case.. may be it can use id of 
> the first block (it comes for free).

A blockid of a block is a concatenation of a 64 bit blockid and a 64 bit 
generation stamp. An error while writing to a block causes the generation stamp 
of that block to be modified. So, the blockid of the first block of a file does 
not remain fixed for the lifetime of that file. That means, it cannot be used 
as an unique identifier for a file.

> (3) separation of block management.

UUIDs probably make it somewhat futureproof, but we can also upgrade the 
unique-within-filesystem-fileid to a globally-unique-fileid when the use case 
arises. Such an upgrade will be easy to do. (The tradeoff is using more memory 
in the NN)

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732054#action_12732054
 ] 

dhruba borthakur commented on HDFS-487:
---

The API for pluggable block placement (HDFS-385) provides the pathname of the 
file to the block placement policy. The block placement policy can use the 
filename to determine what kind of placement algorithm to use for blocks in 
that file. This works well in the current NN design. However, if in future, we 
separate out the Block Manager from the NN, the Block Manager might not know 
the pathname for which the block belongs to. In that case, the Block manager 
will not be able to provide the filename when invoking the 
pluggable-block-placement-policy API. So, in some sense, using a fileid 
(instead of a filename) is future-proofing the API.

Again to emphasize, HDFS-385 does not really need fileids, although it is good 
to have. The API designed in HDFS-385 shoudl be marked as "experimental", and 
we can change it if/when the Block Manager is separated out from the NN. Which 
option do you prefer?

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732079#action_12732079
 ] 

dhruba borthakur commented on HDFS-487:
---

> Thanks Dhruba. It looks like the specifics of why a fileid is required (or if 
> it even helps) depend on future features. I think it is better to wait.

Thanks Raghu.

Sanjay: given Raghu's opinion, do you really think that the HDFS-385 needs to 
change to use the fileid (and not the filename).

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732269#action_12732269
 ] 

dhruba borthakur commented on HDFS-385:
---

 The cookie approach has a major disadvantage. Let's assume that the cookie is 
implemented as a pathname for hadoop 0.21. Then somebody builds an app using 
this api that parses the cookie as a hdfs pathname. For hadoop 0.22, we change 
the cookie to be a fileid. The earlier application compiles against the hadoop 
0.22 release, but at runtime the app will fail because it is unable to parse 
the cookie as a pathname. Instead, if we change the API signature for  hadoop 
0.22 to reflect that the pathname is not available anymore (instead a fileid is 
avilable), then the app will fail at compile time itself, which might be better 
than failing at runtime. No?



> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732270#action_12732270
 ] 

dhruba borthakur commented on HDFS-385:
---

> Shall we postpond the recording of the policy to another future Jira?

+1. We should mark this API as experimental in nature and avoid persisting it 
in the fsimage. 

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-492) Expose corrupt replica/block information

2009-07-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732801#action_12732801
 ] 

dhruba borthakur commented on HDFS-492:
---

This utility will be be a great help to monitor and process corrupt blocks in 
the file system.

Is it possible to also make HDFS throw a CorruptBlockException (which can be a 
subclass of IOException) when an app tries to read a block that is corrupt. 
This helps in developing intelligent clients that  can do a variety of things 
for this type of problem: maybe retrieve the file from an external location 
(maybe from tape) or from an alternate datacenter (if it is replicated there).

> Expose corrupt replica/block information
> 
>
> Key: HDFS-492
> URL: https://issues.apache.org/jira/browse/HDFS-492
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, name-node
>Affects Versions: 0.21.0
>Reporter: Bill Zeller
>Priority: Minor
> Attachments: hdfs-492-4.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This adds two additional functions to FSNamesystem to provide more 
> information about corrupt replicas. It also adds two servlets to the namenode 
> that provide information (in JSON) about all blocks with corrupt replicas as 
> well as information about a specific block. It also changes the file browsing 
> servlet by adding a link from block ids to the above mentioned block 
> information page.
> These JSON pages are designed to be used by client side tools which wish to 
> analyze corrupt block/replicas. The only change to an existing (non-servlet) 
> class is described below.  
> Currently, CorruptReplicasMap stores a map of corrupt replica information and 
> allows insertion and deletion. It also gives information about the corrupt 
> replicas for a specific block. It does not allow iteration over all corrupt 
> blocks. Two additional functions will be added to FSNamesystem (which will 
> call BlockManager which will call CorruptReplicasMap). The first will return 
> the size of the corrupt replicas map, which represents the number of blocks 
> that have corrupt replicas (but less than the number of corrupt replicas if a 
> block has multiple corrupt replicas). The second will allow "paging" through 
> a list of block ids that contain corrupt replicas:
> {{public synchronized List getCorruptReplicaBlockIds(int n, Long 
> startingBlockId)}}
> {{n}} is the number of block ids to return and {{startingBlockId}} is the 
> block id offset. To prevent a large number of items being returned at one 
> time, n is constrained to 0 <= {{n}} <= 100. If {{startingBlockId}} is null, 
> up to {{n}} items are returned starting at the beginning of the list. 
> Ordering is enforced through the internal use of TreeMap in 
> CorruptReplicasMap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-495) Hadoop FSNamesystem startFileInternal() getLease() has bug

2009-07-20 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733384#action_12733384
 ] 

dhruba borthakur commented on HDFS-495:
---

Absolutely right. This fix is posted as part of patch attached to HDFS-200

> Hadoop FSNamesystem startFileInternal() getLease() has bug
> --
>
> Key: HDFS-495
> URL: https://issues.apache.org/jira/browse/HDFS-495
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Ruyue Ma
>Priority: Minor
> Fix For: 0.20.1
>
>
> Original Code:
> //
> // If the file is under construction , then it must be in our
> // leases. Find the appropriate lease record.
> //
> Lease lease = leaseManager.getLease(new StringBytesWritable(holder));
> //
> // We found the lease for this file. And surprisingly the original
> // holder is trying to recreate this file. This should never occur.
> //
> if (lease != null) {
>   throw new AlreadyBeingCreatedException(
>  "failed to create file " + 
> src + " for " + holder +
>  " on client " + 
> clientMachine + 
>  " because current 
> leaseholder is trying to recreate file.");
> }
> Problem: if another client (who has had some file leases) to recreate the 
> underconstruction file, it can't trigger the lease recovery. 
> Reason:  we should do:
>  if (new StringBytesWritable(holder).equals(pendingFile.clientName)){
>   throw new AlreadyBeingCreatedException(
>  "failed to create file " + 
> src + " for " + holder +
>  " on client " + 
> clientMachine + 
>  " because current 
> leaseholder is trying to recreate file.");
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-495) Hadoop FSNamesystem startFileInternal() getLease() has bug

2009-07-21 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733738#action_12733738
 ] 

dhruba borthakur commented on HDFS-495:
---

Is it possible for you to post this patch and a unit test (instead of waiting 
for HDFS-200)? Thanks.  Here is a page that describes how to write unit tests : 
http://wiki.apache.org/hadoop/HowToContribute

> Hadoop FSNamesystem startFileInternal() getLease() has bug
> --
>
> Key: HDFS-495
> URL: https://issues.apache.org/jira/browse/HDFS-495
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Ruyue Ma
>Priority: Minor
> Fix For: 0.20.1
>
>
> Original Code:
> //
> // If the file is under construction , then it must be in our
> // leases. Find the appropriate lease record.
> //
> Lease lease = leaseManager.getLease(new StringBytesWritable(holder));
> //
> // We found the lease for this file. And surprisingly the original
> // holder is trying to recreate this file. This should never occur.
> //
> if (lease != null) {
>   throw new AlreadyBeingCreatedException(
>  "failed to create file " + 
> src + " for " + holder +
>  " on client " + 
> clientMachine + 
>  " because current 
> leaseholder is trying to recreate file.");
> }
> Problem: if another client (who has had some file leases) to recreate the 
> underconstruction file, it can't trigger the lease recovery. 
> Reason:  we should do:
>  if (new StringBytesWritable(holder).equals(pendingFile.clientName)){
>   throw new AlreadyBeingCreatedException(
>  "failed to create file " + 
> src + " for " + holder +
>  " on client " + 
> clientMachine + 
>  " because current 
> leaseholder is trying to recreate file.");
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734114#action_12734114
 ] 

dhruba borthakur commented on HDFS-435:
---

Very cool stuff! And the guide is very helpful. I have some questions from the 
user gide.

{quote}
   pointcut callReceivePacket() : 
call (* OutputStream.write(..)) 
&& withincode (* BlockReceiver.receivePacket(..)) 
// to further limit the application of this aspect a very narrow 
'target' can be used as follows 
//  && target(DataOutputStream) 
&& !within(BlockReceiverAspects +); 

   {quote}

Can you pl explain the above line in detail, what it means, etc. Things like 
"pointcut", "withincode", are these aspectJ constructs? what is the intention 
of the above line? Thanks.


> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-496) Use PureJavaCrc32 in HDFS

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734121#action_12734121
 ] 

dhruba borthakur commented on HDFS-496:
---

For the records. the PureJavaCrc32 computes the same CRC value as the current 
one. So, this patch doe snot change HDFS data format. Can you pl link this with 
the one in the common project, because that JIRA has the performance numbers.

> Use PureJavaCrc32 in HDFS
> -
>
> Key: HDFS-496
> URL: https://issues.apache.org/jira/browse/HDFS-496
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-496.txt
>
>
> Common now has a pure java CRC32 implementation which is more efficient than 
> java.util.zip.CRC32. This issue is to make use of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734125#action_12734125
 ] 

dhruba borthakur commented on HDFS-200:
---

Hi Ruyue, your option of excluding specific datanodes (specified by the client) 
sounds reasonable. This might help in the case of network partitioning where a 
specific client loses access to a set of datanodes while the datanode is alive 
and well and is able to send heartbeats to the namenode. Can you pl create a 
separate JIRA for your prosposed fix and attach your patch there? Thanks.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
> fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734307#action_12734307
 ] 

dhruba borthakur commented on HDFS-167:
---

Hi Bill, will it be possible for you to submit this as a patch and a unit test? 
Details are here : http://wiki.apache.org/hadoop/HowToContribute

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-435) Add orthogonal fault injection mechanism/framework

2009-07-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734308#action_12734308
 ] 

dhruba borthakur commented on HDFS-435:
---

> Dhruba, where should we put the doc? Any idea?

Docs typically go into src/docs. But we want the doc in an editable and open 
format. The pdf format is not editable. One option is to convert it into 
forrest xml and check it into src/docs/src/documentation/content/xdocs. 


> Add orthogonal fault injection mechanism/framework
> --
>
> Key: HDFS-435
> URL: https://issues.apache.org/jira/browse/HDFS-435
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: Fault injection development guide and Framework 
> HowTo.pdf, Fault injection development guide and Framework HowTo.pdf
>
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error 
> handling and recovery mechanisms, reduce reproduction time and increase the 
> reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. 
> E.g. faults have to be injected at build time and would have to be 
> configurable, e.g. all faults could be turned off, or only some of them would 
> be allowed to happen. Also, fault injection has to be separated from 
> production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned HDFS-167:
-

Assignee: Bill Zeller

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-497) One of the DFSClient::create functions ignores parameter

2009-07-23 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734600#action_12734600
 ] 

dhruba borthakur commented on HDFS-497:
---

What is your recommendation to fix this --- Do we need to get rid of the 
progress parameter? 

> One of the DFSClient::create functions ignores parameter
> 
>
> Key: HDFS-497
> URL: https://issues.apache.org/jira/browse/HDFS-497
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.1
>Reporter: Bill Zeller
>Priority: Minor
>
> DFSClient::create(String src, boolean overwrite, Progressable progress) 
> ignores progress parameter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-07-24 Thread dhruba borthakur (JIRA)
Implement erasure coding as a layer on HDFS
---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur


The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file 
system can be reduced. Keeping three copies of the same data is very costly, 
especially when the size of storage is huge. One idea is to reduce the 
replication factor and do erasure coding of a set of blocks so that the over 
probability of failure of a block remains the same as before.

Many forms of error-correcting codes are available, see 
http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
described DiskReduce 
https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.

My opinion is to discuss implementation strategies that are not part of base 
HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-504) HDFS updates the modification time of a file when the file is closed.

2009-07-24 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned HDFS-504:
-

Assignee: Chun Zhang

> HDFS updates the modification time of a file when the file is closed.
> -
>
> Key: HDFS-504
> URL: https://issues.apache.org/jira/browse/HDFS-504
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Chun Zhang
>Assignee: Chun Zhang
>Priority: Minor
> Fix For: 0.20.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Current HDFS updates the modification time of a file when the file is 
> created. We would like to update the modification time of the file when the 
> file is closed after being written to. This helps HDFS Raid to detect file 
> changes more aggressively.
> Solution includes:
> 1. Made changes to 
> closeFile@/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java;
> 2. Added unit test to 
> testTimesAtClose@/org/apache/hadoop/hdfs/TestSetTimes.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-504) HDFS updates the modification time of a file when the file is closed.

2009-07-25 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735330#action_12735330
 ] 

dhruba borthakur commented on HDFS-504:
---

If I look at the patch fle, I see that there are lots of extraneous changes 
because of chages to indentation. Hadoop uses two spaces for indentation. Can 
you pl ensure that your patch has only the lines that need to be changed for 
this JIRA? Thanks.

> HDFS updates the modification time of a file when the file is closed.
> -
>
> Key: HDFS-504
> URL: https://issues.apache.org/jira/browse/HDFS-504
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Chun Zhang
>Assignee: Chun Zhang
>Priority: Minor
> Fix For: 0.20.1
>
> Attachments: pathfile.txt
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Current HDFS updates the modification time of a file when the file is 
> created. We would like to update the modification time of the file when the 
> file is closed after being written to. This helps HDFS Raid to detect file 
> changes more aggressively.
> Solution includes:
> 1. Made changes to 
> closeFile@/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java;
> 2. Added unit test to 
> testTimesAtClose@/org/apache/hadoop/hdfs/TestSetTimes.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-07-25 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735331#action_12735331
 ] 

dhruba borthakur commented on HDFS-503:
---

@Hong: does somebody have a online copy of the paper that we can reference?

> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-504) HDFS updates the modification time of a file when the file is closed.

2009-07-27 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735621#action_12735621
 ] 

dhruba borthakur commented on HDFS-504:
---

The new patch looks better. There are still a few empty change lines in 
TestSetModTimes.java, Can you pl remove these empty lines?

The unit test added a section that verifies that access times and modification 
times persist after a cluster restart. Maybe this is not needed for a unit test 
that is testing that the mod-time of a file gets updated when the file is 
closed?

> HDFS updates the modification time of a file when the file is closed.
> -
>
> Key: HDFS-504
> URL: https://issues.apache.org/jira/browse/HDFS-504
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Chun Zhang
>Assignee: Chun Zhang
>Priority: Minor
> Fix For: 0.20.1
>
> Attachments: pathfile.txt, pathfile.txt
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Current HDFS updates the modification time of a file when the file is 
> created. We would like to update the modification time of the file when the 
> file is closed after being written to. This helps HDFS Raid to detect file 
> changes more aggressively.
> Solution includes:
> 1. Made changes to 
> closeFile@/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java;
> 2. Added unit test to 
> testTimesAtClose@/org/apache/hadoop/hdfs/TestSetTimes.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-506) Incorrect UserName at Solaris because it has no "whoami" command by default

2009-07-27 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735768#action_12735768
 ] 

dhruba borthakur commented on HDFS-506:
---

Is it possible for you to find out the name of the equivalent command in 
Solaris? Better still, if you can submit a patch that incorates your findings, 
that will be great. Thanks.

> Incorrect UserName at Solaris because it has no "whoami" command by default
> ---
>
> Key: HDFS-506
> URL: https://issues.apache.org/jira/browse/HDFS-506
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.1
> Environment: OS: SunOS 5.10
>Reporter: Urko Benito
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Solaris enviroment has no __whoami__ command, so the __getUnixUserName()__ at 
> UnixUserGroupInformation class fails because it's calling to 
> Shell.USER_NAME_COMMAND which is defines as "whoami".
> So it launched an Exception and set the default "DrWho" username ignoring all 
> the FileSystem permissions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-504) HDFS updates the modification time of a file when the file is closed.

2009-07-29 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736947#action_12736947
 ] 

dhruba borthakur commented on HDFS-504:
---

You can run the tests in your own workspace via running something similar to :

ant -Dpatch.file=pathfile.txt  -Dforrest.home=/home/dhruba/forrest/ 
-Dfindbugs.home=/home/dhruba/findbugs -Dscratch.dir=$RESULTS 
-Dsvn.cmd=/usr/local/bin/svn -Dgrep.cmd=/bin/grep  -Dpatch.cmd=/usr/bin/patch 
-Djava5.home=/usr/local/jdk1.5.0_07/ test-patch

> HDFS updates the modification time of a file when the file is closed.
> -
>
> Key: HDFS-504
> URL: https://issues.apache.org/jira/browse/HDFS-504
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Chun Zhang
>Assignee: Chun Zhang
>Priority: Minor
> Fix For: 0.20.1
>
> Attachments: pathfile.txt, pathfile.txt, pathfile.txt, pathfile.txt
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Current HDFS updates the modification time of a file when the file is 
> created. We would like to update the modification time of the file when the 
> file is closed after being written to. This helps HDFS Raid to detect file 
> changes more aggressively.
> Solution includes:
> 1. Made changes to 
> closeFile@/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java;
> 2. Added unit test to 
> testTimesAtClose@/org/apache/hadoop/hdfs/TestSetTimes.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-506) Incorrect UserName at Solaris because it has no "whoami" command by default

2009-07-29 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736955#action_12736955
 ] 

dhruba borthakur commented on HDFS-506:
---

One can install a whoami script on the solaris machine. It could be a shell 
script that internally uses the id command. In that case, no changes are needed 
to Hadoop. is my assumption correct?

> Incorrect UserName at Solaris because it has no "whoami" command by default
> ---
>
> Key: HDFS-506
> URL: https://issues.apache.org/jira/browse/HDFS-506
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.1
> Environment: OS: SunOS 5.10
>Reporter: Urko Benito
> Attachments: PermissionChecker.java.diff, Shell.java.diff, 
> test-hadoop-security.tar.gz, UnixUserGroupInformation.java.diff
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Solaris enviroment has no __whoami__ command, so the __getUnixUserName()__ at 
> UnixUserGroupInformation class fails because it's calling to 
> Shell.USER_NAME_COMMAND which is defines as "whoami".
> So it launched an Exception and set the default "DrWho" username ignoring all 
> the FileSystem permissions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-30 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737114#action_12737114
 ] 

dhruba borthakur commented on HDFS-167:
---

It would be really nice if we can split this into two patches: one patch that 
fixes the infinite loop and the other is a code-cleanup related to making 
dfs.namenode private. The reason being that he infinite loop problem can be 
used by people to backport to previous hadoop releases, especially hadoop 0.20. 

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
> Attachments: hdfs-167-4.patch, hdfs-167-5.patch
>
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-30 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737359#action_12737359
 ] 

dhruba borthakur commented on HDFS-167:
---

The change that changes DFSClient.namenode  from public to private cannot be 
backported easily to previous releases becaase it changes a public API, isn't 
it?

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
> Attachments: hdfs-167-4.patch, hdfs-167-5.patch
>
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-30 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737372#action_12737372
 ] 

dhruba borthakur commented on HDFS-167:
---

It appears that you already made HDFS-514 to check in only the changes to 
DFSClient.namenode. Thanks.

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
> Attachments: hdfs-167-4.patch, hdfs-167-5.patch
>
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-07-30 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-200:
--

Attachment: fsyncConcurrentReaders13_20.txt

This patch fixes the exception "Runtime exception. 
java.lang.IllegalStateException: generationStamp (=1) == 
GenerationStamp.WILDCARD_STAMP" reported by stack earlier.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders3.patch, 
> fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, 
> fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-07-31 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737762#action_12737762
 ] 

dhruba borthakur commented on HDFS-167:
---

if there is a patch for 0.20 attached to this JIRA, that will be awesome. if 
not, I can live with that too.

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
> Attachments: hdfs-167-4.patch, hdfs-167-5.patch, hdfs-167-6.patch
>
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-516) Low Latency distributed reads

2009-07-31 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737788#action_12737788
 ] 

dhruba borthakur commented on HDFS-516:
---

This sounds interesting. The patch you posted includes only the changes needed 
to Hadoop code, it does not have your implementation of the RadFileSystem.

> Low Latency distributed reads
> -
>
> Key: HDFS-516
> URL: https://issues.apache.org/jira/browse/HDFS-516
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jay Booth
>Priority: Minor
> Attachments: radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-167) DFSClient continues to retry indefinitely

2009-08-01 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737904#action_12737904
 ] 

dhruba borthakur commented on HDFS-167:
---

Thanks for the 0.20 patch. We added a new constructor to DFSClient:

{quote}

   * This constructor was written to allow easy testing of the DFSClient class.
+   * End users will most likely want to use one of the other constructors.
+   */
+  public DFSClient(ClientProtocol namenode, ClientProtocol rpcNamenode,
+   Configuration conf, FileSystem.Statistics stats)

{quote}

is it possible to make this constructor package-private (instead of public)?

> DFSClient continues to retry indefinitely
> -
>
> Key: HDFS-167
> URL: https://issues.apache.org/jira/browse/HDFS-167
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Reporter: Derek Wollenstein
>Assignee: Bill Zeller
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: hdfs-167-4.patch, hdfs-167-5.patch, hdfs-167-6.patch, 
> hdfs-167-for-20-1.patch
>
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally 
> retry its upload up to some limited number of times.  In this case, I found 
> that this retry loop continued indefinitely, to the point that the number of 
> tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for 
> replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: 
> NotReplicatedYetException sleeping 
> /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]  at 
> java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]  at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]  at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-504) HDFS updates the modification time of a file when the file is closed.

2009-08-03 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-504:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Chun!

> HDFS updates the modification time of a file when the file is closed.
> -
>
> Key: HDFS-504
> URL: https://issues.apache.org/jira/browse/HDFS-504
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Chun Zhang
>Assignee: Chun Zhang
>Priority: Minor
> Fix For: 0.20.1
>
> Attachments: pathfile.txt, pathfile.txt, pathfile.txt, pathfile.txt
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Current HDFS updates the modification time of a file when the file is 
> created. We would like to update the modification time of the file when the 
> file is closed after being written to. This helps HDFS Raid to detect file 
> changes more aggressively.
> Solution includes:
> 1. Made changes to 
> closeFile@/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java;
> 2. Added unit test to 
> testTimesAtClose@/org/apache/hadoop/hdfs/TestSetTimes.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-527) Refactor DFSClient constructors

2009-08-04 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739273#action_12739273
 ] 

dhruba borthakur commented on HDFS-527:
---

I like this one.better to expose fewer public APIs,

> Refactor DFSClient constructors
> ---
>
> Key: HDFS-527
> URL: https://issues.apache.org/jira/browse/HDFS-527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
> Attachments: h527_20090804.patch, h527_20090804b.patch
>
>
> There are 5 constructors in DFSClient.  It seems unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-528) Add dfsadmin -waitDatanodes feature to block until DNs have reported

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739364#action_12739364
 ] 

dhruba borthakur commented on HDFS-528:
---

Another generic approach is to specify the number of datanodes to wait for as  
a percentage of the total number of datanodes in a cluster. You would have to 
user the "includelist" feature of HDFS to list all the known datanodes (which 
most admins probably do). In fact, the  NN may exit safemode only if the 
specified percentage of datanodes have checked in with the NN. 

Many times, when we restart our cluster,  many datanodes fail to join the NN. 
However, the NN exists safemode because it finds at least one replica of every 
block. Then the NN starts replicating blocks. We have to manually enter 
safemode, manually look at the datanodes that have refuzed to join the NN, fix 
them and then exit safemode. Your proposed feature helps in elegantly handling 
this scenario.

> Add dfsadmin -waitDatanodes feature to block until DNs have reported
> 
>
> Key: HDFS-528
> URL: https://issues.apache.org/jira/browse/HDFS-528
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-528.txt
>
>
> When starting up a fresh cluster programatically, users often want to wait 
> until DFS is "writable" before continuing in a script. "dfsadmin -safemode 
> wait" doesn't quite work for this on a completely fresh cluster, since when 
> there are 0 blocks on the system, 100% of them are accounted for before any 
> DNs have reported.
> This JIRA is to add a command which waits until a certain number of DNs have 
> reported as alive to the NN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-512) Set block id as the key to Block

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739367#action_12739367
 ] 

dhruba borthakur commented on HDFS-512:
---

There are some advantages to using the generation stamp as part of the unique 
identifier for a Block object. This ensures that all code correctly identifies 
that blocks with different generation stamp are different blocks and can have 
different contents inside them. It might not be a big deal for NN data 
structures, especially because the NN first checks to see if a block belongs to 
a file before inserting it into the BlocksMap. But for external tools that use 
a block interface (e.g. Balancer, fsck, etc), it might be helpful for them to 
understand that blocks with different generation stamps are different blocks 
(do these utilities use the Block object at all?)

@Raghu: > This is probably a good time to add Block to ReplicaInfo. 

If we follow Raghu's suggestion, then can we continue using the genstamp as 
part of the Block key?

There are other cases, (especially during block report processing) where we 
would have to do wild-card lookups for a block. But the cost of these extra 
lookup calls might be minimal because they will be in the error-code-path only.


> Set block id as the key to Block
> 
>
> Key: HDFS-512
> URL: https://issues.apache.org/jira/browse/HDFS-512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Append Branch
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: Append Branch
>
> Attachments: blockKey.patch
>
>
> Currently the key to Block is block id + generation stamp. I would propose to 
> change it to be only block id. This is based on the following properties of 
> the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only 
> one generation of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier 
> since most of the time we know a block's id but may not know its generation 
> stamp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-512) Set block id as the key to Block

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739645#action_12739645
 ] 

dhruba borthakur commented on HDFS-512:
---

> does HDFS have any maps or sets that need to contain two instances of Block 
> with the same block id but different gen stamps

I agree, that this is not the case. So, let's get rid of  the genstamp from the 
the Block key. Can we still use it in the Block.equals() method?

> Set block id as the key to Block
> 
>
> Key: HDFS-512
> URL: https://issues.apache.org/jira/browse/HDFS-512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Append Branch
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: Append Branch
>
> Attachments: blockKey.patch
>
>
> Currently the key to Block is block id + generation stamp. I would propose to 
> change it to be only block id. This is based on the following properties of 
> the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only 
> one generation of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier 
> since most of the time we know a block's id but may not know its generation 
> stamp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-528) Add dfsadmin -waitDatanodes feature to block until DNs have reported

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739647#action_12739647
 ] 

dhruba borthakur commented on HDFS-528:
---

It would be nice if we can integrate it along with the safemode code in the NN. 
 Then, no new command line utilities are needed. 

if one set a non-zero value of the new config parameter, then the NN will not 
exit safemode unless that many DNs have checked in. 

> Add dfsadmin -waitDatanodes feature to block until DNs have reported
> 
>
> Key: HDFS-528
> URL: https://issues.apache.org/jira/browse/HDFS-528
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-528.txt
>
>
> When starting up a fresh cluster programatically, users often want to wait 
> until DFS is "writable" before continuing in a script. "dfsadmin -safemode 
> wait" doesn't quite work for this on a completely fresh cluster, since when 
> there are 0 blocks on the system, 100% of them are accounted for before any 
> DNs have reported.
> This JIRA is to add a command which waits until a certain number of DNs have 
> reported as alive to the NN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-512) Set block id as the key to Block

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739694#action_12739694
 ] 

dhruba borthakur commented on HDFS-512:
---

Yes, sounds good to me. 

> Set block id as the key to Block
> 
>
> Key: HDFS-512
> URL: https://issues.apache.org/jira/browse/HDFS-512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Append Branch
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: Append Branch
>
> Attachments: blockKey.patch
>
>
> Currently the key to Block is block id + generation stamp. I would propose to 
> change it to be only block id. This is based on the following properties of 
> the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only 
> one generation of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier 
> since most of the time we know a block's id but may not know its generation 
> stamp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-15) All replicas of a block end up on only 1 rack

2009-08-05 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739821#action_12739821
 ] 

dhruba borthakur commented on HDFS-15:
--

Does this problem exist (or have been observed) on 0.20?

> All replicas of a block end up on only 1 rack
> -
>
> Key: HDFS-15
> URL: https://issues.apache.org/jira/browse/HDFS-15
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
>Priority: Critical
>
> HDFS replicas placement strategy guarantees that the replicas of a block 
> exist on at least two racks when its replication factor is greater than one. 
> But fsck still reports that the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only 
> check the block's replication factor but not the rack requirement. When an 
> over-replicated block loses a replica due to decomission, corruption, or 
> heartbeat lost, namenode does not take any action to guarantee that remaining 
> replicas are on different racks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-06 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Summary: Allow applications to know that a read request failed because 
block is missing  (was: Allow applications to know that a read failed beucase 
block is missing)

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-532) Allow applications to know that a read failed beucase block is missing

2009-08-06 Thread dhruba borthakur (JIRA)
Allow applications to know that a read failed beucase block is missing
--

 Key: HDFS-532
 URL: https://issues.apache.org/jira/browse/HDFS-532
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur


I have an application that has intelligence to retrieve data from alternate 
locations if HDFS cannot provide this data. This can happen when data in HDFS 
is corrupted or the block is missing. HDFS already throws ChecksumException if 
the block is corrupted and throws a generic IOException if the block is 
missing. I would like HDFS to throw BlockMissingException when a read request 
encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read failed beucase block is missing

2009-08-06 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

  Component/s: hdfs client
Fix Version/s: 0.21.0

> Allow applications to know that a read failed beucase block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-518) Create new tests for Append's hflush

2009-08-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740316#action_12740316
 ] 

dhruba borthakur commented on HDFS-518:
---

Also we have to test writes across checksum boundaries (bytes.per.checksum)

> Create new tests for Append's hflush
> 
>
> Key: HDFS-518
> URL: https://issues.apache.org/jira/browse/HDFS-518
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: HDFS-518.patch
>
>
> According to the test plan a number of new features are going to be 
> implemented as a part of this umbrella (HDFS-265) JIRA. 
> These new features are have to be tested properly. hflush is one of new 
> functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-06 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Attachment: BlockMissingException.patch

if a read request encounters a missing block, then raise a 
BlockMissingException. BlockMissingException is derived from IOException.

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Status: Patch Available  (was: Open)

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Status: Open  (was: Patch Available)

Need to fix the javac warning introduced by this patch.

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Attachment: BlockMissingException2.txt

Fix javac warning in earlier patch.

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch, BlockMissingException2.txt
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

Status: Patch Available  (was: Open)

{quote}

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

{quote}

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch, BlockMissingException2.txt
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-08-12 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-200:
--

Attachment: fsyncConcurrentReaders14_20.txt

This patch ignores deleting blocks during block report processing if the file 
is underConstruction. Updated test case to tets this corner case.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
> fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, 
> Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-532) Allow applications to know that a read request failed because block is missing

2009-08-13 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-532:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this.

> Allow applications to know that a read request failed because block is missing
> --
>
> Key: HDFS-532
> URL: https://issues.apache.org/jira/browse/HDFS-532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockMissingException.patch, BlockMissingException2.txt
>
>
> I have an application that has intelligence to retrieve data from alternate 
> locations if HDFS cannot provide this data. This can happen when data in HDFS 
> is corrupted or the block is missing. HDFS already throws ChecksumException 
> if the block is corrupted and throws a generic IOException if the block is 
> missing. I would like HDFS to throw BlockMissingException when a read request 
> encounters a block that has no locations associated with it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-535) TestFileCreation occasionally fails because of an exception in DataStreamer.

2009-08-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743105#action_12743105
 ] 

dhruba borthakur commented on HDFS-535:
---

+1 looks good to me.

> TestFileCreation occasionally fails because of an exception in DataStreamer.
> 
>
> Key: HDFS-535
> URL: https://issues.apache.org/jira/browse/HDFS-535
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.20.1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: TestFileCreate.patch
>
>
> One of test cases, namely {{testFsCloseAfterClusterShutdown()}}, of 
> {{TestFileCreation}} fails occasionally.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-173) Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2009-08-14 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743107#action_12743107
 ] 

dhruba borthakur commented on HDFS-173:
---

The proposal sounds like a feasible idea.

A related problem is that a Namenode server handler thread will remain busy 
during all this time. Is that a problem that need to be solved too?


> Recursively deleting a directory with millions of files makes NameNode 
> unresponsive for other commands until the deletion completes
> ---
>
> Key: HDFS-173
> URL: https://issues.apache.org/jira/browse/HDFS-173
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>
> Delete a directory with millions of files. This could take several minutes 
> (observed 12 mins for 9 million files). While the operation is in progress 
> FSNamesystem lock is held and the requests from clients are not handled until 
> deletion completes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-15) All replicas of a block end up on only 1 rack

2009-08-19 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745313#action_12745313
 ] 

dhruba borthakur commented on HDFS-15:
--

> A different queue (neededReplicationsForRacks) is maintained for blocks which 
> do not have sufficient rac

There was a time when the Namenode was littered with plenty of adhoc data 
structures, each for its own purpose. There was an effort to consolidate the 
functionality of these data structures into a smaller set. I am not against 
this patch, but is it really difficult to integrate this new data structure 
into neededReplication as explained in your first proposal?

> All replicas of a block end up on only 1 rack
> -
>
> Key: HDFS-15
> URL: https://issues.apache.org/jira/browse/HDFS-15
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hairong Kuang
>Assignee: Jitendra Nath Pandey
>Priority: Critical
>
> HDFS replicas placement strategy guarantees that the replicas of a block 
> exist on at least two racks when its replication factor is greater than one. 
> But fsck still reports that the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only 
> check the block's replication factor but not the rack requirement. When an 
> over-replicated block loses a replica due to decomission, corruption, or 
> heartbeat lost, namenode does not take any action to guarantee that remaining 
> replicas are on different racks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-573) Porting libhdfs to Windows

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748092#action_12748092
 ] 

dhruba borthakur commented on HDFS-573:
---

Is there a way to create a new file that contains finctions that implements  
hsearch/hcreate etc for Windows? in that case, we can continue to use the 
current code on Linux.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-573) Porting libhdfs to Windows

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748105#action_12748105
 ] 

dhruba borthakur commented on HDFS-573:
---

OK, thanks for the explanation.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-570) When opening a file for read, make the file length avaliable to client.

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748106#action_12748106
 ] 

dhruba borthakur commented on HDFS-570:
---

Are we implementing algorithm 1 or algorithm 2 (as described in the append 
design doc)?

> When opening a file for read, make the file length avaliable to client.
> ---
>
> Key: HDFS-570
> URL: https://issues.apache.org/jira/browse/HDFS-570
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client
>Affects Versions: Append Branch
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: Append Branch
>
>
> In order to support read consistency, DFSClient needs the file length at the 
> file opening time.
> For more details, see Section 4 in the [append design 
> doc|https://issues.apache.org/jira/secure/attachment/12415768/appendDesign2.pdf].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-553) BlockSender reports wrong failed position in ChecksumException

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748149#action_12748149
 ] 

dhruba borthakur commented on HDFS-553:
---

I propose that we backport this to 0.20? This is a very simple and row risk fix 
and helps applications that depend on the correct offset being recorded inside 
Checksumexception. If folks agree, then I will pull this into 0.20 as well.

> BlockSender reports wrong failed position in ChecksumException
> --
>
> Key: HDFS-553
> URL: https://issues.apache.org/jira/browse/HDFS-553
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.21.0
>
> Attachments: crcFailedPosition.patch
>
>
> BlockSender sets a wrong position in ChecksumException that indicates the 
> offset where crc mismatch occurs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-531) Renaming of configuration keys

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748266#action_12748266
 ] 

dhruba borthakur commented on HDFS-531:
---

> dfs.max.replication.streams and dfs.heartbeat.recheck.interval should have a 
> more specific context? dfs.namenode.*?

I like this idea. Does it mean that we have dfs.namenode.* dfs.client.* and 
dfs.datanode.* types of property names?

> Renaming of configuration keys
> --
>
> Key: HDFS-531
> URL: https://issues.apache.org/jira/browse/HDFS-531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.21.0
>
> Attachments: changed_config_keys.txt
>
>
> Keys in configuration files should be standardized so that key names reflect 
> the components they are used in.
> For example:
>dfs.backup.address should be renamed to dfs.namenode.backup.address 
>dfs.data.dir   should be renamed to dfs.datanode.data.dir
> This change will impact both hdfs and common sources.
> Following convention is proposed:
> 1. Any key related hdfs should begin with 'dfs'.
> 2. Any key related to namenode, datanode or client should begin with 
> dfs.namenode, dfs.datanode or dfs.client respectively.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-543) Break FSDatasetInterface#writeToBlock() into writeToTemporary, writeToRBW, and append

2009-08-26 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748270#action_12748270
 ] 

dhruba borthakur commented on HDFS-543:
---

+1 looks good.

For writeToTemporary() and FSDataSet.append(), this patch does not invoke 
stopWriters() to terminate any existing writers. Will it be wrong to invoke 
stopWriters() in all these three methods?

> Break FSDatasetInterface#writeToBlock() into writeToTemporary, writeToRBW, 
> and append
> -
>
> Key: HDFS-543
> URL: https://issues.apache.org/jira/browse/HDFS-543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: Append Branch
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: Append Branch
>
> Attachments: writeToReplica.patch, writeToReplica1.patch, 
> writeToReplica2.patch, writeToReplica3.patch
>
>
> FSDatasetInterface#writeToBlock() currently allows to create/recover a 
> temporary replica, create/recover a RBW replica, or append to a replica. The 
> implementation of the method in FSDataset is very complicated and error 
> prone. I'd like to break this method into 3:
> 1. writeToTemporary allows to create a Temporary replica or recover from a 
> packet error for a Tempoary replica;
> 2. writeToRBW allows to create a RBW replica or recover from a packet error 
> for a RBW replica;
> 3. append allows to append to an existing Finalized replica.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-575) DFSClient read performance can be improved by stagerring connection setup to datanode(s)

2009-08-27 Thread dhruba borthakur (JIRA)
DFSClient read performance can be improved by stagerring connection setup to 
datanode(s)


 Key: HDFS-575
 URL: https://issues.apache.org/jira/browse/HDFS-575
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur


The DFS client opens a socket connection to a DN for the n-th block, fetches 
n-th block from that datanode and then opens socket connections to the datanode 
that contains the n+1th block. Sequential-reads might show performance 
improvements if the setting up of socket connections to the datanode containing 
the n+1th block can happen in parallel while the data for the nth block is 
being fetched. The amount of improvement, if any, has to be measured.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-570) When opening a file for read, make the file length avaliable to client.

2009-08-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749018#action_12749018
 ] 

dhruba borthakur commented on HDFS-570:
---

This approach sounds good to me.

> When opening a file for read, make the file length avaliable to client.
> ---
>
> Key: HDFS-570
> URL: https://issues.apache.org/jira/browse/HDFS-570
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client
>Affects Versions: Append Branch
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: Append Branch
>
> Attachments: h570_20090828.patch
>
>
> In order to support read consistency, DFSClient needs the file length at the 
> file opening time.  In the current implmentation, DFSClient obtains the file 
> length at the file opening time but the length is inaccurate if the file is 
> being written.
> For more details, see Section 4 in the [append design 
> doc|https://issues.apache.org/jira/secure/attachment/12415768/appendDesign2.pdf].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-08-28 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-503:
--

Attachment: raid1.txt

Here is a preliminary version of implementing Erasure coding in HDFS.

This package implements a Distributed Raid File System. It is used alongwith
an instance of the Hadoop Distributed File System (HDFS). It can be used to
provide better protection against data corruption. It can also be used to
reduce the total storage requirements of HDFS.

Distributed Raid File System consists of two main software components. The 
first component
is the RaidNode, a daemon that creates parity files from specified HDFS files.
The second component "raidfs" is a software that is layered over a HDFS client 
and it
intercepts all calls that an application makes to the HDFS client. If HDFS 
encounters
corrupted data while reading a file, the raidfs client detects it; it uses the
relevant parity blocks to recover the corrupted data (if possible) and returns
the data to the application. The application is completely transparent to the
fact that parity data was used to satisfy it's read request.

The primary use of this feature is to save disk space for HDFS files.
HDFS typically stores data in triplicate.
The Distributed Raid File System can be configured in such a way that a set of
data blocks of a file are combined together to form one or more parity blocks.
This allows one to reduce the replication factor of a HDFS file from 3 to 2
while keeping the failure probabilty relatively same as before. This typically
results in saving 25% to 30% of storage space in a HDFS cluster.

The RaidNode periodically scans all the specified paths in the configuration
file. For each path, it recursively scans all files that have more than 2 blocks
and that has not been modified during the last few hours (default is 24 hours).
It picks the specified number of blocks (as specified by the stripe size),
from the file, generates a parity block by combining them and
stores the results as another HDFS file in the specified destination
directory. There is a one-to-one mapping between a HDFS
file and its parity file. The RaidNode also periodically finds parity files
that are orphaned and deletes them.

The Distributed Raid FileSystem is layered over a DistributedFileSystem
instance intercepts all calls that go into HDFS. HDFS throws a ChecksumException
or a BlocMissingException when a file read encounters bad data. The layered
Distributed Raid FileSystem catches these exceptions, locates the corresponding
parity file, extract the original data from the parity files and feeds the
extracted data back to the application in a completely transparent way.

The layered Distributed Raid FileSystem does not fix the data-loss that it
encounters while serving data. It merely make the application transparently
use the parity blocks to re-create the original data. A command line tool
"fsckraid" is currently under development that will fix the corrupted files
by extracting the data from the associated parity files. An adminstrator
can run "fsckraid" manually as and when needed.

More details in src/contrib/raid/README


> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: raid1.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-08-28 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-503:
--

Tags: fb

> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: raid1.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-08-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749081#action_12749081
 ] 

dhruba borthakur commented on HDFS-503:
---

Hi Nicholas, I agree with you completely. The current patch implements basic 
xor. Once this patch is accepted by the community, I plan to make the algorithm 
pluggable, so that people can plug in more advanced erasure codes into the  
framework laid out by this patch.

If you have the time and energy, please review the patch and provide any 
feedback you may have. Thanks.

> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: raid1.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-245) Create symbolic links in HDFS

2009-08-30 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749370#action_12749370
 ] 

dhruba borthakur commented on HDFS-245:
---

Hi Yajun, this patch might not be part of 0.21 because I am sort on resources 
to write the test plan, negative unit tests, performance test as scale, etc. If 
somebody can volunteer to write the test plan, user guide and a few negative 
unit tests, that will be of great help (HDFS-254, HDFS-255).

> Create symbolic links in HDFS
> -
>
> Key: HDFS-245
> URL: https://issues.apache.org/jira/browse/HDFS-245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, 
> symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, 
> symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, 
> symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-578) Support for using server default values for blockSize and replication when creating a file

2009-08-31 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749412#action_12749412
 ] 

dhruba borthakur commented on HDFS-578:
---

There are many other parameters that the client can default to the server 
settings. For example, io.file.buffer.size. What if we allow an option for the 
client to fetch a subset of the configuration from the Namenode and then use 
that and then overlay the client-side hdfs-site.xml?

> Support for using server default values for blockSize and replication when 
> creating a file
> --
>
> Key: HDFS-578
> URL: https://issues.apache.org/jira/browse/HDFS-578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client, name-node
>Reporter: Kan Zhang
>Assignee: Kan Zhang
>
> This is a sub-task of HADOOP-4952. This improvement makes it possible for a 
> client to specify that it wants to use the server default values for 
> blockSize and replication params when creating a file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   9   10   >