[jira] Commented: (HDFS-472) Document hdfsproxy design and set-up guide
[ https://issues.apache.org/jira/browse/HDFS-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755387#action_12755387 ] Hudson commented on HDFS-472: - Integrated in Hadoop-Hdfs-trunk-Commit #34 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/34/]) Document hdfsproxy design and set-up guide -- Key: HDFS-472 URL: https://issues.apache.org/jira/browse/HDFS-472 Project: Hadoop HDFS Issue Type: Bug Components: contrib/hdfsproxy Reporter: zhiyong zhang Assignee: zhiyong zhang Fix For: 0.21.0 Attachments: HDFS-472.patch, HDFS-472.patch, HDFS-472.patch, hdfsproxy.pdf, hdfsproxy.pdf currently hdfsproxy only have a README file that does not follow closely to the code. Need more documentation on the design, build and set-up guide. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755388#action_12755388 ] Hudson commented on HDFS-385: - Integrated in Hadoop-Hdfs-trunk-Commit #34 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/34/]) . Add support for an API that allows a module external to HDFS to specify how HDFS blocks should be placed. (dhruba) Design a pluggable interface to place replicas of blocks in HDFS Key: HDFS-385 URL: https://issues.apache.org/jira/browse/HDFS-385 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: BlockPlacementPluggable.txt, BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, BlockPlacementPluggable7.txt The current HDFS code typically places one replica on local rack, the second replica on remote random rack and the third replica on a random node of that remote rack. This algorithm is baked in the NameNode's code. It would be nice to make the block placement algorithm a pluggable interface. This will allow experimentation of different placement algorithms based on workloads, availability guarantees and failure models. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-612) FSDataset should not use org.mortbay.log.Log
[ https://issues.apache.org/jira/browse/HDFS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755496#action_12755496 ] Hudson commented on HDFS-612: - Integrated in Hadoop-Hdfs-trunk #84 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/84/]) FSDataset should not use org.mortbay.log.Log Key: HDFS-612 URL: https://issues.apache.org/jira/browse/HDFS-612 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.21.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.21.0 Attachments: h612_20090911.patch, h612_20090911b.patch There are some codes in FSDataset using org.mortbay.log.Log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-202) Add a bulk FIleSystem.getFileBlockLocations
[ https://issues.apache.org/jira/browse/HDFS-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755567#action_12755567 ] Sanjay Radia commented on HDFS-202: --- Maybe we should punt that until someone develops an append-savvy distcp? +1 Why is DetailedFileStatus[] better than MapFileStatus,BlockLocation[]? The latter seems more transparent. I was holding out on a file system interface return a map. But that is old school. Fine I am convinced. I suspect you also want the rpc signature to return a map (that makes me more nervous because most rpcs do not support that - but ours does I guess.). - Wrt to the new FileContext api, my proposal is that its provides a single getBlockLocation method: MapFileStatus,BlockLocation[] getBlockLocations(Path[] path) and abandon the BlockLocation[] getBlockLocations(path, start, end). (of course FileSystem will continue to support the old getBlockLocations.) Add a bulk FIleSystem.getFileBlockLocations --- Key: HDFS-202 URL: https://issues.apache.org/jira/browse/HDFS-202 Project: Hadoop HDFS Issue Type: New Feature Reporter: Arun C Murthy Assignee: Jakob Homan Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file. The downsides are multiple: # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'. It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'. When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-598) Eclipse launch task for HDFS
[ https://issues.apache.org/jira/browse/HDFS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HDFS-598: - Attachment: HDFS-598-v2.patch Yep, Tom, didn't check that one after the backport. The move into hdfs and hdfs-with-mr confused it, too. I've updated the patch to work. Eclipse launch task for HDFS Key: HDFS-598 URL: https://issues.apache.org/jira/browse/HDFS-598 Project: Hadoop HDFS Issue Type: Improvement Components: build Environment: Eclipse 3.5 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Attachments: HDFS-598-v2.patch, hdfs-598.patch Porting HDFS launch task from HADOOP-5911. See MAPREDUCE-905. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-617: --- Status: Open (was: Patch Available) Support for non-recursive create() in HDFS -- Key: HDFS-617 URL: https://issues.apache.org/jira/browse/HDFS-617 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h617-01.patch, h617-02.patch, h617-03.patch, h617-04.patch HADOOP-4952 calls for a create call that doesn't automatically create missing parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755575#action_12755575 ] Sanjay Radia commented on HDFS-222: --- Clearly this is a hack to support parallel copies of large files in distcp. (It is an embarrassment that hadoop does not support this). The proper way to do this is to create a first class abstraction for a file as a container for blocks. But that is long project. So the new concat method would be marked as limited-private. Breaking the FileSystem abstraction issue - I don't get it: All file systems impls can support a concat of files, though most cannot do this atomically. Owen are you proposing that we add this to distributedFileSystem and not FileSystem and that distcp does as class narrow to use it if it is available? I am fine with that. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-598) Eclipse launch task for HDFS
[ https://issues.apache.org/jira/browse/HDFS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755595#action_12755595 ] Eli Collins commented on HDFS-598: -- Thanks Phil! Eclipse launch task for HDFS Key: HDFS-598 URL: https://issues.apache.org/jira/browse/HDFS-598 Project: Hadoop HDFS Issue Type: Improvement Components: build Environment: Eclipse 3.5 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Attachments: HDFS-598-v2.patch, hdfs-598.patch Porting HDFS launch task from HADOOP-5911. See MAPREDUCE-905. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755610#action_12755610 ] Faisal Khan commented on HDFS-573: -- I ran unit tests on Ziliang's patch for libhdfs on Linux and here is the output http://pages.cs.wisc.edu/~faisal/libhdfs_testresult.txt . Tests look ok. Porting libhdfs to Windows -- Key: HDFS-573 URL: https://issues.apache.org/jira/browse/HDFS-573 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Environment: Windows, Visual Studio 2008 Reporter: Ziliang Guo Original Estimate: 336h Remaining Estimate: 336h The current C code in libhdfs is written using C99 conventions and also uses a few POSIX specific functions such as hcreate, hsearch, and pthread mutex locks. To compile it using Visual Studio would require a conversion of the code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of the POSIX functions. The code also uses the stdint.h header, which is not part of the original C89, but there exists what appears to be a BSD licensed reimplementation written to be compatible with MSVC floating around. I have already done the other necessary conversions, as well as created a simplistic hash bucket for use with hcreate and hsearch and successfully built a DLL of libhdfs. Further testing is needed to see if it is usable by other programs to actually access hdfs, which will likely happen in the next few weeks as the Condor Project continues with its file transfer work. In the process, I've removed a few what I believe are extraneous consts and also fixed an incorrect array initialization where someone was attempting to initialize with something like this: JavaVMOption options[noArgs]; where noArgs was being incremented in the code above. This was in the hdfsJniHelper.c file, in the getJNIEnv function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755619#action_12755619 ] Doug Cutting commented on HDFS-222: --- add this to distributedFileSystem and not FileSystem and that distcp does as class narrow to use it if it is available +1 This sounds like a reasonable plan. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-202) Add a bulk FIleSystem.getFileBlockLocations
[ https://issues.apache.org/jira/browse/HDFS-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755621#action_12755621 ] dhruba borthakur commented on HDFS-202: --- +1 Add a bulk FIleSystem.getFileBlockLocations --- Key: HDFS-202 URL: https://issues.apache.org/jira/browse/HDFS-202 Project: Hadoop HDFS Issue Type: New Feature Reporter: Arun C Murthy Assignee: Jakob Homan Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file. The downsides are multiple: # Even with a few thousand files to process the number of RPCs quickly starts getting noticeable # The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'. It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'. When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755627#action_12755627 ] Raghu Angadi commented on HDFS-516: --- bq. somehow, from 213 seconds to 112 seconds to stream 1GB from a remote HDFS file. This is 5MBps for HDFS and 9MBps for RadFS. Assuming 9MBps is probably 100Mbps network limit (is it?), 5MBps is too low for any FS. Since both reads are from the same physical files, this may not be hardware related. Could you check what is causing this delay? This might be affecting other benchmarks as well. Checking netstat on the client while this read is going on might help. Regd reads in RAD fs, does client fetch 32KB each time (single RPC) or does it pipeline (multiple requests for single client's stream)? @Todd, I essentially see this as POC of what could/should be improved in HDFS for addressing latency issues. Contrib makes sense, but I would not expect this to go to production in this form and should be marked 'Experimental'. The benchmarks also help greatly in setting priorities for features. I don't think this needs a branch since it does not touch core at all. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-617: --- Status: Patch Available (was: Open) Support for non-recursive create() in HDFS -- Key: HDFS-617 URL: https://issues.apache.org/jira/browse/HDFS-617 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h617-01.patch, h617-02.patch, h617-03.patch, h617-04.patch, h617-06.patch HADOOP-4952 calls for a create call that doesn't automatically create missing parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755640#action_12755640 ] Todd Lipcon commented on HDFS-516: -- bq. I essentially see this as POC of what could/should be improved in HDFS for addressing latency issues. Contrib makes sense, but I would not expect this to go to production in this form and should be marked 'Experimental'. +1 Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-616) Create functional tests for new design of the block report
[ https://issues.apache.org/jira/browse/HDFS-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-616: Attachment: HDFS-616.patch This patch adds BlockReport03 through BlockReport08 cases according to HDFS-551's test plan. The modifications are done against Append branch Create functional tests for new design of the block report -- Key: HDFS-616 URL: https://issues.apache.org/jira/browse/HDFS-616 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: HDFS-616.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-617: Component/s: name-node hdfs client Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] +1 the patch is perfect! Support for non-recursive create() in HDFS -- Key: HDFS-617 URL: https://issues.apache.org/jira/browse/HDFS-617 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.21.0 Attachments: h617-01.patch, h617-02.patch, h617-03.patch, h617-04.patch, h617-06.patch HADOOP-4952 calls for a create call that doesn't automatically create missing parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-617: Resolution: Fixed Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed]) Status: Resolved (was: Patch Available) I have committed this. Thanks, Kan! Please add release note. Support for non-recursive create() in HDFS -- Key: HDFS-617 URL: https://issues.apache.org/jira/browse/HDFS-617 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.21.0 Attachments: h617-01.patch, h617-02.patch, h617-03.patch, h617-04.patch, h617-06.patch HADOOP-4952 calls for a create call that doesn't automatically create missing parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
[ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755716#action_12755716 ] Tsz Wo (Nicholas), SZE commented on HDFS-621: - I guess you need a separated MAPREDUCE patch for MiniMR. Exposing MiniDFS and MiniMR clusters as a single process command-line - Key: HDFS-621 URL: https://issues.apache.org/jira/browse/HDFS-621 Project: Hadoop HDFS Issue Type: New Feature Components: test, tools Reporter: Philip Zeyliger Priority: Minor It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess. I've been using just such a system for a couple of weeks, and I like it. It's significantly easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean up after it. I figure others might find it useful as well. I'm at a bit of a loss as to where to put it in 0.21. hdfs-with-mr tests have all the required libraries, so I've put it there. I could conceivably split this into minimr and minihdfs, but it's specifically the fact that they're configured to talk to each other that I like about having them together. And one JVM is better than two for my test programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
[ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HDFS-621: - Assignee: Philip Zeyliger Status: Patch Available (was: Open) Exposing MiniDFS and MiniMR clusters as a single process command-line - Key: HDFS-621 URL: https://issues.apache.org/jira/browse/HDFS-621 Project: Hadoop HDFS Issue Type: New Feature Components: test, tools Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Minor Attachments: HDFS-621.patch It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess. I've been using just such a system for a couple of weeks, and I like it. It's significantly easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean up after it. I figure others might find it useful as well. I'm at a bit of a loss as to where to put it in 0.21. hdfs-with-mr tests have all the required libraries, so I've put it there. I could conceivably split this into minimr and minihdfs, but it's specifically the fact that they're configured to talk to each other that I like about having them together. And one JVM is better than two for my test programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
[ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755743#action_12755743 ] Tsz Wo (Nicholas), SZE commented on HDFS-621: - If you would, take a quick look to see how I use MiniMRCluster. Do you feel I'm abusing the fact that hdfs-hdfswithmr-test exists? No more mapreduce codes in hdfs, please. Having hdfs-with-mr in hdfs is a mistake. It leads to a circular dependence. Indeed, we should move hdfs-with-mr to mapreduce. Exposing MiniDFS and MiniMR clusters as a single process command-line - Key: HDFS-621 URL: https://issues.apache.org/jira/browse/HDFS-621 Project: Hadoop HDFS Issue Type: New Feature Components: test, tools Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Minor Attachments: HDFS-621.patch It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess. I've been using just such a system for a couple of weeks, and I like it. It's significantly easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean up after it. I figure others might find it useful as well. I'm at a bit of a loss as to where to put it in 0.21. hdfs-with-mr tests have all the required libraries, so I've put it there. I could conceivably split this into minimr and minihdfs, but it's specifically the fact that they're configured to talk to each other that I like about having them together. And one JVM is better than two for my test programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
[ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HDFS-621: - Attachment: HDFS-621-0.20-patch Attaching the 0.20 version. Conveniently, where to place it is not a problem there :) Exposing MiniDFS and MiniMR clusters as a single process command-line - Key: HDFS-621 URL: https://issues.apache.org/jira/browse/HDFS-621 Project: Hadoop HDFS Issue Type: New Feature Components: test, tools Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Minor Attachments: HDFS-621-0.20-patch, HDFS-621.patch It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess. I've been using just such a system for a couple of weeks, and I like it. It's significantly easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean up after it. I figure others might find it useful as well. I'm at a bit of a loss as to where to put it in 0.21. hdfs-with-mr tests have all the required libraries, so I've put it there. I could conceivably split this into minimr and minihdfs, but it's specifically the fact that they're configured to talk to each other that I like about having them together. And one JVM is better than two for my test programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-618) Support for non-recursive mkdir in HDFS
[ https://issues.apache.org/jira/browse/HDFS-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-618: --- Attachment: h618-06.patch Support for non-recursive mkdir in HDFS --- Key: HDFS-618 URL: https://issues.apache.org/jira/browse/HDFS-618 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h618-03.patch, h618-04.patch, h618-06.patch Existing mkdirs call automatically creates missing parent directories. HADOOP-4952 call for a mkdir call that doesn't. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-618) Support for non-recursive mkdir in HDFS
[ https://issues.apache.org/jira/browse/HDFS-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755757#action_12755757 ] Kan Zhang commented on HDFS-618: attached a new patch for the latest trunk. also updated the test to check for FileAlreadyExistsException. Support for non-recursive mkdir in HDFS --- Key: HDFS-618 URL: https://issues.apache.org/jira/browse/HDFS-618 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h618-03.patch, h618-04.patch, h618-06.patch Existing mkdirs call automatically creates missing parent directories. HADOOP-4952 call for a mkdir call that doesn't. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-574) Hadoop Doc Split: HDFS Docs
[ https://issues.apache.org/jira/browse/HDFS-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated HDFS-574: - Attachment: hdfs-logo.jpg New logo file ... hdfs-logo.jpg Hadoop Doc Split: HDFS Docs --- Key: HDFS-574 URL: https://issues.apache.org/jira/browse/HDFS-574 Project: Hadoop HDFS Issue Type: Task Components: documentation Affects Versions: 0.21.0 Reporter: Corinne Chandel Assignee: Owen O'Malley Priority: Blocker Attachments: Hadoop-Doc-Split-2.doc, Hadoop-Doc-Split.doc, HDFS-574-hdfs.patch, hdfs-logo.jpg Hadoop Doc Split: HDFS Docs Please note that I am unable to directly check all of the new links. Some links may break and will need to be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-617) Support for non-recursive create() in HDFS
[ https://issues.apache.org/jira/browse/HDFS-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-617: --- Release Note: Support for non-recursive create() Support for non-recursive create() in HDFS -- Key: HDFS-617 URL: https://issues.apache.org/jira/browse/HDFS-617 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.21.0 Attachments: h617-01.patch, h617-02.patch, h617-03.patch, h617-04.patch, h617-06.patch HADOOP-4952 calls for a create call that doesn't automatically create missing parent directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-616) Create functional tests for new design of the block report
[ https://issues.apache.org/jira/browse/HDFS-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-616: Attachment: HDFS-616.patch All testcases are now in place. 3 of them keep failing for some of the functionality isn't yet implemented. Create functional tests for new design of the block report -- Key: HDFS-616 URL: https://issues.apache.org/jira/browse/HDFS-616 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: Append Branch Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: HDFS-616.patch, HDFS-616.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-551) Create new functional test for a block report.
[ https://issues.apache.org/jira/browse/HDFS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-551: Attachment: BlockReportTestPlan.html Last three test cases were removed for they are essentially re-enforce the behavior of BlockReport_09 Create new functional test for a block report. -- Key: HDFS-551 URL: https://issues.apache.org/jira/browse/HDFS-551 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Fix For: 0.21.0 Attachments: BlockReportTestPlan.html, BlockReportTestPlan.html, BlockReportTestPlan.html, BlockReportTestPlan.html, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch, HDFS-551.patch It turned out that there's no test for block report functionality. The one would be extremely valuable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-592) Allow client to get a new generation stamp from NameNode
[ https://issues.apache.org/jira/browse/HDFS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-592: --- Attachment: newGS2.patch This patch incorporates Kan's reveiw comments and adds a unit test for the new API in ClientProtocol. I understand your concern about the naming of the API and what you said makes good sense. But still I do not like the name pipelineRecovery or recoverPipeline. This patch uses the name getNewStampForPipeline. If you still do not like the name, could we resolve the naming issue later? I will keep this in my mind. Allow client to get a new generation stamp from NameNode Key: HDFS-592 URL: https://issues.apache.org/jira/browse/HDFS-592 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Append Branch Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: Append Branch Attachments: newGS.patch, newGS1.patch, newGS2.patch This issue aims to add an API to ClientProtocol that fetches a new generation stamp and an access token from NameNode to support append or pipeline recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-592) Allow client to get a new generation stamp from NameNode
[ https://issues.apache.org/jira/browse/HDFS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755800#action_12755800 ] Hairong Kuang commented on HDFS-592: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Allow client to get a new generation stamp from NameNode Key: HDFS-592 URL: https://issues.apache.org/jira/browse/HDFS-592 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Append Branch Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: Append Branch Attachments: newGS.patch, newGS1.patch, newGS2.patch This issue aims to add an API to ClientProtocol that fetches a new generation stamp and an access token from NameNode to support append or pipeline recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-592) Allow client to get a new generation stamp from NameNode
[ https://issues.apache.org/jira/browse/HDFS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755816#action_12755816 ] Kan Zhang commented on HDFS-592: If you still do not like the name, could we resolve the naming issue later? Well, I have said enough on this. It's your call. I have a further comment on the following lease checking. It seems if the client sends NULL for clientName, the checking is bypassed, which could become a security loophole. {code} +if (clientName != null !pendingFile.getClientName().equals(clientName)) { + throw new LeaseExpiredException(Lease mismatch: + block + owned by + + pendingFile.getClientName() + but is accessed by + clientName); +} {code} Allow client to get a new generation stamp from NameNode Key: HDFS-592 URL: https://issues.apache.org/jira/browse/HDFS-592 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Append Branch Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: Append Branch Attachments: newGS.patch, newGS1.patch, newGS2.patch This issue aims to add an API to ClientProtocol that fetches a new generation stamp and an access token from NameNode to support append or pipeline recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-592) Allow client to get a new generation stamp from NameNode
[ https://issues.apache.org/jira/browse/HDFS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-592: --- Attachment: newGS3.patch Kan, thanks for catching this. ClientName should never be null in the real system. But anyway the new patch checks the null case and adds a new unit test for this. Allow client to get a new generation stamp from NameNode Key: HDFS-592 URL: https://issues.apache.org/jira/browse/HDFS-592 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Append Branch Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: Append Branch Attachments: newGS.patch, newGS1.patch, newGS2.patch, newGS3.patch This issue aims to add an API to ClientProtocol that fetches a new generation stamp and an access token from NameNode to support append or pipeline recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755823#action_12755823 ] Jay Booth commented on HDFS-516: Yeah, I was puzzled by the performance too. I dug through the DFS code and I'm saving a bit on new socket and object creation, maybe a couple instructions here and there, but that shouldn't add up to 100 seconds for a gigabyte (approx 20 blocks). I'm calling read() a bajillion times in a row so it's conceivable (although unlikely) that I'm pegging the CPU and that's the limiting factor. I'm busy for a couple days but will get back to you with some figures from netstat, top and whatever else I can think of, along with another streaming case that works with read(b, off, len) to see if that changes things. I'll do a little more digging into DFS as well to see if I can isolate the cause. I definitely did run them several times on the same machine and another time on a different cluster with similar results, so it wasn't simply bad luck on the rack placement on EC2 (well maybe but unlikely). Will report back when I have more numbers. After I get those, my roadmap for this is to add checksum support and better DatanodeInfo caching. User groups would come after that. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-618) Support for non-recursive mkdir in HDFS
[ https://issues.apache.org/jira/browse/HDFS-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755825#action_12755825 ] Hadoop QA commented on HDFS-618: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419697/h618-06.patch against trunk revision 815496. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/8/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/8/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/8/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/8/console This message is automatically generated. Support for non-recursive mkdir in HDFS --- Key: HDFS-618 URL: https://issues.apache.org/jira/browse/HDFS-618 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h618-03.patch, h618-04.patch, h618-06.patch Existing mkdirs call automatically creates missing parent directories. HADOOP-4952 call for a mkdir call that doesn't. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-622) checkMinReplication should count only live node.
checkMinReplication should count only live node. Key: HDFS-622 URL: https://issues.apache.org/jira/browse/HDFS-622 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.21.0 {{BlockManager.checkMinReplication(Block)}} currently counts all replicas of the block even if they are corrupt. Corrupt replicas should be excluded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-622) checkMinReplication should count only live node.
[ https://issues.apache.org/jira/browse/HDFS-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-622: - Attachment: liveReplicas.patch Here is a simple patch which replaces {{blocksMap.numNodes(block) = minReplication}} with {{countNodes(block).liveReplicas() = minReplication}} checkMinReplication should count only live node. Key: HDFS-622 URL: https://issues.apache.org/jira/browse/HDFS-622 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.21.0 Attachments: liveReplicas.patch {{BlockManager.checkMinReplication(Block)}} currently counts all replicas of the block even if they are corrupt. Corrupt replicas should be excluded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-622) checkMinReplication should count only live node.
[ https://issues.apache.org/jira/browse/HDFS-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-622: - Status: Patch Available (was: Open) checkMinReplication should count only live node. Key: HDFS-622 URL: https://issues.apache.org/jira/browse/HDFS-622 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.21.0 Attachments: liveReplicas.patch {{BlockManager.checkMinReplication(Block)}} currently counts all replicas of the block even if they are corrupt. Corrupt replicas should be excluded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-623) hdfs jar-test ant target fails with the latest commons jar's from the common trunk
hdfs jar-test ant target fails with the latest commons jar's from the common trunk -- Key: HDFS-623 URL: https://issues.apache.org/jira/browse/HDFS-623 Project: Hadoop HDFS Issue Type: Bug Reporter: Giridharan Kesavan [javac] somelocation/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestReplicationPolicy.java:67: incompatible types [javac] found : org.apache.hadoop.hdfs.server.namenode.ReplicationTargetChooser [javac] required: org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy [javac] replicator = fsNamesystem.blockManager.replicator; [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 5 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.