[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777202#action_12777202 ] Hudson commented on HDFS-222: - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See [http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/]) Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771418#action_12771418 ] Hudson commented on HDFS-222: - Integrated in Hadoop-Hdfs-trunk #124 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/124/]) . Support for concatenating of files into a single file without copying. Contributed by Boris Shkolnik. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771110#action_12771110 ] Hairong Kuang commented on HDFS-222: +1 The patch looks good. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770696#action_12770696 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423359/HDFS-222-10.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770697#action_12770697 ] Hairong Kuang commented on HDFS-222: Hope that this is the last comment :-) 1. FSNamesystem.java: no need isDirectory() check on INodeFile. 2. FSDirectory.java: optimization: when removing src inode, no need to traverse src path to get all inodes on the path again since src inode and its ancestor indoes are known. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770762#action_12770762 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423380/HDFS-222-10.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-10.patch, HDFS-222-10.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770366#action_12770366 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423268/HDFS-222-9.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769372#action_12769372 ] Hairong Kuang commented on HDFS-222: Borris, sorry that more comments are here. Hope that this will be the last iteration: # Exam all loggings at the info level to see if you can either remove them or change them to be debug level # Remove the change to BlockManager.java since the code is already commented. # FSNamesystem.java: #* no need to verify quota since target and sources are in the same directory #* should use getINodeFile to get inode of target sources #* no need to check if srcInode is root #* minor: cacatInternal better to be a synchronized method in FSDirectory # FSDirectory.java #* No need to track update disk space consumed in upprotectedConcat. #* in deleteFileWithoutComment, no need to update modification time for the parent for every source update disk space consumed. #* you might be able to remove deleteFileWithoutCommit to take advantage of target sources are under one parent. #* minor: optimize the number of copies when concatenating blocks. # FSEditLog.java: check version # for handling OP_CONCAT_DELETE in loadEditLogs. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222-9.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767570#action_12767570 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422587/HDFS-222-8.patch against trunk revision 826149. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222-8.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766430#action_12766430 ] Venkatesh S commented on HDFS-222: -- Perhaps we should restrict the operation to concat files in the same directory. Reasonable but can it be recursive, includes sub directories as well? Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766353#action_12766353 ] Hairong Kuang commented on HDFS-222: Perhaps we should restrict the operation to concat files in the same directory. +1. This is a reasonable restriction that makes the code much clean and much less error prone. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222-6.patch, HDFS-222-7.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763328#action_12763328 ] Boris Shkolnik commented on HDFS-222: - dhruba borthakur wrote: bq. Does it make it easier to use if this can be bin/hadoop hdfs -concat fileA fileB Than can be done,unless anyone objects. I will look into this. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763330#action_12763330 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421595/HDFS-222-5.patch against trunk revision 822153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222-5.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762548#action_12762548 ] dhruba borthakur commented on HDFS-222: --- This is a useful feature to have. I am guessing that there will be a few more hdfs-specific tools that we will develop going forward. This toll currently is invoked by bin/hadoop jar org.apache.hdfs.tools.HDFSConcat fileA fileB. Does it make it easier to use if this can be bin/hadoop hdfs -concat fileA fileB? Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762324#action_12762324 ] Hairong Kuang commented on HDFS-222: Some initial comments: * ClientProtocol.java: *# the protocol's version should be bumped; *# unnecessary changes to the rename signature. * FSNamesytem.java: *# I would suggest the following changes to the code organization so the method naming is consistent with existing namespace changes concat: an un-synchronous method which contains non-inode related checks on the input parameters, calls concatInternal, and sync edit log; concatInternal: an synchrounous private method which does the real work; remove unprotectedConcat in FsNamesystem and add a method concat to FSDirectory which performs all inode-related checkings and namespace changes. *# permission checking: I would prefer to perform permission checking on target and srcs in one place. We need WRITE permission on the parent of the source node not on the ancestor. *# Block size checking could be simplified by making all files have the same preferred block size and each file's last block is full except for the last file. *# INodeFile means this inode represents a file. So checking if an inode is a directory should be performed before converting an inode to be INodeFile. * FSEditLog.java: since the edit log has a new op, on-disk layout version should be updated. * minor: should make all concat related methods have the same signatures. Some of them have src as the 2nd parameter. For the first parameter, I prefer to use target instead of trg. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759818#action_12759818 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420606/HDFS-222-4.patch against trunk revision 818801. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222-4.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759313#action_12759313 ] Hadoop QA commented on HDFS-222: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420476/HDFS-222-3.patch against trunk revision 818575. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 20 javac compiler warnings (more than the trunk's current 18 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 106 release audit warnings (more than the trunk's current 105 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/console This message is automatically generated. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik Attachments: HDFS-222-1.patch, HDFS-222-2.patch, HDFS-222-3.patch, HDFS-222.patch An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756627#action_12756627 ] Boris Shkolnik commented on HDFS-222: - here is the tentative plan: # API void concat(String trg, String ... srcs) wil be added to DistributedFileSystem and DFSClient. # actual implemntation will be in FsNamesystem.java and FSDirectory.java # The following prerequisites will be checked before the actual blocks will be moved: #* Files are not empty and not null #* NameNode is not in the SafeMode #* Permissions are valid: #** Write permissions for target file #** Read permissions for src files #** Write permissions in the source parent directory (for delete). # All the blocks of all the files are of the same size and same replication level. Actions: # Actual blocks moved to the target file # Access/Modification times to be updated of the following files: #* target file #* src directory # Quotas updated: #* Target directory - NSQuota +0, DSQuota +Sum(block sizes) #* Src directory - NSQuota -1, DSQuoa -Sum(block sizes) Errors handling using exceptions. Note. Should srcs be an array instead of varargs? Seems safer and easier to use. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756805#action_12756805 ] Boris Shkolnik commented on HDFS-222: - To simplify (and to avoid overwrite question) I suggest we concatenate srcs' blocks TO the target file. i.e. if we have File1 {Block11, Block12} File2 {Block21, Block22} File3 {Block31, Block32} and we do concat(File1, File2, File3) we get File1 {Block11, Block12, Block21, Block22, Block31, Block32} and File2, File3 deleted To make things atomic we would need to introduce one new OP_CONCAT_DELETE for the EditsLog, which will be recorded only when every block is moved and source file deleted (we cannot just call FsDirectory.delete() for this reason). Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756829#action_12756829 ] Doug Cutting commented on HDFS-222: --- That sounds reasonable. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755575#action_12755575 ] Sanjay Radia commented on HDFS-222: --- Clearly this is a hack to support parallel copies of large files in distcp. (It is an embarrassment that hadoop does not support this). The proper way to do this is to create a first class abstraction for a file as a container for blocks. But that is long project. So the new concat method would be marked as limited-private. Breaking the FileSystem abstraction issue - I don't get it: All file systems impls can support a concat of files, though most cannot do this atomically. Owen are you proposing that we add this to distributedFileSystem and not FileSystem and that distcp does as class narrow to use it if it is available? I am fine with that. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755619#action_12755619 ] Doug Cutting commented on HDFS-222: --- add this to distributedFileSystem and not FileSystem and that distcp does as class narrow to use it if it is available +1 This sounds like a reasonable plan. Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S Assignee: Boris Shkolnik An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-222) Support for concatenating of files into a single file
[ https://issues.apache.org/jira/browse/HDFS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747043#action_12747043 ] Doug Cutting commented on HDFS-222: --- What's the use case? I'm guessing the end-goal is cross-version distcp again. Is that right? If so, I wonder if we should discuss that as a distinct issue and craft an end-to-end solution for it? Support for concatenating of files into a single file - Key: HDFS-222 URL: https://issues.apache.org/jira/browse/HDFS-222 Project: Hadoop HDFS Issue Type: New Feature Reporter: Venkatesh S An API to concatenate files of same size and replication factor on HDFS into a single larger file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.