[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842077#action_12842077 ] Qi Liu commented on HDFS-814: - Please take a look at HDFS-246. Is there any reason why HDFS-246 is not applied? It seems to me that this patch is only a subset of HDFS-246. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: getLength-yahoo-0.20.patch, h814_20091221.patch, > h814_20091221_0.21.patch, privateInputStream.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841597#action_12841597 ] Tsz Wo (Nicholas), SZE commented on HDFS-814: - getLength-yahoo-0.20.patch looks good. Thanks Hairong. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: getLength-yahoo-0.20.patch, h814_20091221.patch, > h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799002#action_12799002 ] Hudson commented on HDFS-814: - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See [http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/]) > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794614#action_12794614 ] Hudson commented on HDFS-814: - Integrated in Hadoop-Hdfs-trunk #182 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/182/]) > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794446#action_12794446 ] dhruba borthakur commented on HDFS-814: --- > but it has the following issue: Applications that don't care about very > accurate file lengths will pay the cost for files This will happen only if the file is being written to when somebody else does a getFileStatus on the file. This should never happen for the most typical app that runs on HDFS... a map-reduce job. >Cost of ls -r of a dir (say MR output dir) can go up when some of the files in >the subtree are open for writing. I suspect that this is not a typical use-case. The MR-job output directory will typically be empty until the job is committed and all files get renamed into the out directory (from the tmp directory). I am good for this patch because this does not introduce a FileSystem/FileContext API. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794351#action_12794351 ] Hudson commented on HDFS-814: - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See [http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/]) . Add an api to get the visible length of a DFSDataInputStream. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794189#action_12794189 ] Hudson commented on HDFS-814: - Integrated in Hadoop-Hdfs-trunk-Commit #155 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/155/]) . Add an api to get the visible length of a DFSDataInputStream. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794166#action_12794166 ] Tsz Wo (Nicholas), SZE commented on HDFS-814: - I forgot to say that the failure of TestFiDataTransferProtocol2 is not related to this. See HDFS-849. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch, h814_20091221_0.21.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793876#action_12793876 ] Hadoop QA commented on HDFS-814: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428660/h814_20091221.patch against trunk revision 893066. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/157/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/157/console This message is automatically generated. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793861#action_12793861 ] Sanjay Radia commented on HDFS-814: --- I like Dhruba's suggestion because of the transparent behaviour, but it has the following issue: Applications that don't care about very accurate file lengths will pay the cost for files that happen to be open for writing. Cost of ls -r of a dir (say MR output dir) can go up when some of the files in the subtree are open for writing. Isn't it acceptable to say that listStatus returns the last known file size. DFSDataInputStream.getVisibleLen() gives a more accurate result? > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793818#action_12793818 ] Tsz Wo (Nicholas), SZE commented on HDFS-814: - > an we also enhance DFSClient.getFileInfo() to return the current length of a > file (for a file that is being written into)... I think it may be a good idea to change DFSClient.getFileInfo(). But we cannot easily update stat.length since there is no api to update the length in FileStatus. > The benefit of this approach might reduce confusion to users...especially if > DFSClient.getFileInfo() and DfsClient.getFileLength() returns different file > sizes for the same file. Also, I am guessing that this will not introduce any > new performance impact. There is no such method called DfsClient.getFileLength(). Do you mean DfsClient.DFSInputStream.getFileLength()? This method is not visible since DfsClient.DFSInputStream is package private (and I change it to private in my patch). For the performance, it is hard to estimate since there are two additional round trips (one to the NN and one to a DN) for DFSClient.open(..). > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793802#action_12793802 ] Arun C Murthy commented on HDFS-814: bq. Related thought: should we move DFSDataInputStream outside of DFSClient since it will have some significant functionality i.e. getVisibleLength? Let me elaborate: My thinking is that we need an HDFS specific input-stream which is *the* input-stream with features such as getVisibleLength etc. (possibly even getCurrentDatanode, getCurrentBlock ?) > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793800#action_12793800 ] Arun C Murthy commented on HDFS-814: +1, this will be useful for SequenceFile etc. Related thought: should we move DFSDataInputStream outside of DFSClient since it will have some significant functionality i.e. getVisibleLength? > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793728#action_12793728 ] dhruba borthakur commented on HDFS-814: --- Code looks good. Can we also enhance DFSClient.getFileInfo() to return the current length of a file (for a file that is being written into)... something like this: {quote} public FileStatus getFileInfo(String src) throws IOException { checkOpen(); try { FileStatus stat = namenode.getFileInfo(src); if (stat.isUnderConstruction()) { stat.length = DFSClient.open(src).getFileLength(); } } catch(RemoteException re) { throw re.unwrapRemoteException(AccessControlException.class); } } {quote} The benefit of this approach might reduce confusion to users...especially if DFSClient.getFileInfo() and DfsClient.getFileLength() returns different file sizes for the same file. Also, I am guessing that this will not introduce any new performance impact. > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793714#action_12793714 ] Tsz Wo (Nicholas), SZE commented on HDFS-814: - > Can this be added to 0.21 as well? Sure. This is a part of hflush. How does the patch look to you? > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793402#action_12793402 ] dhruba borthakur commented on HDFS-814: --- Can this be added to 0.21 as well? > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > Attachments: h814_20091221.patch > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787982#action_12787982 ] dhruba borthakur commented on HDFS-814: --- The "visible length" of a file should be the same as the "length" of a file, isn't it? If so, isn't it appropriate to return that length in the getFileStatus() call (even for files that are being currently written to)? > Add an api to get the visible length of a DFSDataInputStream. > - > > Key: HDFS-814 > URL: https://issues.apache.org/jira/browse/HDFS-814 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > > Hflush guarantees that the bytes written before are visible to the new > readers. However, there is no way to get the length of the visible bytes. > The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.