[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3110: -- Fix Version/s: (was: 3.0.0) 2.0.1-alpha Committed to branch-2 as well. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs, performance >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 2.0.1-alpha > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3110: -- Component/s: performance > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs, performance >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 3.0.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3110: -- Release Note: libhdfs is enhanced to read directly into user-supplied buffers when possible, reducing the number of memory copies. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 3.0.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3110: -- Resolution: Fixed Fix Version/s: (was: 0.24.0) 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I was able to run the tests. I got some errors on the write side, but then I verified that those errors were present before this patch as well. So, I don't think it's worth blocking this patch on it, given it makes good progress getting the tests back running at all! Thanks for the contribution, Henry! Nice speedup! > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 3.0.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.5.patch You're right - my original thinking was that checking errno != 0 let us know that we aren't at EOF, but in fact that's captured by checking whether readDirect == 0 (since EOF and 0-byte reads are indistinguishable by return code). I've removed the check and reworded the comment > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.4.patch Patch addresses Todd's latest comments. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch, HDFS-3110.4.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.3.patch New patch that's actually a diff vs trunk this time :/ I incorporated most of Todd's suggestions. I've left HDFS_FILE_SUPPORTS_DIRECT_READ in hdfs.h for now so that users who *really* want to turn off support for some reason (perhaps a bug) have access to the flag that they can set in hdfsFile's guts. I ran the tests against the default local filesystem when no fs.default.name is set, and observed no errors except that the tests expect readDirect to be available. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, > HDFS-3110.3.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.2.patch Patch addressing Todd's concerns. I added a 'flags' field to hdfsFile that has a bit set if a direct read is supported. I detect that by trying to issue a 0-byte read when the file is created. If an exception is thrown, the flag is cleared, otherwise it is set. Once the flag is set, all subsequent hdfsRead calls will be diverted to hdfsReadDirect. An alternative is to use reflection to grab the input stream inside FsDataInputStream and use reflection to look for ByteBufferReadable, but that feels a little fragile (and complex to do in C); plus if some FS implements read(ByteBuffer) only to stub it out with a UnsupportedOperationException or similar, reads would never work correctly. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.1.patch Here's a patch with a simple smoke test. I have also reworked test-libhdfs.sh to work against trunk, since it had not been updated since the days of the ant build. After setting HADOOP_HOME, running test-libhdfs.sh spins up a mini cluster (from HDFS-3167) and runs the tests correctly for me from a checkout of trunk. This is a better situation than before, even though the script is not yet integrated with the automatic build (there is a todo in the hdfs pom.xml for this). Let me know if I should split the refactor of the test script up from this JIRA. > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.0.patch Patch against trunk taking into account Todd's comments when this was initially part of HDFS-2834 > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Status: Patch Available (was: Open) > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > Attachments: HDFS-3110.0.patch > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Component/s: libhdfs > libhdfs implementation of direct read API > - > > Key: HDFS-3110 > URL: https://issues.apache.org/jira/browse/HDFS-3110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 0.24.0 > > > Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, > which leads to significant performance increases when reading local data from > C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira