[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-06-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3110:
--

Fix Version/s: (was: 3.0.0)
   2.0.1-alpha

Committed to branch-2 as well.

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs, performance
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-06 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3110:
--

Component/s: performance

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs, performance
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.0.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-05 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3110:
--

Release Note: libhdfs is enhanced to read directly into user-supplied 
buffers when possible, reducing the number of memory copies.

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.0.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-05 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3110:
--

   Resolution: Fixed
Fix Version/s: (was: 0.24.0)
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I was able to run the tests. I got some errors on the write side, but then I 
verified that those errors were present before this patch as well. So, I don't 
think it's worth blocking this patch on it, given it makes good progress 
getting the tests back running at all!

Thanks for the contribution, Henry! Nice speedup!

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 3.0.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-04 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.5.patch

You're right - my original thinking was that checking errno != 0 let us know 
that we aren't at EOF, but in fact that's captured by checking whether 
readDirect == 0 (since EOF and 0-byte reads are indistinguishable by return 
code). 

I've removed the check and reworded the comment

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.4.patch

Patch addresses Todd's latest comments.

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch, HDFS-3110.4.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.3.patch

New patch that's actually a diff vs trunk this time :/

I incorporated most of Todd's suggestions. I've left 
HDFS_FILE_SUPPORTS_DIRECT_READ in hdfs.h for now so that users who *really* 
want to turn off support for some reason (perhaps a bug) have access to the 
flag that they can set in hdfsFile's guts. 

I ran the tests against the default local filesystem when no fs.default.name is 
set, and observed no errors except that the tests expect readDirect to be 
available.

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
> HDFS-3110.3.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.2.patch

Patch addressing Todd's concerns.

I added a 'flags' field to hdfsFile that has a bit set if a direct read is 
supported. I detect that by trying to issue a 0-byte read when the file is 
created.  If an exception is thrown, the flag is cleared, otherwise it is set. 
Once the flag is set, all subsequent hdfsRead calls will be diverted to 
hdfsReadDirect. 

An alternative is to use reflection to grab the input stream inside 
FsDataInputStream and use reflection to look for ByteBufferReadable, but that 
feels a little fragile (and complex to do in C); plus if some FS implements 
read(ByteBuffer) only to stub it out with a UnsupportedOperationException or 
similar, reads would never work correctly. 

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-01 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.1.patch

Here's a patch with a simple smoke test. I have also reworked test-libhdfs.sh 
to work against trunk, since it had not been updated since the days of the ant 
build. 

After setting HADOOP_HOME, running test-libhdfs.sh spins up a mini cluster 
(from HDFS-3167) and runs the tests correctly for me from a checkout of trunk. 
This is a better situation than before, even though the script is not yet 
integrated with the automatic build (there is a todo in the hdfs pom.xml for 
this).

Let me know if I should split the refactor of the test script up from this 
JIRA. 

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-03-22 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.0.patch

Patch against trunk taking into account Todd's comments when this was initially 
part of HDFS-2834

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-03-22 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Status: Patch Available  (was: Open)

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
> Attachments: HDFS-3110.0.patch
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-03-16 Thread Henry Robinson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Component/s: libhdfs

> libhdfs implementation of direct read API
> -
>
> Key: HDFS-3110
> URL: https://issues.apache.org/jira/browse/HDFS-3110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 0.24.0
>
>
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
> which leads to significant performance increases when reading local data from 
> C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira