[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430857#comment-13430857
 ] 

Zhanwei.Wang commented on HDFS-2656:
------------------------------------

@Jing Zhao

Hi, I just had a quick look at your patch, and I had something worry about:

1) In hdfsOpenFile, it seems not really open on a hdfs file, just create a 
handle in libhdfs2. If open for read, it cannot check the error such as file 
not existing. If open for write, the libhdfs2 did not hold the lease, that 
means other client such java client and libhdfs client still alos can open file 
for write. It is big semantic difference with libhdfs.

2) For each read/write, create new connection to namenode and datanode and 
close the connection after the operation, It seems a performance issue.

[~btoddb]
I do not thing libhdfs2 is a replacement of libhdfs since:
1) The performance of this patch seems a issue. In my implementation, I setup 
only one http connection for a opened file and keep the connection until file 
closed. The performance is about 5%~10% slower than java client(without 
short-circuit read), I did not compare it with the libhdfs. I think the 
overhead is on http server and data locality.
2) In my implementation, hdfsFlush is implemented as closing and reopening the 
file, has different semantic with libhdfs. and in this patch, the semantic 
difference of oppen is bigger.

I use libhdfs2 since jvm use too many memory and I also want better 
performance. Currently I have moved to libhdfs3, that is a really replacement 
of libhdfs

About the checksum of libhdfs2, I don't think that is a problem, since http 
server use java client to read data and already verify the checksum, and http 
connection is over tcp, the possibility of read unexpected data is very small.


                
> Implement a pure c client based on webhdfs
> ------------------------------------------
>
>                 Key: HDFS-2656
>                 URL: https://issues.apache.org/jira/browse/HDFS-2656
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Zhanwei.Wang
>         Attachments: HDFS-2656.patch, HDFS-2656.unfinished.patch
>
>
> Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
> seems a little big, and libhdfs can also not be used in the environment 
> without hdfs.
> It seems a good idea to implement a pure c client by wrapping webhdfs. It 
> also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to