[jira] Commented: (HADOOP-4) tool to mount dfs on linux

Craig Macdonald (JIRA) Thu, 21 Feb 2008 02:46:08 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571004#action_12571004
 ]


Craig Macdonald commented on HADOOP-4:
--------------------------------------


Hi Pete,


Definently using the latest tar this time ;-)
My first time using the new build system - looks good! 

Some comments:

1. Firstly, I shouldnt have deleted my last comment - though it was clearly in 
error as I was reading the wrong version of fuse_dfs.c. In your comments, can 
you say which file you've just uploaded?

For posterity, previous comment was:

{quote}

I will try the newer version tomorrow when @work. I note that fi->fh isnt used 
or set in dfs_read in your latest version. Could we set it in dfs_open for 
O_READONLY, and then use it if available? 

I'm not clear on the semantics of hdfsPread  - does it assume that offset is 
after previous offset?
If so then we need to check that the current read on a file is strictly after 
the previous read for a previously open FH to be of use - hdfsTell could be of 
use here.
{quote}

2. With respect to the read speed, this is indeed a bit faster in our test 
setting (nearer 6MB/sec), but not yet similar to the Hadoop fs shell (about 
10.5MB/sec). Fuse version 2.7.2
{noformat}
# time bin/hadoop fs -cat /user/craigm/data.df > /dev/null 

real    0m50.347s
user    0m16.023s
sys     0m6.644s

# time cat /misc/hdfs/user/craigm/data.df > /dev/null 

real    1m31.263s
user    0m0.131s
sys     0m2.384s

{noformat}
I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know 
how much CPU time it burns.

Can I ask how your test time test compares to using the Hadoop fs shell on the 
same machine? When reading, the CPU on the client is used 45%ish, similar to 
the Hadoop fs shell CPU use.

I feel it would be good to aim for similar performance as the Hadoop fs shell, 
as this seems reasonable compared to NFS in my test setting, and should scale 
better as the number of concurrent reads increases, given available wire 
bandwidth.

3. With respect to the build system, it could be clearer what --with-dfspath= 
is meant to point to. src/Makefile.am seems to assume that include files are at 
${dfspath}/include/linux and the hdfs.so at ${dfspath}/include/shared. This 
isnt how the Hadoop installation is laid out. Perhaps it would be better if we 
could give an option to the hadoop installation and it's taken from there?

4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my 
shell script about guessing the locations of the JRE shared lib files.

5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that 
was absolute - ie to your installation. Should be deleted when building tar 
file.

6 (minor). update print_usage if you're happy with the specification of 
filesystem options. I made no changes to my shell script or my autofs mount for 
this version to work :-)


Cheers

Craig

> tool to mount dfs on linux
> --------------------------
>
>                 Key: HADOOP-4
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.5.0
>         Environment: linux only
>            Reporter: John Xing
>            Assignee: Doug Cutting
>         Attachments: fuse-dfs.tar.gz, fuse-dfs.tar.gz, fuse-dfs.tar.gz, 
> fuse-dfs.tar.gz, fuse-dfs.tar.gz, 
> fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
> fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
> fuse-j-hadoopfs-03.tar.gz, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c, 
> fuse_dfs.sh, Makefile
>
>
> tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4) tool to mount dfs on linux

Reply via email to