[
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571004#action_12571004
]
Craig Macdonald commented on HADOOP-4:
--------------------------------------
Hi Pete,
Definently using the latest tar this time ;-)
My first time using the new build system - looks good!
Some comments:
1. Firstly, I shouldnt have deleted my last comment - though it was clearly in
error as I was reading the wrong version of fuse_dfs.c. In your comments, can
you say which file you've just uploaded?
For posterity, previous comment was:
{quote}
I will try the newer version tomorrow when @work. I note that fi->fh isnt used
or set in dfs_read in your latest version. Could we set it in dfs_open for
O_READONLY, and then use it if available?
I'm not clear on the semantics of hdfsPread - does it assume that offset is
after previous offset?
If so then we need to check that the current read on a file is strictly after
the previous read for a previously open FH to be of use - hdfsTell could be of
use here.
{quote}
2. With respect to the read speed, this is indeed a bit faster in our test
setting (nearer 6MB/sec), but not yet similar to the Hadoop fs shell (about
10.5MB/sec). Fuse version 2.7.2
{noformat}
# time bin/hadoop fs -cat /user/craigm/data.df > /dev/null
real 0m50.347s
user 0m16.023s
sys 0m6.644s
# time cat /misc/hdfs/user/craigm/data.df > /dev/null
real 1m31.263s
user 0m0.131s
sys 0m2.384s
{noformat}
I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know
how much CPU time it burns.
Can I ask how your test time test compares to using the Hadoop fs shell on the
same machine? When reading, the CPU on the client is used 45%ish, similar to
the Hadoop fs shell CPU use.
I feel it would be good to aim for similar performance as the Hadoop fs shell,
as this seems reasonable compared to NFS in my test setting, and should scale
better as the number of concurrent reads increases, given available wire
bandwidth.
3. With respect to the build system, it could be clearer what --with-dfspath=
is meant to point to. src/Makefile.am seems to assume that include files are at
${dfspath}/include/linux and the hdfs.so at ${dfspath}/include/shared. This
isnt how the Hadoop installation is laid out. Perhaps it would be better if we
could give an option to the hadoop installation and it's taken from there?
4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my
shell script about guessing the locations of the JRE shared lib files.
5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that
was absolute - ie to your installation. Should be deleted when building tar
file.
6 (minor). update print_usage if you're happy with the specification of
filesystem options. I made no changes to my shell script or my autofs mount for
this version to work :-)
Cheers
Craig
> tool to mount dfs on linux
> --------------------------
>
> Key: HADOOP-4
> URL: https://issues.apache.org/jira/browse/HADOOP-4
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.5.0
> Environment: linux only
> Reporter: John Xing
> Assignee: Doug Cutting
> Attachments: fuse-dfs.tar.gz, fuse-dfs.tar.gz, fuse-dfs.tar.gz,
> fuse-dfs.tar.gz, fuse-dfs.tar.gz,
> fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz,
> fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz,
> fuse-j-hadoopfs-03.tar.gz, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c,
> fuse_dfs.sh, Makefile
>
>
> tool to mount dfs on linux
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.