[ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570484#action_12570484
 ] 

Craig Macdonald commented on HADOOP-4:
--------------------------------------

Pete,

I have been experimenting with fuse_dfs.c and have a few questions:

(1) I am using a previous version of fuse_dfs.c, mainly because I dont have 
bootstrap.sh. However, with respect to the new fuse_dfs.c option parsing - is 
this compatible with calling via mount.fuse, and autofs?

This how I currently mount using an autofs map containing:
{code}
hdfs            -fstype=fuse,rw,nodev,nonempty,noatime,allow_other  
:/path/to/fuse_dfs_moutn/fuse_dfs.sh\#dfs\://namenode\:9000
{code}
fuse_dfs.sh is just a shell script setting the CLASSPATH and LD_LIBRARY_PATH, 
and essentially, just execs the fuse_dfs. If I changed to the more recent 
version, I would probably have to put the dfs://namenode:9000 configuration 
into the script I think.

(2) Have you done any sort of performance testing? I'm experimenting with HDFS 
for use in a mixed envionment (hadoop and non-hadoop jobs), and the throughput 
I see is miserable. For example, I use a test network of 8 P3-1GHz nodes, and a 
similar client on 100meg network.

Below, I compare cat-ing a 512MB file from (a) an NFS mount on the same network 
as the cluster nodes (b) using the hadoop frontend and (c) using the FUSE HDFS 
filesystem.

{noformat}
# (a)
$ time cat /mnt/tmp/data.df > /dev/null

real 0m47.280s
user 0m0.059s
sys 0m2.476s

# (b)
$ time bin/hadoop fs -cat hdfs:///user/craigm/data.df > /dev/null

real 0m48.839s
user 0m16.256s
sys 0m7.001s

# (c)
$ time cat /misc/hdfs/user/craigm/data.df >/dev/null

real    1m41.686s
user    0m0.135s
sys     0m2.302s
{noformat}

Note that the NFS and Hadoop fs -cat obtain about 10.5MB/sec, while the hdfs 
fuse mount (in /misc/hdfs) achieves only 5MB/sec. Is this an expected overhead 
for FUSE? 

I did try tuning rd_buf_size to match the size of reads that the kernel was 
requesting - ie 128KB instead of 32KB, however this made matters worse:

{noformat}
# with 128KB buffer size
$ time cat /misc/hdfs/user/craigm/data.df >/dev/null

real    2m11.080s
user    0m0.113s
sys     0m2.180s
{noformat}

Perhaps an option would be to keep the HDFS file open between reads and timeout 
the connection when not used, or something; read more than we need and then 
keep it in the memory? Both would overly complicate the neat code though!

(3) If I use an autofs for hdfs, then mounts will timeout quickly (30 seconds), 
and then reconnect again on demand. Perhaps fuse_dfs.c can implement the 
destroy fuse operation to free up the connection to the namenode, etc?

Cheers

Craig

> tool to mount dfs on linux
> --------------------------
>
>                 Key: HADOOP-4
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.5.0
>         Environment: linux only
>            Reporter: John Xing
>            Assignee: Doug Cutting
>         Attachments: fuse-dfs.tar.gz, fuse-dfs.tar.gz, fuse-dfs.tar.gz, 
> fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
> fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
> fuse-j-hadoopfs-03.tar.gz, fuse_dfs.c, fuse_dfs.c, Makefile
>
>
> tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to