[ https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040860#comment-13040860 ]
Brian Bockelman commented on HDFS-420: -------------------------------------- Hi, I added the ref-counting in there originally because I thought I could get a refcounting scheme to work. I really couldn't come up with a way to safely disconnect the filesystem. OTOH, there really seems to be no reason to disconnect the filesystem: we've had mounts lasting for a month. I'm hoping future versions of libhdfs will help out here, so I'd like to keep things factored out into doConnect and doDisconnect. That way we can revisit ref-counting in the future if we'd like. When I get back from vacation, I'll put Eli's final patch through the paces locally. Brian > Fuse-dfs should cache fs handles > -------------------------------- > > Key: HDFS-420 > URL: https://issues.apache.org/jira/browse/HDFS-420 > Project: Hadoop HDFS > Issue Type: Improvement > Components: contrib/fuse-dfs > Affects Versions: 0.20.2 > Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP > (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) > Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build > 10.0-b19, mixed mode) > Reporter: Dima Brodsky > Assignee: Brian Bockelman > Fix For: 0.23.0 > > Attachments: fuse_dfs_020_memleaks.patch, > fuse_dfs_020_memleaks_v3.patch, fuse_dfs_020_memleaks_v8.patch, > hdfs-420-1.patch, hdfs-420-2.patch > > > Fuse-dfs should cache fs handles on a per-user basis. This significantly > increases performance, and has the side effect of fixing the current code > which leaks fs handles. > The original bug description follows: > I run the following test: > 1. Run hadoop DFS in single node mode > 2. start up fuse_dfs > 3. copy my source tree, about 250 megs, into the DFS > cp -av * /mnt/hdfs/ > in /var/log/messages I keep seeing: > Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime > /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to > 1229385138/1229963739 > and then eventually > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1333 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1333 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1333 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1333 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1209 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1209 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1333 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1209 > Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs > fuse_dfs.c:1037 > and the file system hangs. hadoop is still running and I don't see any > errors in it's logs. I have to unmount the dfs and restart fuse_dfs and then > everything is fine again. At some point I see the following messages in the > /var/log/messages: > ERROR: dfs problem - could not close file_handle(139677114350528) for > /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log > fuse_dfs.c:1464 > Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close > file_handle(139676770220176) for > /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log > fuse_dfs.c:1464 > Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close > file_handle(139677114812832) for > /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log > fuse_dfs.c:1464 > Is this a known issue? Am I just flooding the system too much. All of this > is being performed on a single, dual core, machine. > Thanks! > ttyl > Dima -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira