[jira] Commented: (HDFS-420) fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse

Zhang Bingjun (JIRA) Fri, 04 Sep 2009 01:37:23 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751359#action_12751359
 ]


Zhang Bingjun commented on HDFS-420:
------------------------------------

One memory leak problem were found in function hdfsFreeFileInfo() of file 
hdfs.c (under c++/libhdfs/). The bug can be easily fixed as in this issue: 
https://issues.apache.org/jira/browse/HDFS-596

In my test, before fixing the bug, 1GB memory will be exhausted after writing 
14000 files. After fixing the bug, only 34MB memory were exhausted after 
writing 18000 files.

Another bug in fuse-dfs is the failure of releasing file system handle hdfsFS. 
hdfsFS is a global reference in the JNI code of libhdfs. The failure of 
releasing it will at least resulting the memory leak in storing the global 
reference itself and underlying java object is points to. The current 
implementation of fuse-dfs opens hdfs and generate a global reference (hdfsFS) 
for each file write/read operation. If the underlying objects (FS) are shared, 
the memory leak may mainly come from storing the global reference itself in 
Java VM. 

I am still thinking how to come up with a nice way to release the global 
reference (hdfsFS). Will update you here once I have something.

Thanks!

> fuse_dfs is unable to connect to the dfs after a copying a large number of 
> files into the dfs over fuse
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-420
>                 URL: https://issues.apache.org/jira/browse/HDFS-420
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>         Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP 
> (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) 
> Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 
> 10.0-b19, mixed mode)
>            Reporter: Dima Brodsky
>
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>      cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime 
> /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 
> 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any 
> errors in it's logs.  I have to unmount the dfs and restart fuse_dfs and then 
> everything is fine again.  At some point I see the following messages in the 
> /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
>  fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139676770220176) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
>  fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139677114812832) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
>  fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this 
> is being performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-420) fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse

Reply via email to