[ 
https://issues.apache.org/jira/browse/HADOOP-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661289#action_12661289
 ] 

Craig Macdonald commented on HADOOP-4932:
-----------------------------------------

The more I think about this, it does look like doConnectAsUser() need to 
maintain a cache of FS connections.

At present, we in effect to hdfsConnectAsUser for each fuse-dfs operation 
(except read/write). In doing so, a Configuration object is created, used to 
obtain the correct file system, and then destroyed. It is true that 
FileSystem.get(URI,Configuration) does cache filesystems, however we create a 
new Configuration object for each call, with related expenses in parsing the 
XML configuration files etc (if I read the source correctly).

However, in libhdfs, hdfsConnectAsUser creates a new global reference of type 
hdfsFS (i.e. a malloc) for every call, regardless of whether the returned 
FileSystem already had an existing global reference or not.

I believe that fuse-dfs should maintain a cache of filesystem hdfsFS handles on 
a user specific basis. This should allow considerably faster fs operations. 
Pete, I believe you suggested ghashtable? We need a Map<uid_t, hdfsFS> in C 
effectively. Clearing out the cache would be minor issue.

An alternative would be to add a hdfsFSFree function to libhdfs so that we dont 
leak hdfsFS handles. Performing hdfsDisconnect is not the correct course of 
action, as this will close the Java FileSystem object.

I'm not sure that this solves the problem that Dima reports, however, I do 
believe there is a problem here.  

> fuse_dfs is unable to connect to the dfs after a copying a large number of 
> files into the dfs over fuse
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4932
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4932
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>    Affects Versions: 0.19.0
>         Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP 
> (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) 
> Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 
> 10.0-b19, mixed mode)
>            Reporter: Dima Brodsky
>
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>      cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime 
> /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 
> 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any 
> errors in it's logs.  I have to unmount the dfs and restart fuse_dfs and then 
> everything is fine again.  At some point I see the following messages in the 
> /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
>  fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139676770220176) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
>  fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139677114812832) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
>  fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this 
> is being performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to