David, What kernel are you running on the file server? I've heard on the list that the stock RedHat kernels are compiled with too small of a stack size option and that running NFS and lustre on the same node will not behave well together. A minimum of a 8k stack size is needed for this configuration.
-mb On Mar 11, 2011, at 12:37 PM, David Noriega wrote: > We've been running Lustre happily for a few months now, but we have > one client that can be troublesome at times and it happens to be the > most important client. Its our "file server" client as it runs NFS and > Samba. I'm not sure where to start. I've seen this client disconnect > from lustre nodes, but then recover and reconnect. There are hundreds > of messages in dmesg about a few inodes. The big problem happened a > few weeks ago when this client was booted and never could reconnect. > The client and the lustre nodes simply kept saying HELLO to each > other. > > Anyways as of right now this is what I see in dmesg: > > nfsd: non-standard errno: -108 > LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 > LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 > previous similar messages > LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure > -108 inode 561619132 > LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped > 777 previous similar messages > LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode > 18382976 mdc close failed: rc = -108 > nfsd: non-standard errno: -108 > LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped > 17238 previous similar messages > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ > IMP_INVALID req@ffff81032da81800 x1360479978792199/t0 > o35->lustre-MDT0000_UUID@192.168.5.104@tcp:23/10 lens 408/1128 e 0 to > 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 > LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped > 19011 previous similar messages > nfsd: non-standard errno: -108 > > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104@tcp. The mds_close operation failed with -116 > LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode > 12094041 mdc close failed: rc = -116 > LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) > failure -2 inode 560111661 > > > Any ideas? > > -- > Personally, I liked the university. They gave us money and facilities, > we didn't have to produce anything! You've never been out of college! > You don't know what it's like out there! I've worked in the private > sector. They expect results. -Ray Ghostbusters > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss -- +----------------------------------------------- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +----------------------------------------------- _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss