Jim: Actually, do you have the source rpms from the binary rpms that you sent me?
Becky On Thu, Aug 2, 2012 at 1:11 PM, Becky Ligon <[email protected]> wrote: > Jim: > > I am running a 2.8.6 client agains a 2.8.5 server on CentOS-6.2 and tried > to reproduce your problem. I have over 100 files in my directory and my > "ls" is working. So, I'm thinking that the creation and/or installation of > the kernel module and client are hosed some how on your site. > > There *might* be a difference in the two OrangeFS versions that you are > using. I will install from the rpms that you sent me and see if I can > recreate the problem. > > Becky > > On Thu, Aug 2, 2012 at 12:35 PM, Becky Ligon <[email protected]> wrote: > >> Remind me again of your production OS? And, did you create the kernel >> module using this OS? >> >> Becky >> >> >> On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote: >> >>> Jim: >>> >>> It might be easier for me to debug if you can set up an account for me >>> and let me look at your environment. Is this possible? >>> >>> Becky >>> >>> >>> On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote: >>> >>>> Actually a user has figured it out, at least in one directory: 66 >>>> enteries work, but 67 fail: >>>> >>>> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to >>>> figure out what was going on. If you are interested, you could try >>>> these steps in this directory above: >>>> >>>> 1) ls |wc -l (gives you 66) >>>> >>>> 2) emacs a.txt (creates new file a.txt) >>>> >>>> 3) CRTL x-s (saves new file) >>>> >>>> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error) >>>> >>>> 5) rm a.txt (removes file- can't tab to finish name, must type in >>>> entire) >>>> >>>> 6) ls |wc -l (gives you 66 - all back to normal) >>>> >>>> --Jim >>>> >>>> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote: >>>> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all >>>> > of them) now. >>>> > >>>> > Output on a sample directory: >>>> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS >>>> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid >>>> argument >>>> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS >>>> > output >>>> > getncfromNCAR.csh >>>> > mz4assim_conus_1h_20070101.nc >>>> > mz4assim_conus_1h_20070102.nc >>>> > >>>> > >>>> > Another user indicates this is based on how many files are in the >>>> > directory. If he knows the file name, the file is still accessible, >>>> > but ls or tab completion or anything like that fail. If he deletes >>>> > files to get it under a not-exactly-deturmined amount, ls works again. >>>> > >>>> > --Jim >>>> > >>>> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]> >>>> wrote: >>>> >> Jim: >>>> >> >>>> >> Are you running 2.8.6 on the server and the client? Or, just 2.8.6 >>>> from >>>> >> the head node? >>>> >> >>>> >> Can you run a "ls" on their directories that appear to be missing >>>> data? Can >>>> >> you also run pvfs2-ls on those same directories? Please send me the >>>> output >>>> >> from both commands. >>>> >> >>>> >> Thanks, >>>> >> Becky >>>> >> >>>> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]> >>>> wrote: >>>> >>> >>>> >>> So, since switching over to 2.8.6, I've had two users report that >>>> >>> their larger directories are missing files / data. >>>> >>> >>>> >>> Now I'm really in for it....I'm asking for more details, but I'll >>>> need >>>> >>> to address this pretty thoroughly and rapidly...File systems that >>>> >>> loose user data are not useful. >>>> >>> >>>> >>> --Jim >>>> >>> >>>> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]> >>>> wrote: >>>> >>> > Jim: >>>> >>> > >>>> >>> > The documentation link that I sent doesn't seem to work. Instead: >>>> >>> > >>>> >>> > go to www.orangefs.org and click on the html link for the >>>> install guide, >>>> >>> > about midway down the page. >>>> >>> > >>>> >>> > the install guide has a section on setting up a client and in >>>> section >>>> >>> > 3.3 is >>>> >>> > the description of the pvfs2tab file. >>>> >>> > >>>> >>> > Becky >>>> >>> > >>>> >>> > >>>> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]> >>>> wrote: >>>> >>> >> >>>> >>> >> Jim: >>>> >>> >> >>>> >>> >> To generate a new config file, issue the command: >>>> >>> >> >>>> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name> >>>> >>> >> >>>> >>> >> You will be asked a set of questions regarding your >>>> installation. This >>>> >>> >> utility may not provide everything you need, just depends on your >>>> >>> >> setup. To >>>> >>> >> help you, I will forward you a copy of our production conf file. >>>> You >>>> >>> >> can >>>> >>> >> compare it to your own needs and modify the new conf file as >>>> needed. >>>> >>> >> After >>>> >>> >> you create a new conf file, I would be happy to review it for >>>> you. >>>> >>> >> >>>> >>> >> I'm not sure how your clients have started without a proper >>>> pvfs2tab >>>> >>> >> file, >>>> >>> >> unless you have the appropriate info in your fstab file. The >>>> mount >>>> >>> >> info >>>> >>> >> could be in either file. I will send you a copy of our >>>> production >>>> >>> >> pvfs2tab >>>> >>> >> file as an example. >>>> >>> >> >>>> >>> >> The link below will describe how to create the entries in the >>>> >>> >> pvfs2tab/fstab file. >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client >>>> >>> >> >>>> >>> >> Thanks for giving 2.8.6 a try! Let me know how it goes! >>>> >>> >> >>>> >>> >> Becky >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected] >>>> > >>>> >>> >> wrote: >>>> >>> >>> >>>> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there >>>> and a >>>> >>> >>> full cluster at the moment, so I can't intentionally reboot it. >>>> If it >>>> >>> >>> crashes on me today, I'll take the opportunity to update >>>> everything as >>>> >>> >>> soon as it comes back and reboot it again. Otherwise, I'll try >>>> early >>>> >>> >>> tomorrow morning to load and reboot. >>>> >>> >>> >>>> >>> >>> Also, you previously mentioned my pvfs2 server configuration >>>> file >>>> >>> >>> format was out of date. Can you suggest a new config file >>>> format to >>>> >>> >>> use based on what I gave you? Also, I've never had a pvfs2tab >>>> file on >>>> >>> >>> my clients, and my attempts to create one so far have failed. >>>> It >>>> >>> >>> seems I don't know the proper syntax, and I haven't found a >>>> >>> >>> sufficiently clear documentation on that either. It has worked >>>> for ~4 >>>> >>> >>> years without one, but... >>>> >>> >>> >>>> >>> >>> --Jim >>>> >>> >>> >>>> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon < >>>> [email protected]> >>>> >>> >>> wrote: >>>> >>> >>> > Jim: >>>> >>> >>> > >>>> >>> >>> > Next time this happens, can you attach to the >>>> pvfs2-client-core >>>> >>> >>> > process >>>> >>> >>> > using gdb and see if you can tell in which function it seems >>>> to >>>> >>> >>> > spinning? >>>> >>> >>> > Also, you can try turning on client debugging, so we can see >>>> what >>>> >>> >>> > the >>>> >>> >>> > client >>>> >>> >>> > core is doing. To turn on debugging dynamically, issue the >>>> >>> >>> > following: >>>> >>> >>> > >>>> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug >>>> >>> >>> > >>>> >>> >>> > With the CPU so high, the client-core may or may not see the >>>> change >>>> >>> >>> > in >>>> >>> >>> > gossip_debug settings. If it does, then a lot of output will >>>> be >>>> >>> >>> > generated! >>>> >>> >>> > Before you reboot your system, make a copy of the client log >>>> and >>>> >>> >>> > send >>>> >>> >>> > that >>>> >>> >>> > to me, along with any information you might get from gdb. >>>> >>> >>> > >>>> >>> >>> > When you can, please try using 2.8.6 on your head node and >>>> see if >>>> >>> >>> > you >>>> >>> >>> > can >>>> >>> >>> > reproduce the problem. >>>> >>> >>> > >>>> >>> >>> > Thanks, >>>> >>> >>> > Becky >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir < >>>> [email protected]> >>>> >>> >>> > wrote: >>>> >>> >>> >> >>>> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and >>>> reopened on >>>> >>> >>> >> reboot (eg, all entries are lost). Already checked. Also, I >>>> >>> >>> >> didn't >>>> >>> >>> >> see anything in /var/log/messages (I looked there when the >>>> problem >>>> >>> >>> >> started mounting). There appears to be no "paper trail" of >>>> this >>>> >>> >>> >> incident, which is why its been so hard to track down. >>>> >>> >>> >> >>>> >>> >>> >> --Jim >>>> >>> >>> >> >>>> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon < >>>> [email protected]> >>>> >>> >>> >> wrote: >>>> >>> >>> >> > Jim: >>>> >>> >>> >> > >>>> >>> >>> >> > Please send the pvfs2-client.log from your head node and >>>> the >>>> >>> >>> >> > /var/log/messages just before you rebooted. I'm thinking >>>> that >>>> >>> >>> >> > the >>>> >>> >>> >> > high >>>> >>> >>> >> > CPU >>>> >>> >>> >> > utilization is coming from a failed operation that wasn't >>>> cleaned >>>> >>> >>> >> > up >>>> >>> >>> >> > properly. >>>> >>> >>> >> > >>>> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of >>>> these >>>> >>> >>> >> > high >>>> >>> >>> >> > CPU >>>> >>> >>> >> > utilization issues. It would be worth while for you to >>>> apply >>>> >>> >>> >> > 2.8.6 >>>> >>> >>> >> > to >>>> >>> >>> >> > your >>>> >>> >>> >> > head node and see if this particular situation comes up >>>> again. >>>> >>> >>> >> > >>>> >>> >>> >> > Becky >>>> >>> >>> >> > >>>> >>> >>> >> > >>>> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir < >>>> [email protected]> >>>> >>> >>> >> > wrote: >>>> >>> >>> >> >> >>>> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on >>>> 2.8.5. I >>>> >>> >>> >> >> don't >>>> >>> >>> >> >> have a crash file, and it looks like its still in the >>>> process of >>>> >>> >>> >> >> bringing down my head node. Symptoms were: >>>> >>> >>> >> >> >>>> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but >>>> >>> >>> >> >> probably >>>> >>> >>> >> >> from) the pvfs2 volume. At some point, CPU usage spikes >>>> on the >>>> >>> >>> >> >> head >>>> >>> >>> >> >> node. Top shows both the scp and the pvfs2-client-core >>>> using >>>> >>> >>> >> >> 100% >>>> >>> >>> >> >> of >>>> >>> >>> >> >> a core. The load avg just keeps going up and up. About >>>> 29, I >>>> >>> >>> >> >> lost >>>> >>> >>> >> >> responsiveness from the server. CPU load shows 62.5% >>>> iowait, >>>> >>> >>> >> >> 25% >>>> >>> >>> >> >> system, 12.5% idle, all others 0. The only processes of >>>> note >>>> >>> >>> >> >> running >>>> >>> >>> >> >> is the one SCP and the pvfs2 process. >>>> >>> >>> >> >> >>>> >>> >>> >> >> >>>> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need >>>> to go >>>> >>> >>> >> >> hit >>>> >>> >>> >> >> the >>>> >>> >>> >> >> front panel reset button. When it comes back up, I doubt >>>> there >>>> >>> >>> >> >> will >>>> >>> >>> >> >> be any written logs of what happened. Hence, why I can >>>> never >>>> >>> >>> >> >> catch >>>> >>> >>> >> >> the logs of the crash; it *thinks* its working until the >>>> system >>>> >>> >>> >> >> goes >>>> >>> >>> >> >> non-responsive and resets. >>>> >>> >>> >> >> >>>> >>> >>> >> >> --Jim >>>> >>> >>> >> > >>>> >>> >>> >> > >>>> >>> >>> >> > >>>> >>> >>> >> > >>>> >>> >>> >> > -- >>>> >>> >>> >> > Becky Ligon >>>> >>> >>> >> > OrangeFS Support and Development >>>> >>> >>> >> > Omnibond Systems >>>> >>> >>> >> > Anderson, South Carolina >>>> >>> >>> >> > >>>> >>> >>> >> > >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >>> > -- >>>> >>> >>> > Becky Ligon >>>> >>> >>> > OrangeFS Support and Development >>>> >>> >>> > Omnibond Systems >>>> >>> >>> > Anderson, South Carolina >>>> >>> >>> > >>>> >>> >>> > >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> -- >>>> >>> >> Becky Ligon >>>> >>> >> OrangeFS Support and Development >>>> >>> >> Omnibond Systems >>>> >>> >> Anderson, South Carolina >>>> >>> >> >>>> >>> >> >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > -- >>>> >>> > Becky Ligon >>>> >>> > OrangeFS Support and Development >>>> >>> > Omnibond Systems >>>> >>> > Anderson, South Carolina >>>> >>> > >>>> >>> > >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Becky Ligon >>>> >> OrangeFS Support and Development >>>> >> Omnibond Systems >>>> >> Anderson, South Carolina >>>> >> >>>> >> >>>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> >> > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
