Jim:

Actually, do you have the source rpms from the binary rpms that you sent me?

Becky

On Thu, Aug 2, 2012 at 1:11 PM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> I am running a 2.8.6 client agains a 2.8.5 server on CentOS-6.2 and tried
> to reproduce your problem.  I have over 100 files in my directory and my
> "ls" is working.  So, I'm thinking that the creation and/or installation of
> the kernel module and client are hosed some how on your site.
>
> There *might* be a difference in the two OrangeFS versions that you are
> using.  I will install from the rpms that you sent me and see if I can
> recreate the problem.
>
> Becky
>
> On Thu, Aug 2, 2012 at 12:35 PM, Becky Ligon <[email protected]> wrote:
>
>> Remind me again of your production OS?  And, did you create the kernel
>> module using this OS?
>>
>> Becky
>>
>>
>> On Thu, Aug 2, 2012 at 12:32 PM, Becky Ligon <[email protected]> wrote:
>>
>>> Jim:
>>>
>>> It might be easier for me to debug if you can set up an account for me
>>> and let me look at your environment.  Is this possible?
>>>
>>> Becky
>>>
>>>
>>> On Thu, Aug 2, 2012 at 11:48 AM, Jim Kusznir <[email protected]> wrote:
>>>
>>>> Actually a user has figured it out, at least in one directory: 66
>>>> enteries work, but 67 fail:
>>>>
>>>> Also, in this directory /mnt/pvfs2/gould/salmon/test2 I tried to
>>>> figure out what was going on. If you are interested, you could try
>>>> these steps in this directory above:
>>>>
>>>> 1) ls |wc -l (gives you 66)
>>>>
>>>> 2) emacs a.txt (creates new file a.txt)
>>>>
>>>> 3) CRTL x-s (saves new file)
>>>>
>>>> 4) ls |wc -l (gives "ls: reading directory .: Invalid argument" error)
>>>>
>>>> 5) rm a.txt (removes file- can't tab to finish name, must type in
>>>> entire)
>>>>
>>>> 6) ls |wc -l (gives you 66 - all back to normal)
>>>>
>>>> --Jim
>>>>
>>>> On Thu, Aug 2, 2012 at 8:47 AM, Jim Kusznir <[email protected]> wrote:
>>>> > I'm still running 2.8.5 on the servers, and 2.8.6 on the clients (all
>>>> > of them) now.
>>>> >
>>>> > Output on a sample directory:
>>>> > kusznir@aeolus ~]$ ls /mnt/pvfs2/airpact/MOZART4_CONUS
>>>> > ls: reading directory /mnt/pvfs2/airpact/MOZART4_CONUS: Invalid
>>>> argument
>>>> > [kusznir@aeolus ~]$ pvfs2-ls /mnt/pvfs2/airpact/MOZART4_CONUS
>>>> > output
>>>> > getncfromNCAR.csh
>>>> > mz4assim_conus_1h_20070101.nc
>>>> > mz4assim_conus_1h_20070102.nc
>>>> >
>>>> >
>>>> > Another user indicates this is based on how many files are in the
>>>> > directory.  If he knows the file name, the file is still accessible,
>>>> > but ls or tab completion or anything like that fail.  If he deletes
>>>> > files to get it under a not-exactly-deturmined amount, ls works again.
>>>> >
>>>> > --Jim
>>>> >
>>>> > On Thu, Aug 2, 2012 at 6:34 AM, Becky Ligon <[email protected]>
>>>> wrote:
>>>> >> Jim:
>>>> >>
>>>> >> Are you running 2.8.6 on the server and the client?  Or, just  2.8.6
>>>> from
>>>> >> the head node?
>>>> >>
>>>> >> Can you run a "ls" on their directories that appear to be missing
>>>> data?  Can
>>>> >> you also run pvfs2-ls on those same directories?  Please send me the
>>>> output
>>>> >> from both commands.
>>>> >>
>>>> >> Thanks,
>>>> >> Becky
>>>> >>
>>>> >> On Wed, Aug 1, 2012 at 8:07 PM, Jim Kusznir <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> So, since switching over to 2.8.6, I've had two users report that
>>>> >>> their larger directories are missing files / data.
>>>> >>>
>>>> >>> Now I'm really in for it....I'm asking for more details, but I'll
>>>> need
>>>> >>> to address this pretty thoroughly and rapidly...File systems that
>>>> >>> loose user data are not useful.
>>>> >>>
>>>> >>> --Jim
>>>> >>>
>>>> >>> On Tue, Jul 31, 2012 at 12:36 PM, Becky Ligon <[email protected]>
>>>> wrote:
>>>> >>> > Jim:
>>>> >>> >
>>>> >>> > The documentation link that I sent doesn't seem to work.  Instead:
>>>> >>> >
>>>> >>> > go to www.orangefs.org and click on the html link for the
>>>> install guide,
>>>> >>> > about midway down the page.
>>>> >>> >
>>>> >>> > the install guide has a section on setting up a client and in
>>>> section
>>>> >>> > 3.3 is
>>>> >>> > the description of the pvfs2tab file.
>>>> >>> >
>>>> >>> > Becky
>>>> >>> >
>>>> >>> >
>>>> >>> > On Tue, Jul 31, 2012 at 3:26 PM, Becky Ligon <[email protected]>
>>>> wrote:
>>>> >>> >>
>>>> >>> >> Jim:
>>>> >>> >>
>>>> >>> >> To generate a new config file, issue the command:
>>>> >>> >>
>>>> >>> >> /opt/pvfs2/bin/pvfs2-genconfig <config file name>
>>>> >>> >>
>>>> >>> >> You will be asked a set of questions regarding your
>>>> installation.  This
>>>> >>> >> utility may not provide everything you need, just depends on your
>>>> >>> >> setup.  To
>>>> >>> >> help you, I will forward you a copy of our production conf file.
>>>>  You
>>>> >>> >> can
>>>> >>> >> compare it to your own needs and modify the new conf file as
>>>> needed.
>>>> >>> >> After
>>>> >>> >> you create a new conf file, I would be happy to review it for
>>>> you.
>>>> >>> >>
>>>> >>> >> I'm not sure how your clients have started without a proper
>>>> pvfs2tab
>>>> >>> >> file,
>>>> >>> >> unless you have the appropriate info in your fstab file.  The
>>>> mount
>>>> >>> >> info
>>>> >>> >> could be in either file.  I will send you a copy of our
>>>> production
>>>> >>> >> pvfs2tab
>>>> >>> >> file as an example.
>>>> >>> >>
>>>> >>> >> The link below will describe how to create the entries in the
>>>> >>> >> pvfs2tab/fstab file.
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc//pvfs2-quickstart/pvfs2quickstart.php#subsec:client
>>>> >>> >>
>>>> >>> >> Thanks for giving 2.8.6 a try!  Let me know how it goes!
>>>> >>> >>
>>>> >>> >> Becky
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> On Tue, Jul 31, 2012 at 2:14 PM, Jim Kusznir <[email protected]
>>>> >
>>>> >>> >> wrote:
>>>> >>> >>>
>>>> >>> >>> I've got 2.8.6 ready to install, but I've got 15 users on there
>>>> and a
>>>> >>> >>> full cluster at the moment, so I can't intentionally reboot it.
>>>>  If it
>>>> >>> >>> crashes on me today, I'll take the opportunity to update
>>>> everything as
>>>> >>> >>> soon as it comes back and reboot it again.  Otherwise, I'll try
>>>> early
>>>> >>> >>> tomorrow morning to load and reboot.
>>>> >>> >>>
>>>> >>> >>> Also, you previously mentioned my pvfs2 server configuration
>>>> file
>>>> >>> >>> format was out of date.  Can you suggest a new config file
>>>> format to
>>>> >>> >>> use based on what I gave you?  Also, I've never had a pvfs2tab
>>>> file on
>>>> >>> >>> my clients, and my attempts to create one so far have failed.
>>>>  It
>>>> >>> >>> seems I don't know the proper syntax, and I haven't found a
>>>> >>> >>> sufficiently clear documentation on that either.  It has worked
>>>> for ~4
>>>> >>> >>> years without one, but...
>>>> >>> >>>
>>>> >>> >>> --Jim
>>>> >>> >>>
>>>> >>> >>> On Tue, Jul 31, 2012 at 10:03 AM, Becky Ligon <
>>>> [email protected]>
>>>> >>> >>> wrote:
>>>> >>> >>> > Jim:
>>>> >>> >>> >
>>>> >>> >>> > Next time this happens, can you attach to the
>>>> pvfs2-client-core
>>>> >>> >>> > process
>>>> >>> >>> > using gdb and see if you can tell in which function it seems
>>>> to
>>>> >>> >>> > spinning?
>>>> >>> >>> > Also, you can try turning on client debugging, so we can see
>>>> what
>>>> >>> >>> > the
>>>> >>> >>> > client
>>>> >>> >>> > core is doing.  To turn on debugging dynamically, issue the
>>>> >>> >>> > following:
>>>> >>> >>> >
>>>> >>> >>> > echo "all" > /proc/sys/pvfs2/client-debug
>>>> >>> >>> >
>>>> >>> >>> > With the CPU so high, the client-core may or may not see the
>>>> change
>>>> >>> >>> > in
>>>> >>> >>> > gossip_debug settings.  If it does, then a lot of output will
>>>> be
>>>> >>> >>> > generated!
>>>> >>> >>> > Before you reboot your system, make a copy of the client log
>>>> and
>>>> >>> >>> > send
>>>> >>> >>> > that
>>>> >>> >>> > to me, along with any information you might get from gdb.
>>>> >>> >>> >
>>>> >>> >>> > When you can, please try using 2.8.6 on your head node and
>>>> see if
>>>> >>> >>> > you
>>>> >>> >>> > can
>>>> >>> >>> > reproduce the problem.
>>>> >>> >>> >
>>>> >>> >>> > Thanks,
>>>> >>> >>> > Becky
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>> > On Tue, Jul 31, 2012 at 12:45 PM, Jim Kusznir <
>>>> [email protected]>
>>>> >>> >>> > wrote:
>>>> >>> >>> >>
>>>> >>> >>> >> Unfortunately, the pvfs2-client.log is truncated and
>>>> reopened on
>>>> >>> >>> >> reboot (eg, all entries are lost).  Already checked.  Also, I
>>>> >>> >>> >> didn't
>>>> >>> >>> >> see anything in /var/log/messages (I looked there when the
>>>> problem
>>>> >>> >>> >> started mounting).  There appears to be no "paper trail" of
>>>> this
>>>> >>> >>> >> incident, which is why its been so hard to track down.
>>>> >>> >>> >>
>>>> >>> >>> >> --Jim
>>>> >>> >>> >>
>>>> >>> >>> >> On Mon, Jul 30, 2012 at 1:18 PM, Becky Ligon <
>>>> [email protected]>
>>>> >>> >>> >> wrote:
>>>> >>> >>> >> > Jim:
>>>> >>> >>> >> >
>>>> >>> >>> >> > Please send the pvfs2-client.log from your head node and
>>>> the
>>>> >>> >>> >> > /var/log/messages just before you rebooted.  I'm thinking
>>>> that
>>>> >>> >>> >> > the
>>>> >>> >>> >> > high
>>>> >>> >>> >> > CPU
>>>> >>> >>> >> > utilization is coming from a failed operation that wasn't
>>>> cleaned
>>>> >>> >>> >> > up
>>>> >>> >>> >> > properly.
>>>> >>> >>> >> >
>>>> >>> >>> >> > As I noted in my previous email, 2.8.6 addressed some of
>>>> these
>>>> >>> >>> >> > high
>>>> >>> >>> >> > CPU
>>>> >>> >>> >> > utilization issues.  It would be worth while for you to
>>>> apply
>>>> >>> >>> >> > 2.8.6
>>>> >>> >>> >> > to
>>>> >>> >>> >> > your
>>>> >>> >>> >> > head node and see if this particular situation comes up
>>>> again.
>>>> >>> >>> >> >
>>>> >>> >>> >> > Becky
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >> > On Mon, Jul 30, 2012 at 3:03 PM, Jim Kusznir <
>>>> [email protected]>
>>>> >>> >>> >> > wrote:
>>>> >>> >>> >> >>
>>>> >>> >>> >> >> I think I caught a pvfs2-induced crash in progress on
>>>> 2.8.5.  I
>>>> >>> >>> >> >> don't
>>>> >>> >>> >> >> have a crash file, and it looks like its still in the
>>>> process of
>>>> >>> >>> >> >> bringing down my head node.  Symptoms were:
>>>> >>> >>> >> >>
>>>> >>> >>> >> >> Someone was doing an scp from (or to, not sure which, but
>>>> >>> >>> >> >> probably
>>>> >>> >>> >> >> from) the pvfs2 volume.  At some point, CPU usage spikes
>>>> on the
>>>> >>> >>> >> >> head
>>>> >>> >>> >> >> node.  Top shows both the scp and the pvfs2-client-core
>>>> using
>>>> >>> >>> >> >> 100%
>>>> >>> >>> >> >> of
>>>> >>> >>> >> >> a core.  The load avg just keeps going up and up.  About
>>>> 29, I
>>>> >>> >>> >> >> lost
>>>> >>> >>> >> >> responsiveness from the server.  CPU load shows 62.5%
>>>> iowait,
>>>> >>> >>> >> >> 25%
>>>> >>> >>> >> >> system, 12.5% idle, all others 0.  The only processes of
>>>> note
>>>> >>> >>> >> >> running
>>>> >>> >>> >> >> is the one SCP and the pvfs2 process.
>>>> >>> >>> >> >>
>>>> >>> >>> >> >>
>>>> >>> >>> >> >> My machine has now gone unresponsive; I'll probably need
>>>> to go
>>>> >>> >>> >> >> hit
>>>> >>> >>> >> >> the
>>>> >>> >>> >> >> front panel reset button.  When it comes back up, I doubt
>>>> there
>>>> >>> >>> >> >> will
>>>> >>> >>> >> >> be any written logs of what happened.  Hence, why I can
>>>> never
>>>> >>> >>> >> >> catch
>>>> >>> >>> >> >> the logs of the crash; it *thinks* its working until the
>>>> system
>>>> >>> >>> >> >> goes
>>>> >>> >>> >> >> non-responsive and resets.
>>>> >>> >>> >> >>
>>>> >>> >>> >> >> --Jim
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >> > --
>>>> >>> >>> >> > Becky Ligon
>>>> >>> >>> >> > OrangeFS Support and Development
>>>> >>> >>> >> > Omnibond Systems
>>>> >>> >>> >> > Anderson, South Carolina
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>> > --
>>>> >>> >>> > Becky Ligon
>>>> >>> >>> > OrangeFS Support and Development
>>>> >>> >>> > Omnibond Systems
>>>> >>> >>> > Anderson, South Carolina
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Becky Ligon
>>>> >>> >> OrangeFS Support and Development
>>>> >>> >> Omnibond Systems
>>>> >>> >> Anderson, South Carolina
>>>> >>> >>
>>>> >>> >>
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > --
>>>> >>> > Becky Ligon
>>>> >>> > OrangeFS Support and Development
>>>> >>> > Omnibond Systems
>>>> >>> > Anderson, South Carolina
>>>> >>> >
>>>> >>> >
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Becky Ligon
>>>> >> OrangeFS Support and Development
>>>> >> Omnibond Systems
>>>> >> Anderson, South Carolina
>>>> >>
>>>> >>
>>>>
>>>
>>>
>>>
>>> --
>>> Becky Ligon
>>> OrangeFS Support and Development
>>> Omnibond Systems
>>> Anderson, South Carolina
>>>
>>>
>>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to