Jim:

One more thing:  can you send me the pvfs2-client.log files from the nodes
where a KP has occurred?  If possible, I'd like the corresponding
/var/log/messages log file from when the KP happened.

Thanks,
Becky

On Wed, Jul 25, 2012 at 1:05 PM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> Can you also send me your PVFS server config file?
>
> Becky
>
>
> On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> wrote:
>
>> Jim:
>>
>> Can you send me the kmod-pvfs2-...rpm?  I'd like to see how its files are
>> layed out.
>>
>> Thanks,
>> Becky
>>
>>
>> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> wrote:
>>
>>> Hi Becky:
>>>
>>> Thanks for all your input.  I was on travel and am currently catching
>>> up on e-mail, so here are answers to your questions:
>>>
>>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6
>>> (CentOS 6.2) clusters identically.
>>> 2) I can mount manually using the init script.  It just will not run
>>> on boot.  It tries, but fails with the error message supplied.
>>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required
>>> for ROCKS clusters...Any software to be installed on each node needs
>>> to be its own RPM).  It appears to me that the module is being loaded
>>> successfully.
>>> 4) Ok, that sounds plausible.  I'll make those corrections and see if
>>> that fixes things.
>>>
>>> Of course, the mount on boot was one of two show-stopping issues.  The
>>> second show-stopping issue is how many kernel panics are being caused
>>> by OrangeFS.  I've been experiencing 3-8 KP's a week on a light to
>>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes).
>>>
>>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6).  For my
>>> users, I absolutely must have a "traditional filesystem interface"
>>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work
>>> on the files like they would for any other filesystem).
>>>
>>> --Jim
>>>
>>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> wrote:
>>> > Jim:
>>> >
>>> > In your init script, you need to add the LD_LIBRARY_PATH variable,
>>> since
>>> > your pvfs library is not in a standard location:
>>> >
>>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH
>>> >
>>> > Remove the LD_PRELOAD.  It is not needed here.
>>> >
>>> > Before "modprobe" will work, you have to run the command "depmod" to
>>> update
>>> > the modules list.  The "make kmod_install" does not automatically do
>>> this.
>>> > NOTE:  if you place the kernel module (pvfs2.ko) somewhere other than
>>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe
>>> to load
>>> > the module.  Instead, use "/sbin/insmod <path>/pvfs2.ko".  If you are
>>> using
>>> > the rpm spec that I gave you (and it looks like you are), then
>>> pvfs2.ko is
>>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use the
>>> > "insmod" command to load it and the "rmmod" command to unload it.
>>> >
>>> > When you issue a "stop", your script does not stop the client nor does
>>> it
>>> > unload the kernel module.  This will cause problems if you issue a
>>> "start"
>>> > by starting another pvfs2-client.  I will send you the init script
>>> that we
>>> > use here.  Maybe, you can modify it to accommodate your environment.
>>>  We
>>> > have more checks in it than you have in yours.
>>> >
>>> > I am not familiar with how PVFS reacts to the "intr" option that you
>>> specify
>>> > in the mount command.  What is its purpose?
>>> >
>>> > Becky
>>> >
>>> >
>>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>
>>> >> Jim:
>>> >>
>>> >> I just realized that you have already sent me your init script.  Let
>>> me
>>> >> take a closer look at it.
>>> >>
>>> >> Becky
>>> >>
>>> >>
>>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>>
>>> >>> Jim:
>>> >>>
>>> >>> I have successfully booted my CentOS 6.2 system (using
>>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and mounted
>>> the
>>> >>> client.  Thus, I can only guess that there is something in your
>>> environment
>>> >>> causing the problem.  Is it possible for you to mount the client by
>>> issuing
>>> >>> the commands manually once the system is running?  Can you send me a
>>> copy of
>>> >>> your startup script for mounting the client from your /etc/init.d
>>> directory?
>>> >>>
>>> >>> Becky
>>> >>>
>>> >>>
>>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>> Jim:
>>> >>>>
>>> >>>> I have been able to successfully mount-on-boot on a VM with the
>>> >>>> 2.6.32-220.13.1.el6.x86_64.  However, I was using the Scientific
>>> Linux 6
>>> >>>> distro and NOT CentOS 6.2.  Next, I will try a CentOS 6.2 distro
>>> and see
>>> >>>> what happens with it.
>>> >>>>
>>> >>>> Becky
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>> Jim:
>>> >>>>>
>>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 environment?
>>>  If
>>> >>>>> so, which version of OrangeFS are you running?
>>> >>>>>
>>> >>>>> Becky
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <[email protected]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> I cannot reproduce the pvfs2 crash on demand.  I have not yet
>>> seen it
>>> >>>>>> on centos 6, but I haven't placed centos6 into production yet.
>>> >>>>>>
>>> >>>>>> On my centos5 systems, its not reproducible on demand, but it
>>> seems to
>>> >>>>>> happen with moderate file access from a few different processes.
>>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which is a
>>> >>>>>> pvfs2 client) will do it.  This has happened since the beginning
>>> of
>>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's more
>>> than
>>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been
>>> seeing
>>> >>>>>> compute nodes KP with the previous screenshot (it did not crash
>>> (that
>>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes).
>>> >>>>>>
>>> >>>>>> Here's my /etc/init.d/pvfs2-client script:
>>> >>>>>> ---------------
>>> >>>>>> #!/bin/sh
>>> >>>>>> #
>>> >>>>>> # chkconfig: 2345 99 99
>>> >>>>>> #
>>> >>>>>> # description: mount pvfs2 filesystem
>>> >>>>>> #
>>> >>>>>>
>>> >>>>>> . /etc/rc.d/init.d/functions
>>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/
>>> >>>>>> case "$1" in
>>> >>>>>> start)
>>> >>>>>>         echo -n "Mounting PVFS2 Filesystem: "
>>> >>>>>>         modprobe pvfs2
>>> >>>>>>         /opt/pvfs2/sbin/pvfs2-client -p
>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core
>>> >>>>>>         mkdir -p /mnt/pvfs2
>>> >>>>>>         mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs
>>> >>>>>> /mnt/pvfs2
>>> >>>>>>         touch /var/lock/subsys/pvfs2-client
>>> >>>>>>         ;;
>>> >>>>>>
>>> >>>>>> stop)
>>> >>>>>>         echo -n "Unmounting PVFS2 Filesystem: "
>>> >>>>>>         umount /mnt/pvfs2
>>> >>>>>>         rm -f /var/lock/subsys/pvfs2-client
>>> >>>>>>         ;;
>>> >>>>>>
>>> >>>>>> restart)
>>> >>>>>>         $0 stop
>>> >>>>>>         $0 start
>>> >>>>>>         ;;
>>> >>>>>>
>>> >>>>>> status)
>>> >>>>>>         status $NAME
>>> >>>>>>         ;;
>>> >>>>>> *)
>>> >>>>>>         echo "Usage: $NAME {start|stop|restart|status}"
>>> >>>>>>         exit 1
>>> >>>>>> esac
>>> >>>>>>
>>> >>>>>> exit 0
>>> >>>>>> ----------------
>>> >>>>>> I've tried with the export commented and uncommented, no
>>> difference.
>>> >>>>>>
>>> >>>>>> --Jim
>>> >>>>>>
>>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon <[email protected]
>>> >
>>> >>>>>> wrote:
>>> >>>>>> > Thanks, Jim.
>>> >>>>>> >
>>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production
>>> >>>>>> > environment.  So, I
>>> >>>>>> > should be able to setup a VM with your kernel version and test.
>>>  Can
>>> >>>>>> > you
>>> >>>>>> > give me a scenario to try in order to reproduce the problem?
>>> >>>>>> >
>>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the
>>> >>>>>> > mount-with-boot
>>> >>>>>> > issue.
>>> >>>>>> >
>>> >>>>>> > Becky
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir <
>>> [email protected]>
>>> >>>>>> > wrote:
>>> >>>>>> >>
>>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel
>>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64
>>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch
>>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64
>>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64
>>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch
>>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch
>>> >>>>>> >> [root@aeoltest torque]# uname -a
>>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr
>>> 17
>>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
>>> >>>>>> >> [root@aeoltest torque]#
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon <
>>> [email protected]>
>>> >>>>>> >> wrote:
>>> >>>>>> >> > Jim:
>>> >>>>>> >> >
>>> >>>>>> >> > We are working on a few corrections to the user library, as
>>> we
>>> >>>>>> >> > speak,
>>> >>>>>> >> > that
>>> >>>>>> >> > were identified last week.  Using LD_PRELOAD would
>>> definitely get
>>> >>>>>> >> > around
>>> >>>>>> >> > the
>>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we have
>>> all
>>> >>>>>> >> > of the
>>> >>>>>> >> > current corrections in place before using it.
>>> >>>>>> >> >
>>> >>>>>> >> > I also have some questions for you.  I am working the issue
>>> with
>>> >>>>>> >> > the
>>> >>>>>> >> > "won't
>>> >>>>>> >> > mount on boot" issue and would like to know the specific
>>> kernel
>>> >>>>>> >> > that you
>>> >>>>>> >> > are
>>> >>>>>> >> > using under CentOS 6.2.
>>> >>>>>> >> >
>>> >>>>>> >> > Thanks,
>>> >>>>>> >> > Becky
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir <
>>> [email protected]>
>>> >>>>>> >> > wrote:
>>> >>>>>> >> >>
>>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last
>>> chunk of
>>> >>>>>> >> >> a
>>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached.
>>> >>>>>> >> >>
>>> >>>>>> >> >> With respect to client access methods, perhaps I should
>>> switch
>>> >>>>>> >> >> to a
>>> >>>>>> >> >> user space solution.  I remember hearing about an LD_Preload
>>> >>>>>> >> >> client
>>> >>>>>> >> >> module (not using fuse, but being entirely userspace).  Is
>>> that
>>> >>>>>> >> >> "ready" with 2.8.6?  If not, perhaps I need to switch to the
>>> >>>>>> >> >> fuse
>>> >>>>>> >> >> module...
>>> >>>>>> >> >>
>>> >>>>>> >> >> --Jim
>>> >>>>>> >> >>
>>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko
>>> >>>>>> >> >> <[email protected]>
>>> >>>>>> >> >> wrote:
>>> >>>>>> >> >> > Hello Becky,
>>> >>>>>> >> >> >
>>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote:
>>> >>>>>> >> >> >> Andrew:
>>> >>>>>> >> >> >>
>>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with
>>> question
>>> >>>>>> >> >> >> marks
>>> >>>>>> >> >> >> in
>>> >>>>>> >> >> >> the
>>> >>>>>> >> >> >> "ls" output, but we are working on it.
>>> >>>>>> >> >> >>
>>> >>>>>> >> >> >> Just FYI!
>>> >>>>>> >> >> >
>>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse
>>> client
>>> >>>>>> >> >> > during
>>> >>>>>> >> >> > update then.
>>> >>>>>> >> >> >
>>> >>>>>> >> >> > Best regards,
>>> >>>>>> >> >> > Andrew Savchenko
>>> >>>>>> >> >> >
>>> >>>>>> >> >> > _______________________________________________
>>> >>>>>> >> >> > Pvfs2-users mailing list
>>> >>>>>> >> >> > [email protected]
>>> >>>>>> >> >> >
>>> >>>>>> >> >> >
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>> >>>>>> >> >> >
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >> > --
>>> >>>>>> >> > Becky Ligon
>>> >>>>>> >> > OrangeFS Support and Development
>>> >>>>>> >> > Omnibond Systems
>>> >>>>>> >> > Anderson, South Carolina
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > --
>>> >>>>>> > Becky Ligon
>>> >>>>>> > OrangeFS Support and Development
>>> >>>>>> > Omnibond Systems
>>> >>>>>> > Anderson, South Carolina
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Becky Ligon
>>> >>>>> OrangeFS Support and Development
>>> >>>>> Omnibond Systems
>>> >>>>> Anderson, South Carolina
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Becky Ligon
>>> >>>> OrangeFS Support and Development
>>> >>>> Omnibond Systems
>>> >>>> Anderson, South Carolina
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Becky Ligon
>>> >>> OrangeFS Support and Development
>>> >>> Omnibond Systems
>>> >>> Anderson, South Carolina
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Becky Ligon
>>> >> OrangeFS Support and Development
>>> >> Omnibond Systems
>>> >> Anderson, South Carolina
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Becky Ligon
>>> > OrangeFS Support and Development
>>> > Omnibond Systems
>>> > Anderson, South Carolina
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to