Jim: One more thing: can you send me the pvfs2-client.log files from the nodes where a KP has occurred? If possible, I'd like the corresponding /var/log/messages log file from when the KP happened.
Thanks, Becky On Wed, Jul 25, 2012 at 1:05 PM, Becky Ligon <[email protected]> wrote: > Jim: > > Can you also send me your PVFS server config file? > > Becky > > > On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> wrote: > >> Jim: >> >> Can you send me the kmod-pvfs2-...rpm? I'd like to see how its files are >> layed out. >> >> Thanks, >> Becky >> >> >> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> wrote: >> >>> Hi Becky: >>> >>> Thanks for all your input. I was on travel and am currently catching >>> up on e-mail, so here are answers to your questions: >>> >>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6 >>> (CentOS 6.2) clusters identically. >>> 2) I can mount manually using the init script. It just will not run >>> on boot. It tries, but fails with the error message supplied. >>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required >>> for ROCKS clusters...Any software to be installed on each node needs >>> to be its own RPM). It appears to me that the module is being loaded >>> successfully. >>> 4) Ok, that sounds plausible. I'll make those corrections and see if >>> that fixes things. >>> >>> Of course, the mount on boot was one of two show-stopping issues. The >>> second show-stopping issue is how many kernel panics are being caused >>> by OrangeFS. I've been experiencing 3-8 KP's a week on a light to >>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes). >>> >>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6). For my >>> users, I absolutely must have a "traditional filesystem interface" >>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work >>> on the files like they would for any other filesystem). >>> >>> --Jim >>> >>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> wrote: >>> > Jim: >>> > >>> > In your init script, you need to add the LD_LIBRARY_PATH variable, >>> since >>> > your pvfs library is not in a standard location: >>> > >>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH >>> > >>> > Remove the LD_PRELOAD. It is not needed here. >>> > >>> > Before "modprobe" will work, you have to run the command "depmod" to >>> update >>> > the modules list. The "make kmod_install" does not automatically do >>> this. >>> > NOTE: if you place the kernel module (pvfs2.ko) somewhere other than >>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe >>> to load >>> > the module. Instead, use "/sbin/insmod <path>/pvfs2.ko". If you are >>> using >>> > the rpm spec that I gave you (and it looks like you are), then >>> pvfs2.ko is >>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use the >>> > "insmod" command to load it and the "rmmod" command to unload it. >>> > >>> > When you issue a "stop", your script does not stop the client nor does >>> it >>> > unload the kernel module. This will cause problems if you issue a >>> "start" >>> > by starting another pvfs2-client. I will send you the init script >>> that we >>> > use here. Maybe, you can modify it to accommodate your environment. >>> We >>> > have more checks in it than you have in yours. >>> > >>> > I am not familiar with how PVFS reacts to the "intr" option that you >>> specify >>> > in the mount command. What is its purpose? >>> > >>> > Becky >>> > >>> > >>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]> >>> wrote: >>> >> >>> >> Jim: >>> >> >>> >> I just realized that you have already sent me your init script. Let >>> me >>> >> take a closer look at it. >>> >> >>> >> Becky >>> >> >>> >> >>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]> >>> wrote: >>> >>> >>> >>> Jim: >>> >>> >>> >>> I have successfully booted my CentOS 6.2 system (using >>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and mounted >>> the >>> >>> client. Thus, I can only guess that there is something in your >>> environment >>> >>> causing the problem. Is it possible for you to mount the client by >>> issuing >>> >>> the commands manually once the system is running? Can you send me a >>> copy of >>> >>> your startup script for mounting the client from your /etc/init.d >>> directory? >>> >>> >>> >>> Becky >>> >>> >>> >>> >>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected]> >>> wrote: >>> >>>> >>> >>>> Jim: >>> >>>> >>> >>>> I have been able to successfully mount-on-boot on a VM with the >>> >>>> 2.6.32-220.13.1.el6.x86_64. However, I was using the Scientific >>> Linux 6 >>> >>>> distro and NOT CentOS 6.2. Next, I will try a CentOS 6.2 distro >>> and see >>> >>>> what happens with it. >>> >>>> >>> >>>> Becky >>> >>>> >>> >>>> >>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected]> >>> wrote: >>> >>>>> >>> >>>>> Jim: >>> >>>>> >>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 environment? >>> If >>> >>>>> so, which version of OrangeFS are you running? >>> >>>>> >>> >>>>> Becky >>> >>>>> >>> >>>>> >>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <[email protected]> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> I cannot reproduce the pvfs2 crash on demand. I have not yet >>> seen it >>> >>>>>> on centos 6, but I haven't placed centos6 into production yet. >>> >>>>>> >>> >>>>>> On my centos5 systems, its not reproducible on demand, but it >>> seems to >>> >>>>>> happen with moderate file access from a few different processes. >>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which is a >>> >>>>>> pvfs2 client) will do it. This has happened since the beginning >>> of >>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's more >>> than >>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been >>> seeing >>> >>>>>> compute nodes KP with the previous screenshot (it did not crash >>> (that >>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes). >>> >>>>>> >>> >>>>>> Here's my /etc/init.d/pvfs2-client script: >>> >>>>>> --------------- >>> >>>>>> #!/bin/sh >>> >>>>>> # >>> >>>>>> # chkconfig: 2345 99 99 >>> >>>>>> # >>> >>>>>> # description: mount pvfs2 filesystem >>> >>>>>> # >>> >>>>>> >>> >>>>>> . /etc/rc.d/init.d/functions >>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/ >>> >>>>>> case "$1" in >>> >>>>>> start) >>> >>>>>> echo -n "Mounting PVFS2 Filesystem: " >>> >>>>>> modprobe pvfs2 >>> >>>>>> /opt/pvfs2/sbin/pvfs2-client -p >>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core >>> >>>>>> mkdir -p /mnt/pvfs2 >>> >>>>>> mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs >>> >>>>>> /mnt/pvfs2 >>> >>>>>> touch /var/lock/subsys/pvfs2-client >>> >>>>>> ;; >>> >>>>>> >>> >>>>>> stop) >>> >>>>>> echo -n "Unmounting PVFS2 Filesystem: " >>> >>>>>> umount /mnt/pvfs2 >>> >>>>>> rm -f /var/lock/subsys/pvfs2-client >>> >>>>>> ;; >>> >>>>>> >>> >>>>>> restart) >>> >>>>>> $0 stop >>> >>>>>> $0 start >>> >>>>>> ;; >>> >>>>>> >>> >>>>>> status) >>> >>>>>> status $NAME >>> >>>>>> ;; >>> >>>>>> *) >>> >>>>>> echo "Usage: $NAME {start|stop|restart|status}" >>> >>>>>> exit 1 >>> >>>>>> esac >>> >>>>>> >>> >>>>>> exit 0 >>> >>>>>> ---------------- >>> >>>>>> I've tried with the export commented and uncommented, no >>> difference. >>> >>>>>> >>> >>>>>> --Jim >>> >>>>>> >>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon <[email protected] >>> > >>> >>>>>> wrote: >>> >>>>>> > Thanks, Jim. >>> >>>>>> > >>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production >>> >>>>>> > environment. So, I >>> >>>>>> > should be able to setup a VM with your kernel version and test. >>> Can >>> >>>>>> > you >>> >>>>>> > give me a scenario to try in order to reproduce the problem? >>> >>>>>> > >>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the >>> >>>>>> > mount-with-boot >>> >>>>>> > issue. >>> >>>>>> > >>> >>>>>> > Becky >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir < >>> [email protected]> >>> >>>>>> > wrote: >>> >>>>>> >> >>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel >>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64 >>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch >>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64 >>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64 >>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch >>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch >>> >>>>>> >> [root@aeoltest torque]# uname -a >>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr >>> 17 >>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux >>> >>>>>> >> [root@aeoltest torque]# >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon < >>> [email protected]> >>> >>>>>> >> wrote: >>> >>>>>> >> > Jim: >>> >>>>>> >> > >>> >>>>>> >> > We are working on a few corrections to the user library, as >>> we >>> >>>>>> >> > speak, >>> >>>>>> >> > that >>> >>>>>> >> > were identified last week. Using LD_PRELOAD would >>> definitely get >>> >>>>>> >> > around >>> >>>>>> >> > the >>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we have >>> all >>> >>>>>> >> > of the >>> >>>>>> >> > current corrections in place before using it. >>> >>>>>> >> > >>> >>>>>> >> > I also have some questions for you. I am working the issue >>> with >>> >>>>>> >> > the >>> >>>>>> >> > "won't >>> >>>>>> >> > mount on boot" issue and would like to know the specific >>> kernel >>> >>>>>> >> > that you >>> >>>>>> >> > are >>> >>>>>> >> > using under CentOS 6.2. >>> >>>>>> >> > >>> >>>>>> >> > Thanks, >>> >>>>>> >> > Becky >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir < >>> [email protected]> >>> >>>>>> >> > wrote: >>> >>>>>> >> >> >>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last >>> chunk of >>> >>>>>> >> >> a >>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached. >>> >>>>>> >> >> >>> >>>>>> >> >> With respect to client access methods, perhaps I should >>> switch >>> >>>>>> >> >> to a >>> >>>>>> >> >> user space solution. I remember hearing about an LD_Preload >>> >>>>>> >> >> client >>> >>>>>> >> >> module (not using fuse, but being entirely userspace). Is >>> that >>> >>>>>> >> >> "ready" with 2.8.6? If not, perhaps I need to switch to the >>> >>>>>> >> >> fuse >>> >>>>>> >> >> module... >>> >>>>>> >> >> >>> >>>>>> >> >> --Jim >>> >>>>>> >> >> >>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko >>> >>>>>> >> >> <[email protected]> >>> >>>>>> >> >> wrote: >>> >>>>>> >> >> > Hello Becky, >>> >>>>>> >> >> > >>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote: >>> >>>>>> >> >> >> Andrew: >>> >>>>>> >> >> >> >>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with >>> question >>> >>>>>> >> >> >> marks >>> >>>>>> >> >> >> in >>> >>>>>> >> >> >> the >>> >>>>>> >> >> >> "ls" output, but we are working on it. >>> >>>>>> >> >> >> >>> >>>>>> >> >> >> Just FYI! >>> >>>>>> >> >> > >>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse >>> client >>> >>>>>> >> >> > during >>> >>>>>> >> >> > update then. >>> >>>>>> >> >> > >>> >>>>>> >> >> > Best regards, >>> >>>>>> >> >> > Andrew Savchenko >>> >>>>>> >> >> > >>> >>>>>> >> >> > _______________________________________________ >>> >>>>>> >> >> > Pvfs2-users mailing list >>> >>>>>> >> >> > [email protected] >>> >>>>>> >> >> > >>> >>>>>> >> >> > >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>> >>>>>> >> >> > >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> >> > -- >>> >>>>>> >> > Becky Ligon >>> >>>>>> >> > OrangeFS Support and Development >>> >>>>>> >> > Omnibond Systems >>> >>>>>> >> > Anderson, South Carolina >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > -- >>> >>>>>> > Becky Ligon >>> >>>>>> > OrangeFS Support and Development >>> >>>>>> > Omnibond Systems >>> >>>>>> > Anderson, South Carolina >>> >>>>>> > >>> >>>>>> > >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Becky Ligon >>> >>>>> OrangeFS Support and Development >>> >>>>> Omnibond Systems >>> >>>>> Anderson, South Carolina >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Becky Ligon >>> >>>> OrangeFS Support and Development >>> >>>> Omnibond Systems >>> >>>> Anderson, South Carolina >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Becky Ligon >>> >>> OrangeFS Support and Development >>> >>> Omnibond Systems >>> >>> Anderson, South Carolina >>> >>> >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Becky Ligon >>> >> OrangeFS Support and Development >>> >> Omnibond Systems >>> >> Anderson, South Carolina >>> >> >>> >> >>> > >>> > >>> > >>> > -- >>> > Becky Ligon >>> > OrangeFS Support and Development >>> > Omnibond Systems >>> > Anderson, South Carolina >>> > >>> > >>> >> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> >> > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
