Re: [Cooker] XFS+HIGHMEM lockup, with fix.
Todd Lyons wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bryan Whitehead wrote on Mon, Nov 18, 2002 at 04:52:57PM -0800 : ahh, great work, just came home after being away for the weekend, and that one made me very happy=) I'd hope Mandrake would release a new kernel. If we are hitting this bug, I'm sure others are Bryan, is there any indication that this bug exists in only the XFS driver (ie patches we got were bad...grabbed at just the wrong moment in time) or is it more kernel wide? It seems to be isolated to the XFS driver only in my limited view. this only is in the XFS driver. The patch I sent only will affect the xfs module. Blue skies... Todd - -- MandrakeSoft USA http://www.mandrakesoft.com Mandrake: An amalgam of good ideas from RedHat, Debian, and MandrakeSoft. All in all, IMHO, an unbeatable combination. --Levi Ramsey on Cooker ML Cooker Version mandrake-release-9.1-0.1mdk Kernel 2.4.19-19mdksecure -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE93TwQlp7v05cW2woRAuBpAJ9NHqeydQNuqoKqFVFmz20jqsRycwCgsDdZ ka7WVDJe/HbgCdrv4SW5Sdg= =Ik2f -END PGP SIGNATURE- -- Bryan Whitehead SysAdmin - JPL - Interferometry Systems and Technology Phone: 818 354 2903 [EMAIL PROTECTED]
Re: [Cooker] XFS+HIGHMEM lockup, with fix.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bryan Whitehead wrote on Mon, Nov 18, 2002 at 04:52:57PM -0800 : > > > >ahh, great work, just came home after being away for the weekend, and > >that one made me very happy=) > I'd hope Mandrake would release a new kernel. If we are hitting this > bug, I'm sure others are Bryan, is there any indication that this bug exists in only the XFS driver (ie patches we got were bad...grabbed at just the wrong moment in time) or is it more kernel wide? It seems to be isolated to the XFS driver only in my limited view. Blue skies... Todd - -- MandrakeSoft USA http://www.mandrakesoft.com Mandrake: An amalgam of good ideas from RedHat, Debian, and MandrakeSoft. All in all, IMHO, an unbeatable combination. --Levi Ramsey on Cooker ML Cooker Version mandrake-release-9.1-0.1mdk Kernel 2.4.19-19mdksecure -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE93TwQlp7v05cW2woRAuBpAJ9NHqeydQNuqoKqFVFmz20jqsRycwCgsDdZ ka7WVDJe/HbgCdrv4SW5Sdg= =Ik2f -END PGP SIGNATURE-
Re: [Cooker] XFS+HIGHMEM lockup, with fix.
Per Øyvind Karlsen wrote: Bryan Whitehead wrote: After working with SGI for the past week, with kdb backtraces and many rebuilds of the kernel. The problem with the xfs filesystem as mandrake ships it was found. Finally! Now I can use all the memory we bought for our machines!! :) I included a patch that needs to be dropped into the 2.4.19-q16/patches directory in the kernel source as shipped with mandrake 9.0. Please apply and release a new kernel... :) diff -ru linux-2.4.19/fs/xfs.orig/pagebuf/page_buf_io.c linux-2.4.19/fs/xfs/pagebuf/page_buf_io.c --- linux-2.4.19/fs/xfs.orig/pagebuf/page_buf_io.c2002-11-13 17:25:21.0 -0800 +++ linux-2.4.19/fs/xfs/pagebuf/page_buf_io.c2002-11-13 17:31:40.0 -0800 @@ -309,7 +309,8 @@ __pb_block_prepare_write_async(ip, page, cpoff, cpoff+csize, at_eof, NULL, pbmapp, PBF_WRITE); -memset((void *) (kmap(page) + cpoff), 0, csize); +/* __pb_block_prepare_write already kmap'd the page */ +memset((void *) (page_address(page) + cpoff), 0, csize); pagebuf_commit_write_core(ip, page, cpoff, cpoff + csize); pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + cpoff + csize; Only in linux-2.4.19/fs/xfs/pagebuf: page_buf_io.c.orig ahh, great work, just came home after being away for the weekend, and that one made me very happy=) I'd hope Mandrake would release a new kernel. If we are hitting this bug, I'm sure others are -- Bryan Whitehead SysAdmin - JPL - Interferometry Systems and Technology Phone: 818 354 2903 [EMAIL PROTECTED]
Re: [Cooker] XFS+HIGHMEM lockup, with fix.
Bryan Whitehead wrote: After working with SGI for the past week, with kdb backtraces and many rebuilds of the kernel. The problem with the xfs filesystem as mandrake ships it was found. Finally! Now I can use all the memory we bought for our machines!! :) I included a patch that needs to be dropped into the 2.4.19-q16/patches directory in the kernel source as shipped with mandrake 9.0. Please apply and release a new kernel... :) diff -ru linux-2.4.19/fs/xfs.orig/pagebuf/page_buf_io.c linux-2.4.19/fs/xfs/pagebuf/page_buf_io.c --- linux-2.4.19/fs/xfs.orig/pagebuf/page_buf_io.c 2002-11-13 17:25:21.0 -0800 +++ linux-2.4.19/fs/xfs/pagebuf/page_buf_io.c 2002-11-13 17:31:40.0 -0800 @@ -309,7 +309,8 @@ __pb_block_prepare_write_async(ip, page, cpoff, cpoff+csize, at_eof, NULL, pbmapp, PBF_WRITE); - memset((void *) (kmap(page) + cpoff), 0, csize); + /* __pb_block_prepare_write already kmap'd the page */ + memset((void *) (page_address(page) + cpoff), 0, csize); pagebuf_commit_write_core(ip, page, cpoff, cpoff + csize); pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + cpoff + csize; Only in linux-2.4.19/fs/xfs/pagebuf: page_buf_io.c.orig ahh, great work, just came home after being away for the weekend, and that one made me very happy=) -- Mvh Per Øyvind Karlsen Delonic Technology Group AS Sysadmin, developer, greasemonkey www.delonic.no - +47 41681061
Re: [Cooker] XFS+HIGHMEM lockup, with fix.
Sweet! I'm impressed that you found such an obscure bug. Could you go through some of the steps that you went through to do this debugging? I would suggest the following mentionables: 1) enabling kdb Not that hard in the mandrake kernel, just switch the kdb option in the .spec file from 0 to 1. 2) interpreting the data I did a full backtrace of every process that was on the machine. (After the machine locked up). once your in the kdb console you just type "ps" for the process list, and then "bta" to backtrace all processes. The problem was there was NO xfs backtraces, however one processes was screwed up and a backtrace could not be done. This ment that something in that process got totally hosed. it's also the process that hard locked the machine. I pointed out that the load on the machine can be very low when the bug is triggered, and that it doesn't get triggered when there is no high memory available. (even when running the highmem kernel) In linux, when HIGHMEM is turned on there is still "regular" (LOWMEM) memory. As long as xfs was only playing with the LOWMEM segment of memory no lockups. But as soon as xfs used HIGHMEM (no matter the load ont he machine) it locked hard. 3) What made you zero in on that particular memset (ties back to #2) I ran SGI's pre3 of XFS 1.2 kernels and had no problems. After that I used thier own brew of 1.1 with highmem, and didn't have a problem. After I told SGI they sent me a one liner patch. 4) Documentation that says the __pb_block_prepare_write already kmap'ed the page. If the answer is "use the source Luke" then so be it SGI read the source... and knows it... ;) Mandrake must have pulled the xfs patches from SGI at exactly the wrong time :( -- Bryan Whitehead SysAdmin - JPL - Interferometry Systems and Technology Phone: 818 354 2903 [EMAIL PROTECTED]
Re: [Cooker] XFS+HIGHMEM lockup, with fix.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bryan Whitehead wrote on Fri, Nov 15, 2002 at 03:51:03PM -0800 : > After working with SGI for the past week, with kdb backtraces and many > rebuilds of the kernel. The problem with the xfs filesystem as mandrake > ships it was found. Finally! Now I can use all the memory we bought for > our machines!! :) > I included a patch that needs to be dropped into the 2.4.19-q16/patches > directory in the kernel source as shipped with mandrake 9.0. Sweet! I'm impressed that you found such an obscure bug. Could you go through some of the steps that you went through to do this debugging? I would suggest the following mentionables: 1) enabling kdb 2) interpreting the data 3) What made you zero in on that particular memset (ties back to #2) 4) Documentation that says the __pb_block_prepare_write already kmap'ed the page. If the answer is "use the source Luke" then so be it Congrats! Blue skies... Todd - -- MandrakeSoft USA http://www.mandrakesoft.com cat /boot/vmlinuz > /dev/dsp #for great justice Cooker Version mandrake-release-9.1-0.1mdk Kernel 2.4.19-19mdksecure -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE91Y6tlp7v05cW2woRAksAAJ9Ae8Jjj292JrCb5OoaySlEipd+RQCfft3S G/OzSTk3d2/Xvu8NUerhl9s= =CjD+ -END PGP SIGNATURE-