Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs
Chris Mason wrote: > > On Thursday, February 01, 2001 02:16:43 PM -0200 Rik van Riel ><[EMAIL PROTECTED]> wrote: > >> About the system hanging completely, I wonder if it goes >> away by pressing sysrq-S (sync all disks). If it does, >> maybe Reiserfs was blocking all the pages in the inactive >> list from being written because one of the active pages >> (not a replacement candidate) needed to be written out >> first? Or does the Reiserfs ->writepage() function handle >> this? > In answer to Rik's question, no, sysrq-S doesn't fix it. The sound of the disk grind changes momentarily then it resumes. > In most cases, the reiserfs writepage func is the same as block_write_full_page. >The only difference should come when a packed tail has been mmap'd. > > Since JOURNAL_MAX_BATCH was at 100, the log should have only pinned 400k. More >blocks could be pinned, but kreiserfsd should be in the process of flushing those. > > I've been trying out a few things to send memory pressure down to reiserfs, but they >have mostly been based on the code to make fs/buffer.c use writepage for flushing. I >should have done something simple first, I'll start on that now. > > -chris http://stuph.org/VM/ is back up, no thanks to the network solutions. The files listed there have all the relevant backtraces. -d - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs
My apologies...my internic data isn't updated, http://208.179.0.18/VM/ -d -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs
Chris Mason wrote: > Sorry, can't seem to resolve stuph.org. What is kreiserfsd doing during when the >system is waiting for more ram? With JOURNAL_MAX_BATCH set to 100, kreiserfsd will >end up responsible for sending log blocks/metadata to disk and freeing the pinned >buffers. > > -chris (http://208.179.0.18/VM/) [schedule_timeout+115/148] [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92] [reiserfs_journal_commit_thread+173/224] [kernel_thread+40/56] -d -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs
My apologies...my internic data isn't updated, http://208.179.0.18/VM/ -d -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs
On Thursday, February 01, 2001 02:16:43 PM -0200 Rik van Riel <[EMAIL PROTECTED]> wrote: > > About the system hanging completely, I wonder if it goes > away by pressing sysrq-S (sync all disks). If it does, > maybe Reiserfs was blocking all the pages in the inactive > list from being written because one of the active pages > (not a replacement candidate) needed to be written out > first? Or does the Reiserfs ->writepage() function handle > this? > In most cases, the reiserfs writepage func is the same as block_write_full_page. The only difference should come when a packed tail has been mmap'd. Since JOURNAL_MAX_BATCH was at 100, the log should have only pinned 400k. More blocks could be pinned, but kreiserfsd should be in the process of flushing those. I've been trying out a few things to send memory pressure down to reiserfs, but they have mostly been based on the code to make fs/buffer.c use writepage for flushing. I should have done something simple first, I'll start on that now. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM brokenness, possibly related to reiserfs
On Wednesday, January 31, 2001 11:02:46 PM -0800 David Ford <[EMAIL PROTECTED]> wrote: > (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect > anything). > > Ok, having approached this slightly more intelligently here are [better] > results. > > The dumps are large so they are located at http://stuph.org/VM/. Here's > the story. Sorry, can't seem to resolve stuph.org. What is kreiserfsd doing during when the system is waiting for more ram? With JOURNAL_MAX_BATCH set to 100, kreiserfsd will end up responsible for sending log blocks/metadata to disk and freeing the pinned buffers. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM brokenness, possibly related to reiserfs
On Wed, 31 Jan 2001, David Ford wrote: > The dumps are large so they are located at http://stuph.org/VM/. I can't seem to resolve this domain ... > Here's the story. I boot and startx, I load xmms and netscape > to eat away memory. When free buffers/cache falls below 7M the > system stalls and the only recovery is sysrq-E or reboot. At > the moment of stall the disk will grind continuously for about > 25 to 30 minutes then go silent. At this point in time the only > recovery is reboot, sysrq-E won't work. > Kernel 2.4.1, with reiserfs, devfs, no patches applied. Between 2.4.0 and 2.4.1, a few VM changes were made by Linus and Marcelo. These are good changes, but they seem to need a little bit of tuning in related code to get system behaviour right again. I'm working on these changes and should have a (few) rebalancing patches out soon to fix the performance problem. About the system hanging completely, I wonder if it goes away by pressing sysrq-S (sync all disks). If it does, maybe Reiserfs was blocking all the pages in the inactive list from being written because one of the active pages (not a replacement candidate) needed to be written out first? Or does the Reiserfs ->writepage() function handle this? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM brokenness, possibly related to reiserfs
On Thu, 1 Feb 2001, David Ford wrote: > Correct, the point of the matter is to find stress points. It > will do the exact same thing when it reaches the end of swap. > I suspect a relation to reiserfs fighting for buffers perhaps. > This fight occurs a few megs before the OOM routine trips. Ah, so this is the problem. I've already heard that the OOM killer is no longer triggered for some reason, I'll enter the bug in the Linux-MM bugzilla so I don't forget it and I'll fix it ASAP. ObBugzillaURL: http://www.linux-mm.org/bugzilla.shtml regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM brokenness, possibly related to reiserfs
Correct, the point of the matter is to find stress points. It will do the exact same thing when it reaches the end of swap. I suspect a relation to reiserfs fighting for buffers perhaps. This fight occurs a few megs before the OOM routine trips. -d Ed Tomlinson wrote: > Hi, > > Gather this is with no swap space allocated... And the question is why does > the oom handler not get triggered? > > Ed Tomlinson > > David Ford wrote: > > > (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect > > anything). > > > > Ok, having approached this slightly more intelligently here are [better] > > results. > > > > The dumps are large so they are located at http://stuph.org/VM/. Here's > > the story. I boot and startx, I load xmms and netscape to eat away > > memory. When free buffers/cache falls below 7M the system stalls and > > the only recovery is sysrq-E or reboot. At the moment of stall the disk > > will grind continuously for about 25 to 30 minutes then go silent. At > > this point in time the only recovery is reboot, sysrq-E won't work. > > > > If I move the mouse or type a key within 30 seconds of this incident, > > that user input will take about 5 minutes to register. After that > > initial minute, nothing more will happen. > > > > Kernel 2.4.1, with reiserfs, devfs, no patches applied. > > > > "klog-X" are basically the same thing but I'm running top, syslogd, and > > klogd with -20 priority. I didn't note anything out of the ordinary in > > top. These are snapshots where I've managed to murder processes and > > restart the problem without rebooting. > > > > In the second instance, I had my finger on the kill button and managed > > to kill netscape and recover partially. However the system was heavily > > loaded even after the kill. > > > > I have xmms in STOPped state so it's just waiting. > > > > kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is > > taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal. > > The system load was hovering at 1.00 for a few minutes then dropped to > > zero. However scrolling text in an rxvt is slow enough to watch blocks > > move. Running "ps aux" takes nearly one third of a second for total > > time. Total number of processes is ~40. > > > > Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3 > > 1(L-TLB) 4 2 > > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] > > [process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692] > > [kernel_thread+31/56] [kernel_thread+40/56] > > > > Jan 31 22:31:51 nifty kernel: kswapdS CBF75FAC 5704 4 > > 1(L-TLB) 5 3 > > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] > > [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92] > > [kswapd+213/244] [kernel_thread+40/56] > > > > Jan 31 22:31:52 nifty kernel: bdflush S CBF7 5912 6 > > 1(L-TLB) 7 5 > > Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216] > > [kernel_thread+40/56] > > > > > > In the fourth snapshot, I have put xmms in STOP state again inside the > > memory shortage, memory is at 4800 free buffers/cache and 1592 free mem. > > > > As I entered this shortage period I started a 'ps -eo ... > file' to try > > and record data there. This is the only disk activity happening. Load > > is ~4.00. I have now killed the ps. > > > > Load has dropped significantly and I have tolerable but quite laggy user > > input responsiveness now. > > > > Memory is currently 4900/1588 like above. Load is about 2.00 and will > > continue dropping if I don't do anything. Any processes I exec which > > need to be loaded from disk take several seconds. I.e. 'uptime' takes > > about 4 seconds to execute. > > > > Snapshot #5 will be the last one and I will reboot. Once memory is > > freed from xmms (back to 150megs free), everything is peachy. > > > > > > -d > > > > -- > > There is a natural aristocracy among men. The grounds of this are virtue > and talents. > > Thomas Jefferson The good thing about standards is that there are so many > to choose > > from. Andrew S. Tanenbaum > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > Please read the FAQ at http://www.tux.org/lkml/ > > -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM brokenness, possibly related to reiserfs
Hi, Gather this is with no swap space allocated... And the question is why does the oom handler not get triggered? Ed Tomlinson David Ford wrote: > (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect > anything). > > Ok, having approached this slightly more intelligently here are [better] > results. > > The dumps are large so they are located at http://stuph.org/VM/. Here's > the story. I boot and startx, I load xmms and netscape to eat away > memory. When free buffers/cache falls below 7M the system stalls and > the only recovery is sysrq-E or reboot. At the moment of stall the disk > will grind continuously for about 25 to 30 minutes then go silent. At > this point in time the only recovery is reboot, sysrq-E won't work. > > If I move the mouse or type a key within 30 seconds of this incident, > that user input will take about 5 minutes to register. After that > initial minute, nothing more will happen. > > Kernel 2.4.1, with reiserfs, devfs, no patches applied. > > "klog-X" are basically the same thing but I'm running top, syslogd, and > klogd with -20 priority. I didn't note anything out of the ordinary in > top. These are snapshots where I've managed to murder processes and > restart the problem without rebooting. > > In the second instance, I had my finger on the kill button and managed > to kill netscape and recover partially. However the system was heavily > loaded even after the kill. > > I have xmms in STOPped state so it's just waiting. > > kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is > taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal. > The system load was hovering at 1.00 for a few minutes then dropped to > zero. However scrolling text in an rxvt is slow enough to watch blocks > move. Running "ps aux" takes nearly one third of a second for total > time. Total number of processes is ~40. > > Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3 > 1(L-TLB) 4 2 > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] > [process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692] > [kernel_thread+31/56] [kernel_thread+40/56] > > Jan 31 22:31:51 nifty kernel: kswapdS CBF75FAC 5704 4 > 1(L-TLB) 5 3 > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] > [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92] > [kswapd+213/244] [kernel_thread+40/56] > > Jan 31 22:31:52 nifty kernel: bdflush S CBF7 5912 6 > 1(L-TLB) 7 5 > Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216] > [kernel_thread+40/56] > > > In the fourth snapshot, I have put xmms in STOP state again inside the > memory shortage, memory is at 4800 free buffers/cache and 1592 free mem. > > As I entered this shortage period I started a 'ps -eo ... > file' to try > and record data there. This is the only disk activity happening. Load > is ~4.00. I have now killed the ps. > > Load has dropped significantly and I have tolerable but quite laggy user > input responsiveness now. > > Memory is currently 4900/1588 like above. Load is about 2.00 and will > continue dropping if I don't do anything. Any processes I exec which > need to be loaded from disk take several seconds. I.e. 'uptime' takes > about 4 seconds to execute. > > Snapshot #5 will be the last one and I will reboot. Once memory is > freed from xmms (back to 150megs free), everything is peachy. > > > -d > > -- > There is a natural aristocracy among men. The grounds of this are virtue and talents. > Thomas Jefferson The good thing about standards is that there are so many to choose > from. Andrew S. Tanenbaum > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
VM brokenness, possibly related to reiserfs
(Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect anything). Ok, having approached this slightly more intelligently here are [better] results. The dumps are large so they are located at http://stuph.org/VM/. Here's the story. I boot and startx, I load xmms and netscape to eat away memory. When free buffers/cache falls below 7M the system stalls and the only recovery is sysrq-E or reboot. At the moment of stall the disk will grind continuously for about 25 to 30 minutes then go silent. At this point in time the only recovery is reboot, sysrq-E won't work. If I move the mouse or type a key within 30 seconds of this incident, that user input will take about 5 minutes to register. After that initial minute, nothing more will happen. Kernel 2.4.1, with reiserfs, devfs, no patches applied. "klog-X" are basically the same thing but I'm running top, syslogd, and klogd with -20 priority. I didn't note anything out of the ordinary in top. These are snapshots where I've managed to murder processes and restart the problem without rebooting. In the second instance, I had my finger on the kill button and managed to kill netscape and recover partially. However the system was heavily loaded even after the kill. I have xmms in STOPped state so it's just waiting. kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal. The system load was hovering at 1.00 for a few minutes then dropped to zero. However scrolling text in an rxvt is slow enough to watch blocks move. Running "ps aux" takes nearly one third of a second for total time. Total number of processes is ~40. Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3 1(L-TLB) 4 2 Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] [process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692] [kernel_thread+31/56] [kernel_thread+40/56] Jan 31 22:31:51 nifty kernel: kswapdS CBF75FAC 5704 4 1(L-TLB) 5 3 Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148] [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92] [kswapd+213/244] [kernel_thread+40/56] Jan 31 22:31:52 nifty kernel: bdflush S CBF7 5912 6 1(L-TLB) 7 5 Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216] [kernel_thread+40/56] In the fourth snapshot, I have put xmms in STOP state again inside the memory shortage, memory is at 4800 free buffers/cache and 1592 free mem. As I entered this shortage period I started a 'ps -eo ... > file' to try and record data there. This is the only disk activity happening. Load is ~4.00. I have now killed the ps. Load has dropped significantly and I have tolerable but quite laggy user input responsiveness now. Memory is currently 4900/1588 like above. Load is about 2.00 and will continue dropping if I don't do anything. Any processes I exec which need to be loaded from disk take several seconds. I.e. 'uptime' takes about 4 seconds to execute. Snapshot #5 will be the last one and I will reboot. Once memory is freed from xmms (back to 150megs free), everything is peachy. -d -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/