Re: Linux 2.4.5-ac15 / 2.4.6-pre5
Mike Galbraith schrieb am Freitag, den 22. Juni 2001: > > 6 5 1 77232 2692 2136 47004 560 892 2048 1524 10428 285529 2 98 0 >^ > Was disk running? (I bet not.. bet it stopped just after stall began) There was no disk activity during the stall. Walter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15 / 2.4.6-pre5
On Fri, 22 Jun 2001, Walter Hofmann wrote: > Ok, I managed to press SysRq-T this time ond got a trace for my hang. > Symbols are resolved by klog. If you prefer ksymopps please tell me, I > used klog because ksymopps seems to drop all lines without symbols. Someone else might want that and/or a complete trace. I can see enough to say it looks an awful lot like a little gremlin that's been plagueing me off and on for months. (off at the moment. if he moved into your box, you can keep him.. I don't want him back:)) > There seem to be no kernel deamons in the trace? Is this normal, or is > the log buffer too small? If it is the latter, how can I increase its > size? I don't think it matters much. I strongly suspect we'd just see more of the same. Try commenting out the current->policy |= SCHED_YIELD in __alloc_pages() just for grins (more or less). > 6 5 1 77232 2692 2136 47004 560 892 2048 1524 10428 285529 2 98 0 ^ Was disk running? (I bet not.. bet it stopped just after stall began) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15 / 2.4.6-pre5
Mike Galbraith schrieb am Donnerstag, den 21. Juni 2001: > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 >^ > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > > block on IO, so they loop insanely). > > Why doesn't the VM hang the syncing of queued IO on these guys via > wait_event or such instead of trying to just let the allocation fail? > (which seems to me will only cause the allocation to be resubmitted, > effectively changing nothing but adding overhead) Does failing the > allocation in fact accomplish more than what I'm (uhoh:) assuming? Ok, I managed to press SysRq-T this time ond got a trace for my hang. Symbols are resolved by klog. If you prefer ksymopps please tell me, I used klog because ksymopps seems to drop all lines without symbols. There seem to be no kernel deamons in the trace? Is this normal, or is the log buffer too small? If it is the latter, how can I increase its size? Kernel was 2.4.6pre5 plus Rik's patch (at the end). I see the same hangs with the ac series. Walter Jun 22 15:42:09 frodo kernel: 2672 1021 1 1035 (NOTLB)1050 1004 Jun 22 15:42:10 frodo kernel: Call Trace: [sys_wait4+875/924] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldS 7FFF 0 1035 1021 1055 (NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: smbd S 7FFF 0 1050 1(NOTLB) 1051 1021 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: sshd S 7FFF 0 1051 1(NOTLB) 1060 1050 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldR 5644 1055 1035 1056 (NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [__alloc_pages+272/656] [_alloc_pages+24/28] [__get_free_pages+10/24] [__pollwait+51/148] [pipe_poll+38/100] [do_pollfd+94/176] [do_poll+167/228] Jun 22 15:42:10 frodo kernel:[sys_poll+603/884] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldS C5C8A000 5704 1056 1055(NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:10 frodo kernel: wwwoffled S C5F7BF10 2672 1060 1 4417 (NOTLB) 1064 1051 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: cron S C5F5DF7C 0 1064 1(NOTLB) 1068 1060 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [sys_nanosleep+304/428] [system_call+51/56] Jun 22 15:42:10 frodo kernel: in.identd S 7FFF 0 1068 1 1070 (NOTLB) 1083 1064 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [wait_for_connect+308/420] [tcp_accept+134/408] [inet_accept+48/316] [sys_accept+102/244] [do_fork+1567/1756] [schedule+714/1064] Jun 22 15:42:10 frodo kernel:[restore_sigcontext+273/312] [sys_socketcall+172/476] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd R 3444 1070 1068 1081 (NOTLB) Jun 22 15:42:11 frodo kernel: Call Trace: [__alloc_pages+272/656] [_alloc_pages+24/28] [__get_free_pages+10/24] [sys_poll+310/884] [handle_IRQ_event+49/92] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C5B7A00016 1071 1070(NOTLB) 1076 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7806000 0 1076 1070(NOTLB) 1077 1071 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7FBC000 0 1077 1070(NOTLB) 1078 1076 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7FB8000 2676 1078 1070(NOTLB) 1081 1077 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C6964000 0 1081 1070(NOTLB) 1078 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: nscd S C739BF14 0 1083 1 1085 (NOTLB) 1098 1068 Jun 22 15:42:11 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [do_poll+55/228] [sys_poll+603/884] [sys_newstat+103/116]
Re: Linux 2.4.5-ac15
On Wed, 20 Jun 2001, Rik van Riel wrote: > > FWIW, here is the vmstat output for the second (short) hang. Taken with > > ac14, vmstat 1 was started (long) before the hang and interrupted about > > five seconds after it. The machine has 128MB RAM and 256MB swap. > > >procs memoryswap io system cpu > > r b w swpd free buff cache si sobibo incs us sy id > > 1 0 0 77000 1464 18444 67324 8 0 152 224 386 1345 26 19 55 > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 1 98 1 > > Does the following patch help with this problem, or are > you both experiencing something unrelated to this particular > buglet ? Hi Rik, I tried 2.4.6-pre5 with your patch (quoted at the end). Oberservations: I still see this hang, it seemed to last longer than with ac14/ac15 (say, 30 seconds). There was no heavy swapping going on, eiter before or after the hang. During the hang there was no disc activity. Compared with 2.4.5ac I saw that 2.4.6-pre5 uses much less swap (according to xosview). With the load I tried (many open browser windows) the ac series used to use 80-100MB of swap; 2.4.6-pre5 only used 40MB swap for roughly the same number of windows open. I forgot to press SysRq-T to get a trace, I'm afraid. kdb didn't compile with this kernel either (although patching worked). I had vmstat running in another window and stopped it a couple of seconds after the hang, here are the last line of its output: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 0 36424 3232888 45036 0 0 4 0 255 3250 56 13 32 1 0 0 36424 3096888 45048 0 012 0 140 1010 37 6 58 4 0 0 36424 2964888 45060 0 012 0 228 1304 90 6 4 3 0 0 36424 3052900 44668 0 088 0 259 2522 88 12 0 2 0 0 36424 3164900 44524 0 0 4 0 144 3556 87 13 0 3 0 0 36424 2812900 44468 0 0 8 0 211 2007 87 11 3 5 0 0 36424 2812912 44108 0 020 0 196 1243 92 8 0 4 0 0 36424 2812920 43836 0 0 108 0 271 2928 88 12 0 4 0 0 36424 2808920 42728 0 0 228 0 284 2042 85 11 5 2 0 0 36424 3112924 42416 76 5004 288 5260 385 948 84 11 6 4 0 0 36424 2816940 42016 0 0 100 0 223 1252 94 3 3 3 0 0 36424 2812944 41472 0 0 0 0 229 1392 92 8 0 3 0 0 36424 2812948 41112 0 068 0 264 1107 95 3 2 1 0 0 36424 2932948 40756 0 0 0 0 262 879 92 8 0 2 0 0 36424 2808952 40740 0 0 0 0 191 2244 36 12 53 4 0 0 36424 2808952 40504 32 032 0 242 975 93 6 2 2 0 0 36424 3252956 40008 0 064 0 249 2505 85 15 0 3 0 0 36424 2972956 39996 0 0 8 0 127 1419 88 10 2 3 0 0 36424 2988956 39108 0 020 0 247 1632 83 17 0 2 0 0 36424 3332964 38496 0 0 176 0 218 955 91 9 0 3 0 0 36424 3180964 38724 120 0 232 0 112 3026 89 11 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 4 0 0 36424 3020968 38800 64 064 0 158 2008 87 13 1 3 0 0 36424 2808936 38192 0 0 192 552 232 678 90 6 4 2 0 0 36424 2988936 37632 0 0 0 4 167 678 98 2 0 2 0 0 36424 2868940 37592 0 0 4 104 177 1137 93 7 0 3 0 0 36396 2852940 37592 0 0 020 185 1125 93 7 0 4 0 0 36396 2848984 37624 0 06064 193 1245 92 8 0 5 0 0 36396 2244 1000 37656 0 028 176 161 2377 69 31 0 1 0 0 36396 2364 1004 37660 0 0 8 244 180 1836 75 25 0 1 0 1 36396 2484 1004 37780 100 0 104 248 178 2369 61 38 1 4 0 1 36384 2020 1012 38328 520 0 560 148 185 1696 58 19 22 6 0 0 45940 1744 1012 47676 108 724 368 868 6886 186930 1 99 0 2 0 1 45856 2528 1028 46480 272 5480 752 5524 264 2413 82 18 0 5 0 0 46072 2732 1028 45740 0 6636 8 6636 297 1165 84 16 0 4 0 0 46072 2532 1028 45776 0 020 4 245 3310 88 13 0 3 0 0 46072 2392 1040 45336 0 024 0 119 1296 91 9 0 2 0 0 46072 2832 1052 44872 0 048 4 113 1276 91 9 0 3 0 0 46072 2392 1056 44544 0 0 0 0 104 943 97 3 0 2 0 0 46068 2808 1056 44112 1104 0 1164 0 144 870 70 11 19 1 0 0 46052
Re: Linux 2.4.5-ac15
On Fri, 22 Jun 2001, Marcelo Tosatti wrote: > On Fri, 22 Jun 2001, Mike Galbraith wrote: > > > One thing that _could_ be done about looping allocations is to steal > > a page from the clean list ignoring PageReferenced (if you have any). > > That would be a very expensive 'rob Peter to pay Paul' trade though. > > Don't like it. (I like it only slightly better than using cpu to heat air;) Oh well. Someone will think up the right answer eventually. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Fri, 22 Jun 2001, Mike Galbraith wrote: > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > > > > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 >1 > > >^ > > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > > > > block on IO, so they loop insanely). > > > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > > wait_event or such instead of trying to just let the allocation fail? > ... > > > Does failing the allocation in fact accomplish more than what I'm > > > (uhoh:) assuming? > > > > No. > > hmm.. > > Jun 18 07:11:36 kernel: reclaim_page: salvaged ref:1 age:0 buf:0 cnt:1 > Jun 18 07:11:36 last message repeated 27 times > > One thing that _could_ be done about looping allocations is to steal > a page from the clean list ignoring PageReferenced (if you have any). > That would be a very expensive 'rob Peter to pay Paul' trade though. Don't like it. This goes against the aging logic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 > >^ > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > > > block on IO, so they loop insanely). > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > wait_event or such instead of trying to just let the allocation fail? ... > > Does failing the allocation in fact accomplish more than what I'm > > (uhoh:) assuming? > > No. hmm.. Jun 18 07:11:36 kernel: reclaim_page: salvaged ref:1 age:0 buf:0 cnt:1 Jun 18 07:11:36 last message repeated 27 times One thing that _could_ be done about looping allocations is to steal a page from the clean list ignoring PageReferenced (if you have any). That would be a very expensive 'rob Peter to pay Paul' trade though. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? ... Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? No. hmm.. Jun 18 07:11:36 kernel: reclaim_page: salvaged ref:1 age:0 buf:0 cnt:1 Jun 18 07:11:36 last message repeated 27 times One thing that _could_ be done about looping allocations is to steal a page from the clean list ignoring PageReferenced (if you have any). That would be a very expensive 'rob Peter to pay Paul' trade though. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Fri, 22 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? ... Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? No. hmm.. Jun 18 07:11:36 kernel: reclaim_page: salvaged ref:1 age:0 buf:0 cnt:1 Jun 18 07:11:36 last message repeated 27 times One thing that _could_ be done about looping allocations is to steal a page from the clean list ignoring PageReferenced (if you have any). That would be a very expensive 'rob Peter to pay Paul' trade though. Don't like it. This goes against the aging logic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Fri, 22 Jun 2001, Marcelo Tosatti wrote: On Fri, 22 Jun 2001, Mike Galbraith wrote: One thing that _could_ be done about looping allocations is to steal a page from the clean list ignoring PageReferenced (if you have any). That would be a very expensive 'rob Peter to pay Paul' trade though. Don't like it. (I like it only slightly better than using cpu to heat air;) Oh well. Someone will think up the right answer eventually. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Wed, 20 Jun 2001, Rik van Riel wrote: FWIW, here is the vmstat output for the second (short) hang. Taken with ac14, vmstat 1 was started (long) before the hang and interrupted about five seconds after it. The machine has 128MB RAM and 256MB swap. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 77000 1464 18444 67324 8 0 152 224 386 1345 26 19 55 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 1 98 1 Does the following patch help with this problem, or are you both experiencing something unrelated to this particular buglet ? Hi Rik, I tried 2.4.6-pre5 with your patch (quoted at the end). Oberservations: I still see this hang, it seemed to last longer than with ac14/ac15 (say, 30 seconds). There was no heavy swapping going on, eiter before or after the hang. During the hang there was no disc activity. Compared with 2.4.5ac I saw that 2.4.6-pre5 uses much less swap (according to xosview). With the load I tried (many open browser windows) the ac series used to use 80-100MB of swap; 2.4.6-pre5 only used 40MB swap for roughly the same number of windows open. I forgot to press SysRq-T to get a trace, I'm afraid. kdb didn't compile with this kernel either (although patching worked). I had vmstat running in another window and stopped it a couple of seconds after the hang, here are the last line of its output: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 0 36424 3232888 45036 0 0 4 0 255 3250 56 13 32 1 0 0 36424 3096888 45048 0 012 0 140 1010 37 6 58 4 0 0 36424 2964888 45060 0 012 0 228 1304 90 6 4 3 0 0 36424 3052900 44668 0 088 0 259 2522 88 12 0 2 0 0 36424 3164900 44524 0 0 4 0 144 3556 87 13 0 3 0 0 36424 2812900 44468 0 0 8 0 211 2007 87 11 3 5 0 0 36424 2812912 44108 0 020 0 196 1243 92 8 0 4 0 0 36424 2812920 43836 0 0 108 0 271 2928 88 12 0 4 0 0 36424 2808920 42728 0 0 228 0 284 2042 85 11 5 2 0 0 36424 3112924 42416 76 5004 288 5260 385 948 84 11 6 4 0 0 36424 2816940 42016 0 0 100 0 223 1252 94 3 3 3 0 0 36424 2812944 41472 0 0 0 0 229 1392 92 8 0 3 0 0 36424 2812948 41112 0 068 0 264 1107 95 3 2 1 0 0 36424 2932948 40756 0 0 0 0 262 879 92 8 0 2 0 0 36424 2808952 40740 0 0 0 0 191 2244 36 12 53 4 0 0 36424 2808952 40504 32 032 0 242 975 93 6 2 2 0 0 36424 3252956 40008 0 064 0 249 2505 85 15 0 3 0 0 36424 2972956 39996 0 0 8 0 127 1419 88 10 2 3 0 0 36424 2988956 39108 0 020 0 247 1632 83 17 0 2 0 0 36424 3332964 38496 0 0 176 0 218 955 91 9 0 3 0 0 36424 3180964 38724 120 0 232 0 112 3026 89 11 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 4 0 0 36424 3020968 38800 64 064 0 158 2008 87 13 1 3 0 0 36424 2808936 38192 0 0 192 552 232 678 90 6 4 2 0 0 36424 2988936 37632 0 0 0 4 167 678 98 2 0 2 0 0 36424 2868940 37592 0 0 4 104 177 1137 93 7 0 3 0 0 36396 2852940 37592 0 0 020 185 1125 93 7 0 4 0 0 36396 2848984 37624 0 06064 193 1245 92 8 0 5 0 0 36396 2244 1000 37656 0 028 176 161 2377 69 31 0 1 0 0 36396 2364 1004 37660 0 0 8 244 180 1836 75 25 0 1 0 1 36396 2484 1004 37780 100 0 104 248 178 2369 61 38 1 4 0 1 36384 2020 1012 38328 520 0 560 148 185 1696 58 19 22 6 0 0 45940 1744 1012 47676 108 724 368 868 6886 186930 1 99 0 2 0 1 45856 2528 1028 46480 272 5480 752 5524 264 2413 82 18 0 5 0 0 46072 2732 1028 45740 0 6636 8 6636 297 1165 84 16 0 4 0 0 46072 2532 1028 45776 0 020 4 245 3310 88 13 0 3 0 0 46072 2392 1040 45336 0 024 0 119 1296 91 9 0 2 0 0 46072 2832 1052 44872 0 048 4 113 1276 91 9 0 3 0 0 46072 2392 1056 44544 0 0 0 0 104 943 97 3 0 2 0 0 46068 2808 1056 44112 1104 0 1164 0 144 870 70 11 19 1 0 0 46052 2812 1060
Re: Linux 2.4.5-ac15 / 2.4.6-pre5
Mike Galbraith schrieb am Donnerstag, den 21. Juni 2001: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? (which seems to me will only cause the allocation to be resubmitted, effectively changing nothing but adding overhead) Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? Ok, I managed to press SysRq-T this time ond got a trace for my hang. Symbols are resolved by klog. If you prefer ksymopps please tell me, I used klog because ksymopps seems to drop all lines without symbols. There seem to be no kernel deamons in the trace? Is this normal, or is the log buffer too small? If it is the latter, how can I increase its size? Kernel was 2.4.6pre5 plus Rik's patch (at the end). I see the same hangs with the ac series. Walter Jun 22 15:42:09 frodo kernel: 2672 1021 1 1035 (NOTLB)1050 1004 Jun 22 15:42:10 frodo kernel: Call Trace: [sys_wait4+875/924] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldS 7FFF 0 1035 1021 1055 (NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: smbd S 7FFF 0 1050 1(NOTLB) 1051 1021 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: sshd S 7FFF 0 1051 1(NOTLB) 1060 1050 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldR 5644 1055 1035 1056 (NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [__alloc_pages+272/656] [_alloc_pages+24/28] [__get_free_pages+10/24] [__pollwait+51/148] [pipe_poll+38/100] [do_pollfd+94/176] [do_poll+167/228] Jun 22 15:42:10 frodo kernel:[sys_poll+603/884] [system_call+51/56] Jun 22 15:42:10 frodo kernel: mysqldS C5C8A000 5704 1056 1055(NOTLB) Jun 22 15:42:10 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:10 frodo kernel: wwwoffled S C5F7BF10 2672 1060 1 4417 (NOTLB) 1064 1051 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [do_select+153/520] [sys_select+1071/1436] [system_call+51/56] Jun 22 15:42:10 frodo kernel: cron S C5F5DF7C 0 1064 1(NOTLB) 1068 1060 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [sys_nanosleep+304/428] [system_call+51/56] Jun 22 15:42:10 frodo kernel: in.identd S 7FFF 0 1068 1 1070 (NOTLB) 1083 1064 Jun 22 15:42:10 frodo kernel: Call Trace: [schedule_timeout+23/152] [wait_for_connect+308/420] [tcp_accept+134/408] [inet_accept+48/316] [sys_accept+102/244] [do_fork+1567/1756] [schedule+714/1064] Jun 22 15:42:10 frodo kernel:[restore_sigcontext+273/312] [sys_socketcall+172/476] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd R 3444 1070 1068 1081 (NOTLB) Jun 22 15:42:11 frodo kernel: Call Trace: [__alloc_pages+272/656] [_alloc_pages+24/28] [__get_free_pages+10/24] [sys_poll+310/884] [handle_IRQ_event+49/92] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C5B7A00016 1071 1070(NOTLB) 1076 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7806000 0 1076 1070(NOTLB) 1077 1071 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7FBC000 0 1077 1070(NOTLB) 1078 1076 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C7FB8000 2676 1078 1070(NOTLB) 1081 1077 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: in.identd S C6964000 0 1081 1070(NOTLB) 1078 Jun 22 15:42:11 frodo kernel: Call Trace: [sys_rt_sigsuspend+255/284] [system_call+51/56] Jun 22 15:42:11 frodo kernel: nscd S C739BF14 0 1083 1 1085 (NOTLB) 1098 1068 Jun 22 15:42:11 frodo kernel: Call Trace: [schedule_timeout+120/152] [process_timeout+0/76] [do_poll+55/228] [sys_poll+603/884] [sys_newstat+103/116]
Re: Linux 2.4.5-ac15 / 2.4.6-pre5
On Fri, 22 Jun 2001, Walter Hofmann wrote: Ok, I managed to press SysRq-T this time ond got a trace for my hang. Symbols are resolved by klog. If you prefer ksymopps please tell me, I used klog because ksymopps seems to drop all lines without symbols. Someone else might want that and/or a complete trace. I can see enough to say it looks an awful lot like a little gremlin that's been plagueing me off and on for months. (off at the moment. if he moved into your box, you can keep him.. I don't want him back:)) There seem to be no kernel deamons in the trace? Is this normal, or is the log buffer too small? If it is the latter, how can I increase its size? I don't think it matters much. I strongly suspect we'd just see more of the same. Try commenting out the current-policy |= SCHED_YIELD in __alloc_pages() just for grins (more or less). 6 5 1 77232 2692 2136 47004 560 892 2048 1524 10428 285529 2 98 0 ^ Was disk running? (I bet not.. bet it stopped just after stall began) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15 / 2.4.6-pre5
Mike Galbraith schrieb am Freitag, den 22. Juni 2001: 6 5 1 77232 2692 2136 47004 560 892 2048 1524 10428 285529 2 98 0 ^ Was disk running? (I bet not.. bet it stopped just after stall began) There was no disk activity during the stall. Walter - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thursday 21 June 2001 21:50, Marcelo Tosatti wrote: > On Thu, 21 Jun 2001, Daniel Phillips wrote: > > On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: > > > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they > > > > > can't block on IO, so they loop insanely). > > > > > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > > > wait_event or such instead of trying to just let the allocation fail? > > > > > > Actually the VM should limit the amount of data being queued for _all_ > > > kind of allocations. > > > > > > The problem is the lack of a mechanism which allows us to account the > > > approximated amount of queued IO by the VM. (except for swap pages) > > > > Coincidence - that's what I started working on two days ago, and I'm > > moving into the second generation design today. Look at > > 'queued_sectors'. I found pretty quickly it's not enough, today I'm > > adding 'submitted_sectors' to the soup. This will allow me to > > distinguish between traffic generated by my own thread and other traffic. > > Could you expand on this, please ? OK, I am doing opportunistic flushing, so I want to know that nobody else is using the disk, and so long as that's true, I'll keep flushing out buffers. Conversely, if anybody else queues a request I'll bail out of the flush loop as soon as I've flushed the absolute minimum number of buffers, i.e., the ones that were dirtied more than bdflush_params->age_buffer ago. But how do I know if somebody else is submitting requests? The surest way to know is to have a sumitted_sectors counter that just counts every submission, and compare that to the number of sectors I know I've submitted. (This counter wraps, so I actually track the difference from value on entering the flush loop). The first thing I found (duh) is that nobody else ever submits anything while I'm in the flush loop because I'm on UP and I never (almost never) yield the CPU. On SMP I will get other threads submitting, but only rarely will the submission happen while I'm in the flush loop. No good, I'm not detecting the other disk activity reliably, back to the drawing board. My original plan was to compute a running average of submission rates and use that to control my opportunistic flushing. I departed from that because I seemed to get good results with a much simpler strategy, the patch I already posted. It's fundamentally flawed though - it works fine for constant light load and constant full load, but not for sporadic loads. What I need is something a lot smoother, more analog, so I'll return to my original plan. What I want to notice is that the IO submission rate has fallen below a certain level then, when the IO backlog has also fallen below a few ms worth of transfers I can do the opportunistic flushing. In the flush loop I want to submit enough buffers to make sure I'm using the full bandwidth, but not so many that I create a big backlog that gets in the way of a surge in demand from some other source. I'm still working out the details of that, I will not post an updated patch today after all ;-) By the way, there's a really important throughput benefit for doing this early flushing that I didn't put in the list when I first wrote about it. It's this: whenever we have a bunch of buffers dirtied, if the disk bandwidth is available we want to load up the disk right away, not 5 seconds from now. If we wait 5 seconds, we just wasted 5 seconds of disk bandwidth. Again, duh. So my goal in doing this was initially do have it cost as little in throughput as possible - I see now that it's actually a win for throughput. End of discussion about whether to put in the effort or not. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Daniel Phillips wrote: > On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: > > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they > > > > can't block on IO, so they loop insanely). > > > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > > wait_event or such instead of trying to just let the allocation fail? > > > > Actually the VM should limit the amount of data being queued for _all_ > > kind of allocations. > > > > The problem is the lack of a mechanism which allows us to account the > > approximated amount of queued IO by the VM. (except for swap pages) > > Coincidence - that's what I started working on two days ago, and I'm moving > into the second generation design today. Look at 'queued_sectors'. I found > pretty quickly it's not enough, today I'm adding 'submitted_sectors' to the > soup. This will allow me to distinguish between traffic generated by my own > thread and other traffic. Could you expand on this, please ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they > > > can't block on IO, so they loop insanely). > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > wait_event or such instead of trying to just let the allocation fail? > > Actually the VM should limit the amount of data being queued for _all_ > kind of allocations. > > The problem is the lack of a mechanism which allows us to account the > approximated amount of queued IO by the VM. (except for swap pages) Coincidence - that's what I started working on two days ago, and I'm moving into the second generation design today. Look at 'queued_sectors'. I found pretty quickly it's not enough, today I'm adding 'submitted_sectors' to the soup. This will allow me to distinguish between traffic generated by my own thread and other traffic. > > Does failing the allocation in fact accomplish more than what I'm > > (uhoh:) assuming? > > No. > > It sucks really badly. Amen. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 > >^ > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > > > block on IO, so they loop insanely). > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > wait_event or such instead of trying to just let the allocation fail? > > Actually the VM should limit the amount of data being queued for _all_ > kind of allocations. Limiting the amount of data being queued for IO will make things less ragged, but you can't limit the IO.. pages returning to service upon completion is the only thing keeping you alive. That's why I hate not seeing my disk utterly saturated when things get hot and heavy. The only thing that I can see that's possible is to let tasks proceed in an ordered fashion as pages return.. take a number and wait. IMHO, right now we try to maintain low latency way too long and end up with the looping problem because of that. We need a more controlled latency roll-down to the full disk speed wall. We hit it and go splat ;-) > The problem is the lack of a mechanism which allows us to account the > approximated amount of queued IO by the VM. (except for swap pages) Ingo once mentioned an io thingy for vm, but I got kind of dizzy trying to figure out exactly how I'd impliment, what with clustering and getting information to seperate io threads and back ;-) > You can see it this way: To get free memory we're "polling" instead of > waiting on the IO completion of pages. > > > (which seems to me will only cause the allocation to be resubmitted, > > effectively changing nothing but adding overhead) > > Yes. (not that overhead really matters once you are well and truely iobound) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Mike Galbraith wrote: > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 >^ > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > > block on IO, so they loop insanely). > > Why doesn't the VM hang the syncing of queued IO on these guys via > wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) You can see it this way: To get free memory we're "polling" instead of waiting on the IO completion of pages. > (which seems to me will only cause the allocation to be resubmitted, > effectively changing nothing but adding overhead) Yes. > Does failing the allocation in fact accomplish more than what I'm > (uhoh:) assuming? No. It sucks really badly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't > block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? (which seems to me will only cause the allocation to be resubmitted, effectively changing nothing but adding overhead) Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? (which seems to me will only cause the allocation to be resubmitted, effectively changing nothing but adding overhead) Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) You can see it this way: To get free memory we're polling instead of waiting on the IO completion of pages. (which seems to me will only cause the allocation to be resubmitted, effectively changing nothing but adding overhead) Yes. Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? No. It sucks really badly. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 198 1 ^ Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. Limiting the amount of data being queued for IO will make things less ragged, but you can't limit the IO.. pages returning to service upon completion is the only thing keeping you alive. That's why I hate not seeing my disk utterly saturated when things get hot and heavy. The only thing that I can see that's possible is to let tasks proceed in an ordered fashion as pages return.. take a number and wait. IMHO, right now we try to maintain low latency way too long and end up with the looping problem because of that. We need a more controlled latency roll-down to the full disk speed wall. We hit it and go splat ;-) The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) Ingo once mentioned an io thingy for vm, but I got kind of dizzy trying to figure out exactly how I'd impliment, what with clustering and getting information to seperate io threads and back ;-) You can see it this way: To get free memory we're polling instead of waiting on the IO completion of pages. (which seems to me will only cause the allocation to be resubmitted, effectively changing nothing but adding overhead) Yes. (not that overhead really matters once you are well and truely iobound) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) Coincidence - that's what I started working on two days ago, and I'm moving into the second generation design today. Look at 'queued_sectors'. I found pretty quickly it's not enough, today I'm adding 'submitted_sectors' to the soup. This will allow me to distinguish between traffic generated by my own thread and other traffic. Does failing the allocation in fact accomplish more than what I'm (uhoh:) assuming? No. It sucks really badly. Amen. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thu, 21 Jun 2001, Daniel Phillips wrote: On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) Coincidence - that's what I started working on two days ago, and I'm moving into the second generation design today. Look at 'queued_sectors'. I found pretty quickly it's not enough, today I'm adding 'submitted_sectors' to the soup. This will allow me to distinguish between traffic generated by my own thread and other traffic. Could you expand on this, please ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Thursday 21 June 2001 21:50, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Daniel Phillips wrote: On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: On Thu, 21 Jun 2001, Mike Galbraith wrote: On Thu, 21 Jun 2001, Marcelo Tosatti wrote: Ok, I suspect that GFP_BUFFER allocations are fucking up here (they can't block on IO, so they loop insanely). Why doesn't the VM hang the syncing of queued IO on these guys via wait_event or such instead of trying to just let the allocation fail? Actually the VM should limit the amount of data being queued for _all_ kind of allocations. The problem is the lack of a mechanism which allows us to account the approximated amount of queued IO by the VM. (except for swap pages) Coincidence - that's what I started working on two days ago, and I'm moving into the second generation design today. Look at 'queued_sectors'. I found pretty quickly it's not enough, today I'm adding 'submitted_sectors' to the soup. This will allow me to distinguish between traffic generated by my own thread and other traffic. Could you expand on this, please ? OK, I am doing opportunistic flushing, so I want to know that nobody else is using the disk, and so long as that's true, I'll keep flushing out buffers. Conversely, if anybody else queues a request I'll bail out of the flush loop as soon as I've flushed the absolute minimum number of buffers, i.e., the ones that were dirtied more than bdflush_params-age_buffer ago. But how do I know if somebody else is submitting requests? The surest way to know is to have a sumitted_sectors counter that just counts every submission, and compare that to the number of sectors I know I've submitted. (This counter wraps, so I actually track the difference from value on entering the flush loop). The first thing I found (duh) is that nobody else ever submits anything while I'm in the flush loop because I'm on UP and I never (almost never) yield the CPU. On SMP I will get other threads submitting, but only rarely will the submission happen while I'm in the flush loop. No good, I'm not detecting the other disk activity reliably, back to the drawing board. My original plan was to compute a running average of submission rates and use that to control my opportunistic flushing. I departed from that because I seemed to get good results with a much simpler strategy, the patch I already posted. It's fundamentally flawed though - it works fine for constant light load and constant full load, but not for sporadic loads. What I need is something a lot smoother, more analog, so I'll return to my original plan. What I want to notice is that the IO submission rate has fallen below a certain level then, when the IO backlog has also fallen below a few ms worth of transfers I can do the opportunistic flushing. In the flush loop I want to submit enough buffers to make sure I'm using the full bandwidth, but not so many that I create a big backlog that gets in the way of a surge in demand from some other source. I'm still working out the details of that, I will not post an updated patch today after all ;-) By the way, there's a really important throughput benefit for doing this early flushing that I didn't put in the list when I first wrote about it. It's this: whenever we have a bunch of buffers dirtied, if the disk bandwidth is available we want to load up the disk right away, not 5 seconds from now. If we wait 5 seconds, we just wasted 5 seconds of disk bandwidth. Again, duh. So my goal in doing this was initially do have it cost as little in throughput as possible - I see now that it's actually a win for throughput. End of discussion about whether to put in the effort or not. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Tue, 19 Jun 2001, Walter Hofmann wrote: > On Sun, 17 Jun 2001, Walter Hofmann wrote: > > > I had already two crashes with ac15. The system was still ping-able, but > > login over the network didn't work anymore. > > > > The first crash happened after I started xosview and noticed that the > > system almost used up the swap (for no apparent reason). The second > > crash happened shortly after I started fsck on a crypto-loop device. > > > > This does not happen with ac14, even under heavy load. > > > > I noticed a second problem: Sometimes the system hangs completely for > > approximately ten seconds, but continues just fine after that. I have > > seen this with ac14 and ac15, but not with ac12. > > FWIW, here is the vmstat output for the second (short) hang. Taken with > ac14, vmstat 1 was started (long) before the hang and interrupted about > five seconds after it. The machine has 128MB RAM and 256MB swap. > > >procs memoryswap io system cpu > r b w swpd free buff cache si sobibo incs us sy id > 1 1 0 77332 1584 15632 67740 44 0 448 0 496 932 84 15 1 > 1 2 0 77456 1848 15944 66960 0 0 372 724 625 2296 62 20 18 > 3 0 1 77456 1780 16208 67044 72 0 33680 584 1695 20 20 61 > 2 0 0 77404 1464 16672 66652 0 0 572 0 530 2649 26 19 55 > 3 1 0 77344 1464 17000 66480 124 0 656 0 419 879 12 16 72 > 0 3 0 77344 1468 17076 66388 184 0 1080 0 561 654 8 8 84 > 0 5 0 77892 1464 17184 66892 176 128 800 396 415 1050 14 11 74 > 0 5 0 77892 1600 17216 66868 16 068 1020 508 295 5 5 90 > 0 3 0 77892 1464 17316 66784 56 0 37268 464 1287 22 14 64 > 2 3 0 77892 1464 17524 66828 76 0 440 0 398 987 8 12 79 > 1 3 0 77892 1464 17780 66680 32 0 512 0 367 1061 10 10 79 > 1 1 0 77880 1464 18020 66392 224 0 756 0 394 1579 43 12 44 > 2 1 0 77784 2172 18324 64820 16 0 992 0 529 1745 37 19 44 > 0 4 0 77936 1848 18428 65180 124 0 252 920 570 451 23 9 69 > 0 2 0 77888 1680 18564 65656 84 0 744 0 532 721 21 12 67 > 3 0 0 77876 1464 18700 65564 4 0 1176 0 487 804 26 16 58 > 0 3 1 77496 1468 18712 65700 424 100 1296 384 401 532 70 10 20 > 2 0 0 77920 1508 18804 65504 72 248 968 260 525 709 40 9 51 > 2 2 0 77908 1728 18788 65388 0 120 1000 568 568 608 41 8 51 > 0 4 0 77908 1620 18828 65548 0 0 172 356 545 420 22 8 69 > 1 1 0 77904 1712 18472 65464 36 0 1600 0 485 621 52 15 33 >procs memoryswap io system cpu > r b w swpd free buff cache si sobibo incs us sy id > 2 1 0 78124 1528 18496 64940 116 20 884 288 545 604 54 16 30 > 4 0 0 78124 1468 18548 64260 4 0 468 0 449 663 49 6 46 > 3 0 0 77844 3416 18492 63932 100 0 304 0 431 1915 80 16 4 > 1 2 0 77844 2892 18536 64204 60 0 284 820 583 917 64 13 23 > 1 0 0 77844 2824 18544 64236 0 04068 591 550 36 6 58 > 3 0 0 77844 2604 18568 64372 0 0 120 0 455 474 64 13 23 > 1 0 0 77844 2472 18572 64440 0 056 0 399 617 35 9 56 > 1 0 0 77844 2456 18572 64460 0 0 0 0 515 721 8 6 87 > 0 0 0 77844 2448 18572 64468 0 0 4 0 469 655 8 8 83 > 1 0 0 77844 2384 18572 64528 0 0 0 428 538 641 7 10 83 > 0 0 0 77844 2388 18572 64528 0 0 0 0 492 733 3 9 89 > 0 0 0 77844 2368 18572 64548 0 0 0 0 520 804 11 7 82 > 0 0 0 77844 2336 18572 64580 0 0 0 0 473 680 6 6 89 > 1 0 0 77844 2276 18584 64608 0 012 0 490 966 30 13 56 > 2 0 0 77844 2228 18584 64648 0 0 0 344 539 589 47 7 47 > 3 0 0 77844 2228 18588 64692 0 0 4 0 381 455 29 11 60 > 2 0 1 77844 2180 18588 64700 0 0 0 0 453 781 33 9 58 > 1 0 0 77844 2160 18604 64708 0 016 0 390 852 18 5 77 > 2 0 1 77844 1940 18616 64912 124 0 212 0 318 756 40 8 52 > 3 0 0 77844 1680 18620 65180 240 0 244 576 492 1632 87 13 0 > 2 0 1 77844 1528 18540 65540 584 0 592 0 352 2466 90 10 0 >procs memoryswap io system cpu > r b w swpd free buff cache si sobibo incs us sy id > 2 0 0 77844 1800 18516 65588 40 040 0 357 675 89 11 0 > 3 5 2 77844 1464 18536 65916
Re: Linux 2.4.5-ac15
On Tue, 19 Jun 2001, Walter Hofmann wrote: > On Sun, 17 Jun 2001, Walter Hofmann wrote: > > > I had already two crashes with ac15. The system was still ping-able, but > > login over the network didn't work anymore. > > > > The first crash happened after I started xosview and noticed that the > > system almost used up the swap (for no apparent reason). The second > > crash happened shortly after I started fsck on a crypto-loop device. > > FWIW, here is the vmstat output for the second (short) hang. Taken with > ac14, vmstat 1 was started (long) before the hang and interrupted about > five seconds after it. The machine has 128MB RAM and 256MB swap. >procs memoryswap io system cpu > r b w swpd free buff cache si sobibo incs us sy id > 1 0 0 77000 1464 18444 67324 8 0 152 224 386 1345 26 19 55 > 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 1 98 1 Does the following patch help with this problem, or are you both experiencing something unrelated to this particular buglet ? regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ --- linux/mm/swapfile.c.~1~ Thu May 3 16:34:46 2001 +++ linux/mm/swapfile.c Thu May 3 16:36:07 2001 @@ -67,8 +67,14 @@ } /* No luck, so now go finegrined as usual. -Andrea */ for (offset = si->lowest_bit; offset <= si->highest_bit ; offset++) { - if (si->swap_map[offset]) + if (si->swap_map[offset]) { + /* Any full pages we find we should avoid +* looking at next time. */ + if (offset == si->lowest_bit) + si->lowest_bit++; continue; + } + got_page: if (offset == si->lowest_bit) si->lowest_bit++; @@ -79,6 +85,7 @@ si->cluster_next = offset+1; return offset; } + si->highest_bit = 0; return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
Walter Hofmann <[EMAIL PROTECTED]> writes: > It hung when I tried to close a browser window after reading the > text in it for quite some time. No swapping was going on. I've just seen this as well (for the first time) with -ac15. I was playing music with madplay at the time, and then did a "find . -type f -print0 | xargs -0 chmod 644" on a large directory tree on a reiserfs partition. A few seconds after I started the command, I got a hang which lasted a few seconds, then another, then another just after the find finished. It hasn't happened again since. All I got in the kernel log was: 2001-06-20 20:15:52.260230500 warning: Sound: DMA (output) timed out - IRQ/DRQ config error? 2001-06-20 20:16:07.472837500 warning: Sound: DMA (output) timed out - IRQ/DRQ config error? which makes sense, since the sound paused at the same time... Memory stats at the moment (i.e. about five minutes after it happened, with exactly the same stuff running): (azz:~) free total used free sharedbuffers cached Mem:288240 286652 1588196 30348 224860 -/+ buffers/cache: 31444 256796 Swap: 1048784 52176 996608 (azz:~) vmstat procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 52184 1588 30348 224876 0 25362 153 400 68 10 22 .config available on request. -- Adam Sampson <[EMAIL PROTECTED]> http://azz.us-lot.org/> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
Walter Hofmann [EMAIL PROTECTED] writes: It hung when I tried to close a browser window after reading the text in it for quite some time. No swapping was going on. I've just seen this as well (for the first time) with -ac15. I was playing music with madplay at the time, and then did a find . -type f -print0 | xargs -0 chmod 644 on a large directory tree on a reiserfs partition. A few seconds after I started the command, I got a hang which lasted a few seconds, then another, then another just after the find finished. It hasn't happened again since. All I got in the kernel log was: 2001-06-20 20:15:52.260230500 warning: Sound: DMA (output) timed out - IRQ/DRQ config error? 2001-06-20 20:16:07.472837500 warning: Sound: DMA (output) timed out - IRQ/DRQ config error? which makes sense, since the sound paused at the same time... Memory stats at the moment (i.e. about five minutes after it happened, with exactly the same stuff running): (azz:~) free total used free sharedbuffers cached Mem:288240 286652 1588196 30348 224860 -/+ buffers/cache: 31444 256796 Swap: 1048784 52176 996608 (azz:~) vmstat procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 52184 1588 30348 224876 0 25362 153 400 68 10 22 .config available on request. -- Adam Sampson [EMAIL PROTECTED] URL:http://azz.us-lot.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Tue, 19 Jun 2001, Walter Hofmann wrote: On Sun, 17 Jun 2001, Walter Hofmann wrote: I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. FWIW, here is the vmstat output for the second (short) hang. Taken with ac14, vmstat 1 was started (long) before the hang and interrupted about five seconds after it. The machine has 128MB RAM and 256MB swap. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 77000 1464 18444 67324 8 0 152 224 386 1345 26 19 55 2 4 2 77084 1524 18396 66904 0 1876 108 2220 2464 66079 1 98 1 Does the following patch help with this problem, or are you both experiencing something unrelated to this particular buglet ? regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ --- linux/mm/swapfile.c.~1~ Thu May 3 16:34:46 2001 +++ linux/mm/swapfile.c Thu May 3 16:36:07 2001 @@ -67,8 +67,14 @@ } /* No luck, so now go finegrined as usual. -Andrea */ for (offset = si-lowest_bit; offset = si-highest_bit ; offset++) { - if (si-swap_map[offset]) + if (si-swap_map[offset]) { + /* Any full pages we find we should avoid +* looking at next time. */ + if (offset == si-lowest_bit) + si-lowest_bit++; continue; + } + got_page: if (offset == si-lowest_bit) si-lowest_bit++; @@ -79,6 +85,7 @@ si-cluster_next = offset+1; return offset; } + si-highest_bit = 0; return 0; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Tue, 19 Jun 2001, Walter Hofmann wrote: On Sun, 17 Jun 2001, Walter Hofmann wrote: I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. This does not happen with ac14, even under heavy load. I noticed a second problem: Sometimes the system hangs completely for approximately ten seconds, but continues just fine after that. I have seen this with ac14 and ac15, but not with ac12. FWIW, here is the vmstat output for the second (short) hang. Taken with ac14, vmstat 1 was started (long) before the hang and interrupted about five seconds after it. The machine has 128MB RAM and 256MB swap. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 1 0 77332 1584 15632 67740 44 0 448 0 496 932 84 15 1 1 2 0 77456 1848 15944 66960 0 0 372 724 625 2296 62 20 18 3 0 1 77456 1780 16208 67044 72 0 33680 584 1695 20 20 61 2 0 0 77404 1464 16672 66652 0 0 572 0 530 2649 26 19 55 3 1 0 77344 1464 17000 66480 124 0 656 0 419 879 12 16 72 0 3 0 77344 1468 17076 66388 184 0 1080 0 561 654 8 8 84 0 5 0 77892 1464 17184 66892 176 128 800 396 415 1050 14 11 74 0 5 0 77892 1600 17216 66868 16 068 1020 508 295 5 5 90 0 3 0 77892 1464 17316 66784 56 0 37268 464 1287 22 14 64 2 3 0 77892 1464 17524 66828 76 0 440 0 398 987 8 12 79 1 3 0 77892 1464 17780 66680 32 0 512 0 367 1061 10 10 79 1 1 0 77880 1464 18020 66392 224 0 756 0 394 1579 43 12 44 2 1 0 77784 2172 18324 64820 16 0 992 0 529 1745 37 19 44 0 4 0 77936 1848 18428 65180 124 0 252 920 570 451 23 9 69 0 2 0 77888 1680 18564 65656 84 0 744 0 532 721 21 12 67 3 0 0 77876 1464 18700 65564 4 0 1176 0 487 804 26 16 58 0 3 1 77496 1468 18712 65700 424 100 1296 384 401 532 70 10 20 2 0 0 77920 1508 18804 65504 72 248 968 260 525 709 40 9 51 2 2 0 77908 1728 18788 65388 0 120 1000 568 568 608 41 8 51 0 4 0 77908 1620 18828 65548 0 0 172 356 545 420 22 8 69 1 1 0 77904 1712 18472 65464 36 0 1600 0 485 621 52 15 33 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 1 0 78124 1528 18496 64940 116 20 884 288 545 604 54 16 30 4 0 0 78124 1468 18548 64260 4 0 468 0 449 663 49 6 46 3 0 0 77844 3416 18492 63932 100 0 304 0 431 1915 80 16 4 1 2 0 77844 2892 18536 64204 60 0 284 820 583 917 64 13 23 1 0 0 77844 2824 18544 64236 0 04068 591 550 36 6 58 3 0 0 77844 2604 18568 64372 0 0 120 0 455 474 64 13 23 1 0 0 77844 2472 18572 64440 0 056 0 399 617 35 9 56 1 0 0 77844 2456 18572 64460 0 0 0 0 515 721 8 6 87 0 0 0 77844 2448 18572 64468 0 0 4 0 469 655 8 8 83 1 0 0 77844 2384 18572 64528 0 0 0 428 538 641 7 10 83 0 0 0 77844 2388 18572 64528 0 0 0 0 492 733 3 9 89 0 0 0 77844 2368 18572 64548 0 0 0 0 520 804 11 7 82 0 0 0 77844 2336 18572 64580 0 0 0 0 473 680 6 6 89 1 0 0 77844 2276 18584 64608 0 012 0 490 966 30 13 56 2 0 0 77844 2228 18584 64648 0 0 0 344 539 589 47 7 47 3 0 0 77844 2228 18588 64692 0 0 4 0 381 455 29 11 60 2 0 1 77844 2180 18588 64700 0 0 0 0 453 781 33 9 58 1 0 0 77844 2160 18604 64708 0 016 0 390 852 18 5 77 2 0 1 77844 1940 18616 64912 124 0 212 0 318 756 40 8 52 3 0 0 77844 1680 18620 65180 240 0 244 576 492 1632 87 13 0 2 0 1 77844 1528 18540 65540 584 0 592 0 352 2466 90 10 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 0 77844 1800 18516 65588 40 040 0 357 675 89 11 0 3 5 2 77844 1464 18536 65916 1508 44 1660 264 435 852 37 16 47 1 0 0 77844 1484 18532 65968 864
Re: Linux 2.4.5-ac15
On Sun, 17 Jun 2001, Walter Hofmann wrote: > I had already two crashes with ac15. The system was still ping-able, but > login over the network didn't work anymore. > > The first crash happened after I started xosview and noticed that the > system almost used up the swap (for no apparent reason). The second > crash happened shortly after I started fsck on a crypto-loop device. > > This does not happen with ac14, even under heavy load. > > I noticed a second problem: Sometimes the system hangs completely for > approximately ten seconds, but continues just fine after that. I have > seen this with ac14 and ac15, but not with ac12. FWIW, here is the vmstat output for the second (short) hang. Taken with ac14, vmstat 1 was started (long) before the hang and interrupted about five seconds after it. The machine has 128MB RAM and 256MB swap. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 1 0 77332 1584 15632 67740 44 0 448 0 496 932 84 15 1 1 2 0 77456 1848 15944 66960 0 0 372 724 625 2296 62 20 18 3 0 1 77456 1780 16208 67044 72 0 33680 584 1695 20 20 61 2 0 0 77404 1464 16672 66652 0 0 572 0 530 2649 26 19 55 3 1 0 77344 1464 17000 66480 124 0 656 0 419 879 12 16 72 0 3 0 77344 1468 17076 66388 184 0 1080 0 561 654 8 8 84 0 5 0 77892 1464 17184 66892 176 128 800 396 415 1050 14 11 74 0 5 0 77892 1600 17216 66868 16 068 1020 508 295 5 5 90 0 3 0 77892 1464 17316 66784 56 0 37268 464 1287 22 14 64 2 3 0 77892 1464 17524 66828 76 0 440 0 398 987 8 12 79 1 3 0 77892 1464 17780 66680 32 0 512 0 367 1061 10 10 79 1 1 0 77880 1464 18020 66392 224 0 756 0 394 1579 43 12 44 2 1 0 77784 2172 18324 64820 16 0 992 0 529 1745 37 19 44 0 4 0 77936 1848 18428 65180 124 0 252 920 570 451 23 9 69 0 2 0 77888 1680 18564 65656 84 0 744 0 532 721 21 12 67 3 0 0 77876 1464 18700 65564 4 0 1176 0 487 804 26 16 58 0 3 1 77496 1468 18712 65700 424 100 1296 384 401 532 70 10 20 2 0 0 77920 1508 18804 65504 72 248 968 260 525 709 40 9 51 2 2 0 77908 1728 18788 65388 0 120 1000 568 568 608 41 8 51 0 4 0 77908 1620 18828 65548 0 0 172 356 545 420 22 8 69 1 1 0 77904 1712 18472 65464 36 0 1600 0 485 621 52 15 33 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 1 0 78124 1528 18496 64940 116 20 884 288 545 604 54 16 30 4 0 0 78124 1468 18548 64260 4 0 468 0 449 663 49 6 46 3 0 0 77844 3416 18492 63932 100 0 304 0 431 1915 80 16 4 1 2 0 77844 2892 18536 64204 60 0 284 820 583 917 64 13 23 1 0 0 77844 2824 18544 64236 0 04068 591 550 36 6 58 3 0 0 77844 2604 18568 64372 0 0 120 0 455 474 64 13 23 1 0 0 77844 2472 18572 64440 0 056 0 399 617 35 9 56 1 0 0 77844 2456 18572 64460 0 0 0 0 515 721 8 6 87 0 0 0 77844 2448 18572 64468 0 0 4 0 469 655 8 8 83 1 0 0 77844 2384 18572 64528 0 0 0 428 538 641 7 10 83 0 0 0 77844 2388 18572 64528 0 0 0 0 492 733 3 9 89 0 0 0 77844 2368 18572 64548 0 0 0 0 520 804 11 7 82 0 0 0 77844 2336 18572 64580 0 0 0 0 473 680 6 6 89 1 0 0 77844 2276 18584 64608 0 012 0 490 966 30 13 56 2 0 0 77844 2228 18584 64648 0 0 0 344 539 589 47 7 47 3 0 0 77844 2228 18588 64692 0 0 4 0 381 455 29 11 60 2 0 1 77844 2180 18588 64700 0 0 0 0 453 781 33 9 58 1 0 0 77844 2160 18604 64708 0 016 0 390 852 18 5 77 2 0 1 77844 1940 18616 64912 124 0 212 0 318 756 40 8 52 3 0 0 77844 1680 18620 65180 240 0 244 576 492 1632 87 13 0 2 0 1 77844 1528 18540 65540 584 0 592 0 352 2466 90 10 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 0 77844 1800 18516 65588 40 040 0 357 675 89 11 0 3 5 2 77844 1464 18536 65916 1508 44 1660 264 435 852 37 16 47 1 0 0 77844 1484 18532 65968 864 0 936 0 386 667 89 7 5 1 0 1 77844 1464 18344 66220 1328 0 1416 280 416
Re: Linux 2.4.5-ac15
On Sun, 17 Jun 2001, Walter Hofmann wrote: > I had already two crashes with ac15. The system was still ping-able, but > login over the network didn't work anymore. > > The first crash happened after I started xosview and noticed that the > system almost used up the swap (for no apparent reason). The second > crash happened shortly after I started fsck on a crypto-loop device. > > This does not happen with ac14, even under heavy load. I had a hang with ac14 now, too. It hung when I tried to close a browser window after reading the text in it for quite some time. No swapping was going on. Walter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Sun, 17 Jun 2001, Walter Hofmann wrote: I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. This does not happen with ac14, even under heavy load. I had a hang with ac14 now, too. It hung when I tried to close a browser window after reading the text in it for quite some time. No swapping was going on. Walter - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
On Sun, 17 Jun 2001, Walter Hofmann wrote: I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. This does not happen with ac14, even under heavy load. I noticed a second problem: Sometimes the system hangs completely for approximately ten seconds, but continues just fine after that. I have seen this with ac14 and ac15, but not with ac12. FWIW, here is the vmstat output for the second (short) hang. Taken with ac14, vmstat 1 was started (long) before the hang and interrupted about five seconds after it. The machine has 128MB RAM and 256MB swap. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 1 0 77332 1584 15632 67740 44 0 448 0 496 932 84 15 1 1 2 0 77456 1848 15944 66960 0 0 372 724 625 2296 62 20 18 3 0 1 77456 1780 16208 67044 72 0 33680 584 1695 20 20 61 2 0 0 77404 1464 16672 66652 0 0 572 0 530 2649 26 19 55 3 1 0 77344 1464 17000 66480 124 0 656 0 419 879 12 16 72 0 3 0 77344 1468 17076 66388 184 0 1080 0 561 654 8 8 84 0 5 0 77892 1464 17184 66892 176 128 800 396 415 1050 14 11 74 0 5 0 77892 1600 17216 66868 16 068 1020 508 295 5 5 90 0 3 0 77892 1464 17316 66784 56 0 37268 464 1287 22 14 64 2 3 0 77892 1464 17524 66828 76 0 440 0 398 987 8 12 79 1 3 0 77892 1464 17780 66680 32 0 512 0 367 1061 10 10 79 1 1 0 77880 1464 18020 66392 224 0 756 0 394 1579 43 12 44 2 1 0 77784 2172 18324 64820 16 0 992 0 529 1745 37 19 44 0 4 0 77936 1848 18428 65180 124 0 252 920 570 451 23 9 69 0 2 0 77888 1680 18564 65656 84 0 744 0 532 721 21 12 67 3 0 0 77876 1464 18700 65564 4 0 1176 0 487 804 26 16 58 0 3 1 77496 1468 18712 65700 424 100 1296 384 401 532 70 10 20 2 0 0 77920 1508 18804 65504 72 248 968 260 525 709 40 9 51 2 2 0 77908 1728 18788 65388 0 120 1000 568 568 608 41 8 51 0 4 0 77908 1620 18828 65548 0 0 172 356 545 420 22 8 69 1 1 0 77904 1712 18472 65464 36 0 1600 0 485 621 52 15 33 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 1 0 78124 1528 18496 64940 116 20 884 288 545 604 54 16 30 4 0 0 78124 1468 18548 64260 4 0 468 0 449 663 49 6 46 3 0 0 77844 3416 18492 63932 100 0 304 0 431 1915 80 16 4 1 2 0 77844 2892 18536 64204 60 0 284 820 583 917 64 13 23 1 0 0 77844 2824 18544 64236 0 04068 591 550 36 6 58 3 0 0 77844 2604 18568 64372 0 0 120 0 455 474 64 13 23 1 0 0 77844 2472 18572 64440 0 056 0 399 617 35 9 56 1 0 0 77844 2456 18572 64460 0 0 0 0 515 721 8 6 87 0 0 0 77844 2448 18572 64468 0 0 4 0 469 655 8 8 83 1 0 0 77844 2384 18572 64528 0 0 0 428 538 641 7 10 83 0 0 0 77844 2388 18572 64528 0 0 0 0 492 733 3 9 89 0 0 0 77844 2368 18572 64548 0 0 0 0 520 804 11 7 82 0 0 0 77844 2336 18572 64580 0 0 0 0 473 680 6 6 89 1 0 0 77844 2276 18584 64608 0 012 0 490 966 30 13 56 2 0 0 77844 2228 18584 64648 0 0 0 344 539 589 47 7 47 3 0 0 77844 2228 18588 64692 0 0 4 0 381 455 29 11 60 2 0 1 77844 2180 18588 64700 0 0 0 0 453 781 33 9 58 1 0 0 77844 2160 18604 64708 0 016 0 390 852 18 5 77 2 0 1 77844 1940 18616 64912 124 0 212 0 318 756 40 8 52 3 0 0 77844 1680 18620 65180 240 0 244 576 492 1632 87 13 0 2 0 1 77844 1528 18540 65540 584 0 592 0 352 2466 90 10 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 0 77844 1800 18516 65588 40 040 0 357 675 89 11 0 3 5 2 77844 1464 18536 65916 1508 44 1660 264 435 852 37 16 47 1 0 0 77844 1484 18532 65968 864 0 936 0 386 667 89 7 5 1 0 1 77844 1464 18344 66220 1328 0 1416 280 416 519 54 5
Re: Linux 2.4.5-ac15
I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. This does not happen with ac14, even under heavy load. I noticed a second problem: Sometimes the system hangs completely for approximately ten seconds, but continues just fine after that. I have seen this with ac14 and ac15, but not with ac12. This is a mixed IDE/SCSI (Adaptec) system, 128MB RAM/256MB swap on a Gigabyte 440LX mainboard with a Pentium II. Walter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
I had already two crashes with ac15. The system was still ping-able, but login over the network didn't work anymore. The first crash happened after I started xosview and noticed that the system almost used up the swap (for no apparent reason). The second crash happened shortly after I started fsck on a crypto-loop device. This does not happen with ac14, even under heavy load. I noticed a second problem: Sometimes the system hangs completely for approximately ten seconds, but continues just fine after that. I have seen this with ac14 and ac15, but not with ac12. This is a mixed IDE/SCSI (Adaptec) system, 128MB RAM/256MB swap on a Gigabyte 440LX mainboard with a Pentium II. Walter - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
mach_kbd_rate was changed to kbd_rate, but not defined. vt.c: In function `vt_ioctl': vt.c:504: `kbd_rate' undeclared (first use in this function) vt.c:504: (Each undeclared identifier is reported only once vt.c:504: for each function it appears in.) vt.c:510: `kbd_rate' used prior to declaration vt.c:510: warning: implicit declaration of function `kbd_rate' make[3]: *** [vt.o] Error 1 make[2]: *** [first_rule] Error 2 make[1]: *** [_subdir_char] Error 2 make: *** [_dir_drivers] Error 2 -- Tom Vier <[EMAIL PROTECTED]> DSA Key id 0x27371A2C - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.5-ac15
mach_kbd_rate was changed to kbd_rate, but not defined. vt.c: In function `vt_ioctl': vt.c:504: `kbd_rate' undeclared (first use in this function) vt.c:504: (Each undeclared identifier is reported only once vt.c:504: for each function it appears in.) vt.c:510: `kbd_rate' used prior to declaration vt.c:510: warning: implicit declaration of function `kbd_rate' make[3]: *** [vt.o] Error 1 make[2]: *** [first_rule] Error 2 make[1]: *** [_subdir_char] Error 2 make: *** [_dir_drivers] Error 2 -- Tom Vier [EMAIL PROTECTED] DSA Key id 0x27371A2C - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 2.4.5-ac15
ftp://ftp.kernel.org/pub/linux/kernel/people/alan/2.4/ Intermediate diffs are available from http://www.bzimage.org 2.4.5-ac15 o Enable MMX extensions on Cyrix MII (me) o Make pid on core dump configurable (Ben LaHaise) o Random UML fixups, add fcntl64/getdents64 (Jeff Dike) o Add multicast support to UML(Harland Welte) o Ensure promise raid driver doesnt look at non (Arjan van de Ven) disk devices o Fix IDE chipsets that incorrectly think a 64K (Mark Lord) DMA is in fact zero size o Fix generic alpha build trident driver (Michal Jaegermann) o SHM accounting fixes(Christoph Rohland) o Update refill_inactive to match Linus tree (Rik van Riel) o Add Asustek L8400K to the dmi data (me) o Add kernel mode keyboard rate setup (Sergey Tursanov) o Alpha compile fix (Richard Henderson) o Add Ali1533 to the isa dma quirks (Angelo Di Filippo) o Fix a procfs oops (Al Viro) o Alpha symbol/warning fixes (Michal Jaegermann) o Some laptops take a long time for the cs4281(Rik van Riel) and codec bus to wake up o Fix potential flags corruption on error path(me) in comx-mixcom driver 2.4.5-ac14 o Fix oops on command abort on aha152x(me) | This so far is only a partial fix o Switch to unlazy swap cache free up (Marcelo Tosatti) o Page launder changes(Rik van Riel) o Remove dead irda irlap compression code (Dag Brattli) o Fix bug where init/main.c executes freed code (Hans-Peter Nilsson) o Fix ramfs accounting. truncate/freepage hook(Christoph Rohland) o Add MTWEOF ioctl to parallel tape (Russ Ingram) o Add driver for CATC based USB ethernet (Vojtech Pavlik) o Update cris architecture code (Bjorn Wesen) o Clean up reiserfs tail->full page convert (Chris Mason) o Clean up lp init, fix lp= option handling (Tim Waugh) o Don't panic on out of memory during ps/2 setup (Andrey Panin) o Initialise vc_cons objects in full (Richard Hirst) o Futher Configure.help resync(Eric Raymond) o Fix misdeclaration of xtime (Petr Vandrovec) o Add yet more sb variants(Andrey Panin) o Fix bogus VIA warning triggers (I hope) (me) o Fix 3c509 symbols when building nonpnp (Keith Owens) 2.4.5-ac13 o Fix i2o_block to use invalidate_device (me) o Fix viodasd to use invalidate_device(me) o Fix missing ipc alloc check (Manfred Spraul) o Use skb_purge_queue in isdn (Kai Germaschewski) o Fix epic100 printk error(Francois Romieu) o Resync with master Configure.help (Eric Raymond) o Avoid oops when reading swap proc during swapon (Paul Menage) o Sony pi driver update (Stelian Pop) o Sony motioneye camera driver(Stelian Pop, Andrew Tridgell) o Fix eepro100 access by user to some registers (Andrey Savochkin) o Small APM real mode reboot clean ups(Stephen Rothwell) o Fix isofs buffer leak on invalid iocharset (Tachino Nobuhiro) o Fix default encoding on pwc videocam(Mark Cooke) o Clean up FAT further, fix endian bug, and times (OGAWA Hirofumi) before 1/1/1980 o Support combo parallel/serial PCI cards (Tim Waugh) o CS46xx mmap oops fix(me) 2.4.5-ac12 o Report apic timer vector in hex too (Philip Pokorny) | With 0x in front so we can tell on reports.. o Report card services differently if kernel (Jeff Garzik) o Don't terminate init on sysrq (Adam Slattery) unless forced o Add more pci wrappers when PCI is off (Jeff Garzik) o Remove 4K object from the stack in emu10k1 (me) o Remove 3.5K object from the i2o_proc stack (me) o Remove 3K object from the ewrk3 ioctl stack (me) o Fix bugs in the es1371 locking (me) o Fix ohci iso alignments (Roman Weissgaerber) o Updated megaraid driver (Atul Mukker) | In paticular this now uses the new PCI api 2.4.5-ac11 o Fix the megaraid driver ioctl check (me) o Fix the moxa ioctl checks
Linux 2.4.5-ac15
ftp://ftp.kernel.org/pub/linux/kernel/people/alan/2.4/ Intermediate diffs are available from http://www.bzimage.org 2.4.5-ac15 o Enable MMX extensions on Cyrix MII (me) o Make pid on core dump configurable (Ben LaHaise) o Random UML fixups, add fcntl64/getdents64 (Jeff Dike) o Add multicast support to UML(Harland Welte) o Ensure promise raid driver doesnt look at non (Arjan van de Ven) disk devices o Fix IDE chipsets that incorrectly think a 64K (Mark Lord) DMA is in fact zero size o Fix generic alpha build trident driver (Michal Jaegermann) o SHM accounting fixes(Christoph Rohland) o Update refill_inactive to match Linus tree (Rik van Riel) o Add Asustek L8400K to the dmi data (me) o Add kernel mode keyboard rate setup (Sergey Tursanov) o Alpha compile fix (Richard Henderson) o Add Ali1533 to the isa dma quirks (Angelo Di Filippo) o Fix a procfs oops (Al Viro) o Alpha symbol/warning fixes (Michal Jaegermann) o Some laptops take a long time for the cs4281(Rik van Riel) and codec bus to wake up o Fix potential flags corruption on error path(me) in comx-mixcom driver 2.4.5-ac14 o Fix oops on command abort on aha152x(me) | This so far is only a partial fix o Switch to unlazy swap cache free up (Marcelo Tosatti) o Page launder changes(Rik van Riel) o Remove dead irda irlap compression code (Dag Brattli) o Fix bug where init/main.c executes freed code (Hans-Peter Nilsson) o Fix ramfs accounting. truncate/freepage hook(Christoph Rohland) o Add MTWEOF ioctl to parallel tape (Russ Ingram) o Add driver for CATC based USB ethernet (Vojtech Pavlik) o Update cris architecture code (Bjorn Wesen) o Clean up reiserfs tail-full page convert (Chris Mason) o Clean up lp init, fix lp= option handling (Tim Waugh) o Don't panic on out of memory during ps/2 setup (Andrey Panin) o Initialise vc_cons objects in full (Richard Hirst) o Futher Configure.help resync(Eric Raymond) o Fix misdeclaration of xtime (Petr Vandrovec) o Add yet more sb variants(Andrey Panin) o Fix bogus VIA warning triggers (I hope) (me) o Fix 3c509 symbols when building nonpnp (Keith Owens) 2.4.5-ac13 o Fix i2o_block to use invalidate_device (me) o Fix viodasd to use invalidate_device(me) o Fix missing ipc alloc check (Manfred Spraul) o Use skb_purge_queue in isdn (Kai Germaschewski) o Fix epic100 printk error(Francois Romieu) o Resync with master Configure.help (Eric Raymond) o Avoid oops when reading swap proc during swapon (Paul Menage) o Sony pi driver update (Stelian Pop) o Sony motioneye camera driver(Stelian Pop, Andrew Tridgell) o Fix eepro100 access by user to some registers (Andrey Savochkin) o Small APM real mode reboot clean ups(Stephen Rothwell) o Fix isofs buffer leak on invalid iocharset (Tachino Nobuhiro) o Fix default encoding on pwc videocam(Mark Cooke) o Clean up FAT further, fix endian bug, and times (OGAWA Hirofumi) before 1/1/1980 o Support combo parallel/serial PCI cards (Tim Waugh) o CS46xx mmap oops fix(me) 2.4.5-ac12 o Report apic timer vector in hex too (Philip Pokorny) | With 0x in front so we can tell on reports.. o Report card services differently if kernel (Jeff Garzik) o Don't terminate init on sysrq (Adam Slattery) unless forced o Add more pci wrappers when PCI is off (Jeff Garzik) o Remove 4K object from the stack in emu10k1 (me) o Remove 3.5K object from the i2o_proc stack (me) o Remove 3K object from the ewrk3 ioctl stack (me) o Fix bugs in the es1371 locking (me) o Fix ohci iso alignments (Roman Weissgaerber) o Updated megaraid driver (Atul Mukker) | In paticular this now uses the new PCI api 2.4.5-ac11 o Fix the megaraid driver ioctl check (me) o Fix the moxa ioctl checks