Re: konqueror deadlocks on 2.6.22
On Tuesday 22 January 2008, Al Boldi wrote: > Chris Mason wrote: > > Running fsync in data=ordered means that all of the dirty blocks on the > > FS will get written before fsync returns. > > Hm, that's strange, I expected this kind of behaviour from data=journal. > > data=writeback should return immediatly, which seems it does, but > data=ordered should only wait for metadata flush, it shouldn't wait for > filedata flush. Are you sure it waits for both? I over simplified. data=ordered means that all data blocks are written before the metadata that references them commits. So, if you add 1GB to a fileA in a transaction and then run fsync(fileB) in the same transaction, the 1GB from fileA is sent to disk (and waited on) before the fsync on fileB returns. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
Chris Mason wrote: > Running fsync in data=ordered means that all of the dirty blocks on the FS > will get written before fsync returns. Hm, that's strange, I expected this kind of behaviour from data=journal. data=writeback should return immediatly, which seems it does, but data=ordered should only wait for metadata flush, it shouldn't wait for filedata flush. Are you sure it waits for both? Thanks! -- Al -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
On Tuesday 22 January 2008, Al Boldi wrote: > Ingo Molnar wrote: > > * Oliver Pinter (Pintér Olivér) <[EMAIL PROTECTED]> wrote: > > > and then please update to CFS-v24.1 > > > http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24. > > >1 .patch > > > > > > > Yes with CFSv20.4, as in the log. > > > > > > > > It also hangs on 2.6.23.13 > > > > my feeling is that this is some sort of timing dependent race in > > konqueror/kde/qt that is exposed when a different scheduler is put in. > > > > If it disappears with CFS-v24.1 it is probably just because the timings > > will change again. Would be nice to debug this on the konqueror side and > > analyze why it fails and how. You can probably tune the timings by > > enabling SCHED_DEBUG and tweaking /proc/sys/kernel/*sched* values - in > > particular sched_latency and the granularity settings. Setting wakeup > > granularity to 0 might be one of the things that could make a > > difference. > > Thanks Ingo, but Mike suggested that data=writeback may make a difference, > which it does indeed. > > So the bug seems to be related to data=ordered, although I haven't gotten > any feedback from the ext3 gurus yet. > > Seems rather critical though, as data=writeback is a dangerous mode to run. Running fsync in data=ordered means that all of the dirty blocks on the FS will get written before fsync returns. Your original stack trace shows everyone either performing writeback for a log commit or waiting for the log commit to return. They key task in your trace is kjournald, stuck in get_request_wait. It could be a block layer bug, not giving him requests quickly enough, or it could be the scheduler not giving him back the cpu fast enough. At any rate, that's where to concentrate the debugging. You should be able to simulate this by running a few instances of the below loop and looking for stalls: while(true) ; do time dd if=/dev/zero of=foo bs=50M count=4 oflags=sync done -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
Ingo Molnar wrote: > * Oliver Pinter (Pintér Olivér) <[EMAIL PROTECTED]> wrote: > > and then please update to CFS-v24.1 > > http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24.1 > >.patch > > > > > Yes with CFSv20.4, as in the log. > > > > > > It also hangs on 2.6.23.13 > > my feeling is that this is some sort of timing dependent race in > konqueror/kde/qt that is exposed when a different scheduler is put in. > > If it disappears with CFS-v24.1 it is probably just because the timings > will change again. Would be nice to debug this on the konqueror side and > analyze why it fails and how. You can probably tune the timings by > enabling SCHED_DEBUG and tweaking /proc/sys/kernel/*sched* values - in > particular sched_latency and the granularity settings. Setting wakeup > granularity to 0 might be one of the things that could make a > difference. Thanks Ingo, but Mike suggested that data=writeback may make a difference, which it does indeed. So the bug seems to be related to data=ordered, although I haven't gotten any feedback from the ext3 gurus yet. Seems rather critical though, as data=writeback is a dangerous mode to run. Thanks! -- Al -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
* Oliver Pinter (Pintér Olivér) <[EMAIL PROTECTED]> wrote: > and then please update to CFS-v24.1 > http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24.1.patch > > Yes with CFSv20.4, as in the log. > > > > It also hangs on 2.6.23.13 my feeling is that this is some sort of timing dependent race in konqueror/kde/qt that is exposed when a different scheduler is put in. If it disappears with CFS-v24.1 it is probably just because the timings will change again. Would be nice to debug this on the konqueror side and analyze why it fails and how. You can probably tune the timings by enabling SCHED_DEBUG and tweaking /proc/sys/kernel/*sched* values - in particular sched_latency and the granularity settings. Setting wakeup granularity to 0 might be one of the things that could make a difference. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
On Sun, 2008-01-20 at 08:41 +0300, Al Boldi wrote: > logic. Any ideas how this could be fixed? BTW, no idea, fs is taboo land here. (panic() is my very favorite function...;) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
On Sun, 2008-01-20 at 08:41 +0300, Al Boldi wrote: > BTW Mike: Your server bounces my messages. Hm. I don't have a server. Might have something to do with some naughty task frequently scribbling zeros to /etc/resolv.conf when I brutally reboot my box (i give myself cause to do that quite a lot). Fixed for the thousandth time. Some day I'll track/shoot the bugger. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
Mike Galbraith wrote: > On Sat, 2008-01-19 at 21:14 +0300, Al Boldi wrote: > > I was just attacked by some deadlock issue involving sqlite3 and > > konqueror. While sqlite3 continues to slowly fill a 7M-record db in > > transaction mode, konqueror hangs for a few minutes, then continues only > > to hang again and again. > > > > Looks like an fs/blockIO issue involving fsync. > > > > As a workaround, is there a way to make fsync soft? > > Do you have the fs mounted data=writeback? A while back, I ran into > starvation on the order of minutes with my old/full ext2 fs until > mounting data=writeback. You are absolutely right. With data=writeback the hangs completely disappear, and sqlite3 insert performance increases 10x fold. Now data=writeback is known to be faster than data=ordered, but a 10x fold increase probably points to some sync contention within the data=ordered logic. Any ideas how this could be fixed? Thanks a lot! BTW Mike: Your server bounces my messages. -- Al -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
add cc (ingo) and then please update to CFS-v24.1 http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24.1.patch On 1/19/08, Al Boldi <[EMAIL PROTECTED]> wrote: > Oliver Pinter (Pintér Olivér) wrote: > > This kernel is vanilla 2.6.22.y or with CFS? > > Yes with CFSv20.4, as in the log. > > It also hangs on 2.6.23.13 > > > On 1/19/08, Al Boldi <[EMAIL PROTECTED]> wrote: > > > I was just attacked by some deadlock issue involving sqlite3 and > > > konqueror. While sqlite3 continues to slowly fill a 7M-record db in > > > transaction mode, konqueror hangs for a few minutes, then continues only > > > to hang again and again. > > > > > > Looks like an fs/blockIO issue involving fsync. > > > > > > As a workaround, is there a way to make fsync soft? > > > Thanks! > > -- > Al > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Thanks, Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
Oliver Pinter (Pintér Olivér) wrote: > This kernel is vanilla 2.6.22.y or with CFS? Yes with CFSv20.4, as in the log. It also hangs on 2.6.23.13 > On 1/19/08, Al Boldi <[EMAIL PROTECTED]> wrote: > > I was just attacked by some deadlock issue involving sqlite3 and > > konqueror. While sqlite3 continues to slowly fill a 7M-record db in > > transaction mode, konqueror hangs for a few minutes, then continues only > > to hang again and again. > > > > Looks like an fs/blockIO issue involving fsync. > > > > As a workaround, is there a way to make fsync soft? Thanks! -- Al -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: konqueror deadlocks on 2.6.22
This kernel is vanilla 2.6.22.y or with CFS? On 1/19/08, Al Boldi <[EMAIL PROTECTED]> wrote: > I was just attacked by some deadlock issue involving sqlite3 and konqueror. > While sqlite3 continues to slowly fill a 7M-record db in transaction mode, > konqueror hangs for a few minutes, then continues only to hang again and > again. > > Looks like an fs/blockIO issue involving fsync. > > As a workaround, is there a way to make fsync soft? > > > Thanks! > > -- > Al > > --- > Jan 19 20:36:13 localhost kernel: SysRq : Show Blocked State > Jan 19 20:36:13 localhost kernel: taskPC stack pid > father > Jan 19 20:36:13 localhost kernel: kjournald D c153b4c0 0 951 > 2 > Jan 19 20:36:13 localhost kernel:c1579d70 0046 0010 c153b4c0 > c153b5fc c1579dbc 0001 > Jan 19 20:36:13 localhost kernel:c1579d78 c03f163e c1527848 c0216d50 > 0010 c1527878 0001 d7ca8dc0 > Jan 19 20:36:13 localhost kernel: c153b4c0 c012dcb0 c1579dbc > c1579dbc 0001 d62092c0 d7ca8dc0 > Jan 19 20:36:13 localhost kernel: Call Trace: > Jan 19 20:36:13 localhost kernel: [] io_schedule+0xe/0x20 > Jan 19 20:36:13 localhost kernel: [] get_request_wait+0x100/0x120 > Jan 19 20:36:13 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:13 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:13 localhost kernel: [] elv_merge+0xba/0x150 > Jan 19 20:36:13 localhost kernel: [] __make_request+0x6c/0x2f0 > Jan 19 20:36:13 localhost kernel: [] > generic_make_request+0x13f/0x1d0 > Jan 19 20:36:14 localhost kernel: [] __slab_alloc+0x87/0xf0 > Jan 19 20:36:14 localhost kernel: [] mempool_alloc+0x2a/0xc0 > Jan 19 20:36:14 localhost kernel: [] submit_bio+0x46/0xe0 > Jan 19 20:36:14 localhost kernel: [] > smp_apic_timer_interrupt+0x28/0x30 > Jan 19 20:36:14 localhost kernel: [] > apic_timer_interrupt+0x28/0x30 > Jan 19 20:36:14 localhost kernel: [] bio_alloc_bioset+0x7f/0x160 > Jan 19 20:36:14 localhost kernel: [] > end_buffer_write_sync+0x0/0x70 > Jan 19 20:36:14 localhost kernel: [] submit_bh+0xd1/0x130 > Jan 19 20:36:14 localhost kernel: [] > journal_do_submit_data+0x29/0x30 > Jan 19 20:36:14 localhost kernel: [] > journal_submit_data_buffers+0x115/0x170 > Jan 19 20:36:14 localhost kernel: [] > journal_commit_transaction+0x1af/0xc30 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] kjournald+0x197/0x1e0 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] kjournald+0x0/0x1e0 > Jan 19 20:36:14 localhost kernel: [] kthread+0x6a/0x70 > Jan 19 20:36:14 localhost kernel: [] kthread+0x0/0x70 > Jan 19 20:36:14 localhost kernel: [] > kernel_thread_helper+0x7/0x10 > Jan 19 20:36:14 localhost kernel: === > Jan 19 20:36:14 localhost kernel: konqueror D c14b39f0 0 1922 > 1918 > Jan 19 20:36:14 localhost kernel:dbb13e08 0082 dbb13e08 c14b39f0 > c14b3b2c c1517f40 c1517f00 0009813c > Jan 19 20:36:14 localhost kernel:dbb13e3c c01aa5e1 c1517f50 > c14b39f0 c012dcb0 > Jan 19 20:36:14 localhost kernel:dbb13e48 dbb13e48 c1517f50 0001 > 0003 c14b39f0 c012dcb0 > Jan 19 20:36:14 localhost kernel: Call Trace: > Jan 19 20:36:14 localhost kernel: [] log_wait_commit+0xf1/0x140 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] > autoremove_wake_function+0x0/0x50 > Jan 19 20:36:14 localhost kernel: [] journal_stop+0x146/0x1c0 > Jan 19 20:36:14 localhost kernel: [] journal_start+0x93/0xc0 > Jan 19 20:36:14 localhost kernel: [] __writepage+0x0/0x30 > Jan 19 20:36:14 localhost kernel: [] > journal_force_commit+0x1c/0x30 > Jan 19 20:36:14 localhost kernel: [] ext3_force_commit+0x25/0x30 > Jan 19 20:36:14 localhost kernel: [] write_inode+0x4b/0x50 > Jan 19 20:36:14 localhost kernel: [] > __sync_single_inode+0x1a3/0x1d0 > Jan 19 20:36:14 localhost kernel: [] > __writeback_single_inode+0x4e/0x1b0 > Jan 19 20:36:14 localhost kernel: [] do_writepages+0x3d/0x50 > Jan 19 20:36:14 localhost kernel: [] > __filemap_fdatawrite_range+0x87/0x90 > Jan 19 20:36:14 localhost kernel: [] ext3_sync_file+0x98/0xe0 > Jan 19 20:36:14 localhost kernel: [] filemap_fdatawrite+0x23/0x30 > Jan 19 20:36:14 localhost kernel: [] do_fsync+0x6d/0x90 > Jan 19 20:36:14 localhost kernel: [] __do_fsync+0x27/0x50 > Jan 19 20:36:14 localhost kernel: [] syscall_call+0x7/0xb > Jan 19 20:36:14 localhost kernel: [] svc_seq_show+0x110/0x120 > Jan 19 20:36:14 localhost kernel: === > Jan 19 20:36:14 localhost kernel: sqlite3 D c4aa4000 0 5507 > 2021 > Jan 19 20:36:14 localhost kernel:dd53bbf0 00200082
konqueror deadlocks on 2.6.22
I was just attacked by some deadlock issue involving sqlite3 and konqueror. While sqlite3 continues to slowly fill a 7M-record db in transaction mode, konqueror hangs for a few minutes, then continues only to hang again and again. Looks like an fs/blockIO issue involving fsync. As a workaround, is there a way to make fsync soft? Thanks! -- Al --- Jan 19 20:36:13 localhost kernel: SysRq : Show Blocked State Jan 19 20:36:13 localhost kernel: taskPC stack pid father Jan 19 20:36:13 localhost kernel: kjournald D c153b4c0 0 951 2 Jan 19 20:36:13 localhost kernel:c1579d70 0046 0010 c153b4c0 c153b5fc c1579dbc 0001 Jan 19 20:36:13 localhost kernel:c1579d78 c03f163e c1527848 c0216d50 0010 c1527878 0001 d7ca8dc0 Jan 19 20:36:13 localhost kernel: c153b4c0 c012dcb0 c1579dbc c1579dbc 0001 d62092c0 d7ca8dc0 Jan 19 20:36:13 localhost kernel: Call Trace: Jan 19 20:36:13 localhost kernel: [] io_schedule+0xe/0x20 Jan 19 20:36:13 localhost kernel: [] get_request_wait+0x100/0x120 Jan 19 20:36:13 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:13 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:13 localhost kernel: [] elv_merge+0xba/0x150 Jan 19 20:36:13 localhost kernel: [] __make_request+0x6c/0x2f0 Jan 19 20:36:13 localhost kernel: [] generic_make_request+0x13f/0x1d0 Jan 19 20:36:14 localhost kernel: [] __slab_alloc+0x87/0xf0 Jan 19 20:36:14 localhost kernel: [] mempool_alloc+0x2a/0xc0 Jan 19 20:36:14 localhost kernel: [] submit_bio+0x46/0xe0 Jan 19 20:36:14 localhost kernel: [] smp_apic_timer_interrupt+0x28/0x30 Jan 19 20:36:14 localhost kernel: [] apic_timer_interrupt+0x28/0x30 Jan 19 20:36:14 localhost kernel: [] bio_alloc_bioset+0x7f/0x160 Jan 19 20:36:14 localhost kernel: [] end_buffer_write_sync+0x0/0x70 Jan 19 20:36:14 localhost kernel: [] submit_bh+0xd1/0x130 Jan 19 20:36:14 localhost kernel: [] journal_do_submit_data+0x29/0x30 Jan 19 20:36:14 localhost kernel: [] journal_submit_data_buffers+0x115/0x170 Jan 19 20:36:14 localhost kernel: [] journal_commit_transaction+0x1af/0xc30 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] kjournald+0x197/0x1e0 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] kjournald+0x0/0x1e0 Jan 19 20:36:14 localhost kernel: [] kthread+0x6a/0x70 Jan 19 20:36:14 localhost kernel: [] kthread+0x0/0x70 Jan 19 20:36:14 localhost kernel: [] kernel_thread_helper+0x7/0x10 Jan 19 20:36:14 localhost kernel: === Jan 19 20:36:14 localhost kernel: konqueror D c14b39f0 0 1922 1918 Jan 19 20:36:14 localhost kernel:dbb13e08 0082 dbb13e08 c14b39f0 c14b3b2c c1517f40 c1517f00 0009813c Jan 19 20:36:14 localhost kernel:dbb13e3c c01aa5e1 c1517f50 c14b39f0 c012dcb0 Jan 19 20:36:14 localhost kernel:dbb13e48 dbb13e48 c1517f50 0001 0003 c14b39f0 c012dcb0 Jan 19 20:36:14 localhost kernel: Call Trace: Jan 19 20:36:14 localhost kernel: [] log_wait_commit+0xf1/0x140 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] autoremove_wake_function+0x0/0x50 Jan 19 20:36:14 localhost kernel: [] journal_stop+0x146/0x1c0 Jan 19 20:36:14 localhost kernel: [] journal_start+0x93/0xc0 Jan 19 20:36:14 localhost kernel: [] __writepage+0x0/0x30 Jan 19 20:36:14 localhost kernel: [] journal_force_commit+0x1c/0x30 Jan 19 20:36:14 localhost kernel: [] ext3_force_commit+0x25/0x30 Jan 19 20:36:14 localhost kernel: [] write_inode+0x4b/0x50 Jan 19 20:36:14 localhost kernel: [] __sync_single_inode+0x1a3/0x1d0 Jan 19 20:36:14 localhost kernel: [] __writeback_single_inode+0x4e/0x1b0 Jan 19 20:36:14 localhost kernel: [] do_writepages+0x3d/0x50 Jan 19 20:36:14 localhost kernel: [] __filemap_fdatawrite_range+0x87/0x90 Jan 19 20:36:14 localhost kernel: [] ext3_sync_file+0x98/0xe0 Jan 19 20:36:14 localhost kernel: [] filemap_fdatawrite+0x23/0x30 Jan 19 20:36:14 localhost kernel: [] do_fsync+0x6d/0x90 Jan 19 20:36:14 localhost kernel: [] __do_fsync+0x27/0x50 Jan 19 20:36:14 localhost kernel: [] syscall_call+0x7/0xb Jan 19 20:36:14 localhost kernel: [] svc_seq_show+0x110/0x120 Jan 19 20:36:14 localhost kernel: === Jan 19 20:36:14 localhost kernel: sqlite3 D c4aa4000 0 5507 2021 Jan 19 20:36:14 localhost kernel:dd53bbf0 00200082 c0217bef c4aa4000 c4aa413c dd53bc40 dd53bc48 Jan 19 20:36:14 localhost kernel:dd53bbf8 c03f163e c13e0758 c017b4b7 c03f17d5 c017b490 c89c3690 0002 Jan 19 20:36:14 localhost kernel:c4aa4000 c017b490 c03f1882 0002 c89c3690 0002 0