Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-29 Thread Richard Weinberger
On Wed, Jan 29, 2014 at 8:13 PM, Andrew Ruder  wrote:
> On Wed, Jan 29, 2014 at 04:56:04PM +0100, Richard Weinberger wrote:
>> Does the issue also happen if you disable preemption?
>> i.e. CONFIG_PREEMPT_NONE=y
>
> CONFIG_PREEMPT_NONE=y still breaks.  I suspect that sync_filesystem
> has some blocking behavior that allows other processes to schedule.

Okay, I have to test this on real hardware.
May take a few days.

BTW: s/not.at/nod.at/g ;)

> Another log is attached and the patch I used to create this log is
> below.  I think the most interesting part of this patch is the last hunk
> that modifies ubifs_remount_ro.  This clearly shows that after the mount
> has been marked as read-only we have dirty inodes waiting for the
> writeback to come in and hit all the asserts.
>
> Here's some of the important parts of the log:
>
>   170 pre sync_filesystem
> # Notice that while we were running sync_filesystem
> # a inode (0xd75ab658) snuck in with a sys_rename
>   204 inode: d75ab658
>   205 [ cut here ]
>   206 WARNING: CPU: 0 PID: 625 at fs/fs-writeback.c:1213 
> __mark_inode_dirty+0x2a4/0x2f4()
>   207 Modules linked in:
>   208 CPU: 0 PID: 625 Comm: fsstress Tainted: GW
> 3.12.0-00041-g7f12d39-dirty #250
>   209 [] (unwind_backtrace+0x0/0x128) from [] 
> (show_stack+0x20/0x24)
>   210 [] (show_stack+0x20/0x24) from [] 
> (dump_stack+0x20/0x28)
>   211 [] (dump_stack+0x20/0x28) from [] 
> (warn_slowpath_common+0x78/0x98)
>   212 [] (warn_slowpath_common+0x78/0x98) from [] 
> (warn_slowpath_null+0x2c/0x34)
>   213 [] (warn_slowpath_null+0x2c/0x34) from [] 
> (__mark_inode_dirty+0x2a4/0x2f4)
>   214 [] (__mark_inode_dirty+0x2a4/0x2f4) from [] 
> (ubifs_rename+0x4a4/0x600)
>   215 [] (ubifs_rename+0x4a4/0x600) from [] 
> (vfs_rename+0x280/0x3f4)
>   216 [] (vfs_rename+0x280/0x3f4) from [] 
> (SyS_renameat+0x18c/0x218)
>   217 [] (SyS_renameat+0x18c/0x218) from [] 
> (SyS_rename+0x2c/0x30)
>   218 [] (SyS_rename+0x2c/0x30) from [] 
> (ret_fast_syscall+0x0/0x2c)
>   219 ---[ end trace 35ebec8569a53526 ]---
>   754 post sync_filesystem
>   755 pre prepare_remount
>   756 post prepare_remount
>   757 prepare_remount success
>   758 UBIFS: background thread "ubifs_bgt0_0" stops
>   759 we are now a read-only mount
>   760 bdi.work_list = d7ac4110, .next = d7ac4110
> # WE HAVE DIRTY I/O (notice the .next != &b_dirty)
>   761 bdi.wb.b_dirty = d7ac40d8, .next = d75accd0
>   762 bdi.wb.b_io = d7ac40e0, .next = d7ac40e0
>   763 bdi.wb.b_more_io = d7ac40e8, .next = d7ac40e8
>   764 do_remount_sb success
> # And now our friend (inode 0xd75ab658) comes in with a writeback after
> # we are read-only triggering the assert
>   779 inode: d75ab658
>   780 UBIFS assert failed in reserve_space at 125 (pid 11)
>   781 CPU: 0 PID: 11 Comm: kworker/u2:1 Tainted: GW
> 3.12.0-00041-g7f12d39-dirty #250
>   782 Workqueue: writeback bdi_writeback_workfn (flush-ubifs_0_0)
>   783 [] (unwind_backtrace+0x0/0x128) from [] 
> (show_stack+0x20/0x24)
>   784 [] (show_stack+0x20/0x24) from [] 
> (dump_stack+0x20/0x28)
>   785 [] (dump_stack+0x20/0x28) from [] 
> (make_reservation+0x8c/0x560)
>   786 [] (make_reservation+0x8c/0x560) from [] 
> (ubifs_jnl_write_inode+0xbc/0x214)
>   787 [] (ubifs_jnl_write_inode+0xbc/0x214) from [] 
> (ubifs_write_inode+0xec/0x17c)
>   788 [] (ubifs_write_inode+0xec/0x17c) from [] 
> (__writeback_single_inode+0x1fc/0x308)
>   789 [] (__writeback_single_inode+0x1fc/0x308) from [] 
> (writeback_sb_inodes+0x1f8/0x3c4)
>   790 [] (writeback_sb_inodes+0x1f8/0x3c4) from [] 
> (__writeback_inodes_wb+0x80/0xc0)
>   791 [] (__writeback_inodes_wb+0x80/0xc0) from [] 
> (wb_writeback+0x198/0x310)
>   792 [] (wb_writeback+0x198/0x310) from [] 
> (bdi_writeback_workfn+0x15c/0x454)
>   793 [] (bdi_writeback_workfn+0x15c/0x454) from [] 
> (process_one_work+0x280/0x420)
>   794 [] (process_one_work+0x280/0x420) from [] 
> (worker_thread+0x240/0x380)
>   795 [] (worker_thread+0x240/0x380) from [] 
> (kthread+0xbc/0xc8)
>   796 [] (kthread+0xbc/0xc8) from [] 
> (ret_from_fork+0x14/0x20)
>
> - Andy
>
>
>
>
> --- patch ---
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 9f4935b..48e4272 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -93,6 +93,10 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
>  {
> trace_writeback_queue(bdi, work);
>
> +   do {
> +   extern bool ubifs_enable_debug;
> +   WARN_ON(ubifs_enable_debug);
> +   } while (0);
> spin_lock_bh(&bdi->wb_lock);
> list_add_tail(&work->list, &bdi->work_list);
> spin_unlock_bh(&bdi->wb_lock);
> @@ -194,6 +198,11 @@ static void redirty_tail(struct inode *inode, struct 
> bdi_writeback *wb)
> if (time_before(inode->dirtied_when, tail->dirtied_when))
> inode->dirtied_when = jiffies;
> }
> +   do {
> +   extern bool ubifs_enable_debug;
> +   if (ubifs_enable_deb

Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-29 Thread Andrew Ruder
On Wed, Jan 29, 2014 at 04:56:04PM +0100, Richard Weinberger wrote:
> Does the issue also happen if you disable preemption?
> i.e. CONFIG_PREEMPT_NONE=y

CONFIG_PREEMPT_NONE=y still breaks.  I suspect that sync_filesystem
has some blocking behavior that allows other processes to schedule.

Another log is attached and the patch I used to create this log is
below.  I think the most interesting part of this patch is the last hunk
that modifies ubifs_remount_ro.  This clearly shows that after the mount
has been marked as read-only we have dirty inodes waiting for the
writeback to come in and hit all the asserts.

Here's some of the important parts of the log:

  170 pre sync_filesystem
# Notice that while we were running sync_filesystem
# a inode (0xd75ab658) snuck in with a sys_rename
  204 inode: d75ab658
  205 [ cut here ]
  206 WARNING: CPU: 0 PID: 625 at fs/fs-writeback.c:1213 
__mark_inode_dirty+0x2a4/0x2f4()
  207 Modules linked in:
  208 CPU: 0 PID: 625 Comm: fsstress Tainted: GW
3.12.0-00041-g7f12d39-dirty #250
  209 [] (unwind_backtrace+0x0/0x128) from [] 
(show_stack+0x20/0x24)
  210 [] (show_stack+0x20/0x24) from [] 
(dump_stack+0x20/0x28)
  211 [] (dump_stack+0x20/0x28) from [] 
(warn_slowpath_common+0x78/0x98)
  212 [] (warn_slowpath_common+0x78/0x98) from [] 
(warn_slowpath_null+0x2c/0x34)
  213 [] (warn_slowpath_null+0x2c/0x34) from [] 
(__mark_inode_dirty+0x2a4/0x2f4)
  214 [] (__mark_inode_dirty+0x2a4/0x2f4) from [] 
(ubifs_rename+0x4a4/0x600)
  215 [] (ubifs_rename+0x4a4/0x600) from [] 
(vfs_rename+0x280/0x3f4)
  216 [] (vfs_rename+0x280/0x3f4) from [] 
(SyS_renameat+0x18c/0x218)
  217 [] (SyS_renameat+0x18c/0x218) from [] 
(SyS_rename+0x2c/0x30)
  218 [] (SyS_rename+0x2c/0x30) from [] 
(ret_fast_syscall+0x0/0x2c)
  219 ---[ end trace 35ebec8569a53526 ]---
  754 post sync_filesystem
  755 pre prepare_remount
  756 post prepare_remount
  757 prepare_remount success
  758 UBIFS: background thread "ubifs_bgt0_0" stops
  759 we are now a read-only mount
  760 bdi.work_list = d7ac4110, .next = d7ac4110
# WE HAVE DIRTY I/O (notice the .next != &b_dirty)
  761 bdi.wb.b_dirty = d7ac40d8, .next = d75accd0
  762 bdi.wb.b_io = d7ac40e0, .next = d7ac40e0
  763 bdi.wb.b_more_io = d7ac40e8, .next = d7ac40e8
  764 do_remount_sb success
# And now our friend (inode 0xd75ab658) comes in with a writeback after
# we are read-only triggering the assert
  779 inode: d75ab658
  780 UBIFS assert failed in reserve_space at 125 (pid 11)
  781 CPU: 0 PID: 11 Comm: kworker/u2:1 Tainted: GW
3.12.0-00041-g7f12d39-dirty #250
  782 Workqueue: writeback bdi_writeback_workfn (flush-ubifs_0_0)
  783 [] (unwind_backtrace+0x0/0x128) from [] 
(show_stack+0x20/0x24)
  784 [] (show_stack+0x20/0x24) from [] 
(dump_stack+0x20/0x28)
  785 [] (dump_stack+0x20/0x28) from [] 
(make_reservation+0x8c/0x560)
  786 [] (make_reservation+0x8c/0x560) from [] 
(ubifs_jnl_write_inode+0xbc/0x214)
  787 [] (ubifs_jnl_write_inode+0xbc/0x214) from [] 
(ubifs_write_inode+0xec/0x17c)
  788 [] (ubifs_write_inode+0xec/0x17c) from [] 
(__writeback_single_inode+0x1fc/0x308)
  789 [] (__writeback_single_inode+0x1fc/0x308) from [] 
(writeback_sb_inodes+0x1f8/0x3c4)
  790 [] (writeback_sb_inodes+0x1f8/0x3c4) from [] 
(__writeback_inodes_wb+0x80/0xc0)
  791 [] (__writeback_inodes_wb+0x80/0xc0) from [] 
(wb_writeback+0x198/0x310)
  792 [] (wb_writeback+0x198/0x310) from [] 
(bdi_writeback_workfn+0x15c/0x454)
  793 [] (bdi_writeback_workfn+0x15c/0x454) from [] 
(process_one_work+0x280/0x420)
  794 [] (process_one_work+0x280/0x420) from [] 
(worker_thread+0x240/0x380)
  795 [] (worker_thread+0x240/0x380) from [] 
(kthread+0xbc/0xc8)
  796 [] (kthread+0xbc/0xc8) from [] 
(ret_from_fork+0x14/0x20)

- Andy




--- patch ---
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9f4935b..48e4272 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -93,6 +93,10 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 {
trace_writeback_queue(bdi, work);
 
+   do {
+   extern bool ubifs_enable_debug;
+   WARN_ON(ubifs_enable_debug);
+   } while (0);
spin_lock_bh(&bdi->wb_lock);
list_add_tail(&work->list, &bdi->work_list);
spin_unlock_bh(&bdi->wb_lock);
@@ -194,6 +198,11 @@ static void redirty_tail(struct inode *inode, struct 
bdi_writeback *wb)
if (time_before(inode->dirtied_when, tail->dirtied_when))
inode->dirtied_when = jiffies;
}
+   do {
+   extern bool ubifs_enable_debug;
+   if (ubifs_enable_debug) pr_info("inode: %p\n", inode);
+   WARN_ON(ubifs_enable_debug);
+   } while (0);
list_move(&inode->i_wb_list, &wb->b_dirty);
 }
 
@@ -204,6 +213,11 @@ static void requeue_io(struct inode *inode, struct 
bdi_writeback *wb)
 {
assert_spin_locked(&wb->list_lock);
list_move(&inode->i_wb_list, &wb->b_more_io);
+   do {
+  

Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-29 Thread Richard Weinberger


Am 29.01.2014 16:46, schrieb Andrew Ruder:
> On Wed, Jan 29, 2014 at 08:30:45AM +0100, Richard Weinberger wrote:
>> BTW: Can you please share your .config?
> 
> No problem.  FYI, this is for a board that is still in development so
> not all my changes have been submitted for inclusion yet.  I would be
> happy to share the changes now if necessary but I've attached the --stat
> to show that as far as mtd/ubifs goes, i'm basically running a stock
> v3.12:

Does the issue also happen if you disable preemption?
i.e. CONFIG_PREEMPT_NONE=y

Maybe it is really a race...

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-29 Thread Andrew Ruder
On Wed, Jan 29, 2014 at 08:30:45AM +0100, Richard Weinberger wrote:
> BTW: Can you please share your .config?

No problem.  FYI, this is for a board that is still in development so
not all my changes have been submitted for inclusion yet.  I would be
happy to share the changes now if necessary but I've attached the --stat
to show that as far as mtd/ubifs goes, i'm basically running a stock
v3.12:

% git diff --stat v3.12..HEAD
 Documentation/devicetree/bindings/usb/pxa-usb.txt |5 +-
 arch/arm/boot/dts/Makefile|2 +
 arch/arm/boot/dts/pxa270-elecsys_z2.dts   |  330 ++
 arch/arm/boot/dts/pxa270-elecsys_z4.dts   |  434 +++
 arch/arm/boot/dts/pxa27x.dtsi |   13 +-
 arch/arm/boot/dts/pxa2xx.dtsi |   14 +-
 arch/arm/boot/dts/pxa3xx.dtsi |4 +
 arch/arm/configs/elecsys_z2_defconfig | 3089 +++
 arch/arm/mach-pxa/Kconfig |   18 +
 arch/arm/mach-pxa/Makefile|3 +-
 arch/arm/mach-pxa/generic.c   |3 +-
 arch/arm/mach-pxa/include/mach/hardware.h |2 +-
 arch/arm/mach-pxa/irq.c   |8 +-
 arch/arm/mach-pxa/{pxa-dt.c => pxa27x-dt.c}   |   44 +-
 arch/arm/mach-pxa/pxa27x.c|   25 +-
 arch/arm/mach-pxa/{pxa-dt.c => pxa3xx-dt.c}   |   11 +-
 arch/arm/mach-pxa/sleep.S |2 +-
 drivers/irqchip/Kconfig   |8 +
 drivers/irqchip/Makefile  |1 +
 drivers/irqchip/irq-zeus.c|  192 +++
 drivers/mmc/host/pxamci.c |   33 +-
 drivers/net/ethernet/8390/ne.c|4 +-
 drivers/net/ethernet/davicom/dm9000.c |   68 +-
 drivers/tty/serial/pxa.c  |6 +-
 24 files changed, 4222 insertions(+), 97 deletions(-)

And the .config I'm using is attached.

Thanks,
Andy





config.gz
Description: application/gunzip


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-29 Thread Andrew Ruder
On Wed, Jan 29, 2014 at 08:17:35AM +0100, Richard Weinberger wrote:
> So you can trigger this by running fsstress on /mnt and then call
> mount -o remount,ro /mnt?

That's all it takes.  I actually run the remount until it succeeds,
obviously with fsstress going in the background there is a pretty narrow
window where it needs to sneak in with a write between the start of the
sync_filesystem and before the remount check for writers (-EBUSY).

> Can you also trigger it on nandsim or mtdram?

Haven't tried, but I can.  I just don't really have that great of
a development environment for doing this on a desktop machine.

> I did a quick test on my testbed using mtdram and was unable to trigger it.
> But I fear my box is too fast.

I think there is definitely a speed component of my 416 MHz PXA 270 that
makes this pretty easy to hit.  What if you did something like (changing
the ubiattach and mount lines obviously)?

I'll see if I can do this in a qemu, but it might take me a while to get
that setup.

- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-28 Thread Richard Weinberger
Am 29.01.2014 06:32, schrieb Andrew Ruder:
> Ok, I've got some more useful information.  I have been adding
> a multitude of WARN_ON's and prink's all over the remount code and have
> come up with the attached log.
> 
> A little bit of explanation:
> 
> Line 1: sync_filesystem (from do_remount_sb)
> Line 188: sync_filesystem ends
> Line 43, 64, 81, 98, 114, 135, 156, 173: write operations that occur
> during sync_filesystem.  Before each warning I print the inode pointer.
> Line 197: read-only remount has completely finished (this message is
> from userspace post remount)
> Line 199: a sync is called, there are apparently dirty inodes in our
> now-readonly ubifs filesystem
> Line 215: failed assert that occurs because the writeback triggers for
> inode 0xd75b9450 (see line 41, it got in with a sys_write while we were
> running our sync_filesystem in do_remount_sb)
> 
> Does this help?  It looks like there is a race condition between the
> writeback code and the remount read-only.  Nothing is done to lock out
> writes during the first half of the do_remount_sb and some stuff makes
> it into the writeback worker queue while we are busy syncing the
> filesystem only to trigger later when ubifs has decided it is
> read-only...
> 
> Note: I only barely know what I am talking about - filesystems still not
> my forte :)

BTW: Can you please share your .config?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-28 Thread Richard Weinberger
Am 29.01.2014 06:32, schrieb Andrew Ruder:
> Ok, I've got some more useful information.  I have been adding
> a multitude of WARN_ON's and prink's all over the remount code and have
> come up with the attached log.
> 
> A little bit of explanation:
> 
> Line 1: sync_filesystem (from do_remount_sb)
> Line 188: sync_filesystem ends
> Line 43, 64, 81, 98, 114, 135, 156, 173: write operations that occur
> during sync_filesystem.  Before each warning I print the inode pointer.
> Line 197: read-only remount has completely finished (this message is
> from userspace post remount)
> Line 199: a sync is called, there are apparently dirty inodes in our
> now-readonly ubifs filesystem
> Line 215: failed assert that occurs because the writeback triggers for
> inode 0xd75b9450 (see line 41, it got in with a sys_write while we were
> running our sync_filesystem in do_remount_sb)
> 
> Does this help?  It looks like there is a race condition between the
> writeback code and the remount read-only.  Nothing is done to lock out
> writes during the first half of the do_remount_sb and some stuff makes
> it into the writeback worker queue while we are busy syncing the
> filesystem only to trigger later when ubifs has decided it is
> read-only...

So you can trigger this by running fsstress on /mnt and then call
mount -o remount,ro /mnt?

Can you also trigger it on nandsim or mtdram?

I did a quick test on my testbed using mtdram and was unable to trigger it.
But I fear my box is too fast.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-28 Thread Andrew Ruder
Ok, I've got some more useful information.  I have been adding
a multitude of WARN_ON's and prink's all over the remount code and have
come up with the attached log.

A little bit of explanation:

Line 1: sync_filesystem (from do_remount_sb)
Line 188: sync_filesystem ends
Line 43, 64, 81, 98, 114, 135, 156, 173: write operations that occur
during sync_filesystem.  Before each warning I print the inode pointer.
Line 197: read-only remount has completely finished (this message is
from userspace post remount)
Line 199: a sync is called, there are apparently dirty inodes in our
now-readonly ubifs filesystem
Line 215: failed assert that occurs because the writeback triggers for
inode 0xd75b9450 (see line 41, it got in with a sys_write while we were
running our sync_filesystem in do_remount_sb)

Does this help?  It looks like there is a race condition between the
writeback code and the remount read-only.  Nothing is done to lock out
writes during the first half of the do_remount_sb and some stuff makes
it into the writeback worker queue while we are busy syncing the
filesystem only to trigger later when ubifs has decided it is
read-only...

Note: I only barely know what I am talking about - filesystems still not
my forte :)

Thanks,
Andy


  1: pre sync_filesystem
  2: [ cut here ]
  3: WARNING: CPU: 0 PID: 645 at fs/fs-writeback.c:98 
bdi_queue_work+0xc8/0x130()
  4: Modules linked in:
  5: CPU: 0 PID: 645 Comm: mount Tainted: GW
3.12.0-00041-g7f12d39-dirty #236
  6: [] (unwind_backtrace+0x0/0x128) from [] 
(show_stack+0x20/0x24)
  7: [] (show_stack+0x20/0x24) from [] 
(dump_stack+0x20/0x28)
  8: [] (dump_stack+0x20/0x28) from [] 
(warn_slowpath_common+0x78/0x98)
  9: [] (warn_slowpath_common+0x78/0x98) from [] 
(warn_slowpath_null+0x2c/0x34)
 10: [] (warn_slowpath_null+0x2c/0x34) from [] 
(bdi_queue_work+0xc8/0x130)
 11: [] (bdi_queue_work+0xc8/0x130) from [] 
(writeback_inodes_sb_nr+0xb4/0xd4)
 12: [] (writeback_inodes_sb_nr+0xb4/0xd4) from [] 
(writeback_inodes_sb+0x30/0x34)
 13: [] (writeback_inodes_sb+0x30/0x34) from [] 
(sync_filesystem+0x50/0xbc)
 14: [] (sync_filesystem+0x50/0xbc) from [] 
(do_remount_sb+0xb8/0x1fc)
 15: [] (do_remount_sb+0xb8/0x1fc) from [] 
(do_mount+0x2e8/0x81c)
 16: [] (do_mount+0x2e8/0x81c) from [] (SyS_mount+0x94/0xc8)
 17: [] (SyS_mount+0x94/0xc8) from [] 
(ret_fast_syscall+0x0/0x2c)
 18: ---[ end trace c6e04f3813781dc3 ]---
 19: Adding tail
 20: writeback_workfn d781c110 d700fe8c
 21: inode: d75a2c30
 22: inode: d75b8c30
 23: [ cut here ]
 24: WARNING: CPU: 0 PID: 645 at fs/fs-writeback.c:98 
bdi_queue_work+0xc8/0x130()
 25: Modules linked in:
 26: CPU: 0 PID: 645 Comm: mount Tainted: GW
3.12.0-00041-g7f12d39-dirty #236
 27: [] (unwind_backtrace+0x0/0x128) from [] 
(show_stack+0x20/0x24)
 28: [] (show_stack+0x20/0x24) from [] 
(dump_stack+0x20/0x28)
 29: [] (dump_stack+0x20/0x28) from [] 
(warn_slowpath_common+0x78/0x98)
 30: [] (warn_slowpath_common+0x78/0x98) from [] 
(warn_slowpath_null+0x2c/0x34)
 31: [] (warn_slowpath_null+0x2c/0x34) from [] 
(bdi_queue_work+0xc8/0x130)
 32: [] (bdi_queue_work+0xc8/0x130) from [] 
(sync_inodes_sb+0xbc/0x1bc)
 33: [] (sync_inodes_sb+0xbc/0x1bc) from [] 
(sync_filesystem+0x88/0xbc)
 34: [] (sync_filesystem+0x88/0xbc) from [] 
(do_remount_sb+0xb8/0x1fc)
 35: [] (do_remount_sb+0xb8/0x1fc) from [] 
(do_mount+0x2e8/0x81c)
 36: [] (do_mount+0x2e8/0x81c) from [] (SyS_mount+0x94/0xc8)
 37: [] (SyS_mount+0x94/0xc8) from [] 
(ret_fast_syscall+0x0/0x2c)
 38: ---[ end trace c6e04f3813781dc4 ]---
 39: Adding tail
 40: writeback_workfn d781c110 d700fe9c
 41: inode: d75b9450
 42: [ cut here ]
 43: WARNING: CPU: 0 PID: 651 at fs/fs-writeback.c:1215 
__mark_inode_dirty+0x2dc/0x32c()
 44: Modules linked in:
 45: CPU: 0 PID: 651 Comm: fsstress Tainted: GW
3.12.0-00041-g7f12d39-dirty #236
 46: [] (unwind_backtrace+0x0/0x128) from [] 
(show_stack+0x20/0x24)
 47: [] (show_stack+0x20/0x24) from [] 
(dump_stack+0x20/0x28)
 48: [] (dump_stack+0x20/0x28) from [] 
(warn_slowpath_common+0x78/0x98)
 49: [] (warn_slowpath_common+0x78/0x98) from [] 
(warn_slowpath_null+0x2c/0x34)
 50: [] (warn_slowpath_null+0x2c/0x34) from [] 
(__mark_inode_dirty+0x2dc/0x32c)
 51: [] (__mark_inode_dirty+0x2dc/0x32c) from [] 
(__set_page_dirty_nobuffers+0xf4/0x110)
 52: [] (__set_page_dirty_nobuffers+0xf4/0x110) from [] 
(ubifs_write_end+0x1c8/0x294)
 53: [] (ubifs_write_end+0x1c8/0x294) from [] 
(generic_file_buffered_write+0x17c/0x260)
 54: [] (generic_file_buffered_write+0x17c/0x260) from [] 
(__generic_file_aio_write+0x420/0x448)
 55: [] (__generic_file_aio_write+0x420/0x448) from [] 
(generic_file_aio_write+0x68/0xac)
 56: [] (generic_file_aio_write+0x68/0xac) from [] 
(ubifs_aio_write+0x17c/0x18c)
 57: [] (ubifs_aio_write+0x17c/0x18c) from [] 
(do_sync_write+0x7c/0xa0)
 58: [] (do_sync_write+0x7c/0xa0) from [] 
(vfs_write+0xe4/0x194)
 59: [] (vfs_write+0xe4/0x194) from [] (SyS_

Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-27 Thread Andrew Ruder
On Sat, Jan 25, 2014 at 04:02:15PM +0100, Richard Weinberger wrote:
> So ubifs_bgt0_0 stops and the fun begins.
> Can you trigger the issue also by unmounting /mnt?
> I.e umount -l /mnt
> The background thread should only stop after all io is done...

Did some experiments last week to see if I could trigger the bug with
full debug messages enabled.  Biggest problem is that I don't have
non-volatile memory available, serial logging slows it down too much to
trigger the bug, and the reboot tends to shut down any attempt to
offload the log to capture the relevant messages.

That being said, I was able to trigger the bug with the following:

[root@buildroot ~]# (sleep 5 ; while ! mount -o remount,ro /mnt ; do true ; 
done ; echo remount > /dev/kmsg ; sleep 5 ; echo reboot > /dev/kmsg ; reboot ) &
[2] 564
[root@buildroot ~]# fsstress -p 10 -n 10 -X -d /mnt/fsstress -l 0

In my log I can see the "remount" message and 100ms later I can see the
first ubifs assert.  I've attached the relevant portion of the logs
below from the first time I see LEB 44 mentioned through the asserts.
I've put the logs on the web due to concerns of flooding the mailing
list with 100's of kB in attachments.

https://gist.github.com/aeruder/8651928

ubi_corruption.txt is the kernel log
afterwards.txt is the console log with the ensuing issue with ubifs

I also have logs of the recovery process in the Linux kernel later on,
(still takes 2 mounts), an image of the MTD device, and would be happy
to try anything or enable any additional debug messages.

> Can you also please find out whether fssstress is still running when
> reboot takes action?

Thanks for taking a look.  I'm reading everything I can find about ubifs
to see if I can make some headway into understanding what is going on
but filesytems are definitely not my forte :).

Cheers,
Andrew Ruder
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] reproducable ubifs reboot assert and corruption

2014-01-24 Thread Andrew Ruder
Just expanding distribution a little bit to see if anyone has any ideas.

On Tue, Jan 21, 2014 at 11:15:10PM -0600, Andrew Ruder wrote:
> Problem:
> ubifs corruption to the point where uboot can no longer deal with it and
> it takes multiple mounts to recover filesystem from Linux.
> 
> My hardware:
> NOR flash
> [3.244397] 0.flash: Found 1 x16 devices at 0x0 in 16-bit bank. 
> Manufacturer ID 0x01 Chip ID 0x002301
> PXA270
> 
> My software:
> 3.12.0 with some patches (not mtd or ubi or flash related) to support my
> board.
> 
> I'm able to reproduce this fairly readily by the following sequence of
> commands.  I'm not able to trigger it with full debug messages enabled
> but possibly I could enable them for certain files within the ubifs
> driver if something specific would be helpful.  But basically it seems
> like if I reboot (or mount -o remount,ro) while ubifs is writing, it
> sometimes crashes and leaves the partition in a bad state.  Sorry about
> some of the strangeness of the commands, I am scripting this with expect
> and this is my lame attempt to give me something to pattern recognize
> off of.
> 
> I should also point out, I have not had any problems at all when I do an
> abrupt shutdown (either via pulling power or letting a hardware watchdog
> take care of the reset).
> 
> ==
> ==
> ==
> 
> Here's the log of ubifs driver crashing.
> 
> [0.00] Booting Linux on physical CPU 0x0
> [0.00] Linux version 3.12.0-00041-g7f12d39-dirty 
> (andy@andrewruder-hplin) (gcc version 4.8.2 (Buildroot 
> 2013.11-rc1-00028-gf388663) ) #210 PREEMPT Tue Jan 21 21:36:54 CST 2014
> [0.00] CPU: XScale-PXA270 [69054118] revision 8 (ARMv5TE), cr=397f
> ...
> [3.211589] 0.flash: Found 1 x16 devices at 0x0 in 16-bit bank. 
> Manufacturer ID 0x01 Chip ID 0x002301
> [3.322188] Amd/Fujitsu Extended Query Table at 0x0040
> [3.327405]   Amd/Fujitsu Extended Query version 1.5.
> [3.370661] number of CFI chips: 1
> [3.378822] 5 cmdlinepart partitions found on MTD device 0.flash
> [3.385273] Creating 5 MTD partitions on "0.flash":
> [3.390212] 0x-0x0008 : "uboot"
> [3.423423] 0x0008-0x000a : "env"
> [3.503423] 0x000a-0x000c : "env_redund"
> [3.603453] 0x000c-0x000e : "env_default"
> [3.674772] 0x000e-0x0400 : "data"
> ...
> Welcome to Buildroot
> buildroot login: root
> Password: 
> [root@buildroot ~]# echo 7 > /proc/sys/kernel/printk; a=$?; sleep 2 ; [ x$a = 
> x0 ] || echo "@@FAIL@@"
> [root@buildroot ~]# ubiattach -m 4 -d 0; a=$?; sleep 2 ; [ x$a = x0 ] || echo 
> "@@FAIL@@"
> [   10.055595] UBI: attaching mtd4 to ubi0
> [   10.107901] UBI: scanning is finished
> [   10.172362] UBI: attached mtd4 (name "data", size 63 MiB) to ubi0
> [   10.178509] UBI: PEB size: 131072 bytes (128 KiB), LEB size: 130944 bytes
> [   10.220887] UBI: min./max. I/O unit sizes: 1/512, sub-page size 1
> [   10.228545] UBI: VID header offset: 64 (aligned 64), data offset: 128
> [   10.240011] UBI: good PEBs: 505, bad PEBs: 0, corrupted PEBs: 0
> [   10.247227] UBI: user volume: 1, internal volumes: 1, max. volumes count: 
> 128
> [   10.255658] UBI: max/mean erase counter: 42/22, WL threshold: 4096, image 
> sequence number: 2104545903
> [   10.266343] UBI: available PEBs: 4, total reserved PEBs: 501, PEBs 
> reserved for bad PEB handling: 0
> [   10.276740] UBI: background thread "ubi_bgt0d" started, PID 510
> UBI device number 0, total 505 LEBs (66126720 bytes, 63.1 MiB), available 4 
> LEBs (523776 bytes, 511.5 KiB), LEB size 130944 bytes (127.9 KiB)
> [root@buildroot ~]# mkdir -p /mnt; a=$?; sleep 2 ; [ x$a = x0 ] || echo 
> "@@FAIL@@"
> [root@buildroot ~]# mount -t ubifs ubi0:rootfs /mnt; a=$?; sleep 2 ; [ x$a = 
> x0 ] || echo "@@FAIL@@"
> [   14.463104] UBIFS: background thread "ubifs_bgt0_0" started, PID 516
> [   14.985636] UBIFS: mounted UBI device 0, volume 0, name "rootfs"
> [   14.991670] UBIFS: LEB size: 130944 bytes (127 KiB), min./max. I/O unit 
> sizes: 8 bytes/512 bytes
> [   15.008984] UBIFS: FS size: 63769728 bytes (60 MiB, 487 LEBs), journal 
> size 3142656 bytes (2 MiB, 24 LEBs)
> [   15.020052] UBIFS: reserved for root: 3012001 bytes (2941 KiB)
> [   15.027374] UBIFS: media format: w4/r0 (latest is w4/r0), UUID 
> 567F5BCB-663D-4F84-BF34-3282E100D2D9, small LPT model
> [root@buildroot ~]# rm -fr /mnt/fsstress; a=$?; sleep 2 ; [ x$a = x0 ] || 
> echo "@@FAIL@@"
> [root@buildroot ~]# mkdir -p /mnt/fsstress; a=$?; sleep 2 ; [ x$a = x0 ] || 
> echo "@@FAIL@@"
> [root@buildroot ~]# (sleep 41 ; reboot) &
> [1] 522
> [root@buildroot ~]# fsstress -p 10 -n 10 -X -d /mnt/fsstress -l 0
> seed = 653587
> [   64.672770] UBIFS: background thread "ubifs_bgt0_0" stops
> The system is going down NOW!
>