Re: Post ext3 conversion problems

2016-09-19 Thread Sean Greenslade
On Mon, Sep 19, 2016 at 02:30:28PM +0800, Qu Wenruo wrote:
> All chunks are completed convert to DUP, no small chunk, all to its maximum
> chunk size.
> So from chunk level, nothing related to convert yet.
> 
> But for extent tree, I found several extents are heavily referred to.
> Like extent 158173081600 or 183996522496.
> 
> If you're not using off-band dedupe, then it's quite possible that's the
> remaining structure of convert.

I never ran any sort of dedup on this partition.

> Not pretty sure if it's related to the bug, but did you do the
> balance/defrag operation just after removing ext_save subvolume?

That's quite possible. I did it in a live boot, so I don't have the bash
history to check. I checked it just now using "btrfs subvol list -d",
and there's nothing listed. I ran a full balance after that, but the
problem remains. So whatever the problem is, it can survive a full
balance after the ext_save subvol is completely deleted.

--Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-18 Thread Sean Greenslade
On Mon, Sep 19, 2016 at 10:20:37AM +0800, Qu Wenruo wrote:
> 
> -95 is -EOPNOTSUPP.
> 
> Not a common errno in btrfs.
> 
> Most EOPNOTSUPP are related to discard and crapped fallcate/drop extents.
> 
> Then are you using discard mount option?

I did indeed have the discard mount option enabled. I tried booting with
discard disabled, but the same problem appeared.

> 
> Normally a btrfs-debug-tree would help in most case, but this time it seems
> to be a runtime scrub bug other than on-disk metadata corruption.
> 
> What I can see here is, with all your operation, your fs should be a normal
> btrfs, other than converted one.
> 
> To confirm my idea, would you please upload the following things if your
> filesystem is not too large?
> 
> # btrfs-debug-tree -t extent 
> # btrfs-debug-tree -t chunk 
> # btrfs-debug-tree -t dev 
> 
> There is no file/dir name/data contained in the dump. So it's just
> chunk/extent allocation info.
> You could upload them at ease.
> 
> > Not a mess, I think it's a good bug report. I think Qu and David know
> > more about the latest iteration of the convert code. If you can wait
> > until next week at least to see if they have questions that'd be best.
> > If you need to get access to the computer sooner than later I suggest
> > btrfs-image -c9 -t4 -s to make a filename sanitized copy of the
> > filesystem metadata for them to look at, just in case. They might be
> > able to figure out the problem just from the stack trace, but better
> > to have the image before blowing away the file system, just in case
> > they want it.
> 
> Yes, btrfs-image dump would be the best.
> Although sanitizing may takes a long time and the output may be too large.

I had posted a btrfs-image before. It was run with a single -s flag:

http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs

Here's the debug tree data:

http://phead.us/tmp/wheatley_chunk_2016-09-18.dump.gz
http://phead.us/tmp/wheatley_extent_2016-09-18.dump.gz
http://phead.us/tmp/wheatley_dev_2016-09-18.dump.gz

Thanks,

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-18 Thread Qu Wenruo



At 09/17/2016 04:23 AM, Chris Murphy wrote:

On Fri, Sep 16, 2016 at 1:25 PM, Sean Greenslade
 wrote:

Hi, all. I've been playing around with an old laptop of mine, and I
figured I'd use it as a learning / bugfinding opportunity. Its /home
partition was originally ext3. I have a full partition image of this
drive as a backup, so I can do (and have done) potentially destructive
things. The system disk is a ~6 year old SSD.

To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)


Although there are reports of false btrfsck alerts of 4.7.1, 
btrfs-convert is not related to that false alert, and I assume it's OK.



and ran a simple btrfs-convert on it. After patching up the fstab and
rebooting, everything seemed fine. I deleted the recovery subvol, ran a
full balance, ran a full defrag, and rebooted again. I then decided to
try (as an experiment) using DUP mode for data and metadata. I ran that
balance without issue, then started using the machine. Sometime later, I
got the following remount ro:

[ 7316.764235] [ cut here ]
[ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
[ 7316.764297] BTRFS: Transaction aborted (error -95)
[ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs 
iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 
snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp 
snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core 
snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi 
thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) 
vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables 
x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci 
uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
[ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
[ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O
4.7.3-5-ck #1
[ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903   
 11/08/2010
[ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7316.764513]  0286 6101f47d 8800230dbc78 
812f0215
[ 7316.764522]  8800230dbcc8  8800230dbcb8 
8107ae6f
[ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
880014101d40
[ 7316.764538] Call Trace:
[ 7316.764551]  [] dump_stack+0x63/0x8e
[ 7316.764560]  [] __warn+0xcf/0xf0
[ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
[ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
[ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
[ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]


This means btrfs_update_inode_fallback() fails.



[ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
[ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 [btrfs]


Scrub code then. Not that familiar though.


[ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
[ 7316.764799]  [] process_one_work+0x1ed/0x490
[ 7316.764806]  [] worker_thread+0x49/0x500
[ 7316.764813]  [] ? process_one_work+0x490/0x490
[ 7316.764820]  [] kthread+0xda/0xf0
[ 7316.764830]  [] ret_from_fork+0x1f/0x40
[ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
[ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
[ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
errno=-95 unknown


-95 is -EOPNOTSUPP.

Not a common errno in btrfs.

Most EOPNOTSUPP are related to discard and crapped fallcate/drop extents.

Then are you using discard mount option?


[ 7316.764859] BTRFS info (device sda2): forced readonly
[ 7316.765396] pending csums is 9437184

After seeing this, I decided to attempt a repair (confident that I could
restore from backup if it failed). At the time, I was unaware of the
issues with progs 4.7.1, so when I ran the check and saw all the
incorrect backrefs messages, I figured that was my problem and ran the
--repair. Of course, this didn't make the messages go away on subsequent
checks, so I looked further and found this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=155791

I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
the logs from these, unfortunately). The repair seemed to work (I also
used --init-extent-tree), as current checks don't report any errors.


Personally I pretty trust btrfsck, as it's based on tons of error we 
have exposed, and it's much easier to code to expose problems.


Unless there are something wrong we never met before, at least your 
on-disk metadata should be OK.




The system boots and mounts the FS just 

Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 07:27:58PM -0700, Liu Bo wrote:
> Interesting, seems that we get errors from 
> 
> btrfs_finish_ordered_io
>   insert_reserved_file_extent
> __btrfs_drop_extents
> 
> And splitting an inline extent throws -95.

Heh, you beat me to the draw. I was just coming to the same conclusion
myself from poking at the source code. What's interesting is that it
seems to be a quite explicit thing:

if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
ret = -EOPNOTSUPP;
break;
}

So now the question is why is this happening? Clearly the presence of
inline extents isn't an issue by itself, since another one of my btrfs
/home partitions has plenty of them.

I added some debug prints to my kernel to catch the inode that tripped
the error. Here's the relevant chunk (with filenames scrubbed) from
btrfs-debug-tree:

Inode 140345 triggered the transaction abort.

leaf 175131459584 items 51 free space 7227 generation 118521 owner 5
fs uuid 1d9ee7c7-d13a-4c3c-b730-256c70841c5b
chunk uuid b67a1a82-ff22-48b5-af1b-9d5f85ebee25
item 0 key (140343 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 1 transid 1 size 180 nbytes 0
block group 0 mode 40755 links 1 uid 1000 gid 1000
rdev 0 flags 0x0(none)
item 1 key (140343 INODE_REF 131327) itemoff 16107 itemsize 16
inode ref index 199 namelen 6 name: 
item 2 key (140343 DIR_ITEM 1073386496) itemoff 16072 itemsize 35
location key (142600 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 3 key (140343 DIR_ITEM 1148422723) itemoff 16037 itemsize 35
location key (142601 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 4 key (140343 DIR_ITEM 2415965623) itemoff 16004 itemsize 33
location key (131550 INODE_ITEM 0) type SYMLINK
namelen 3 datalen 0 name: 
item 5 key (140343 DIR_ITEM 2448077466) itemoff 15965 itemsize 39
location key (140565 INODE_ITEM 0) type FILE
namelen 9 datalen 0 name: 
item 6 key (140343 DIR_ITEM 2566671093) itemoff 15930 itemsize 35
location key (140564 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 7 key (140343 DIR_ITEM 3391512089) itemoff 15873 itemsize 57
location key (142599 INODE_ITEM 0) type FILE
namelen 27 datalen 0 name: 
item 8 key (140343 DIR_ITEM 3621719155) itemoff 15838 itemsize 35
location key (131627 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 9 key (140343 DIR_ITEM 3701680574) itemoff 15798 itemsize 40
location key (142603 INODE_ITEM 0) type FIFO
namelen 10 datalen 0 name: 
item 10 key (140343 DIR_ITEM 3816117430) itemoff 15763 itemsize 35
location key (140563 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 11 key (140343 DIR_ITEM 4214885080) itemoff 15729 itemsize 34
location key (131544 INODE_ITEM 0) type SYMLINK
namelen 4 datalen 0 name: 
item 12 key (140343 DIR_ITEM 4253409616) itemoff 15687 itemsize 42
location key (140352 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: 
item 13 key (140343 DIR_INDEX 2) itemoff 15653 itemsize 34
location key (131544 INODE_ITEM 0) type SYMLINK
namelen 4 datalen 0 name: 
item 14 key (140343 DIR_INDEX 3) itemoff 15620 itemsize 33
location key (131550 INODE_ITEM 0) type SYMLINK
namelen 3 datalen 0 name: 
item 15 key (140343 DIR_INDEX 4) itemoff 15585 itemsize 35
location key (131627 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 16 key (140343 DIR_INDEX 5) itemoff 15543 itemsize 42
location key (140352 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: 
item 17 key (140343 DIR_INDEX 6) itemoff 15508 itemsize 35
location key (140563 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 18 key (140343 DIR_INDEX 7) itemoff 15473 itemsize 35
location key (140564 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 19 key (140343 DIR_INDEX 8) itemoff 15434 itemsize 39
location key (140565 INODE_ITEM 0) type FILE
namelen 9 datalen 0 name: 
item 20 key (140343 DIR_INDEX 9) itemoff 15377 itemsize 57
location key (142599 INODE_ITEM 0) type FILE
namelen 27 datalen 0 name: 
item 21 key (140343 DIR_INDEX 10) itemoff 15342 itemsize 35
location key (142600 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 22 key (140343 DIR_INDEX 

Re: Post ext3 conversion problems

2016-09-16 Thread Liu Bo
On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote:
> Hi, all. I've been playing around with an old laptop of mine, and I
> figured I'd use it as a learning / bugfinding opportunity. Its /home
> partition was originally ext3. I have a full partition image of this
> drive as a backup, so I can do (and have done) potentially destructive
> things. The system disk is a ~6 year old SSD.
> 
> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
> and ran a simple btrfs-convert on it. After patching up the fstab and
> rebooting, everything seemed fine. I deleted the recovery subvol, ran a
> full balance, ran a full defrag, and rebooted again. I then decided to
> try (as an experiment) using DUP mode for data and metadata. I ran that
> balance without issue, then started using the machine. Sometime later, I
> got the following remount ro:
> 
> [ 7316.764235] [ cut here ]
> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
> btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
> [ 7316.764297] BTRFS: Transaction aborted (error -95)
> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
> ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic 
> iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common 
> ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse 
> input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 
> snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore 
> shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm 
> sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop 
> sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw 
> atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
> [ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt 
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O   
>  4.7.3-5-ck #1
> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 
>11/08/2010
> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7316.764513]  0286 6101f47d 8800230dbc78 
> 812f0215
> [ 7316.764522]  8800230dbcc8  8800230dbcb8 
> 8107ae6f
> [ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
> 880014101d40
> [ 7316.764538] Call Trace:
> [ 7316.764551]  [] dump_stack+0x63/0x8e
> [ 7316.764560]  [] __warn+0xcf/0xf0
> [ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
> [ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
> [ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
> [ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 
> [btrfs]
> [ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
> [ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 
> [btrfs]
> [ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
> [ 7316.764799]  [] process_one_work+0x1ed/0x490
> [ 7316.764806]  [] worker_thread+0x49/0x500
> [ 7316.764813]  [] ? process_one_work+0x490/0x490
> [ 7316.764820]  [] kthread+0xda/0xf0
> [ 7316.764830]  [] ret_from_fork+0x1f/0x40
> [ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
> errno=-95 unknown
> [ 7316.764859] BTRFS info (device sda2): forced readonly
> [ 7316.765396] pending csums is 9437184
> 
> After seeing this, I decided to attempt a repair (confident that I could
> restore from backup if it failed). At the time, I was unaware of the
> issues with progs 4.7.1, so when I ran the check and saw all the
> incorrect backrefs messages, I figured that was my problem and ran the
> --repair. Of course, this didn't make the messages go away on subsequent
> checks, so I looked further and found this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=155791
> 
> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
> the logs from these, unfortunately). The repair seemed to work (I also
> used --init-extent-tree), as current checks don't report any errors.
> 
> The system boots and mounts the FS just fine. I can read from it all
> day, scrubs complete without failure, but just using the system for a
> while will eventually trigger the same "Transaction aborted (error -95)"
> error.
> 
> I realize this is something of a mess, and that I was less than
> methodical with my actions so far. Given that I have a full backup that
> can be restored if need be (and I certainly could try running the
> convert again), what is my best course of action?

Interesting, seems that we get errors from 

btrfs_finish_ordered_io
  insert_reserved_file_extent

Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 05:45:59PM -0600, Chris Murphy wrote:
> On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade
>  wrote:
> 
> > In the mean time, is there any way to make the kernel more verbose about
> > btrfs errors? It would be nice to see, for example, what was in the
> > transaction that failed, or at least what files / metadata it was
> > touching.
> 
> No idea. Maybe one of the compile time options:
> 
> 
> CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
> This also requires mount options, either check_int or check_int_data
> CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> CONFIG_BTRFS_DEBUG=y
> https://patchwork.kernel.org/patch/846462/
> CONFIG_BTRFS_ASSERT=y
> 
> Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX'
> 
> That might explode in the vicinity of the problem. Thing is, btrfs
> check doesn't see anything wrong with the metadata, so chances are
> debug-tree won't either.

Hmm, I'll probably have a go at compiling the latest mainline kernel
with CONFIG_BTRFS_DEBUG enabled. It certainly can't hurt to try.

And as you suspected, btrfs-debug-tree didn't explode / error out on me.
I didn't thoroughly inspect the output (as I have very little
understanding of the btrfs internals), but it all seemed OK.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade
 wrote:

> In the mean time, is there any way to make the kernel more verbose about
> btrfs errors? It would be nice to see, for example, what was in the
> transaction that failed, or at least what files / metadata it was
> touching.

No idea. Maybe one of the compile time options:


CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
This also requires mount options, either check_int or check_int_data
CONFIG_BTRFS_FS_RUN_SANITY_TESTS
CONFIG_BTRFS_DEBUG=y
https://patchwork.kernel.org/patch/846462/
CONFIG_BTRFS_ASSERT=y

Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX'

That might explode in the vicinity of the problem. Thing is, btrfs
check doesn't see anything wrong with the metadata, so chances are
debug-tree won't either.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 02:23:44PM -0600, Chris Murphy wrote:
> Not a mess, I think it's a good bug report. I think Qu and David know
> more about the latest iteration of the convert code. If you can wait
> until next week at least to see if they have questions that'd be best.
> If you need to get access to the computer sooner than later I suggest
> btrfs-image -c9 -t4 -s to make a filename sanitized copy of the
> filesystem metadata for them to look at, just in case. They might be
> able to figure out the problem just from the stack trace, but better
> to have the image before blowing away the file system, just in case
> they want it.

I can hang on to the system in its current state, I don't particularly
need this machine fully operational.

Just to be proactive, I ran the btrfs-image as follows:

btrfs-image -c9 -t4 -s -w /dev/sda2 dumpfile

http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs

In the mean time, is there any way to make the kernel more verbose about
btrfs errors? It would be nice to see, for example, what was in the
transaction that failed, or at least what files / metadata it was
touching.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 1:25 PM, Sean Greenslade
 wrote:
> Hi, all. I've been playing around with an old laptop of mine, and I
> figured I'd use it as a learning / bugfinding opportunity. Its /home
> partition was originally ext3. I have a full partition image of this
> drive as a backup, so I can do (and have done) potentially destructive
> things. The system disk is a ~6 year old SSD.
>
> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
> and ran a simple btrfs-convert on it. After patching up the fstab and
> rebooting, everything seemed fine. I deleted the recovery subvol, ran a
> full balance, ran a full defrag, and rebooted again. I then decided to
> try (as an experiment) using DUP mode for data and metadata. I ran that
> balance without issue, then started using the machine. Sometime later, I
> got the following remount ro:
>
> [ 7316.764235] [ cut here ]
> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
> btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
> [ 7316.764297] BTRFS: Transaction aborted (error -95)
> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
> ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic 
> iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common 
> ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse 
> input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 
> snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore 
> shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm 
> sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop 
> sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw 
> atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
> [ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt 
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O   
>  4.7.3-5-ck #1
> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 
>11/08/2010
> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7316.764513]  0286 6101f47d 8800230dbc78 
> 812f0215
> [ 7316.764522]  8800230dbcc8  8800230dbcb8 
> 8107ae6f
> [ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
> 880014101d40
> [ 7316.764538] Call Trace:
> [ 7316.764551]  [] dump_stack+0x63/0x8e
> [ 7316.764560]  [] __warn+0xcf/0xf0
> [ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
> [ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
> [ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
> [ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 
> [btrfs]
> [ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
> [ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 
> [btrfs]
> [ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
> [ 7316.764799]  [] process_one_work+0x1ed/0x490
> [ 7316.764806]  [] worker_thread+0x49/0x500
> [ 7316.764813]  [] ? process_one_work+0x490/0x490
> [ 7316.764820]  [] kthread+0xda/0xf0
> [ 7316.764830]  [] ret_from_fork+0x1f/0x40
> [ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
> errno=-95 unknown
> [ 7316.764859] BTRFS info (device sda2): forced readonly
> [ 7316.765396] pending csums is 9437184
>
> After seeing this, I decided to attempt a repair (confident that I could
> restore from backup if it failed). At the time, I was unaware of the
> issues with progs 4.7.1, so when I ran the check and saw all the
> incorrect backrefs messages, I figured that was my problem and ran the
> --repair. Of course, this didn't make the messages go away on subsequent
> checks, so I looked further and found this bug:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=155791
>
> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
> the logs from these, unfortunately). The repair seemed to work (I also
> used --init-extent-tree), as current checks don't report any errors.
>
> The system boots and mounts the FS just fine. I can read from it all
> day, scrubs complete without failure, but just using the system for a
> while will eventually trigger the same "Transaction aborted (error -95)"
> error.
>
> I realize this is something of a mess, and that I was less than
> methodical with my actions so far. Given that I have a full backup that
> can be restored if need be (and I certainly could try running the
> convert again), what is my best course of action?


Not a mess, I think it's a good bug report. I think Qu and David know
more about the latest 

Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
Hi, all. I've been playing around with an old laptop of mine, and I
figured I'd use it as a learning / bugfinding opportunity. Its /home
partition was originally ext3. I have a full partition image of this
drive as a backup, so I can do (and have done) potentially destructive
things. The system disk is a ~6 year old SSD.

To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
and ran a simple btrfs-convert on it. After patching up the fstab and
rebooting, everything seemed fine. I deleted the recovery subvol, ran a
full balance, ran a full defrag, and rebooted again. I then decided to
try (as an experiment) using DUP mode for data and metadata. I ran that
balance without issue, then started using the machine. Sometime later, I
got the following remount ro:

[ 7316.764235] [ cut here ]
[ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
[ 7316.764297] BTRFS: Transaction aborted (error -95)
[ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs 
iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 
snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp 
snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core 
snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi 
thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) 
vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables 
x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci 
uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
[ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
[ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O
4.7.3-5-ck #1
[ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903   
 11/08/2010
[ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7316.764513]  0286 6101f47d 8800230dbc78 
812f0215
[ 7316.764522]  8800230dbcc8  8800230dbcb8 
8107ae6f
[ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
880014101d40
[ 7316.764538] Call Trace:
[ 7316.764551]  [] dump_stack+0x63/0x8e
[ 7316.764560]  [] __warn+0xcf/0xf0
[ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
[ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
[ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
[ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
[ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
[ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 [btrfs]
[ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
[ 7316.764799]  [] process_one_work+0x1ed/0x490
[ 7316.764806]  [] worker_thread+0x49/0x500
[ 7316.764813]  [] ? process_one_work+0x490/0x490
[ 7316.764820]  [] kthread+0xda/0xf0
[ 7316.764830]  [] ret_from_fork+0x1f/0x40
[ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
[ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
[ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
errno=-95 unknown
[ 7316.764859] BTRFS info (device sda2): forced readonly
[ 7316.765396] pending csums is 9437184

After seeing this, I decided to attempt a repair (confident that I could
restore from backup if it failed). At the time, I was unaware of the
issues with progs 4.7.1, so when I ran the check and saw all the
incorrect backrefs messages, I figured that was my problem and ran the
--repair. Of course, this didn't make the messages go away on subsequent
checks, so I looked further and found this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=155791

I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
the logs from these, unfortunately). The repair seemed to work (I also
used --init-extent-tree), as current checks don't report any errors.

The system boots and mounts the FS just fine. I can read from it all
day, scrubs complete without failure, but just using the system for a
while will eventually trigger the same "Transaction aborted (error -95)"
error.

I realize this is something of a mess, and that I was less than
methodical with my actions so far. Given that I have a full backup that
can be restored if need be (and I certainly could try running the
convert again), what is my best course of action?

Thanks,

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html