New problems keep happening... Now btrfs check --repair is giving an error:
check_owner_ref: Assertion `rec->is_root` failed. I do not know what that means, but I did run the command as root (as shown in the images of the screen) images of errors are here: http://imgur.com/a/ZwkUR SUMMARY of the past 2 days: * 2 days ago a computer which had not given any prior problems (and it only a few months old) experienced the root fs going read-only under light normal usage (browsing the web). * I was able to run btrfs rescue zero-log to get the root fs to mount. * immediately upon rebooting and mounting the fs, I ran btrfs check --repair. However, that gave some messages that looked concerning and I posted them here. (The full details are copied lower in this message. Screen pictures are here: http://imgur.com/a/fT1RV) * dmesg showed BTRFS error (device dm-0): qgroup generation * I've been trying to research the issue for the last two days but I have found absolutely zero information. (Maybe I'm asking in the wrong places?) * the system worked normally yesterday (as far as I could tell) * this morning my system locked up with btrfs-transaction consuming 100% CPU and NMI watchdog reporting BUG: soft lockup with btrfs-transaction:314. * I ran btrfs rescue zero-log. It took two reboots to get the root fs to mount. * the system appeared to be operating normally for about 30 minutes, then the root fs went read-only again * I ran btrfs rescue zero-log (again!) * btrfs continues to report errors. I have linked to screen pictures (imgur.com) * btrfs check --repair is giving an error (mentioned above and in the screen pictures) All previous information is copied below in this message for easy reference. What do I do? Replace the disk? Stop using btrfs with dm-crypt? Look elsewhere for help? On Tue, Aug 9, 2016 at 6:54 PM, Dave T <davestechs...@gmail.com> wrote: > The original problem from 2 days ago just happened again. I ran btrfs > rescue zero-log (again) and the root filesystem mounted but it was > read-only on first boot. I rebooted again and everything seems normal. > But clearly there is a problem that needs to be resolved. > > Problem: > The root file system becomes read-only during normal usage. See copied > message below for more information about the error. > > I'm happy to provide more info upon request. I appreciate any help. Is > this btrfs? A bad disk? Something else? > > Linux x99 4.6.4-1-ARCH #1 SMP PREEMPT Mon Jul 11 19:12:32 CEST 2016 > x86_64 GNU/Linux > > btrfs-progs v4.6.1 > > > On Tue, Aug 9, 2016 at 2:07 PM, Dave T <davestechs...@gmail.com> wrote: >> My system locked up with btrfs-transaction consuming 100% CPU and NMI >> watchdog reporting BUG: soft lockup with btrfs-transaction:314. >> >> This comes 2 days after a serious event involving BTRFS where my >> system would not mount the root fs. (I gave details in an email to the >> list two days ago and copied again below.) >> >> Here are full details of todays "bug" (or whatever it was). >> >> When i left work last night I left my system running and I locked the >> session. The only things open were KDE Plasma, some terminal windows >> and some plain text documents in Kate editor. No real work was running >> on the local machine. >> >> This morning I came to work and noticed that my computer was slightly >> warm and the fans were running at higher than normal RPM. >> >> I logged in and opened top in an existing terminal. I saw that >> btrfs-transaction was consuming 100% of a CPU core and kworker was >> consumer 100% of another CPU core. >> >> I tried to run a command (to view logs) in another terminal window, >> but the system became unresponsive. I was able to switch to another >> virtual console, but it was very slow. I took photos with my phone. >> See link below for two images (top and virtual console): >> >> http://imgur.com/a/fT1RV >> >> These photos show what I reported above: >> * btrfs-transaction consuming 100% CPU >> * NMI watchdog reporting BUG: soft lockup with btrfs-transaction:314 >> >> I hard reset my system, expecting the worst, but it rebooted normally. >> journalctl -xb -p3 showed no entries. >> >> Obviously I have a serious problem. However, I have no clue about what >> the problem might be (except that it seemingly involves btrfs). What >> other information can I provide? >> >> On Sun, Aug 7, 2016 at 6:44 PM, Dave <davestechs...@gmail.com> wrote: >>> I am running Arch Linux on a system with full disk encryption and the >>> storage is a Samsung 950 Pro NVMe drive (512 GB). The computer is a >>> couple months old. No bad behavior until now. (I'm only using 21 GB of >>> the 512 space on the disk.) >>> >>> btrfs-progs v4.5.1 >>> >>> Today I was using my system normally and browsing the web. Firefox >>> stopped responding suddenly and for no apparent reason. Then (KDE) >>> Plasma stopped responding. I could not log out of KDE. >>> >>> I killed my user session (pkill -u me), then I tired to startx. At >>> that point I noticed my root filesystem was read-only. >>> >>> As a first step, I rebooted. That didn't help anything. I tried >>> rebooting several more times -- no change. >>> >>> The root filesystem (btrfs) would not mount. (See error below.) I >>> booted into a LiveUSB environment and ran this command: >>> >>> cryptsetup open --type luks /dev/xxx cryptroot >>> >>> It opens. Then I ran: >>> >>> mount -t btrfs -o >>> noatime,nodiratime,ssd,compress=lzo,defaults,space_cache,subvolid=257 >>> /dev/mapper/cryptroot /mnt >>> >>> The error message is shown here: >>> >>> [ 2300.967048] BTRFS info (device dm-0): use ssd allocation scheme >>> [ 2300.967058] BTRFS info (device dm-0): use lzo compression >>> [ 2300.967066] BTRFS info (device dm-0): disk space caching is enabled >>> [ 2300.967069] BTRFS: has skinny extents >>> [ 2300.995393] BTRFS: error (device dm-0) in >>> btrfs_replay_log:2413: errno=-22 unknown (Failed to recover log tree) >>> [ 2300.997617] BTRFS info (device dm-0): delayed_refs has NO entry >>> [ 2300.997673] BTRFS error (device dm-0): cleaner transaction >>> attach returned -30 >>> [ 2301.035405] BTRFS: open_ctree failed >>> >>> It is exactly the same error I saw when trying to boot normally as >>> mentioned above. >>> >>> Based on these two links: >>> >>>> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ >>>> https://btrfs.wiki.kernel.org/index.php/Btrfs-zero-log >>> >>> I decided to take a chance on running this command: >>> >>> btrfs rescue zero-log >>> >>> That worked and I can mount the filesystem. >>> >>> I ran btrfs check --repair. Here is the output: >>> >>> root@broken / # umount /mnt >>> root@broken / # btrfs check --repair /dev/mapper/cryptroot >>> enabling repair mode >>> Checking filesystem on /dev/mapper/cryptroot >>> checking extents >>> bad metadata [292414476288, 292414492672) crossing stripe boundary >>> bad metadata [292414541824, 292414558208) crossing stripe boundary >>> bad metadata [292414672896, 292414689280) crossing stripe boundary >>> bad metadata [292414869504, 292414885888) crossing stripe boundary >>> bad metadata [292415000576, 292415016960) crossing stripe boundary >>> bad metadata [292415066112, 292415082496) crossing stripe boundary >>> bad metadata [292415131648, 292415148032) crossing stripe boundary >>> bad metadata [292415262720, 292415279104) crossing stripe boundary >>> bad metadata [292415328256, 292415344640) crossing stripe boundary >>> bad metadata [292415393792, 292415410176) crossing stripe boundary >>> repaired damaged extent references >>> Fixed 0 roots. >>> checking free space cache >>> cache and super generation don't match, space cache will be invalidated >>> checking fs roots >>> checking csums >>> checking root refs >>> checking quota groups >>> Ignoring qgroup relation key 258 >>> Ignoring qgroup relation key 263 >>> Ignoring qgroup relation key 71776119061217538 >>> Ignoring qgroup relation key 71776119061217543 >>> Counts for qgroup id: 257 are different >>> our: referenced 10412273664 referenced compressed 10412273664 >>> disk: referenced 10411311104 referenced compressed 10411311104 >>> diff: referenced 962560 referenced compressed 962560 >>> our: exclusive 10412273664 exclusive compressed 10412273664 >>> disk: exclusive 10412273664 exclusive compressed 10412273664 >>> found 21570773057 bytes used err is 0 >>> total csum bytes: 19563456 >>> total tree bytes: 403767296 >>> total fs tree bytes: 349667328 >>> total extent tree bytes: 27328512 >>> btree space waste bytes: 66313360 >>> file data blocks allocated: 39882014720 >>> referenced 28043988992 >>> extent buffer leak: start 20987904 len 16384 >>> extent buffer leak: start 292688068608 len 16384 >>> extent buffer leak: start 60915712 len 16384 >>> extent buffer leak: start 29569581056 len 16384 >>> extent buffer leak: start 29569597440 len 16384 >>> extent buffer leak: start 292412063744 len 16384 >>> extent buffer leak: start 292405870592 len 16384 >>> extent buffer leak: start 292405936128 len 16384 >>> extent buffer leak: start 292413964288 len 16384 >>> >>> Then I check dmesg and I see this error information: >>> >>> [ 4925.562422] BTRFS info (device dm-0): use ssd allocation scheme >>> [ 4925.562432] BTRFS info (device dm-0): use lzo compression >>> [ 4925.562440] BTRFS info (device dm-0): disk space caching is enabled >>> [ 4925.562444] BTRFS: has skinny extents >>> [ 4925.578705] BTRFS error (device dm-0): qgroup generation >>> mismatch, marked as inconsistent >>> [ 4925.584033] BTRFS: checking UUID tree >>> >>> What should I do next? I'm a simple user. >>> >>> I already ran memtest86+ overnight using 8 CPU cores in parallel (so >>> it was a very thorough memory test). There were 0 RAM errors. >>> >>> I previously used btrfs since 2012 with no issues. I am concerned >>> about the present issue because I do not understand the cause. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html