On Sun, 12 Jun 2016, Yaroslav Halchenko wrote: > On Fri, 10 Jun 2016, Chris Murphy wrote:
> > > Are those issues something which was fixed since 4.6.0-rc4+ or I should > > > be on look out for them to come back? What other information should I > > > provide if I run into them again to help you troubleshoot/fix it? > > > P.S. Please CC me the replies > > 4.6.2 is current and it's a lot easier to just use that and see if it > > still happens than for someone to track down whether it's been fixed > > since a six week old RC. > Dear Chris, > Thank you for the reply! Now running v4.7-rc2-300-g3d0f0b6 > The thing is that this issue doesn't happen right away, and it takes a > while for it to develop, and seems to be only after an intensive load. > So the version I run will always be "X weeks old" if I just keep hopping > the recent release of master, and it would be an indefinite goose > chase if left un-analyzed. That is why I would still appreciate an > advice on what specifics to report/attempt if such crash happens next > time, or may be if someone is having an idea of what could have lead to > this crash to start with. The beast has died on me today's morning :-/ Last kern.log msg was (Fixing recursive fault but reboot is needed!) One of the tracebacks is the same as before (ending on btrfs_commit_transaction), so I guess it could be the same issue as before? Most probably I will perform the same kernel build/upgrade dance again BUT I still hope that someone might just either spot some sign of recently (since v4.7-rc2-300-g3d0f0b6) fixed issue or, if not spotted, actually looks in detail on possibly a new issue which wasn't addressed yet. I would be "happy" to provide more information or enable any necessary additional monitoring to provide more information in case of the next crash. I have rebooted the box around 11am, and it was completely unresponsive since some time earlier but I think it still "somewhat functioned" after the last traceback reported in the kern.log which I shared at http://www.onerussian.com/tmp/kern-smaug-20160809.log otherwise journalctl -b -1 doesn't show any other grave errors. The very last oops in the kern.log I also cite here. Out of academic interest? why seems to be ext4 functionality within the stack for btrfs_commit_transaction? is some logic common/reused between the two file systems? Or it is just a mere fact that some partitions on ext4 and something in btrfs triggered them as well? Aug 9 07:46:15 smaug kernel: [5132590.362689] Oops: 0000 [#3] SMP Aug 9 07:46:15 smaug kernel: [5132590.367913] Modules linked in: uas usb_storage vboxdrv(O) nls_utf8 ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs veth xt_addrtype ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc cpufreq_stats cpufreq_userspace cpufreq_conservative cpufreq_powersave xt_pkttype nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT nf_reject_ipv4 iptable_mangle xt_multiport xt_state xt_limit xt_conntrack nfsd nf_conntrack_ftp auth_rpcgss oid_registry nfs_acl nfs lockd grace nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables fscache sunrpc binfmt_misc intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_watchdog ipmi_poweroff ipmi_devintf kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass fuse crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper cryptd snd_timer snd soundcore pcspkr evdev joydev ast ttm drm_kms_helper i2c_i801 drm i2c_algo_bit mei_me lpc_ich mfd_core mei ipmi_si ioatdma shpchp wmi ipmi_msghandler ecryptfs cbc tpm_tis tpm acpi_power_meter acpi_pad button sha256_ssse3 sha256_generic hmac encrypted_keys autofs4 ext4 crc16 jbd2 mbcache btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 md_mod ses enclosure sg sd_mod hid_generic usbhid hid crc32c_intel mpt3sas raid_class scsi_transport_sas xhci_pci xhci_hcd ehci_pci ahci ehci_hcd libahci libata usbcore ixgbe scsi_mod usb_common dca ptp pps_core mdio fjes Aug 9 07:46:15 smaug kernel: [5132590.538375] CPU: 6 PID: 2878531 Comm: git Tainted: G D W IO 4.7.0-rc2+ #1 Aug 9 07:46:15 smaug kernel: [5132590.547950] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014 Aug 9 07:46:15 smaug kernel: [5132590.557009] task: ffff8817b855b0c0 ti: ffff88000e0dc000 task.ti: ffff88000e0dc000 Aug 9 07:46:15 smaug kernel: [5132590.566572] RIP: 0010:[<ffffffffa0444be3>] [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2] Aug 9 07:46:15 smaug kernel: [5132590.578009] RSP: 0018:ffff88000e0df8f0 EFLAGS: 00010282 Aug 9 07:46:15 smaug kernel: [5132590.585427] RAX: ffff88155eae8140 RBX: ffff881ed5a9d128 RCX: 0000000002400040 Aug 9 07:46:15 smaug kernel: [5132590.594678] RDX: 00000000000fd0e4 RSI: 0000000000000002 RDI: ffff882034d0f000 Aug 9 07:46:15 smaug kernel: [5132590.603929] RBP: ffff882034d0f000 R08: 0000000000000001 R09: 0000000000001569 Aug 9 07:46:15 smaug kernel: [5132590.613264] R10: 00000000107aa8b7 R11: fffffffffffffff0 R12: ffff881ed5a9d128 Aug 9 07:46:15 smaug kernel: [5132590.622566] R13: ffff882033909000 R14: ffff881816302a00 R15: ffff881ed5a9d128 Aug 9 07:46:15 smaug kernel: [5132590.631846] FS: 0000000000000000(0000) GS:ffff88207fc80000(0000) knlGS:0000000000000000 Aug 9 07:46:15 smaug kernel: [5132590.642060] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 9 07:46:15 smaug kernel: [5132590.649898] CR2: 00000000000fd0e4 CR3: 0000000001a06000 CR4: 00000000001406e0 Aug 9 07:46:15 smaug kernel: [5132590.659130] Stack: Aug 9 07:46:15 smaug kernel: [5132590.663228] ffffffffa049cc54 0000156902020200 ffff881ed5a9d128 0000000000000801 Aug 9 07:46:15 smaug kernel: [5132590.672811] ffff881ed5a9d128 ffff882033909000 ffff881816302a00 ffff881ed5a9d128 Aug 9 07:46:15 smaug kernel: [5132590.682392] ffffffffa0470b9d ffff881ed5a9d128 0000000000000801 ffffffff8121fe67 Aug 9 07:46:15 smaug kernel: [5132590.691981] Call Trace: Aug 9 07:46:15 smaug kernel: [5132590.696597] [<ffffffffa049cc54>] ? __ext4_journal_start_sb+0x34/0xf0 [ext4] Aug 9 07:46:15 smaug kernel: [5132590.705791] [<ffffffffa0470b9d>] ? ext4_dirty_inode+0x2d/0x60 [ext4] Aug 9 07:46:15 smaug kernel: [5132590.714340] [<ffffffff8121fe67>] ? __mark_inode_dirty+0x177/0x360 Aug 9 07:46:15 smaug kernel: [5132590.722623] [<ffffffff8120e389>] ? generic_update_time+0x79/0xd0 Aug 9 07:46:15 smaug kernel: [5132590.730814] [<ffffffff8120da8d>] ? file_update_time+0xbd/0x110 Aug 9 07:46:15 smaug kernel: [5132590.738845] [<ffffffff81175f69>] ? __generic_file_write_iter+0x99/0x1e0 Aug 9 07:46:15 smaug kernel: [5132590.747708] [<ffffffffa04631b6>] ? ext4_file_write_iter+0x196/0x3d0 [ext4] Aug 9 07:46:15 smaug kernel: [5132590.756756] [<ffffffff811f170b>] ? __vfs_write+0xeb/0x160 Aug 9 07:46:15 smaug kernel: [5132590.764301] [<ffffffff811f2103>] ? __kernel_write+0x53/0x100 Aug 9 07:46:15 smaug kernel: [5132590.772081] [<ffffffff810ff672>] ? do_acct_process+0x462/0x4e0 Aug 9 07:46:15 smaug kernel: [5132590.780035] [<ffffffff810ffd4c>] ? acct_process+0xdc/0x100 Aug 9 07:46:15 smaug kernel: [5132590.787648] [<ffffffff8107e403>] ? do_exit+0x7f3/0xb80 Aug 9 07:46:15 smaug kernel: [5132590.794894] [<ffffffff8102fa5c>] ? oops_end+0x9c/0xd0 Aug 9 07:46:15 smaug kernel: [5132590.802027] [<ffffffff81062d35>] ? no_context+0x135/0x390 Aug 9 07:46:15 smaug kernel: [5132590.809496] [<ffffffff815ca1f8>] ? page_fault+0x28/0x30 Aug 9 07:46:15 smaug kernel: [5132590.816808] [<ffffffffa0381af0>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs] Aug 9 07:46:15 smaug kernel: [5132590.826213] [<ffffffff810ba590>] ? wait_woken+0x90/0x90 Aug 9 07:46:15 smaug kernel: [5132590.833501] [<ffffffffa039a11b>] ? btrfs_sync_file+0x2fb/0x3e0 [btrfs] Aug 9 07:46:15 smaug kernel: [5132590.842074] [<ffffffff81225318>] ? do_fsync+0x38/0x60 Aug 9 07:46:15 smaug kernel: [5132590.849114] [<ffffffff8122558c>] ? SyS_fsync+0xc/0x10 Aug 9 07:46:15 smaug kernel: [5132590.856096] [<ffffffff815c81f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 Aug 9 07:46:15 smaug kernel: [5132590.864522] Code: 56 41 55 41 54 55 53 48 89 fd 65 48 8b 04 25 c0 d4 00 00 48 83 ec 10 48 85 ff 48 8b 80 90 06 00 00 74 20 48 85 c0 74 33 48 8b 10 <48> 3b 3a 75 29 83 40 14 01 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e Aug 9 07:46:15 smaug kernel: [5132590.888065] RIP [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2] Aug 9 07:46:15 smaug kernel: [5132590.896830] RSP <ffff88000e0df8f0> Aug 9 07:46:15 smaug kernel: [5132590.902039] CR2: 00000000000fd0e4 Aug 9 07:46:15 smaug kernel: [5132590.907032] ---[ end trace 3b9450d000ed06b4 ]--- Aug 9 07:46:15 smaug kernel: [5132590.914612] Fixing recursive fault but reboot is needed! Thank you very much in advance for any ideas/feedback. Please CC me the responses -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html