Re: BTRFS Balance Hard System Crash (Blinking LEDs)
An update... I encountered blinking LEDs while I was away from my computer again. I'm now pretty confident it wasn't an issue with btrfs balance, but rather the sd-card not being seated well. I just updated my old "F2FS Segmentation fault" post in linux-f2fs-devel. In short, fsck for f2fs was failing, badblocks was coming up with only errors, I cleaned the sd-card contacts, put it back in and badblocks is running cleanly now. It's just too late for me, and I have to rebuild that partition since, for whatever reason, cryptsetup no longer recognizes the partition as being LUKS even though badblocks was run non-destructive. On Fri, Mar 26, 2021 at 11:51 AM Nathan Royce wrote: > > Oh man, I'm hoping things aren't starting to fall apart here. > I was doing my normal routine (tv, browsing, ... (no filesystem > manipulations)) and out of the blue "kodi" just crashes. It's actually > not all that uncommon, and I fired up "iotop" to make sure "coredump" > was happening, and it was. > I then did something else in the terminal, maybe an "ls", and that came up > with: > * > error while loading shared libraries: /usr/lib/libutil.so.1: ELF file > version does not match current one > * ...
Re: F2FS Segmentation Fault
I don't know how much of it was the issue, but when I unmounted the sd-card, and closed the cryptsetup for it, and then ran non-destructive badblocks on it, I was getting ONLY errors. I stopped bb, then pulled out the card, blew on it, wiped down the contacts with rubbing alcohol, let it dry, put it back in and now bb is running cleanly. I then stopped bb, tried to cryptsetup-open it and it said the partition is not a valid LUKS device. Weird since I was using non-destructive. Looks like I'm now forced to rebuild that partition. I wish I had troubleshot the aspect of the sd-card being properly seated. I know I've experienced something similar to it in the past where files suddenly aren't able to be read. Once I reseat the sd-card, everything was fine. The last time I had to even remove the card was maybe 1-2 weeks ago when I had to deal with a noisy power-supply fan. The whole debacle (including btrfs, keyboard leds blinking) may very well have been from the sd-card not being seated well. On Sat, Mar 27, 2021 at 7:02 AM Nathan Royce wrote: > > An update, not quite 1 year later. I encountered another segfault issue. > > It began with my email report to the linux-btrfs mailing list titled > "BTRFS Balance Hard System Crash (Blinking LEDs)" just the other day. ...
Re: F2FS Segmentation Fault
5d9] i_addr[42] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x1bb20aba] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5d9] i_addr[43] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x164914cd] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5d9] i_addr[44] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x18432b76] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5d9] i_addr[45] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0xcfefd9c5] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e484] i_addr[366] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0xd672fbb7] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e484] i_addr[367] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0xa113bab3] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e484] i_addr[368] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x1af84de0] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e484] i_addr[369] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x147f77a5] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5b0] i_addr[30] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0xb8fb4384] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5b0] i_addr[31] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x7dc7364] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5b0] i_addr[32] = 0 [ASSERT] (fsck_chk_data_blk:1555) --> blkaddress is not valid. [0x20350042] [FIX] (fsck_chk_inode_blk: 788) --> [0x9e5b0] i_addr[33] = 0 Segmentation fault $ sudo fsck.f2fs -af /dev/mapper/lukssdi2 Info: Fix the reported corruption. Info: Force to fix corruption Info: Segments per section = 1 Info: Sections per zone = 1 Info: sector size = 512 Info: total sectors = 124168159 (60628 MB) Can't find a valid F2FS superblock at 0x0 Mismatch segment0(3096048428) cp_blkaddr(24874649) Can't find a valid F2FS superblock at 0x1 * journal: * Mar 27 06:22:07 neon kernel: Code: 41 f6 c1 04 75 53 4d 85 c9 74 0b 0f b6 0f 88 0a 41 f6 c1 02 75 53 42 c6 04 0a 00 c3 0f 1f 44 00 00 48 8b 0f 4c 89 c6 48 89 0a <4a> 8b 4c 0f f8 48 8d 7a 08 48 83 e7 f8 4a 89 4c 0a f8 48 89 d1 48 Mar 27 06:22:07 neon kernel: fsck.f2fs[6302]: segfault at 5564d32978f5 ip 55651c124919 sp 7ffd202ff9f8 error 4 in fsck.f2fs[55651c121000+1c000] * This kde-neon kernel version is 5.4.0, and the associated tools version is f2fs-tools-1.11.0. There hasn't been any power-outage that I'm aware of. With that segfault, I'm thinking that fs is now toast, and I need to rebuild that arch-linux partition. At least /home was on btrfs which is still accessible. While I'm at it (scrubbing btrfs), I think I'll memtest my RAM and badblocks my sd-card. On Tue, Jul 14, 2020 at 12:54 AM Jaegeuk Kim wrote: > > On 07/13, Nathan Royce wrote: > > On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim wrote: > > > > > > Hi Nathan, > > > > > > Could you try to say "N" here to move forward to fix the corrupted > > > metadata? > > > > > > Thanks, > > * > > Do you want to restore lost files into ./lost_found/? [Y/N] N ...
Re: BTRFS Balance Hard System Crash (Blinking LEDs)
Oh man, I'm hoping things aren't starting to fall apart here. I was doing my normal routine (tv, browsing, ... (no filesystem manipulations)) and out of the blue "kodi" just crashes. It's actually not all that uncommon, and I fired up "iotop" to make sure "coredump" was happening, and it was. I then did something else in the terminal, maybe an "ls", and that came up with: * error while loading shared libraries: /usr/lib/libutil.so.1: ELF file version does not match current one * Again, it was just out of the blue. Same with other commands like "coredumpctl" or "sync". Even "pacman -Qo /usr/lib/libutil.so.1" caused SEGV. Everything seemed fine after I had last booted (minus what I wrote in my last email). And the oddest thing is that, like I said before, my system/root stuff (eg, /usr/lib/libutil.so.1) is being run from my sd-card (F2FS, not BTRFS). I see the coredumps were written out ~11:05, and journalctl started showing issues arise ~10:56 (typically takes a long time to write out on a slow sd-card): * ... Mar 26 11:05:13 computerName systemd-coredump[70088]: Process 70078 (pacman) of user 1000 dumped core. Stack trace of thread 70078: #0 0x75cf62ee58a5 do_lookup_x (ld-linux-x86-64.so.2 + 0xa8a5) #1 0x75cf62ee6231 _dl_lookup_symbol_x (ld-linux-x86-64.so.2 + 0xb231) #2 0x75cf62ee7dc7 _dl_relocate_object (ld-linux-x86-64.so.2 + 0xcdc7) #3 0x75cf62edfcdd dl_main (ld-linux-x86-64.so.2 + 0x4cdd) #4 0x75cf62ef769f _dl_sysdep_start (ld-linux-x86-64.so.2 + 0x1c69f) #5 0x75cf62edd063 _dl_start (ld-linux-x86-64.so.2 + 0x2063) #6 0x75cf62edc098 _start (ld-linux-x86-64.so.2 + 0x1098) ... Mar 26 11:05:10 computerName kernel: Code: b4 24 d0 00 00 00 49 89 df 48 89 44 24 38 48 89 fb 4c 89 5c 24 60 eb 12 0f 1f 44 00 00 49 83 c4 04 83 e2 01 0f 85 f3 05 00 00 <41> 8b 04 24 48 89 c2 48 31 d8 48 d1 e8 75 e4 48 83 ec 08 4c 89 e0 Mar 26 11:05:09 computerName kernel: pacman[70078]: segfault at 75d29d0d5640 ip 75cf62ee58a5 sp 7ad0e460 error 4 in ld-2.33.so[75cf62edc000+24000] ... Mar 26 10:58:59 computerName kernel: Code: 84 e7 05 00 00 44 8b 33 45 85 f5 74 e4 66 0f ef ff 66 0f ef f6 66 0f ef e4 48 89 ef f3 0f 10 44 24 48 66 0f ef db 66 0f ef d2 <66> 0f 42 a0 61 72 70 cb ee 33 bb 14 5f 79 1d 76 e5 28 0f 11 44 24 Mar 26 10:58:59 computerName kernel: chrome[41222]: segfault at cb707262 ip 77cb36b101ae sp 7fff250007c0 error 5 in i965_dri.so[77cb36aa9000+8fa000] ... Mar 26 10:58:25 computerName plasmashell[43148]: KCrash: Application Name = kate path = /usr/bin pid = 43148 Mar 26 10:58:25 computerName plasmashell[43148]: KCrash: crashing... crashRecursionCounter = 2 Mar 26 10:58:07 computerName systemd[1424]: Started Kate - Advanced Text Editor. ... Mar 26 10:56:51 computerName sudo[69237]:userName : TTY=pts/3 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/iotop ... Mar 26 10:56:32 computerName kernel: audit: type=1701 audit(1616774192.320:455): auid=1000 uid=1000 gid=1000 ses=3 pid=54221 comm="VideoPlayer" exe="/usr/local/lib/kodi/kodi.bin" sig=11 res=1 Mar 26 10:56:32 computerName kernel: Code: 00 00 01 00 00 00 00 00 00 00 02 04 00 00 48 7b 00 00 10 49 4a d0 48 7b 00 00 00 00 00 00 01 00 00 00 55 00 00 00 00 00 00 00 d8 8f 34 4f 7b 00 00 00 39 10 d0 48 7b 00 00 00 00 fa 00 00 fa Mar 26 10:56:32 computerName kernel: VideoPlayer[61823]: segfault at 7b48d0579730 ip 7b48d0579730 sp 7b48ff043248 error 15 ... * As you can see, pretty much everything was crashing (probably not surprising if glibc is involved). Now, like I said, I don't believe it's related to my BTRFS drive since glibc was referenced which is located on my F2FS drive. I ended up rebooting (again) and everything seems fine (so far) as I write this and have the recorded DVR playing (kodi). I don't know what those "kernel: Code:" is supposed to be/mean to me. On Fri, Mar 26, 2021 at 8:29 AM Nathan Royce wrote: > > * > ...I "think" this is where the "emergency" drop out of boot occurred, > and I just did a "systemctl reboot" which had the next boot succeed. > Nope, I'm wrong. For whatever reason, this appears to be the boot that > ended up working (searching for the first "microcode" reference > indicating the start of a boot). > Mar 25 21:44:17 computerName kernel: BTRFS critical (device dm-3): > unable to add free space :-17 ...
BTRFS Balance Hard System Crash (Blinking LEDs)
* ...I "think" this is where the "emergency" drop out of boot occurred, and I just did a "systemctl reboot" which had the next boot succeed. Nope, I'm wrong. For whatever reason, this appears to be the boot that ended up working (searching for the first "microcode" reference indicating the start of a boot). Mar 25 21:44:17 computerName kernel: BTRFS critical (device dm-3): unable to add free space :-17 ...v 13 times Mar 25 21:42:59 computerName kernel: BTRFS critical (device dm-3): unable to add free space :-17 Mar 25 21:42:59 computerName kernel: BTRFS critical (device dm-3): unable to add free space :-17 ...v 36 times Mar 25 21:40:45 computerName kernel: BTRFS critical (device dm-3): unable to add free space :-17 Mar 25 21:40:44 computerName kernel: BTRFS critical (device dm-3): unable to add free space :-17 Mar 25 21:40:44 computerName kernel: ---[ end trace 880e498e00cd6fcd ]--- Mar 25 21:40:44 computerName kernel: ret_from_fork+0x22/0x30 Mar 25 21:40:44 computerName kernel: ? __kthread_bind_mask+0x70/0x70 Mar 25 21:40:44 computerName kernel: kthread+0x144/0x170 Mar 25 21:40:44 computerName kernel: balance_kthread+0x35/0x50 [btrfs] Mar 25 21:40:44 computerName kernel: ? btrfs_balance+0xee0/0xee0 [btrfs] Mar 25 21:40:44 computerName kernel: btrfs_balance+0x765/0xee0 [btrfs] Mar 25 21:40:44 computerName kernel: btrfs_relocate_chunk+0x2a/0xc0 [btrfs] Mar 25 21:40:44 computerName kernel: btrfs_relocate_block_group+0x164/0x310 [btrfs] Mar 25 21:40:44 computerName kernel: relocate_block_group+0x2e9/0x5f0 [btrfs] Mar 25 21:40:44 computerName kernel: prepare_to_merge+0x246/0x280 [btrfs] Mar 25 21:40:44 computerName kernel: btrfs_commit_transaction+0x79b/0xa70 [btrfs] Mar 25 21:40:44 computerName kernel: btrfs_finish_extent_commit+0xb6/0x2c0 [btrfs] Mar 25 21:40:44 computerName kernel: ? clear_extent_bit+0x43/0x60 [btrfs] Mar 25 21:40:44 computerName kernel: unpin_extent_range+0x299/0x4d0 [btrfs] Mar 25 21:40:44 computerName kernel: ? kmem_cache_free+0xad/0x1e0 Mar 25 21:40:44 computerName kernel: __btrfs_add_free_space+0xaf/0x4d0 [btrfs] Mar 25 21:40:44 computerName kernel: link_free_space+0x27/0x60 [btrfs] Mar 25 21:40:44 computerName kernel: Call Trace: Mar 25 21:40:44 computerName kernel: CR2: 3e4fc2c21000 CR3: 00012f60a003 CR4: 001606f0 Mar 25 21:40:44 computerName kernel: CS: 0010 DS: ES: CR0: 80050033 Mar 25 21:40:44 computerName kernel: FS: () GS:95cad820() knlGS: Mar 25 21:40:44 computerName kernel: R13: 95ca79eeac08 R14: 95ca79eeac00 R15: c000 Mar 25 21:40:44 computerName kernel: R10: 95ca181731e0 R11: R12: 95c9d6b57c30 Mar 25 21:40:44 computerName kernel: RBP: R08: R09: 95c9d6b57c30 Mar 25 21:40:44 computerName kernel: RDX: RSI: 026e55e9 RDI: 95ca79eeac08 Mar 25 21:40:44 computerName kernel: RAX: 95ca4f5389b0 RBX: 026e55e9 RCX: 95ca4f538328 Mar 25 21:40:44 computerName kernel: RSP: 0018:b171c067ba60 EFLAGS: 00010246 Mar 25 21:40:44 computerName kernel: Code: 89 e7 49 c7 44 24 08 00 00 00 00 49 c7 44 24 10 00 00 00 00 4c 89 21 e8 16 93 28 fa 31 c0 5b 5d 41 5c 41 5d c3 48 85 d2 75 c1 <0f> 0b b8 ef ff ff ff eb eb 0f 0b b8 ef ff ff ff eb e2 66 0f 1f 44 Mar 25 21:40:44 computerName kernel: RIP: 0010:tree_insert_offset+0x88/0xa0 [btrfs] Mar 25 21:40:44 computerName kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015 Mar 25 21:40:44 computerName kernel: CPU: 0 PID: 998 Comm: btrfs-balance Tainted: G OE 5.8.7-dirty #1 Mar 25 21:40:44 computerName kernel: intel_gtt syscopyarea sysfillrect sysimgblt snd fb_sys_fops soundcore mei lpc_ich evdev mac_hid nct6775 hwmon_vid v4l2loopback_dc(OE) videodev drm mc agpgart fuse ip_tables x_tables f2fs dm_crypt cbc enc> Mar 25 21:40:44 computerName kernel: Modules linked in: ccm cmac algif_hash bnep btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic nls_iso8859_1 nls_cp437 vfat fat snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device tda18271 a> Mar 25 21:40:44 computerName kernel: WARNING: CPU: 0 PID: 998 at fs/btrfs/free-space-cache.c:1499 tree_insert_offset+0x88/0xa0 [btrfs] ...boot crash AFTER balance Mar 25 21:40:39 computerName kernel: BTRFS info (device dm-3): found 8 extents, stage: move data extents Mar 25 21:40:37 computerName kernel: BTRFS info (device dm-3): relocating block group 3875364929536 flags data Mar 25 21:40:37 computerName kernel: BTRFS info (device dm-3): balance: resume -dusage=90 -musage=90 -susage=90 Mar 25 21:40:37 computerName kernel: BTRFS error (device dm-3): incorrect extent count for 2672774086656; counted 7070, expected 7073 Mar 25 21:40:37 computerName systemd[1]: Mounted Mar 25 21:40:33 computerName kernel: BTRFS info (device dm-3): bdev /dev/mapper/ errs: wr 0, rd 56, flush 0, corrupt 0, gen 0 Mar 25 21:40:32 computerName kerne
Re: [PATCH] kconfig: streamline_config.pl: check defined(ENV variable) before using it
Heard, but all the same if it isn't important (which I'm assuming), I'd just as soon be left out of it. That's just the way I am in general, not wanting to be seen unless I have to be seen. Thanks though. On Wed, Sep 2, 2020 at 9:14 PM Masahiro Yamada wrote: > > Even if you do not write the code, > reporting bugs is a great contribution, > and the Reported-by exists for that, I think. > > So, I just want to add your Reported-by tag > (if you do not mind). > > > -- > Best Regards > Masahiro Yamada
Re: [PATCH] kconfig: streamline_config.pl: check defined(ENV variable) before using it
Thanks, but I'd just as soon not be acknowledged/credited. All I did was submit a report. On Wed, Sep 2, 2020 at 11:47 AM Masahiro Yamada wrote: > > Applied to linux-kbuild/fixes with Nathan's tag > > Reported-by: Nathan Royce > > > > Nathan, > I think adding your tag is OK to credit your contribution. > Please let me know if you do not have it in > the commit log. > > > > -- > Best Regards > Masahiro Yamada
Re: localmodconfig - "intel_rapl_perf config not found!!"
Correct. I'm building for 5.8.3 and I'm currently on 5.7.4 (1 month doesn't seem particularly old). On Tue, Aug 25, 2020 at 2:13 PM Randy Dunlap wrote: > > so intel_rapl_perf is listed in your lsmod.cfg file: > intel_rapl_perf16384 2 > > You say Linux 5.8.3. I'm guessing that your "make localmodconfig" tree > is Linux 5.8.3 (?). What kernel version are you running? > I think that it's older, and some file/module names have changed since then.
localmodconfig - "intel_rapl_perf config not found!!"
Intel Haswell Linux 5.8.3 First time I've used localmodconfig ever since reading what it does and liking the "supposed" kernel customization specific to the system. I only use quotes on "supposed" because I DO still see entries I have no interest in (not applicable to my system/needs). I don't know if another email would be warranted for localmodconfig only or if my expectation of it is unrealistic. The "intel_rapl_perf config not found!!" comes up with every .config I try. The simplest test I can come up with would be: * make defconfig //x86_64_defconfig lsmod > lsmod.cfg make localmodconfig LSMOD=lsmod.cfg * lsmod.cfg * Module Size Used by uinput 20480 1 rfcomm 94208 16 ccm20480 9 cmac 16384 5 algif_hash 16384 2 bnep 28672 2 btrfs1556480 1 blake2b_generic20480 0 xor24576 1 btrfs raid6_pq 122880 1 btrfs libcrc32c 16384 1 btrfs crc32c_generic 16384 0 nls_iso8859_1 16384 1 nls_cp437 20480 1 vfat 24576 1 fat90112 1 vfat snd_usb_audio 311296 0 snd_usbmidi_lib45056 1 snd_usb_audio snd_rawmidi45056 1 snd_usbmidi_lib snd_seq_device 16384 1 snd_rawmidi tda18271 53248 1 au8522_dig 16384 1 au8522_common 16384 1 au8522_dig au0828 69632 1 tveeprom 28672 1 au0828 dvb_core 176128 1 au0828 videobuf2_vmalloc 20480 2 dvb_core,au0828 videobuf2_memops 20480 1 videobuf2_vmalloc videobuf2_v4l2 28672 1 au0828 intel_rapl_msr 20480 0 btusb 57344 0 videodev 274432 2 videobuf2_v4l2,au0828 btrtl 24576 1 btusb btbcm 20480 1 btusb videobuf2_common 57344 3 videobuf2_v4l2,dvb_core,au0828 intel_rapl_common 32768 1 intel_rapl_msr rc_core61440 1 au0828 btintel32768 1 btusb bluetooth 688128 49 btrtl,btintel,btbcm,bnep,btusb,rfcomm x86_pkg_temp_thermal20480 0 intel_powerclamp 20480 0 mousedev 24576 0 ecdh_generic 16384 2 bluetooth ecc36864 1 ecdh_generic crc16 16384 1 bluetooth rtl8821ae 290816 0 coretemp 20480 0 snd_hda_codec_hdmi 73728 1 btcoexist 225280 1 rtl8821ae kvm_intel 335872 0 rtl_pci36864 1 rtl8821ae rtlwifi 139264 3 rtl_pci,rtl8821ae,btcoexist kvm 876544 1 kvm_intel iTCO_wdt 16384 0 mei_hdcp 24576 0 iTCO_vendor_support16384 1 iTCO_wdt mac80211 954368 3 rtl_pci,rtl8821ae,rtlwifi i915 2703360 60 snd_hda_codec_realtek 143360 1 irqbypass 16384 1 kvm snd_hda_codec_generic98304 1 snd_hda_codec_realtek ledtrig_audio 16384 2 snd_hda_codec_generic,snd_hda_codec_realtek intel_cstate 16384 0 snd_hda_intel 53248 4 snd_soc_rt5640147456 0 intel_uncore 163840 0 cfg80211 925696 2 rtlwifi,mac80211 snd_intel_dspcfg 24576 1 snd_hda_intel snd_soc_rl6231 20480 1 snd_soc_rt5640 intel_rapl_perf16384 2 snd_hda_codec 176128 4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek snd_soc_core 311296 1 snd_soc_rt5640 rfkill 32768 11 bluetooth,cfg80211 libarc416384 1 mac80211 snd_compress 32768 1 snd_soc_core alx57344 0 input_leds 16384 0 snd_hda_core 114688 5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek i2c_algo_bit 16384 1 i915 ac97_bus 16384 1 snd_soc_core snd_pcm_dmaengine 16384 1 snd_soc_core snd_hwdep 20480 2 snd_usb_audio,snd_hda_codec mdio 16384 1 alx drm_kms_helper266240 1 i915 snd_pcm 159744 9 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_soc_rt5640,snd_compress,snd_soc_core,snd_hda_core,snd_pcm_dmaengine cec69632 2 drm_kms_helper,i915 snd_timer 49152 1 snd_pcm mei_me 49152 1 intel_gtt 24576 1 i915 syscopyarea16384 1 drm_kms_helper sysfillrect16384 1 drm_kms_helper sysimgblt 16384 1 drm_kms_helper snd 118784 22 snd_hda_codec_generic,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,snd_soc_core,snd_pcm,snd_rawmidi fb_sys_fops16384 1 drm_kms_helper soundcore 16384 1 snd mei 131072 3 mei_hd
Re: F2FS Segmentation Fault
On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim wrote: > > Hi Nathan, > > Could you try to say "N" here to move forward to fix the corrupted metadata? > > Thanks, * Do you want to restore lost files into ./lost_found/? [Y/N] N Info: Write valid nat_bits in checkpoint [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18eca] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecb] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecc] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee3] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee4] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee5] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f78] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f79] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f7a] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d621] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d622] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa32] in NAT [FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa33] in NAT Info: Write valid nat_bits in checkpoint Done. * * Info: Fix the reported corruption. Info: Force to fix corruption Info: Segments per section = 1 Info: Sections per zone = 1 Info: sector size = 512 Info: total sectors = 124168159 (60628 MB) Info: MKFS version "Linux version 5.1.15.a-1-hardened (builduser@slave-1) (gcc version 9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019" Info: FSCK version from "Linux version 4.19.13-dirty (nater@devx64) (gcc version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018" to "Linux version 4.19.13-dirty (nater@devx64) (gcc version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018" Info: superblock features = 0 : Info: superblock encrypt level = 0, salt = Info: total FS sectors = 124168152 (60628 MB) Info: CKPT version = 63f2b4a Info: checkpoint state = 281 : allow_nocrc nat_bits unmount Info: No error was reported * I'm now booted in from my SDHC card. So it "seems" I'm good to go. But with the actions taken and the files I've seen displayed during the fsck, I'm thinking I'm going to reinstall all packages. Assuming the issue was related to the power outage, I do wonder why there weren't any fsck issues at bootup at that time. I hadn't had any disk issues before with that card. At least now I know the issue would be resolved by not saving the lost files and I can continue on my merry way.
F2FS Segmentation Fault
I won't re-format unless I hear something within a few days in case you want me to try something. Preface: There was a notable power outage a couple of nights ago. When the power returned, everything seemed fine. No issues during bootup or anything. Then today, I went to open an application and my system started schitzing out with programs suddenly closing(/crashing?). I switched tty and tried to log in but was unable to even be allowed to enter in my password. I switched to another and tried logging in as root which succeeded (somehow). I looked at the journal and saw an entry saying something about /bin/login not being a valid exec format. I went to reboot and when it got to fsck part of initramfs, it failed and I was kicked to root. I ran fsck and saw a bunch of issues, but I guess nothing could get resolved enough to let me reboot. Oh, in case you're wondering, my / (system) is on a 64GB SDHC card. I just happened to also have an older / system on my mechanical drive using BTRFS which I could boot to (which I'm on now). I ran fsck from this older system and it seems I got the same results: * Info: Fix the reported corruption. Info: Force to fix corruption Info: Segments per section = 1 Info: Sections per zone = 1 Info: sector size = 512 Info: total sectors = 124168159 (60628 MB) Info: MKFS version "Linux version 5.1.15.a-1-hardened (builduser@slave-1) (gcc version 9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019" Info: FSCK version from "Linux version 4.19.13-dirty (userName@computerName) (gcc version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018" to "Linux version 4.19.13-dirty (userName@computerName) (gcc version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018" Info: superblock features = 0 : Info: superblock encrypt level = 0, salt = Info: total FS sectors = 124168152 (60628 MB) Info: CKPT version = 63f2b4a Info: checkpoint state = 55 : crc fsck compacted_summary unmount NID[0x18eca] is unreachable, blkaddr:0xcf1d9d3c NID[0x18ecb] is unreachable, blkaddr:0x5db5f91f NID[0x18ecc] is unreachable, blkaddr:0x4653d NID[0x18ee3] is unreachable, blkaddr:0x144dc401 NID[0x18ee4] is unreachable, blkaddr:0x558cfba9 NID[0x18ee5] is unreachable, blkaddr:0x45553 NID[0x18f78] is unreachable, blkaddr:0x560555ac NID[0x18f79] is unreachable, blkaddr:0x58cccb0d NID[0x18f7a] is unreachable, blkaddr:0x53d84 NID[0x4d621] is unreachable, blkaddr:0x4fc1d NID[0x4d622] is unreachable, blkaddr:0x4fc1e NID[0x7fa32] is unreachable, blkaddr:0x20b0ca3a NID[0x7fa33] is unreachable, blkaddr:0xf71b60 [FSCK] Unreachable nat entries[Fail] [0xd] [FSCK] SIT valid block bitmap checking[Fail] [FSCK] Hard link checking for regular file[Ok..] [0x4f6] [FSCK] valid_block_count matching with CP [Fail] [0x736fcb] [FSCK] valid_node_count matcing with CP (de lookup) [Fail] [0x70327] [FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0x70334] [FSCK] valid_inode_count matched with CP [Fail] [0x6f09e] [FSCK] free segment_count matched with CP [Ok..] [0x3bfc] [FSCK] next block offset is free [Ok..] [FSCK] fixing SIT types [FSCK] other corrupted bugs [Fail] Do you want to restore lost files into ./lost_found/? [Y/N] Y Segmentation fault * * Message: Process 3425 (fsck.f2fs) of user 0 dumped core. Stack trace of thread 3425: #0 0x55f8515739c8 n/a (fsck.f2fs) #1 0x55f851575261 n/a (fsck.f2fs) #2 0x55f851572c56 n/a (fsck.f2fs) #3 0x55f85156a3f0 n/a (fsck.f2fs) #4 0x7f51420feee3 __libc_start_main (libc.so.6) #5 0x55f85156a95e n/a (fsck.f2fs) * So if you want more information or need me to try something, let me know soon if you would. Otherwise, I'll just be reformatting my card in a few days. It just could've been a fluke occurred because of the power outage but didn't manifest itself until today.
Re: Kernel 5.2.8 - au0828 - Tuner Is Busy
While your mention of quirks-table.h certainly had possibilities, I'm afraid adding the "AU0828_DEVICE(0x05e1, 0x0400, "Hauppauge", "Woodbury")," entry for my tuner did not make any difference regarding the "Tuner is busy. Error -19" message. I don't know if this means anything, but I see https://patchwork.kernel.org/patch/97726/ from 2010 which contains changes for the 0x0400 model. I guess it never got pulled in. Really, it's fine for me just to hang back at v5.1 for a year or two until ATSC 3.0 USB tuners come out at a reasonable price. On Mon, Aug 19, 2019 at 4:44 PM shuah wrote: > You said you make changes to the > > "Whenever I update my kernel, I edit the > ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my > 0x400 device. > I've been doing it for years and it's been working fine... until now..." > > Please send me the changes you make to the file. I see the following > WOODBURY devices. I am assuming you add 0x400 entry. > > { USB_DEVICE(0x05e1, 0x0480), > .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, > { USB_DEVICE(0x2040, 0x8200), > .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, > > > There is another table in sound/usb/quirks-table.h for AU0828 > devices. In addition to 812658d88d26, 66354f18fe5f makes change > to this table to add a flag. I see two entries in that table: > > AU0828_DEVICE(0x05e1, 0x0480, "Hauppauge", "Woodbury"), > AU0828_DEVICE(0x2040, 0x8200, "Hauppauge", "Woodbury"), > > Since these drivers are now coupled doing resource sharing, > could it be that with your change to au02828 device table, > your changes are bow incomplete. > > I don't have a Woodbury device though. This is something to > try. > > Did you consider sending patch to add your device variant, > so you don't have to keep making this change whenever you > go to a new kernel? > > thanks, > -- Shuah
Re: Kernel 5.2.8 - au0828 - Tuner Is Busy
examined. If no changes are found in that branch/tag range, then the next step would be to analyze any commits that are affected by parents/children (references) of au0828 within that version range, and continually move up/down the line. (eg. linux/usb.h which is referenced by au0828.h) This way, the scope is very narrow at the beginning and widens as needed. I think it's something that could be implemented in the git tool and the user only needs to provide a starting place. Just a thought. I can only hope that I incorrectly used bisecting and someone can point to what I did wrong and provide a better way. (maybe I wouldn't have to mrproper, so the testing wouldn't take days?) On Mon, Aug 19, 2019 at 3:49 PM shuah wrote: > > On 8/16/19 7:15 PM, Nathan Royce wrote: > Hi Nathan, > > Just catching up with this thread. Let me know what you find. Can you > build your own kernel and see what you can find? > > thanks, > -- Shuah
Re: Kernel 5.2.8 - au0828 - Tuner Is Busy
(resubmitting due to non "reply-to-all"): Bugger, I just sent a reply to your last message, but it bounced back with: * 550 5.7.1 Content-Policy reject msg: The message contains HTML subpart, therefore we consider it SPAM or Outlook Virus. TEXT/PLAIN is accepted.! BF:; S1728494AbfHSVzk * I just switched this email to plain-text and will resubmit my previous email as plain-text. Anyway, yeah, all I did in au0828-cards.c was add my 0x0400 like: * { USB_DEVICE(0x2040, 0x7281), .driver_info = AU0828_BOARD_HAUPPAUGE_HVR950Q_MXL }, { USB_DEVICE(0x05e1, 0x0400), .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, { USB_DEVICE(0x05e1, 0x0480), .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, { USB_DEVICE(0x2040, 0x8200), .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, * That's all I've ever had to do. I never knew about the quirks-table.h. I'll take a look. I saw in the log the 0x05e1 addition was made in 2016, but maybe it only applies to the Media Controller API change requirement now (thus, not having caused any problems in the past since the API wasn't being used). I've never sent in a patch before (anywhere. I just point out a problem and let the dev code it in their style). Also I don't want to be a bother in case something even that small could somehow break something else, especially for something "off-brand"(?). I never really minded building the module by itself. I've just now started the build for linux-5.2.y with the quirks-table.h change along with au0828-cards.c. Thanks for that heads-up. Hopefully that does the trick (whatever the trick/quirk is). On Mon, Aug 19, 2019 at 4:44 PM shuah wrote: > > On 8/19/19 2:49 PM, shuah wrote: > > Hi Nathan, > > > > Just catching up with this thread. Let me know what you find. Can you > > build your own kernel and see what you can find? > > > > You said you make changes to the > > "Whenever I update my kernel, I edit the > ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my > 0x400 device. > I've been doing it for years and it's been working fine... until now..." > > Please send me the changes you make to the file. I see the following > WOODBURY devices. I am assuming you add 0x400 entry. > > { USB_DEVICE(0x05e1, 0x0480), > .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, > { USB_DEVICE(0x2040, 0x8200), > .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY }, > > > There is another table in sound/usb/quirks-table.h for AU0828 > devices. In addition to 812658d88d26, 66354f18fe5f makes change > to this table to add a flag. I see two entries in that table: > > AU0828_DEVICE(0x05e1, 0x0480, "Hauppauge", "Woodbury"), > AU0828_DEVICE(0x2040, 0x8200, "Hauppauge", "Woodbury"), > > Since these drivers are now coupled doing resource sharing, > could it be that with your change to au02828 device table, > your changes are bow incomplete. > > I don't have a Woodbury device though. This is something to > try. > > Did you consider sending patch to add your device variant, > so you don't have to keep making this change whenever you > go to a new kernel? > > thanks, > -- Shuah
Re: Kernel 5.2.8 - au0828 - Tuner Is Busy
On Fri, Aug 16, 2019 at 1:42 PM Greg Kroah-Hartman wrote: > If you revert that one commit, does things start working again? > > thanks, > > greg k-h Hey Greg, I just got finished building it after running "$ git revert 812658d88d26" and verifying it reverted by comparing one of the files from git log -p, but alas, no joy. On Fri, Aug 16, 2019 at 5:41 PM Brad Love wrote: > > Hi Nathan, > > I don't have a "woodbury", but I have a Hauppauge 950Q sitting around > and tested it on latest mainline kernel. w_scan is ok and streaming is > fine. There's no unexpected errors. The 950Q uses the same au0828 bridge > and au8522 demod as woodbury, but a different tuner. Your problem > wouldn't appear to be a general au0828 issue. > > You might have to check out git bisect. That will be the quickest way to > get to the bottom, if you've got points A and B, and are > building/running your own kernel. > > Cheers, > > Brad Thanks Brad, I'll explore bisecting and hopefully will be able to narrow down the cause. I wasn't running my own kernel, but rather using the Arch Linux kernel and modding the one module and putting it in "extramodules".
Kernel 5.2.8 - au0828 - Tuner Is Busy
Right up front, I must say I do NOT have a Hauppauge tuner. I think it's like maybe Mygica/Geniatech: Bus 002 Device 004: ID 05e1:0400 Syntek Semiconductor Co., Ltd Whenever I update my kernel, I edit the ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my 0x400 device. I've been doing it for years and it's been working fine... until now... * Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19 <...18 more repeated entries...> Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19 Aug 16 12:07:10 computerName tvheadend[3276]: main: Log started * "w_scan" behaves the same way. * $ modprobe au0828 Aug 16 12:52:52 computerName kernel: videodev: Linux video capture interface: v2.00 Aug 16 12:52:52 computerName kernel: au0828: au0828_init() Debugging is enabled Aug 16 12:52:52 computerName kernel: au0828: au0828 driver loaded Aug 16 12:52:52 computerName kernel: au0828: au0828_usb_probe() vendor id 0x5e1 device id 0x400 ifnum:0 Aug 16 12:52:52 computerName kernel: au0828: au0828_gpio_setup() Aug 16 12:52:52 computerName kernel: au0828: au0828_i2c_register() Aug 16 12:52:52 computerName kernel: au0828: i2c bus registered Aug 16 12:52:52 computerName kernel: au0828: au0828_card_setup() Aug 16 12:52:52 computerName kernel: tveeprom: Encountered bad packet header [20]. Corrupt or not a Hauppauge eeprom. Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom: warning: unknown hauppauge model #0 Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom: hauppauge eeprom: model=0 Aug 16 12:52:52 computerName kernel: au0828: au0828_analog_register called for intf#0! Aug 16 12:52:52 computerName kernel: au0828: au0828_dvb_register() Aug 16 12:52:52 computerName kernel: au8522 7-0047: creating new instance Aug 16 12:52:52 computerName kernel: tda18271 7-0060: creating new instance Aug 16 12:52:52 computerName kernel: tda18271: TDA18271HD/C2 detected @ 7-0060 Aug 16 12:52:53 computerName kernel: au0828: dvb_register() Aug 16 12:52:53 computerName kernel: dvbdev: DVB: registering new adapter (au0828) Aug 16 12:52:53 computerName kernel: usb 2-2.3: DVB: registering adapter 0 frontend 0 (Auvitek AU8522 QAM/8VSB Frontend)... Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity: media entity 'Auvitek AU8522 QAM/8VSB Frontend' registered. Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity: media entity 'dvb-demux' registered. Aug 16 12:52:53 computerName kernel: au0828: Registered device AU0828 [Hauppauge Woodbury] Aug 16 12:52:53 computerName kernel: usbcore: registered new interface driver au0828 * The "eeprom" thing has never been an issue with regard to my tuner working. It still worked in spite of it. It's odd because: * $ lsmod | grep au0828 au0828 86016 0 tveeprom 28672 1 au0828 dvb_core 176128 1 au0828 v4l2_common20480 1 au0828 videobuf2_vmalloc 20480 2 dvb_core,au0828 videobuf2_v4l2 28672 1 au0828 videobuf2_common 61440 3 videobuf2_v4l2,dvb_core,au0828 videodev 253952 4 v4l2_common,videobuf2_v4l2,videobuf2_common,au0828 rc_core61440 1 au0828 media 61440 6 videodev,snd_usb_audio,videobuf2_v4l2,dvb_core,videobuf2_common,au0828 $ ls -la /dev/dvb/adapter0/ total 0 drwxr-xr-x 2 root root 120 Aug 16 12:01 . drwxr-xr-x 3 root root 60 Aug 16 12:01 .. crw-rw+ 1 root video 212, 4 Aug 16 12:01 demux0 crw-rw+ 1 root video 212, 5 Aug 16 12:01 dvr0 crw-rw+ 1 root video 212, 3 Aug 16 12:01 frontend0 crw-rw+ 1 root video 212, 7 Aug 16 12:01 net0 * The previous kernel version I was on that worked was 5.1.15. I just reverted back to the previous version and it's working again. I don't know what broke and where, between the versions. I saw https://lkml.org/lkml/2019/1/21/1020 but this is back in January so I don't know if something was more recently applied to au0828 that makes use of the API. "lsof" didn't show anything related to "/dev/dvb" being used. Oh neat! Someone posted a neat git feature which I tried and I get: * $ git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative v5.1.15..v5.2.8 drivers/media/usb/au0828/ * be50f19fee84 - media: au0828: fix null dereference in error path (12 days ago) * c942fddf8793 - treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 (3 months ago) * 16216333235a - treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1 (3 months ago) * ec8f24b7faaf - treewide: Add SPDX license identifier - Makefile/Kconfig (3 months ago) * 14340de506c9 - media: prefix header search paths with $(srctree)/ (3 months ago) * f604f0f5afb8 - media: au0828: stop video streaming only when last user stops (4 months ago) * 898bc40bfcc2 - media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable() (4 months ago) * 383b0e5b6ebb - medi
Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13
One more thought that may be nothing, but when kmemleak crashed, SUnreclaim was at 932552 kB, and after reclaimed/cleared 299840 kB. There weren't any performance issues like when I had a leak of 5.5 gB in the 4.18 kernel. On Mon, Jan 7, 2019 at 3:52 AM Catalin Marinas wrote: > > Under memory pressure, kmemleak may fail to allocate memory. See this > patch for an attempt to slightly improve things but it's not a proper > solution: > > http://lkml.kernel.org/r/20190102180619.12392-1-...@lca.pw > > -- > Catalin
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
Wow, my system got wrecked (exaggeration) during this latest stretch... Pulseaudio was stretched to the limit and beyond and was forced to restart. Anything that was producing audio had to be restarted to get it back. This time was much like the first time and went from timestamp 573100.060927 (line 1) to 572506.604155 (line 11069), where 100% (literally) of it was that event 37 in the journal, no other kernel log entries except for the systemd-hostnamed audit before it all went down. And as usual, it was my USB TV tuner (tvheadend really) giving the Poll Timeout log entries. Those same uploaded trace files will be updated with the latest bugout. On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman wrote: > > The event type 37 is a host controller event, most likely a event ring full > error. > > So there are probably so many events that we fill the event ring before we > can handle them. > > Could you take traces of this? > Note that the trace file will be huge. > > mount -t debugfs none /sys/kernel/debug > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > copy the traces somewhere safe once the error is triggered: > cp /sys/kernel/debug/tracing/trace / > > -Mathias
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
You can ignore the last set of files with a sample of 1. I got a nice sample of like 150 about 6 hours ago. The link I included in the previous reply contains the same filenames, just updated. The journal timestamps (to correspond with the trace times) go from "[513438.430253] computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37" to "[513438.796965] computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37" That's 150 of them in less than 1/2 second. On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman wrote: > > The event type 37 is a host controller event, most likely a event ring full > error. > > So there are probably so many events that we fill the event ring before we > can handle them. > > Could you take traces of this? > Note that the trace file will be huge. > > mount -t debugfs none /sys/kernel/debug > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > copy the traces somewhere safe once the error is triggered: > cp /sys/kernel/debug/tracing/trace / > > -Mathias
Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13
I'm not all that sure it was memory related based on my Sun, 6 Jan 2019 13:17:04 -0600 post. You'll see the log entries at 3AM, and based on earlier entries I likely went to sleep around 1AM which would mean any memory intense applications (eg. virtual machine) would've been closed out. I have 8GB RAM in my desktop. On Mon, Jan 7, 2019 at 3:52 AM Catalin Marinas wrote: > > Hi Nathan, > > On Tue, Jan 01, 2019 at 01:17:06PM -0600, Nathan Royce wrote: > > I had a leak somewhere and I was directed to look into SUnreclaim > > which was 5.5 GB after an uptime of a little over 1 month on an 8 GB > > system. kmalloc-2048 was a problem. > > I just had enough and needed to find out the cause for my lagging system. > > > > I finally upgraded from 4.18.16 to 4.19.13 and enabled kmemleak to > > hunt for the culprit. I don't think a day had elapsed before kmemleak > > crashed and disabled itself. > > Under memory pressure, kmemleak may fail to allocate memory. See this > patch for an attempt to slightly improve things but it's not a proper > solution: > > http://lkml.kernel.org/r/20190102180619.12392-1-...@lca.pw > > -- > Catalin
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
OK, I finally got one... but there was only 1 journal log entry. The previous time there were like maybe 10 (also very little), but the 2 times before that had enough for me to have to page through the log. I actually messed up on a variable in my script so missed the actual time, but the trace still encompassed entries around the log entry time when I copied it manually. I fixed the script, tested it and have it running again for the next time. GZip compressed the 1GB trace down to 43MB, but PLZip got it down to 19.5MB: https://1drv.ms/f/s!AkkOvekTOCrYn0kEFtJzreV7gCTD All 3 files are from the same trace, but wanted to give you options in case you didn't have plzip. The journal entry (time): [501180.585516] computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37 On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman wrote: > > The event type 37 is a host controller event, most likely a event ring full > error. > > So there are probably so many events that we fill the event ring before we > can handle them. > > Could you take traces of this? > Note that the trace file will be huge. > > mount -t debugfs none /sys/kernel/debug > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > copy the traces somewhere safe once the error is triggered: > cp /sys/kernel/debug/tracing/trace / > > -Mathias
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
I'm only posting to say I'm still waiting... The error came up while I slept, and when I copied that log and looked at it (yes, it WAS huge, just as you said), the timestamps at the head/tail were much later than the journal logged times. So I made a little script to monitor the journal kernel entries for that message and have it copy the file after maybe 5 seconds. And now, I'm just waiting for that error to occur again. On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman wrote: > > The event type 37 is a host controller event, most likely a event ring full > error. > > So there are probably so many events that we fill the event ring before we > can handle them. > > Could you take traces of this? > Note that the trace file will be huge. > > mount -t debugfs none /sys/kernel/debug > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > copy the traces somewhere safe once the error is triggered: > cp /sys/kernel/debug/tracing/trace / > > -Mathias
Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13
e Jan 06 03:27:16 computername kernel: kmemleak: Kernel memory leak detector disabled Jan 06 03:27:16 computername kernel: kmemleak: Automatic memory scanning thread ended Jan 06 03:27:16 computername kernel: kmemleak: Kmemleak disabled without freeing internal data. Reclaim the memory with "echo clear > /sys/kernel/debug/kmemleak". Jan 06 03:27:15 computername plasmashell[1065]: qml: temp unit: 0 Jan 06 03:27:21 computername plasmashell[1065]: qml: temp unit: 0 Jan 06 03:27:24 computername plasmashell[1065]: qml: temp unit: 0 * On Tue, Jan 1, 2019 at 7:04 PM Nathan Royce wrote: > > It was unrelated to my USB issue. It happened again after I rebooted > within 4 hours of uptime. > This time there were 2 traces, one right after the other and included > another line number. > * > Jan 01 17:47:54 computername plasmashell[1048]: qt.qpa.xcb: > QXcbConnection: XCB error: 2 (BadValue), sequence: 45625, resource id: > 69206018, major code: 142 (Unknown), minor code: 3 > Jan 01 17:50:14 computername kernel: WARNING: CPU: 3 PID: 2154 at > mm/page_alloc.c:4262 __alloc_pages_nodemask+0xf74/0xfb0 > Jan 01 17:50:15 computername kernel: Modules linked in: rfcomm ccm > bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271 > au8522_dig au8522_common au0828 tveeprom dvb_core arc4 v4l2_common > intel_rapl snd_soc_rt5640 iTCO_wdt rtl8821ae x86_pkg_temp_thermal > btcoexist i> > Jan 01 17:50:16 computername kernel: soundcore mei_me lpc_ich mei > crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64 > serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas > usb_storage dm_crypt dm_mod sr_mod cdrom sd_mod hid_logitech_hidpp > hid_logitech_> > Jan 01 17:50:16 computername kernel: CPU: 3 PID: 2154 Comm: > PeripBusCEC Not tainted 4.19.13-dirty #2 > Jan 01 17:50:16 computername kernel: Hardware name: To Be Filled By > O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015 > Jan 01 17:50:16 computername kernel: RIP: > 0010:__alloc_pages_nodemask+0xf74/0xfb0 > Jan 01 17:50:16 computername kernel: Code: ff 0f 0b e9 dc fc ff ff 0f > 0b 48 8b b4 24 80 00 00 00 8b 7c 24 18 44 89 f1 48 c7 c2 40 9e 4a b6 > e8 91 ef ff ff e9 d3 f1 ff ff <0f> 0b e9 a9 fc ff ff e8 c0 7f ea ff 85 > d2 0f 85 15 fd ff ff 48 c7 > Jan 01 17:50:16 computername kernel: RSP: 0018:999e032731e0 EFLAGS: > 00010202 > Jan 01 17:50:16 computername kernel: RAX: 8bbcbabc0040 RBX: > 0040 RCX: 0020 > Jan 01 17:50:16 computername kernel: RDX: RSI: > 0002 RDI: 8bbd9fdfc000 > Jan 01 17:50:16 computername kernel: RBP: 0020 R08: > 0040 R09: 0f82 > Jan 01 17:50:16 computername kernel: R10: 0020 R11: > R12: > Jan 01 17:50:16 computername kernel: R13: R14: > R15: > Jan 01 17:50:16 computername kernel: FS: 7f9515642700() > GS:8bbd9818() knlGS: > Jan 01 17:50:16 computername kernel: CS: 0010 DS: ES: CR0: > 80050033 > Jan 01 17:50:16 computername kernel: CR2: 7fdbd95b1000 CR3: > 00011087c003 CR4: 001626e0 > Jan 01 17:50:16 computername kernel: Call Trace: > Jan 01 17:50:16 computername kernel: ? ___slab_alloc+0x43f/0x630 > Jan 01 17:50:16 computername kernel: ? orc_find+0x108/0x190 > Jan 01 17:50:16 computername kernel: ? kmem_cache_alloc+0x1c5/0x210 > Jan 01 17:50:16 computername kernel: ? unwind_next_frame+0x2f8/0x460 > Jan 01 17:50:16 computername kernel: new_slab+0x2fb/0x6f0 > Jan 01 17:50:16 computername kernel: ? _raw_spin_unlock+0x16/0x30 > Jan 01 17:50:16 computername kernel: ? deactivate_slab.isra.27+0x5b4/0x690 > Jan 01 17:50:16 computername kernel: ___slab_alloc+0x43f/0x630 > Jan 01 17:50:16 computername kernel: ? alloc_extent_state+0x1f/0xd0 [btrfs] > Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 > Jan 01 17:50:16 computername kernel: ? ___slab_alloc+0x58d/0x630 > Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 > Jan 01 17:50:16 computername kernel: __slab_alloc.isra.28+0x52/0x70 > Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 > Jan 01 17:50:16 computername kernel: kmem_cache_alloc+0x1c5/0x210 > Jan 01 17:50:16 computername kernel: ? alloc_extent_state+0x1f/0xd0 [btrfs] > Jan 01 17:50:16 computername kernel: create_object+0x43/0x2a0 > Jan 01 17:50:16 computername kernel: ? alloc_extent_state+0x1f/0xd0 [btrfs] > Jan 01 17:50:16 computername kernel: kmem_cache_alloc+0x1a6/0x210 > Jan 01 17:50:16 computername kernel: alloc_extent_state+0x1f/0xd0 [btrfs] > Jan 01 17:50:16 computername kernel: __clear_extent_bit+0x297/0x390 [btrfs] > Jan 01 17:50:16 computername
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
...But then again, maybe it wasn't the cable. It's acting up again. On Tue, Jan 1, 2019 at 3:14 PM Nathan Royce wrote: > > Looks like this particular issue may have been due to a touchy/finicky > connection. > > I removed my tuner from my hub and removed the hub from my > motherboard's USB and put my tuner in directly. > It STILL produced the error, but after I put everything back and > played around a little, the errors stopped. > > Just to be sure, I also rebooted and it's still fine. No xhci errors at all. > The only thing I've done recently (within the past few days) was play > with my scanner which is also on that hub and maybe brushed my tuner > cable or something.
Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13
kernel: FS: () GS:8bbd9800() knlGS: Jan 01 17:50:16 computername kernel: CS: 0010 DS: ES: CR0: 80050033 Jan 01 17:50:16 computername kernel: CR2: 5636c5b233d0 CR3: 000150a0a006 CR4: 001626f0 Jan 01 17:50:16 computername kernel: Call Trace: Jan 01 17:50:16 computername kernel: ? orc_find+0x108/0x190 Jan 01 17:50:16 computername kernel: ? unwind_next_frame+0x121/0x460 Jan 01 17:50:16 computername kernel: ? kcryptd_crypt+0x1d1/0x3a0 [dm_crypt] Jan 01 17:50:16 computername kernel: ? _raw_spin_lock+0x2e/0x40 Jan 01 17:50:16 computername kernel: ? _raw_spin_unlock+0x16/0x30 Jan 01 17:50:16 computername kernel: new_slab+0x2fb/0x6f0 Jan 01 17:50:16 computername kernel: ? _raw_spin_lock+0x13/0x40 Jan 01 17:50:16 computername kernel: ? deactivate_slab.isra.27+0x5b4/0x690 Jan 01 17:50:16 computername kernel: ___slab_alloc+0x43f/0x630 Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 Jan 01 17:50:16 computername kernel: ? ___slab_alloc+0x58d/0x630 Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 Jan 01 17:50:16 computername kernel: __slab_alloc.isra.28+0x52/0x70 Jan 01 17:50:16 computername kernel: ? create_object+0x43/0x2a0 Jan 01 17:50:16 computername kernel: kmem_cache_alloc+0x1c5/0x210 Jan 01 17:50:16 computername kernel: ? mempool_alloc+0x65/0x180 Jan 01 17:50:16 computername kernel: create_object+0x43/0x2a0 Jan 01 17:50:16 computername kernel: ? mempool_alloc+0x65/0x180 Jan 01 17:50:16 computername kernel: kmem_cache_alloc+0x1a6/0x210 Jan 01 17:50:16 computername kernel: ? wait_woken+0x80/0x80 Jan 01 17:50:16 computername kernel: mempool_alloc+0x65/0x180 Jan 01 17:50:16 computername kernel: ? crypt_convert+0x96b/0xf50 [dm_crypt] Jan 01 17:50:16 computername kernel: bio_alloc_bioset+0x14c/0x220 Jan 01 17:50:16 computername kernel: ? _raw_spin_lock_irqsave+0x25/0x50 Jan 01 17:50:16 computername kernel: kcryptd_crypt+0x1d1/0x3a0 [dm_crypt] Jan 01 17:50:16 computername kernel: process_one_work+0x1eb/0x410 Jan 01 17:50:16 computername kernel: worker_thread+0x2d/0x3d0 Jan 01 17:50:16 computername kernel: ? process_one_work+0x410/0x410 Jan 01 17:50:16 computername kernel: kthread+0x112/0x130 Jan 01 17:50:16 computername kernel: ? kthread_park+0x80/0x80 Jan 01 17:50:16 computername kernel: ret_from_fork+0x35/0x40 Jan 01 17:50:16 computername kernel: ---[ end trace 2a9048666fdb2311 ]--- Jan 01 17:50:16 computername kernel: kmemleak: Cannot allocate a kmemleak_object structure Jan 01 17:50:16 computername kernel: kmemleak: Kernel memory leak detector disabled Jan 01 17:50:16 computername kernel: kmemleak: Automatic memory scanning thread ended Jan 01 17:50:16 computername kernel: kmemleak: Kmemleak disabled without freeing internal data. Reclaim the memory with "echo clear > /sys/kernel/debug/kmemleak". Jan 01 17:50:25 computername plasmashell[1048]: qt.qpa.xcb: QXcbConnection: XCB error: 2 (BadValue), sequence: 47417, resource id: 71303170, major code: 142 (Unknown), minor code: 3 ***** On Tue, Jan 1, 2019 at 1:17 PM Nathan Royce wrote: > > Kernel 4.19.13 > > * > Jan 01 02:04:20 computername kernel: xhci_hcd :00:14.0: ERROR > unknown event type 37 > Jan 01 02:04:20 computername kernel: WARNING: CPU: 2 PID: 2236 at > mm/page_alloc.c:4254 __alloc_pages_nodemask+0xf52/0xfb0 > Jan 01 02:04:20 computername kernel: Modules linked in: rfcomm ccm > bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271 > au8522_dig au8522_common au0828 snd_usb_audio tveeprom snd_usbmidi_lib > dvb_core mousedev snd_rawmidi snd_seq_device btusb v4l2_common btrtl > vide> > Jan 01 02:04:20 computername kernel: llc intel_rapl_perf soundcore > alx i2c_i801 mdio evdev lpc_ich mei_me mei pcc_cpufreq mac_hid > crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64 > serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas > usb_storage dm_c> > Jan 01 02:04:20 computername kernel: CPU: 2 PID: 2236 Comm: > MainLoopThread Tainted: GW 4.19.13-dirty #2 > Jan 01 02:04:20 computername kernel: Hardware name: To Be Filled By > O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015 > Jan 01 02:04:20 computername kernel: RIP: > 0010:__alloc_pages_nodemask+0xf52/0xfb0 > Jan 01 02:04:20 computername kernel: Code: c7 44 24 54 00 00 00 00 25 > ff ff f7 ff 89 44 24 18 e9 ea f3 ff ff 48 89 9c 24 80 00 00 00 e9 ad > f3 ff ff 0f 0b e9 dc fc ff ff <0f> 0b 48 8b b4 24 80 00 00 00 8b 7c 24 > 18 44 89 f1 48 c7 c2 40 9e > Jan 01 02:04:20 computername kernel: RSP: 0018:af9f81066e90 EFLAGS: > 00010046 > Jan 01 02:04:20 computername kernel: RAX: RBX: > 0040 RCX: > Jan 01 02:04:20 computername kernel: RDX: RSI: > 0002 RDI: 9d26dfdfc000 > Jan 01 02:04:20 computername
Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
Looks like this particular issue may have been due to a touchy/finicky connection. I removed my tuner from my hub and removed the hub from my motherboard's USB and put my tuner in directly. It STILL produced the error, but after I put everything back and played around a little, the errors stopped. Just to be sure, I also rebooted and it's still fine. No xhci errors at all. The only thing I've done recently (within the past few days) was play with my scanner which is also on that hub and maybe brushed my tuner cable or something. On Tue, Jan 1, 2019 at 12:57 PM Nathan Royce wrote: > > Kernel 4.19.13 > > 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB > xHCI Controller > > Around 400 "unknown event type 37" messages logged in a 2 second span. > * > Jan 01 02:08:07 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 > QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT > Jan 01 02:08:00 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 > QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT > Jan 01 02:07:56 computername kernel: xhci_hcd :00:14.0: ERROR > unknown event type 37 > ... > Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR > unknown event type 37 > Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR > unknown event type 37 > Jan 01 02:07:52 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 > QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT > Jan 01 02:07:44 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 > QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT > * > > I question whether this also caused kemleak to crash as well (will > post after this). > > Regarding my tv tuner, it isn't supported by the kernel specifically, > but is close enough that all I have to do is alter a single source > file to include my device's pid, and it works just fine almost all of > the time.
kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13
Kernel 4.19.13 * Jan 01 02:04:20 computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37 Jan 01 02:04:20 computername kernel: WARNING: CPU: 2 PID: 2236 at mm/page_alloc.c:4254 __alloc_pages_nodemask+0xf52/0xfb0 Jan 01 02:04:20 computername kernel: Modules linked in: rfcomm ccm bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271 au8522_dig au8522_common au0828 snd_usb_audio tveeprom snd_usbmidi_lib dvb_core mousedev snd_rawmidi snd_seq_device btusb v4l2_common btrtl vide> Jan 01 02:04:20 computername kernel: llc intel_rapl_perf soundcore alx i2c_i801 mdio evdev lpc_ich mei_me mei pcc_cpufreq mac_hid crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas usb_storage dm_c> Jan 01 02:04:20 computername kernel: CPU: 2 PID: 2236 Comm: MainLoopThread Tainted: GW 4.19.13-dirty #2 Jan 01 02:04:20 computername kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015 Jan 01 02:04:20 computername kernel: RIP: 0010:__alloc_pages_nodemask+0xf52/0xfb0 Jan 01 02:04:20 computername kernel: Code: c7 44 24 54 00 00 00 00 25 ff ff f7 ff 89 44 24 18 e9 ea f3 ff ff 48 89 9c 24 80 00 00 00 e9 ad f3 ff ff 0f 0b e9 dc fc ff ff <0f> 0b 48 8b b4 24 80 00 00 00 8b 7c 24 18 44 89 f1 48 c7 c2 40 9e Jan 01 02:04:20 computername kernel: RSP: 0018:af9f81066e90 EFLAGS: 00010046 Jan 01 02:04:20 computername kernel: RAX: RBX: 0040 RCX: Jan 01 02:04:20 computername kernel: RDX: RSI: 0002 RDI: 9d26dfdfc000 Jan 01 02:04:20 computername kernel: RBP: R08: 0040 R09: 0f82 Jan 01 02:04:20 computername kernel: R10: R11: R12: Jan 01 02:04:20 computername kernel: R13: R14: R15: Jan 01 02:04:20 computername kernel: FS: 7f7db94d5700() GS:9d26d810() knlGS: Jan 01 02:04:20 computername kernel: CS: 0010 DS: ES: CR0: 80050033 Jan 01 02:04:20 computername kernel: CR2: 92c9da10 CR3: 0001baefe002 CR4: 001626e0 Jan 01 02:04:20 computername kernel: Call Trace: Jan 01 02:04:20 computername kernel: ? __dm_make_request.isra.18+0x3f/0xa0 [dm_mod] Jan 01 02:04:20 computername kernel: ? orc_find+0x108/0x190 Jan 01 02:04:20 computername kernel: ? do_try_to_free_pages+0xc6/0x370 Jan 01 02:04:20 computername kernel: new_slab+0x2fb/0x6f0 Jan 01 02:04:20 computername kernel: ? _raw_spin_lock+0x13/0x40 Jan 01 02:04:20 computername kernel: ? deactivate_slab.isra.27+0x5b4/0x690 Jan 01 02:04:20 computername kernel: ___slab_alloc+0x43f/0x630 Jan 01 02:04:20 computername kernel: ? create_object+0x43/0x2a0 Jan 01 02:04:20 computername kernel: ? ___slab_alloc+0x58d/0x630 Jan 01 02:04:20 computername kernel: ? create_object+0x43/0x2a0 Jan 01 02:04:20 computername kernel: __slab_alloc.isra.28+0x52/0x70 Jan 01 02:04:20 computername kernel: ? create_object+0x43/0x2a0 Jan 01 02:04:20 computername kernel: kmem_cache_alloc+0x1c5/0x210 Jan 01 02:04:20 computername kernel: ? mempool_alloc+0x65/0x180 Jan 01 02:04:20 computername kernel: create_object+0x43/0x2a0 Jan 01 02:04:20 computername kernel: ? mempool_alloc+0x65/0x180 Jan 01 02:04:20 computername kernel: kmem_cache_alloc+0x1a6/0x210 Jan 01 02:04:20 computername kernel: ? wait_woken+0x80/0x80 Jan 01 02:04:20 computername kernel: mempool_alloc+0x65/0x180 Jan 01 02:04:20 computername kernel: ? __process_bio+0x170/0x170 [dm_mod] Jan 01 02:04:20 computername kernel: bio_alloc_bioset+0x14c/0x220 Jan 01 02:04:20 computername kernel: ? create_object+0x249/0x2a0 Jan 01 02:04:20 computername kernel: ? __process_bio+0x170/0x170 [dm_mod] Jan 01 02:04:20 computername kernel: alloc_io+0x24/0x120 [dm_mod] Jan 01 02:04:20 computername kernel: __split_and_process_bio+0x53/0x1a0 [dm_mod] Jan 01 02:04:20 computername kernel: ? generic_make_request_checks+0x49a/0x6f0 Jan 01 02:04:20 computername kernel: ? blk_queue_enter+0x233/0x260 Jan 01 02:04:20 computername kernel: __dm_make_request.isra.18+0x3f/0xa0 [dm_mod] Jan 01 02:04:20 computername kernel: generic_make_request+0x1b9/0x3d0 Jan 01 02:04:20 computername kernel: ? __se_sys_madvise.cold.2+0xbd/0xbd Jan 01 02:04:20 computername kernel: submit_bio+0x45/0x140 Jan 01 02:04:20 computername kernel: __swap_writepage+0x133/0x3c0 Jan 01 02:04:20 computername kernel: ? __frontswap_store+0x6e/0xf0 Jan 01 02:04:20 computername kernel: shmem_writepage+0x229/0x310 Jan 01 02:04:20 computername kernel: pageout.isra.11+0x117/0x350 Jan 01 02:04:20 computername kernel: shrink_page_list+0x7ea/0xc80 Jan 01 02:04:20 computername kernel: shrink_inactive_list+0x29f/0x6b0 Jan 01 02:04:20 computername kernel: shrink_node_memcg+0x20f/0x780 Jan 01 02:04:20 computername kernel: shrink_node+0xcf/0x4a0 Jan 01 02:04:20 computernam
kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13
Kernel 4.19.13 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller Around 400 "unknown event type 37" messages logged in a 2 second span. * Jan 01 02:08:07 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT Jan 01 02:08:00 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT Jan 01 02:07:56 computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37 ... Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37 Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR unknown event type 37 Jan 01 02:07:52 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT Jan 01 02:07:44 computername tvheadend[2370]: linuxdvb: Auvitek AU8522 QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT * I question whether this also caused kemleak to crash as well (will post after this). Regarding my tv tuner, it isn't supported by the kernel specifically, but is close enough that all I have to do is alter a single source file to include my device's pid, and it works just fine almost all of the time.
drivers/tty/serial/samsung.c s3c24xx_uart_copy_rx_to_tty
No idea why, but I will say that something I've done recently was re-enabl my ath9k_htc wireless adapter which tends to firmware-panic quite a bit which also sometimes kills off my ppp usb adapter. I have a script running that monitors the journalctl and restarts hostapd everytime my ath device firmware-panics and comes back alive. Same with netctl for my ppp. The first time I've noticed this particular issue was yesterday when my ssh session became sluggish and I was eventually forced to pull power from my odroid. I only have 4 cores enabled, otherwise the usb issues increase. I cobbled together a setup with my ATX power supply providing power to everything (including usb devices and hubs). * Apr 27 10:27:39 computername kernel: [] (s3c64xx_serial_handle_irq) from [] (__handle_irq_event_percpu+0x50/0x11c) Apr 27 10:28:42 computername systemd-journald[22668]: Missed 5173 kernel messages Apr 27 10:28:42 computername kernel: CPU: 0 PID: 99 Comm: mmcqd/1 Tainted: G D W 4.14.0-dirty #6 Apr 27 10:28:42 computername kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) Apr 27 10:28:42 computername kernel: [] (unwind_backtrace) from [] (show_stack+0x10/0x14) Apr 27 10:28:42 computername kernel: [] (show_stack) from [] (dump_stack+0x88/0x9c) Apr 27 10:28:42 computername kernel: [] (dump_stack) from [] (__warn+0xe8/0x100) Apr 27 10:28:42 computername kernel: [] (__warn) from [] (warn_slowpath_null+0x20/0x28) Apr 27 10:28:42 computername kernel: [] (warn_slowpath_null) from [] (s3c24xx_uart_copy_rx_to_tty+0xa0/0xd4) Apr 27 10:28:42 computername kernel: [] (s3c24xx_uart_copy_rx_to_tty) from [] (s3c24xx_serial_rx_chars+0x14c/0x1b8) Apr 27 10:28:42 computername kernel: [] (s3c24xx_serial_rx_chars) from [] (s3c64xx_serial_handle_irq+0x48/0x60) Apr 27 10:28:42 computername kernel: [] (s3c64xx_serial_handle_irq) from [] (__handle_irq_event_percpu+0x50/0x11c) Apr 27 10:28:42 computername kernel: [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x2c/0x7c) Apr 27 10:28:42 computername kernel: [] (handle_irq_event_percpu) from [] (handle_irq_event+0x38/0x5c) Apr 27 10:28:42 computername kernel: [] (handle_irq_event) from [] (handle_fasteoi_irq+0xa4/0x158) Apr 27 10:28:42 computername kernel: [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x24/0x34) Apr 27 10:28:42 computername kernel: [] (generic_handle_irq) from [] (__handle_domain_irq+0x5c/0xb4) Apr 27 10:28:42 computername kernel: [] (__handle_domain_irq) from [] (gic_handle_irq+0x3c/0x78) Apr 27 10:28:42 computername kernel: [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90) Apr 27 10:28:42 computername kernel: Exception stack(0xede8baf8 to 0xede8bb40) Apr 27 10:28:42 computername kernel: bae0: ede8bcf4 ede98c80 Apr 27 10:28:42 computername kernel: bb00: 0002 a00c0113 eddfec00 ede8bce0 ee20f904 ede8bd58 Apr 27 10:28:42 computername kernel: bb20: 0100 c0c02080 c0801550 ede8bb48 c074c7dc c074c7e0 600c0113 Apr 27 10:28:42 computername kernel: [] (__irq_svc) from [] (_raw_spin_unlock_irqrestore+0x10/0x14) Apr 27 10:28:42 computername kernel: [] (_raw_spin_unlock_irqrestore) from [] (dw_mci_request_end+0xa8/0xdc) Apr 27 10:28:42 computername kernel: [] (dw_mci_request_end) from [] (dw_mci_tasklet_func+0x31c/0x3dc) Apr 27 10:28:42 computername kernel: [] (dw_mci_tasklet_func) from [] (tasklet_action+0x7c/0x118) Apr 27 10:28:42 computername kernel: [] (tasklet_action) from [] (__do_softirq+0xe0/0x248) Apr 27 10:28:42 computername kernel: [] (__do_softirq) from [] (irq_exit+0xd8/0x140) Apr 27 10:28:42 computername kernel: [] (irq_exit) from [] (__handle_domain_irq+0x60/0xb4) Apr 27 10:28:42 computername kernel: [] (__handle_domain_irq) from [] (gic_handle_irq+0x3c/0x78) Apr 27 10:28:42 computername kernel: [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90) Apr 27 10:28:42 computername kernel: Exception stack(0xede8bc28 to 0xede8bc70) Apr 27 10:28:42 computername kernel: bc20: ede8bcf4 ede98c80 0001 7fff ede8bcf4 Apr 27 10:28:42 computername kernel: bc40: ede8a000 0002 c0c05448 ede8bcf0 0100 ede8bc78 Apr 27 10:28:42 computername kernel: bc60: c074c7ec c074c7f0 600c0013 Apr 27 10:28:42 computername kernel: [] (__irq_svc) from [] (_raw_spin_unlock_irq+0xc/0x10) Apr 27 10:28:42 computername kernel: [] (_raw_spin_unlock_irq) from [] (wait_for_common+0xa0/0x168) Apr 27 10:28:42 computername kernel: [] (wait_for_common) from [] (mmc_wait_for_req_done+0x8c/0x110) Apr 27 10:28:42 computername kernel: [] (mmc_wait_for_req_done) from [] (mmc_wait_for_cmd+0x68/0x9c) Apr 27 10:28:42 computername kernel: [] (mmc_wait_for_cmd) from [] (__mmc_send_status+0x68/0x98) Apr 27 10:28:42 computername kernel: [] (__mmc_send_status) from [] (card_busy_detect+0x64/0x150) Apr 27 10:28:42 computername kernel: [] (card_busy_detect) from [] (mmc_blk_err_check+0x180/0x5bc) Apr 27 10:28:42 computername kernel: [] (mmc_blk_err_check) from []
Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
I finally got around to applying your patch, building the toolchain (based on master source (gcc8)), but alas while there is no firmware panic in the log, wifi drops off the face of the planet (ssid disappears and hostapd doesn't know wifi failed (nothing in the log either)). On Wed, Jun 7, 2017 at 5:39 PM, Tobias Diedrich wrote: > Oleksij Rempel wrote: >> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich: >> > Oleksij Rempel wrote: >> >> Yes, this is "normal" problem. The firmware has no error handler for PCI >> >> bus related exceptions. So if we filed to read PCI bus first time, we >> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot >> >> and provide an kernel "firmware panic!" message. >> >> Every one who can or will to fix this, is welcome. >> >> >> >>> * >> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic! >> >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038. >> > [...] >> > >> >> memdmp 50ae78 50ae88 >> > >> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940 l...j..@ >> > >> > [...copy to bin...] >> > $ bin/objdump -b binary -m xtensa -D /tmp/memdump.bin >> > [..] >> >0: 6c1004 entry a1, 32 >> >3: 126aa2 l32ra2, 0xfffdaa8c >> >6: 0c0200 memw >> >9: 8820l32i.n a8, a2, 0 <--Exception cause >> > PC still points at load >> >b: c020movi.n a2, 0 >> >d: 081940 extui a9, a8, 1, 1 >> > >> > Judging from that it should be fairly simple to at least implement >> > some sort of retry, possible after triggering a PCIe link retrain? >> >> I assume, yes. >> >> > There are some related PCIe root complex registers that may point to >> > what exactly failed if they were dumped. >> > >> > The root complex registers live at 0x0004 and I think match the >> > registers described for the root complex in the AR9344 datasheet. >> >> Suddenly I don't have ar7010 docs to tell.. >> >> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR: >> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in >> > the hierarchy reports any of the following errors and the associated >> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL, >> > ERR_NONFATAL." >> > >> > AFAICS link retrain can be done by setting bit3 (INIT_RST, >> > "Application request to initiate a training reset") in >> > PCIE_APP (0x4). >> > >> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which >> > flips some bits in the RC to enable the PCIe bus for reading the >> > EEPROM). >> > >> > The root complex pci configuration space is at 0x2 which could >> > have further error details: >> >> memdmp 2 20200 >> > >> > 02: a02a 168c 0010 0006 0001 0001 .*.. >> > 020010: >> > 020020: >> > 020030: 0040 01ff ...@ >> > 020040: 5bc3 5001 [.P. >> > 020050: 0080 7005 ..p. >> > 020060: >> > 020070: 0042 0010 8701 2010 0013 4411 .BD. >> > 020080: 3011 00c0 03c0 0... >> > 020090: 0010 >> > 0200a0: >> > 0200b0: >> > 0200c0: >> > 0200d0: >> > 0200e0: >> > 0200f0: >> > 020100: 1401 0001 0006 2030 ...0 >> > 020110: 2000 00a0 >> > 020120: >> > 020130: >> > 020140: 0001 0002 >> > 020150: 8000 00ff >> > 020160: >> > 020170: >> > 020180: >> > 020190: >> > 0201a0: >> > 0201b0: >> > 0201c0: >> > 0201d0: >> > 0201e0: >> > 0201f0:
Re: kernel BUG at fs/btrfs/ctree.c:3182 - occurred during heavy NFS transfer
I'm guessing this is related. I noticed my tv wasn't recording to my drive and when I tried to touch a file on the drive, my console become unresponsive. Trying to reboot took like 5 minutes to even stop the processes and in the end couldn't unmount the drive and I had to cut the power to finally get it to boot. Nov 01 17:41:42 dd kernel: [ cut here ] Nov 01 17:41:42 dd kernel: WARNING: CPU: 0 PID: 227 at fs/btrfs/file.c:547 btrfs_drop_extent_cache+0x4b4/0x4e8 [btrfs] Nov 01 17:41:42 dd kernel: Modules linked in: arc4 tda18271 au8522_dig au8522_common ath9k_htc ath9k_common au0828 btusb v4l2_common ath9k_hw btintel videobuf2_vmalloc btbcm videobuf2_memops tveeprom bluetooth ath dvb_core videobuf2_v4l2 videodev mac80211 ecdh_generic vi Nov 01 17:41:42 dd kernel: CPU: 0 PID: 227 Comm: mount Not tainted 4.13.0-dirty #2 Nov 01 17:41:42 dd kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) Nov 01 17:41:42 dd kernel: [] (unwind_backtrace) from [] (show_stack+0x10/0x14) Nov 01 17:41:42 dd kernel: [] (show_stack) from [] (dump_stack+0x88/0x9c) Nov 01 17:41:42 dd kernel: [] (dump_stack) from [] (__warn+0xe8/0x100) Nov 01 17:41:42 dd kernel: [] (__warn) from [] (warn_slowpath_null+0x20/0x28) Nov 01 17:41:42 dd kernel: [] (warn_slowpath_null) from [] (btrfs_drop_extent_cache+0x4b4/0x4e8 [btrfs]) Nov 01 17:41:42 dd kernel: [] (btrfs_drop_extent_cache [btrfs]) from [] (__btrfs_drop_extents+0x618/0x1000 [btrfs]) Nov 01 17:41:42 dd kernel: [] (__btrfs_drop_extents [btrfs]) from [] (btrfs_drop_extents+0x60/0x80 [btrfs]) Nov 01 17:41:42 dd kernel: [] (btrfs_drop_extents [btrfs]) from [] (replay_one_extent+0x718/0x818 [btrfs]) Nov 01 17:41:42 dd kernel: [] (replay_one_extent [btrfs]) from [] (replay_one_buffer+0x248/0x780 [btrfs]) Nov 01 17:41:42 dd kernel: [] (replay_one_buffer [btrfs]) from [] (walk_down_log_tree+0x144/0x38c [btrfs]) Nov 01 17:41:42 dd kernel: [] (walk_down_log_tree [btrfs]) from [] (walk_log_tree+0xd0/0x1e8 [btrfs]) Nov 01 17:41:42 dd kernel: [] (walk_log_tree [btrfs]) from [] (btrfs_recover_log_trees+0x21c/0x49c [btrfs]) Nov 01 17:41:42 dd kernel: [] (btrfs_recover_log_trees [btrfs]) from [] (open_ctree+0x232c/0x2400 [btrfs]) Nov 01 17:41:42 dd kernel: [] (open_ctree [btrfs]) from [] (btrfs_mount+0xecc/0xfa8 [btrfs]) Nov 01 17:41:42 dd kernel: [] (btrfs_mount [btrfs]) from [] (mount_fs+0x2c/0x164) Nov 01 17:41:42 dd kernel: [] (mount_fs) from [] (vfs_kern_mount.part.3+0x48/0xe0) Nov 01 17:41:42 dd kernel: [] (vfs_kern_mount.part.3) from [] (btrfs_mount+0x350/0xfa8 [btrfs]) Nov 01 17:41:42 dd kernel: [] (btrfs_mount [btrfs]) from [] (mount_fs+0x2c/0x164) Nov 01 17:41:42 dd kernel: [] (mount_fs) from [] (vfs_kern_mount.part.3+0x48/0xe0) Nov 01 17:41:42 dd kernel: [] (vfs_kern_mount.part.3) from [] (do_mount+0x1a8/0xc44) Nov 01 17:41:42 dd kernel: [] (do_mount) from [] (SyS_mount+0x54/0xc0) Nov 01 17:41:42 dd kernel: [] (SyS_mount) from [] (__sys_trace_return+0x0/0x10) Nov 01 17:41:42 dd kernel: ---[ end trace 35a26e49cc780cf9 ]---
kernel BUG at fs/btrfs/ctree.c:3182 - occurred during heavy NFS transfer
ODroid XU4 Arch Linux Kernel 4.13 (custom) 4TB USB 3.0 mechanical WD Drive/hub (had bad-block issues in the past that were "corrected") Occurred when using rsync to copy files to an encfs mount over nfs (only 22MB made it). Note, I keep the activity on my odroid very low or things start to bug out left and right. I even set the kernel config so only the 4 LITTLE ARM cores are used rather than include the other 4 big cores. Heavy IO such as from an rsync is one such thing to cause a bugout it seems. Surprisingly, the whigout didn't cause my drive to remount RO. Nov 01 09:43:46 dd kernel: [ cut here ] Nov 01 09:43:46 dd kernel: kernel BUG at fs/btrfs/ctree.c:3182! Nov 01 09:43:46 dd kernel: Internal error: Oops - BUG: 0 [#1] SMP ARM Nov 01 09:43:46 dd kernel: Modules linked in: nf_conntrack_netlink nfnetlink cmac ccm ppp_deflate ppp_async ppp_generic slhc bridge stp llc nf_log_ipv4 ipt_REJECT nf_reject_ipv4 xt_recent nf_log_ipv6 iptable_filter nf_log_common ipt_MASQUERADE xt_LOG nf_nat_masquerade_ip Nov 01 09:43:46 dd kernel: usbserial btrfs xor xor_neon lzo_compress lzo_decompress zlib_deflate raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace crypto_user sunrpc ip_tables x_tables Nov 01 09:43:46 dd kernel: CPU: 3 PID: 476 Comm: nfsd Not tainted 4.13.0-dirty #2 Nov 01 09:43:46 dd kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) Nov 01 09:43:46 dd kernel: task: e9987080 task.stack: e8c28000 Nov 01 09:43:46 dd kernel: PC is at btrfs_set_item_key_safe+0x138/0x144 [btrfs] Nov 01 09:43:46 dd kernel: LR is at comp_keys+0x4c/0x68 [btrfs] Nov 01 09:43:46 dd kernel: pc : []lr : [] psr: 60010013 Nov 01 09:43:46 dd kernel: sp : e8c29890 ip : 006c fp : 0001 Nov 01 09:43:46 dd kernel: r10: ec8ce000 r9 : d75feb40 r8 : e8c29893 Nov 01 09:43:46 dd kernel: r7 : c52816c8 r6 : c0c05448 r5 : 002e r4 : e8c29986 Nov 01 09:43:46 dd kernel: r3 : 00040d00 r2 : 00040d00 r1 : e8c29986 r0 : Nov 01 09:43:46 dd kernel: Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Nov 01 09:43:46 dd kernel: Control: 10c5387d Table: 4bc2806a DAC: 0051 Nov 01 09:43:46 dd kernel: Process nfsd (pid: 476, stack limit = 0xe8c28218) Nov 01 09:43:46 dd kernel: Stack: (0xe8c29890 to 0xe8c2a000) Nov 01 09:43:47 dd kernel: 9880: 8b002000 365c 6c00 0001 Nov 01 09:43:47 dd kernel: 98a0: 00040d00 c52816c8 a000 286a e8c29975 Nov 01 09:43:47 dd kernel: 98c0: d75feb40 bf19a130 00365c8b a000 0001 Nov 01 09:43:47 dd kernel: 98e0: dbe24000 bf1ab0ec c52816c8 00011000 00365c8b Nov 01 09:43:47 dd kernel: 9900: e99f5800 286a 0001 e9335248 Nov 01 09:43:47 dd kernel: 9920: e73472d0 a000 Nov 01 09:43:48 dd kernel: 9940: ec8ce000 bf221f04 c0c05448 Nov 01 09:43:48 dd kernel: 9960: e8c29a7c 0004 365c8b10 a0006c00 Nov 01 09:43:48 dd kernel: 9980: 5c8b 0036 006c 0100 c500 d0bf41e0 e93350a8 Nov 01 09:43:48 dd kernel: 99a0: 1000 00365c8b 00a0006c c5281b00 00040d00 Nov 01 09:43:48 dd kernel: 99c0: d75feb44 c52816c8 c24da070 6000 a000 0001 Nov 01 09:43:48 dd kernel: 99e0: bf1cdb60 a000 0001 Nov 01 09:43:48 dd kernel: 9a00: 0001 0035 e8c29a7c eb78f9ab 6000 e8c29a8c c24da000 Nov 01 09:43:48 dd kernel: 9a20: e8c29b74 e93350b4 e9335080 e93350a8 e99f5800 e9335248 Nov 01 09:43:49 dd kernel: 9a40: a000 a000 ee22d000 ec8ce000 e73472d0 d75feb40 Nov 01 09:43:49 dd kernel: 9a60: e8c29cf8 e9335220 c0c05448 e8c29aa0 c52816c8 Nov 01 09:43:49 dd kernel: 9a80: c24da9d0 c24da9d0 e8c29a8c 2000 c5281bd8 c5281bec Nov 01 09:43:49 dd kernel: 9aa0: c07289cc c5281bd8 00040d00 d75feb44 e9335080 Nov 01 09:43:49 dd kernel: 9ac0: d75feb40 d75fe2d0 ffef c0c05448 e73472d0 bf1ce878 e8c29b74 e8c29cf8 Nov 01 09:43:49 dd kernel: 9ae0: 8000 9007 00365c8b e8c29b20 f0802000 Nov 01 09:43:49 dd kernel: 9b00: d75feb40 c0162e10 60010013 e99f5800 c0c57dd8 ee22d000 Nov 01 09:43:49 dd kernel: 9b20: e9335248 ec8ce000 e8c29cf8 e000 e9ad8c80 e9335118 Nov 01 09:43:49 dd kernel: 9b40: c0c57dd8 e000 ee22d298 0001 e9ad8c80 e8c29b70 e8c28000 c0162e10 Nov 01 09:43:50 dd kernel: 9b60: 60010013 ee22d284 e8c29b74 e8c29b74 8bc96be0 Nov 01 09:43:50 dd kernel: 9b80: 365c 0100 00a0 c0235cc4 e73472d0 Nov 01 09:43:50 dd kernel: 9ba0: ecc24390 00040d00 ee991000 c0c05448 ee22d000 ec8ce240 e9335080 ec8ce240 Nov 01 09:43:50 dd
Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
On Sat, Jun 3, 2017 at 2:57 AM, Oleksij Rempel wrote: > Hm... this function and file: > linux/drivers/net/wireless/ath/ath9k/common-beacon.c > didn't changed since 2015. So, it should be some thing different. > Can you run > git bisect to find exact patch caused this regression? > That was the first time I experienced the x/0 issue and don't know how I'd reproduce it. > Yes, this is "normal" problem. The firmware has no error handler for PCI > bus related exceptions. So if we filed to read PCI bus first time, we > have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot > and provide an kernel "firmware panic!" message. > Every one who can or will to fix this, is welcome. > Thanks for that explanation. I'm not sure it's something I could tackle though. My best bet in the meantime is to coax systemd to restart the services when the device pops up. However, every attempt has failed so far. > It is possible. If adapter is used in AP mode, then lots of WiFi noise > is dumped over this interface. I assume the reproducibility depends on > external environment, not internal. > I find this quite believable. I have 2.4ghz happening with the TP-Link, ZTE Mobley, bluetooth, logitech unifying, usb 3.0. Though all 4 devices are going through the USB 2.0 port, and the tp-link and mobley have long usb cables in the attic and the hub connects through a 2m usb extension. So yeah, I've got a lot of variables in play.
ath9k - Division by zero in kernel (as well as firmware panic)
ODroid XU4 $ uname -a Linux computer 4.12.0-rc3-dirty #1 SMP Wed May 31 15:02:05 CDT 2017 armv7l GNU/Linux $ lsusb ... Bus 001 Device 002: ID 2109:2813 VIA Labs, Inc. Bus 001 Device 010: ID 0cf3:7015 Qualcomm Atheros Communications TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros AR7010+AR9287] ... * Jun 02 16:20:11 computer hostapd[14954]: vwlan0: interface state COUNTRY_UPDATE->HT_SCAN Jun 02 16:20:17 computer hostapd[14954]: 20/40 MHz operation not permitted on channel pri=7 sec=3 based on overlapping BSSes Jun 02 16:20:18 computer kernel: Division by zero in kernel. Jun 02 16:20:18 computer kernel: CPU: 1 PID: 14507 Comm: kworker/u16:2 Tainted: GW 4.12.0-rc3-dirty #1 Jun 02 16:20:18 computer kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) Jun 02 16:20:18 computer kernel: Workqueue: phy5 ieee80211_scan_work [mac80211] Jun 02 16:20:18 computer kernel: [] (unwind_backtrace) from [] (show_stack+0x10/0x14) Jun 02 16:20:18 computer kernel: [] (show_stack) from [] (dump_stack+0x88/0x9c) Jun 02 16:20:18 computer kernel: [] (dump_stack) from [] (Ldiv0_64+0x8/0x18) Jun 02 16:20:18 computer kernel: [] (Ldiv0_64) from [] (ath9k_get_next_tbtt+0x58/0x5c [ath9k_common]) Jun 02 16:20:18 computer kernel: [] (ath9k_get_next_tbtt [ath9k_common]) from [] (ath9k_cmn_beacon_config Jun 02 16:20:18 computer kernel: [] (ath9k_cmn_beacon_config_ap [ath9k_common]) from [] (ath9k_htc_beacon Jun 02 16:20:18 computer kernel: [] (ath9k_htc_beacon_config_ap [ath9k_htc]) from [] (ath9k_htc_vif_recon Jun 02 16:20:18 computer kernel: [] (ath9k_htc_vif_reconfig [ath9k_htc]) from [] (ath9k_htc_sw_scan_compl Jun 02 16:20:18 computer kernel: [] (ath9k_htc_sw_scan_complete [ath9k_htc]) from [] (__ieee80211_scan_co Jun 02 16:20:18 computer kernel: [] (__ieee80211_scan_completed [mac80211]) from [] (ieee80211_scan_work+ Jun 02 16:20:18 computer kernel: [] (ieee80211_scan_work [mac80211]) from [] (process_one_work+0x1d8/0x40 Jun 02 16:20:18 computer kernel: [] (process_one_work) from [] (worker_thread+0x4c/0x564) Jun 02 16:20:18 computer kernel: [] (worker_thread) from [] (kthread+0x14c/0x154) Jun 02 16:20:18 computer kernel: [] (kthread) from [] (ret_from_fork+0x14/0x3c) Jun 02 16:20:18 computer hostapd[14954]: Using interface wlan0 with hwaddr and ssid "" Jun 02 16:20:18 computer kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vwlan0: link becomes ready * This is a new one on me. The "normal" problem (search shows to be a very old issue) I consistently (daily or multiple times/day) encounter is: * Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic! exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038. Jun 02 14:55:30 computer kernel: usb 1-1.1: USB disconnect, device number 9 Jun 02 14:55:30 computer systemd-networkd[11959]: vwlan0: Lost carrier Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state Jun 02 14:55:30 computer kernel: wlan0: deauthenticating from by local choice (Reason: 3=DEAUTH_LEAVING) Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us Jun 02 14:55:30 computer systemd-networkd[11959]: wlan0: Lost carrier Jun 02 14:55:30 computer systemd[1]: Stopping A simple WPA encrypted wireless connection using a static IP... -- Subject: Unit netctl@wlan0.service has begun shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit netctl@wlan0.service has begun shutting down. Jun 02 14:55:30 computer kernel: device vwlan0 left promiscuous mode Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state Jun 02 14:55:30 computer audit: ANOM_PROMISCUOUS dev=vwlan0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295 Jun 02 14:55:30 computer hostapd[13218]: vwlan0: AP-STA-DISCONNECTED Jun 02 14:55:30 computer hostapd[13218]: Failed to set beacon parameters Jun 02 14:55:30 computer hostapd[13218]: vwlan0: INTERFACE-DISABLED Jun 02 14:55:30 computer kernel: usb 1-1.1: ath9k_htc: USB layer deinitialized Jun 02 14:55:30 computer systemd[1]: Starting Load/Save RF Kill Switch Status... -- Subject: Unit systemd-rfkill.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit systemd-rfkill.service has begun starting up. Jun 02 14:55:30 computer systemd[1]: Started Load/Save RF Kill Switch Status. -- Subject: Unit systemd-rfkill.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit systemd-rfkill.service has finished starting up. -- -- The start-up result is done. Jun 02 14:55:30 computer network[13261]: Stopping network profile 'wlan0'... Jun 02 14:55:30 computer kernel: usb 1-1.1: new high-speed USB devic
ath: firmware panic! exccause: 0x0000000d
I find that every time all of the cpu cores are being used, when compiling the kernel source for example, I end up losing my wireless adapter. It seems to be an old issue: https://bbs.archlinux.org/viewtopic.php?id=182173 ARM ODroid XU4 $ uname -a Linux server 4.11.0-rc1-00315-g106e4da60209-dirty #1 SMP Sun Mar 12 16:44:41 CDT 2017 armv7l GNU/Linux $ lsusb ... Bus 003 Device 009: ID 0cf3:7015 Qualcomm Atheros Communications TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros A R7010+AR9287] ... * Mar 27 02:48:49 server kernel: usb 3-1.2.4: ath: firmware panic! exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038 Mar 27 02:48:49 server kernel: usb 3-1.2.4: USB disconnect, device number 7 Mar 27 02:48:49 server kernel: ath: phy0: Chip reset failed Mar 27 02:48:49 server kernel: ath: phy0: Unable to reset channel (2442 Mhz) reset status -22 Mar 27 02:48:49 server kernel: ath: phy0: Unable to set channel Mar 27 02:48:49 server kernel: ath: phy0: RX failed to go idle in 10 ms RXSM=0x4ceb Mar 27 02:48:49 server kernel: ath: phy0: Failed to wakeup in 500us Mar 27 02:48:49 server kernel: ath: phy0: RX failed to go idle in 10 ms RXSM=0x4ceb Mar 27 02:48:49 server kernel: ath: phy0: Failed to wakeup in 500us Mar 27 02:48:50 server kernel: br0: port 2(custom_wlan0) entered disabled state Mar 27 02:48:50 server audit: ANOM_PROMISCUOUS dev=custom_wlan0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967 Mar 27 02:48:50 server kernel: device custom_wlan0 left promiscuous mode Mar 27 02:48:50 server kernel: br0: port 2(custom_wlan0) entered disabled state Mar 27 02:48:50 server hostapd[422]: custom_wlan0: AP-STA-DISCONNECTED Mar 27 02:48:50 server systemd-networkd[414]: custom_wlan0: Lost carrier Mar 27 02:48:50 server hostapd[422]: Failed to set beacon parameters Mar 27 02:48:50 server hostapd[422]: custom_wlan0: INTERFACE-DISABLED Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: USB layer deinitialized Mar 27 02:48:50 server systemd[1]: Starting Load/Save RF Kill Switch Status... -- Subject: Unit systemd-rfkill.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit systemd-rfkill.service has begun starting up. Mar 27 02:48:50 server systemd[1]: Started Load/Save RF Kill Switch Status. -- Subject: Unit systemd-rfkill.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit systemd-rfkill.service has finished starting up. -- -- The start-up result is done. Mar 27 02:48:50 server kernel: usb 3-1.2.4: new high-speed USB device number 9 using xhci-hcd Mar 27 02:48:50 server kernel: usb 3-1.2.4: New USB device found, idVendor=0cf3, idProduct=7015 Mar 27 02:48:50 server kernel: usb 3-1.2.4: New USB device strings: Mfr=16, Product=32, SerialNumber=48 Mar 27 02:48:50 server kernel: usb 3-1.2.4: Product: USB WLAN Mar 27 02:48:50 server kernel: usb 3-1.2.4: Manufacturer: ATHEROS Mar 27 02:48:50 server kernel: usb 3-1.2.4: SerialNumber: 12345 Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: Firmware ath9k_htc/htc_7010-1.4.0.fw requested Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: Transferred FW: ath9k_htc/htc_7010-1.4.0.fw, size: 72812 Mar 27 02:48:50 server kernel: ath9k_htc 3-1.2.4:1.0: ath9k_htc: HTC initialized with 45 credits Mar 27 02:48:50 server kernel: ath9k_htc 3-1.2.4:1.0: ath9k_htc: FW Version: 1.4 ... *
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
Sure, I went ahead and rebuilt it just using the bare exynos_defconfig and adding XTS and ECB and no other changes. No flags were used. No patches were used other than the 2 you provided. Just the barest of bears, the barest of bones, the barest of deserts, the barest of hairless cats. I also wiped out the 4.10.1 modules directory and zImage and dtb before copying them into place. * [ 16.280951] s5p-jpeg 11f6.jpeg: Samsung S5P JPEG codec [ 16.327434] CPU: 3 PID: 115 Comm: irq/69-1083 Not tainted 4.10.1-dirty #1 [ 16.334527] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 16.340533] task: edc52d00 task.stack: edcc [ 16.345040] PC is at post_crypt+0x194/0x1a0 [xts] [ 16.349712] LR is at post_crypt+0x188/0x1a0 [xts] [ 16.354390] pc : []lr : []psr: 200d0113 [ 16.354390] sp : edcc1ea8 ip : ed6f38f4 fp : 30702272 [ 16.365838] r10: 8ee5436d r9 : r8 : ed6f3800 [ 16.371023] r7 : r6 : 0400 r5 : r4 : [ 16.377523] r3 : ef5ead22 r2 : 0200 r1 : 0200 r0 : [ 16.384024] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 16.391128] Control: 10c5387d Table: 6d6f806a DAC: 0051 [ 16.396847] Process irq/69-1083 (pid: 115, stack limit = 0xedcc0210) [ 16.403519] Stack: (0xedcc1ea8 to 0xedcc2000) [ 16.407853] 1ea0: c0c08304 ef5ead20 ecd69200 ef5ead20 ecd69200 ed6f39dc [ 16.416011] 1ec0: 0400 0400 c010f774 c0113bac [ 16.424156] 1ee0: 0010 0010 000f [ 16.432302] 1f00: ed6f3800 edcae3bc 000c edcae3e8 600d0113 ee889d5c bf182764 [ 16.440447] 1f20: edcae390 c0566d84 0001 edcacec0 eea14b00 eea14b00 [ 16.448592] 1f40: edcacec0 c01651c4 eeb00528 c01651e0 edcc edcacee4 c01654b4 [ 16.456738] 1f60: c01652b8 eeb00500 edcc edcacf00 edcacec0 c0165388 [ 16.464884] 1f80: eeb00528 c013673c edcc edcacf00 c0136634 [ 16.473029] 1fa0: c0107778 [ 16.481174] 1fc0: [ 16.489320] 1fe0: 0013 [ 16.497473] [] (post_crypt [xts]) from [] (decrypt_done+0x4c/0x54 [xts]) [ 16.505877] [] (decrypt_done [xts]) from [] (s5p_aes_interrupt+0x1bc/0x208) [ 16.514544] [] (s5p_aes_interrupt) from [] (irq_thread_fn+0x1c/0x54) [ 16.522592] [] (irq_thread_fn) from [] (irq_thread+0x12c/0x1e0) [ 16.530220] [] (irq_thread) from [] (kthread+0x108/0x138) [ 16.537324] [] (kthread) from [] (ret_from_fork+0x14/0x3c) [ 16.544514] Code: eb471ad2 e598c118 e58d0020 e1a04000 (e5906004) [ 16.550709] ---[ end trace 0e5ce4ea2ad2d7e2 ]--- [ 16.555224] genirq: exiting task "irq/69-1083" (115) is an active IRQ thread (irq 69) * I'm sure you could just copy my crypttab and fstab entries that is shown in my first email. On Fri, Mar 10, 2017 at 12:06 PM, Krzysztof Kozlowski wrote: > On Thu, Mar 09, 2017 at 05:16:35AM -0600, Nathan Royce wrote: >> Gave it a try on 4.10.1, but still to no avail: > > (...) > >> Also for the sake of testing, I did not add any FLAGS for compilation this >> time. > > Damn, I am fixing bugs around but not the one you are hitting. Can you > also check if exynos_defconfig (+XTS + any other needed setting sfor > you) also has this issue? > > I want to reproduce it but my setup does not use cryptswap. Probably I > will have to set it up. > > Best regards, > Krzysztof > config_s5psss.tar.gz Description: GNU Zip compressed data
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
Gave it a try on 4.10.1, but still to no avail: * [8.516138] raid6: using intx1 recovery algorithm [ [0;32m OK [0m] Started Flush Journal to Persistent Storage. [9.692091] Unable to handle kernel NULL pointer dereference at virtual address 0004 [9.698896] pgd = c0004000 [9.701489] [0004] *pgd= [9.705055] Internal error: Oops: 17 [#1] SMP ARM [9.709677] Modules linked in: xor_neon zlib_deflate aes_arm raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables [9.719177] xor: measuring software checksum speed [9.727455] CPU: 2 PID: 121 Comm: irq/69-1083 Not tainted 4.10.1-dirty #1 [9.728911]arm4regs : 304.000 MB/sec [9.738707] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [9.738913]8regs : 224.000 MB/sec [9.748924]32regs: 208.000 MB/sec [9.753095] task: edc80b00 task.stack: edd08000 [9.757626] PC is at post_crypt+0x1b4/0x1c4 [9.758914]neon : 316.000 MB/sec [9.758927] xor: using function: neon (316.000 MB/sec) [9.771040] LR is at post_crypt+0x1a8/0x1c4 [9.775197] pc : []lr : []psr: 200c0013 [9.775197] sp : edd09e90 ip : edcd64f4 fp : 02cfca75 [9.786670] r10: 3df4074e r9 : c0c0540c r8 : edcd6400 [9.791831] r7 : r6 : 0400 r5 : r4 : [9.798333] r3 : ef4a775a r2 : 0200 r1 : 0200 r0 : [9.804834] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [9.811901] Control: 10c5387d Table: 6c61c06a DAC: 0051 [9.817618] Process irq/69-1083 (pid: 121, stack limit = 0xedd08218) [9.824291] Stack: (0xedd09e90 to 0xedd0a000) [9.828624] 9e80: ef4a7758 ecca6200 ef4a7758 ecca6200 [9.836781] 9ea0: edcd65dc 0400 0400 eea8f810 0002 [9.844926] 9ec0: 0010 0010 [9.853072] 9ee0: 000f 00040a01 ee958390 edcd6400 ee9583bc 000c ee9583e8 [9.861217] 9f00: 600c0013 ee889d20 c033608c ee958390 c05a7ea8 0001 [9.869363] 9f20: ee957b40 eea8a400 eea8a400 ee957b40 c016ee68 c0c0540c c016ee84 [9.877508] 9f40: edd08000 ee957b64 eea8a400 c016f198 ee957b80 c016ef7c 00040a01 [9.885653] 9f60: eea21380 edd08000 ee957b80 ee957b40 c016f04c eea213a8 [9.893800] 9f80: ee889d20 c0138710 edd08000 ee957b80 c0138608 [9.901944] 9fa0: c0107a38 [9.910089] 9fc0: [9.918235] 9fe0: 0013 [9.926399] [] (post_crypt) from [] (decrypt_done+0x4c/0x54) [9.933761] [] (decrypt_done) from [] (s5p_aes_interrupt+0x1bc/0x208) [9.941908] [] (s5p_aes_interrupt) from [] (irq_thread_fn+0x1c/0x54) [9.949956] [] (irq_thread_fn) from [] (irq_thread+0x14c/0x204) [9.957585] [] (irq_thread) from [] (kthread+0x108/0x138) [9.964681] [] (kthread) from [] (ret_from_fork+0x14/0x3c) [9.971871] Code: eb0114aa e598c118 e58d001c e1a04000 (e5906004) [9.977963] ---[ end trace 8c160bf6676cfe1c ]--- [9.982560] genirq: exiting task "irq/69-1083" (121) is an active IRQ thread (irq 69) [ 11.715339] Btrfs loaded, crc32c=crc32c-generic * Also for the sake of testing, I did not add any FLAGS for compilation this time. On Wed, Mar 8, 2017 at 3:15 PM, Krzysztof Kozlowski wrote: > On Wed, Mar 08, 2017 at 07:45:42PM +0200, Krzysztof Kozlowski wrote: > I sent a fix. At least for spin lock recursion in tcrypt. > > Could you give it a try? > > Best regards, > Krzysztof
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
OK, I just tried 4.10.0 and the output is looking the same. I can't say my setup is all that odd. The cryptographic use is only with the swap partition found in my original email (seen in Herbert's reply). My normal build goes as such: 1) git clean -xdf 2) git reset --hard 3) curl https://github.com/tobetter/linux/commit/9cdf86bac1db2d74bf98508226e86679581f8f80.patch | git apply - //usb: host: xhci-plat: Get PHYs for xhci's hcds 4) curl https://github.com/tobetter/linux/commit/142cf1b68fa0e1710f3623875d5c269cbbc2f005.patch | git apply - //base: platform: name the device already during allocation 5) curl https://github.com/tobetter/linux/commit/3772f11d73289ea40825f40ba5c64b5b0e3888ff.patch | git apply - //phy: exynos5-usbdrd: Calibrate LOS levels for exynos5420/5800 6) sed -i -e "s/static void exynos5420_usbdrd_phy_calibrate/static int exynos5420_usbdrd_phy_calibrate/" ./drivers/phy/phy-exynos5-usbdrd.c 7) //duplicate entry in drivers/media/usb/au0828/au0828-cards.c for my 0x400 vid tuner. 8) HOST_EXTRACFLAGS="-O3 -pipe -mfpu=neon-vfpv4 -mfloat-abi=hard -march=armv7-a -mtune=cortex-a15.cortex-a7" make -j 8 zImage exynos5422-odroidxu4.dtb modules 2>&1 | tee make.log 9) INSTALL_MOD_PATH=./tmp INSTALL_FW_PATH=./tmp make modules_install firmware_install 2>&1 | tee makeModFirm.log 10) sudo cp -rv ./tmp/lib/* /usr/lib 11) sudo cp -v ./arch/arm/boot/zImage /boot/zImage-4.10.0 12) sudo cp -v ./arch/arm/boot/dts/exynos5422-odroidxu4.dtb /boot/exynos5422-odroidxu4-4.10.0.dtb 13) sudo ln -s /boot/zImage-4.10.0 /boot/zImage 14) sudo ln -s /boot/exynos5422-odroidxu4-4.10.0.dtb /boot/exynos5422-odroidxu4.dtb 15) sudo sync 16) sudo systemctl reboot I've attached the config I use. On Mon, Mar 6, 2017 at 11:35 AM, Krzysztof Kozlowski wrote: > On Mon, Mar 06, 2017 at 10:18:45AM -0600, Nathan Royce wrote: >> I tried the patch you submitted, however it also fails for the most part. >> >> "For the most part" because "xts" is now found. >> $ grep xts /proc/crypto >> name : xts(aes) >> driver : xts(ecb-aes-s5p) > > Ah, so probably I did not fix the original issue but some other... or > maybe there are multiple issues. > > Could you attach your config and any other essential reproduction steps > (unusual settings?). > > I saw you tried v4.10.1, could you try just v4.10? > > Best regards, > Krzysztof > >> >> Fail: >> * >> [ 21.057756] xor: using function: neon (352.000 MB/sec) >> [ 21.064243] Unable to handle kernel NULL pointer dereference at >> virtual address 0004 >> [ 21.070966] pgd = c0004000 >> [ 21.073599] [0004] *pgd= >> [ 21.077165] Internal error: Oops: 17 [#1] SMP ARM >> [ 21.081836] Modules linked in: xor aes_arm xor_neon zlib_deflate >> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc >> ip_tables x_tables >> [ 21.095239] CPU: 5 PID: 121 Comm: irq/69-1083 Not tainted >> 4.10.1-dirty #1 >> [ 21.102288] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >> [ 21.108355] task: ee3e3700 task.stack: edcf6000 >> [ 21.112821] PC is at post_crypt+0x1b4/0x1c4 >> [ 21.116972] LR is at post_crypt+0x1a8/0x1c4 >> [ 21.121131] pc : []lr : []psr: 200c0093 >> [ 21.121131] sp : edcf7e68 ip : ec59dcf4 fp : 117ce9ac >> [ 21.132576] r10: 244525e3 r9 : c0c0540c r8 : ec59dc00 >> [ 21.137768] r7 : r6 : 0400 r5 : r4 : >> [ 21.144267] r3 : ef49fcde r2 : 0200 r1 : 0200 r0 : >> [ 21.150768] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM >> Segment none >> [ 21.157964] Control: 10c5387d Table: 6618c06a DAC: 0051 >> [ 21.163677] Process irq/69-1083 (pid: 121, stack limit = 0xedcf6218) >> [ 21.170350] Stack: (0xedcf7e68 to 0xedcf8000) >> [ 21.174684] 7e60: ef49fcdc ec93f200 ef49fcdc >> ec93f200 ec59dddc 0400 >> [ 21.182853] 7e80: 0400 ef49fcdc >> c01100fc >> [ 21.190983] 7ea0: c0110f80 0010 >> 0010 000f 00040a01 >> [ 21.199128] 7ec0: ec59dc00 c0c0540c >> 600c0013 0002 >> [ 21.207274] 7ee0: ee889d20 c033608c eea21c90 c05a80d0 eea21ce8 >> eea21c90 000c 00040a01 >> [ 21.215418] 7f00: eea21ce8 eea21c90 000c eea21ce8 >> c05a8290 0001 >> [ 21.223564] 7f20: eea2a600 eea8a400 eea8a400 eea2a600 c016ee68 >> c0c0540c c016ee84 >> [ 21.231710] 7f40: edcf6000 eea2a624 eea8a400 c016f198 eea2a640 >> c016ef7c 00040a01 >> [ 21.239868] 7f60: 000
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
22.027743] [] (__handle_domain_irq) from [] (gic_handle_irq+0x38/0x74) [ 22.036061] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90) [ 22.043510] Exception stack(0xc0c01f38 to 0xc0c01f80) [ 22.048529] 1f20: 0001 [ 22.056685] 1f40: c0114e60 c0c0 c0c05490 c0c0542c c0b4ff88 c0c01f90 [ 22.064832] 1f60: efffc7c0 600e0013 c0c01f88 c0108480 c0108484 600e0013 [ 22.072978] [] (__irq_svc) from [] (arch_cpu_idle+0x38/0x3c) [ 22.080347] [] (arch_cpu_idle) from [] (do_idle+0x164/0x1f8) [ 22.087708] [] (do_idle) from [] (cpu_startup_entry+0x18/0x1c) [ 22.095258] [] (cpu_startup_entry) from [] (start_kernel+0x374/0x394) [ 22.103389] handlers: [ 22.105635] [] irq_default_primary_handler threaded [] s5p_aes_interrupt [ 22.114046] Disabling IRQ #69 [ 23.496638] Btrfs loaded, crc32c=crc32c-generic * Do I need to add "irqpoll" to my u-boot boot config now? Yeah, the mailing list bounced my original email because I wasn't using plain-text, but my full post shows in Herbert's reply. On Sun, Mar 5, 2017 at 11:16 AM, Krzysztof Kozlowski wrote: > On Fri, Mar 03, 2017 at 12:02:10PM +0800, Herbert Xu wrote: >> On Thu, Mar 02, 2017 at 05:35:30PM -0600, Nathan Royce wrote: >> > ARM ODroid XU4 >> > >> > $ cat /proc/config.gz | gunzip | grep XTS >> > CONFIG_CRYPTO_XTS=y >> > >> > $ grep xts /proc/crypto >> > //4.9.13 >> > name : xts(aes) >> > driver : xts(aes-generic) >> > //4.10.1 >> > >> > //cbc can be found though >> > >> > CRYPTTAB: >> > cryptswap1 UUID= /dev/urandom >> > swap,offset=2048,cipher=aes-xts-plain64:sha512,size=512,nofail >> > >> > FSTAB: >> > /dev/mapper/cryptswap1 none swap sw 0 0 >> > >> > Boot Log: >> > [ 10.535985] [ cut here ] >> > [ 10.539252] WARNING: CPU: 0 PID: 0 at crypto/skcipher.c:430 >> > skcipher_walk_first+0x13c/0x14c >> > [ 10.547542] Modules linked in: xor xor_neon aes_arm zlib_deflate >> > dm_crypt raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc >> > ip_tables x_tables >> > [ 10.561716] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.1-dirty #1 >> > [ 10.568049] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >> > [ 10.574171] [] (unwind_backtrace) from [] >> > (show_stack+0x10/0x14) >> > [ 10.581893] [] (show_stack) from [] >> > (dump_stack+0x84/0x98) >> > [ 10.589073] [] (dump_stack) from [] >> > (__warn+0xe8/0x100) >> > [ 10.595975] [] (__warn) from [] >> > (warn_slowpath_null+0x20/0x28) >> > [ 10.603546] [] (warn_slowpath_null) from [] >> > (skcipher_walk_first+0x13c/0x14c) >> > [ 10.612390] [] (skcipher_walk_first) from [] >> > (skcipher_walk_virt+0x1c/0x38) >> > [ 10.621056] [] (skcipher_walk_virt) from [] >> > (post_crypt+0x38/0x1c4) >> > [ 10.629022] [] (post_crypt) from [] >> > (decrypt_done+0x4c/0x54) >> > [ 10.636389] [] (decrypt_done) from [] >> > (s5p_aes_complete+0x70/0xfc) >> > [ 10.644274] [] (s5p_aes_complete) from [] >> > (s5p_aes_interrupt+0x134/0x1a0) >> > [ 10.652771] [] (s5p_aes_interrupt) from [] >> > (__handle_irq_event_percpu+0x9c/0x124) >> >> This looks like a bug in the s5p driver. It's calling the completion >> function straight from the IRQ handler, which is triggering the >> sanity check in skcipher_walk_first. >> >> The s5p driver needs to schedule a tasklet to call the completion >> function. > > Tasklet... or threaded IRQ handler maybe? I sent a fix. > > BTW, I subscribe the crypto list but I could not find the original email > there. > > Best regards, > Krzysztof >
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
Yup, when I disabled the s5p driver, xts DID show in the /proc/crypto list. Heh, I was about to ask if it was something I should push towards another maintainer for s5p stuff, but found you listed in that as well. If I am incorrect in that assumption, do let me know whom else I should make aware of this issue. Also let me know if you would like the rest of the kernel panic. Maybe you already have enough to go on and don't need it. Thanks for all that clarity. On Fri, Mar 3, 2017 at 6:04 AM, Herbert Xu wrote: > On Fri, Mar 03, 2017 at 04:36:18AM -0600, Nathan Royce wrote: >> I do have ECB selected as well: >> DM_CRYPT=y >> CRYPTO_ECB=y >> CRYPTO_XTS=y >> >> name : ecb(aes) >> driver : ecb-aes-s5p >> module : kernel >> priority : 100 >> refcnt : 1 >> selftest : passed >> internal : no >> type : ablkcipher >> async: yes >> blocksize: 16 >> min keysize : 16 >> max keysize : 32 >> ivsize : 0 >> geniv: >> //still no "xts" can be found in the list > > Weird. So you can't find any instances of xts in /proc/crypto > at all? Even if the self-test fails it should still register an > entry there... > > In any case, I think disabling the s5p driver should work at > least. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
I do have ECB selected as well: DM_CRYPT=y CRYPTO_ECB=y CRYPTO_XTS=y name : ecb(aes) driver : ecb-aes-s5p module : kernel priority : 100 refcnt : 1 selftest : passed internal : no type : ablkcipher async: yes blocksize: 16 min keysize : 16 max keysize : 32 ivsize : 0 geniv: //still no "xts" can be found in the list I saw this about the regression that sounds similar to my issue, except even when I built-in dm_crypt (no initramfs. just diving straight into system), it still fails: http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg23748.html On Fri, Mar 3, 2017 at 3:33 AM, Herbert Xu wrote: > On Fri, Mar 03, 2017 at 03:00:26AM -0600, Nathan Royce wrote: >> OK, I went ahead and enabled self tests >> "CRYPTO_MANAGER_DISABLE_TESTS=n", and my system was able to boot, >> albeit with failures: >> * >> Mar 02 23:14:38 server kernel: ---[ end trace 1c8a91f28cbcebf3 ]--- >> Mar 02 23:14:38 server kernel: alg: skcipher: encryption failed on >> test 1 for xts(ecb-aes-s5p): ret=35 >> Mar 02 23:14:38 server kernel: device-mapper: table: 254:0: crypt: >> Error allocating crypto tfm >> Mar 02 23:14:38 server kernel: device-mapper: ioctl: error adding >> target to table >> Mar 02 23:14:39 server systemd-cryptsetup[234]: Failed to activate >> with key file '/dev/urandom': Invalid argument >> * >> (weird that it asked for the passphrase) >> >> But I do question whether the root issue is related to s5p... Maybe >> there is a correlation in the warning, but to me it looks like the >> issue is something else. > > I see. Do you have ECB enabled in your config? The new XTS requires > ECB to be present so that could be your problem. > > There is already a patch on its way to stable to add the Kconfig > select on ECB. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.
OK, I went ahead and enabled self tests "CRYPTO_MANAGER_DISABLE_TESTS=n", and my system was able to boot, albeit with failures: * Mar 02 23:14:38 server kernel: ---[ end trace 1c8a91f28cbcebf3 ]--- Mar 02 23:14:38 server kernel: alg: skcipher: encryption failed on test 1 for xts(ecb-aes-s5p): ret=35 Mar 02 23:14:38 server kernel: device-mapper: table: 254:0: crypt: Error allocating crypto tfm Mar 02 23:14:38 server kernel: device-mapper: ioctl: error adding target to table Mar 02 23:14:39 server systemd-cryptsetup[234]: Failed to activate with key file '/dev/urandom': Invalid argument * (weird that it asked for the passphrase) But I do question whether the root issue is related to s5p... Maybe there is a correlation in the warning, but to me it looks like the issue is something else. In my OP, I noted that the xts crypto isn't even found in /proc/crypto in 4.10. I'd think it would at least be listed, even if it isn't used. CBC is listed in /proc/crypto with kernel 4.9.13 and 4.10.1 (cbc-aes-s5p) XTS is listed in /proc/crypto with kernel 4.9.13 but NOT 4.10.1 I should also add that I didn't include other tainted messages since they followed the messages I first posted. I was assuming that when the first issue would work, the others would follow suit. I just didn't want to inundate with possible junk. I still have the log file if you think it would be helpful to post the rest. PS: I also noticed the bounce from my first mail submission because I didn't enable plain-text for the e-mail (marked as spam because the email contained html). I rectified that for this reply. On Thu, Mar 2, 2017 at 10:02 PM, Herbert Xu wrote: > On Thu, Mar 02, 2017 at 05:35:30PM -0600, Nathan Royce wrote: >> ARM ODroid XU4 >> >> $ cat /proc/config.gz | gunzip | grep XTS >> CONFIG_CRYPTO_XTS=y >> >> $ grep xts /proc/crypto >> //4.9.13 >> name : xts(aes) >> driver : xts(aes-generic) >> //4.10.1 >> >> //cbc can be found though >> >> CRYPTTAB: >> cryptswap1 UUID= /dev/urandom >> swap,offset=2048,cipher=aes-xts-plain64:sha512,size=512,nofail >> >> FSTAB: >> /dev/mapper/cryptswap1 none swap sw 0 0 >> >> Boot Log: >> [ 10.535985] [ cut here ] >> [ 10.539252] WARNING: CPU: 0 PID: 0 at crypto/skcipher.c:430 >> skcipher_walk_first+0x13c/0x14c >> [ 10.547542] Modules linked in: xor xor_neon aes_arm zlib_deflate >> dm_crypt raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc >> ip_tables x_tables >> [ 10.561716] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.1-dirty #1 >> [ 10.568049] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >> [ 10.574171] [] (unwind_backtrace) from [] >> (show_stack+0x10/0x14) >> [ 10.581893] [] (show_stack) from [] >> (dump_stack+0x84/0x98) >> [ 10.589073] [] (dump_stack) from [] >> (__warn+0xe8/0x100) >> [ 10.595975] [] (__warn) from [] >> (warn_slowpath_null+0x20/0x28) >> [ 10.603546] [] (warn_slowpath_null) from [] >> (skcipher_walk_first+0x13c/0x14c) >> [ 10.612390] [] (skcipher_walk_first) from [] >> (skcipher_walk_virt+0x1c/0x38) >> [ 10.621056] [] (skcipher_walk_virt) from [] >> (post_crypt+0x38/0x1c4) >> [ 10.629022] [] (post_crypt) from [] >> (decrypt_done+0x4c/0x54) >> [ 10.636389] [] (decrypt_done) from [] >> (s5p_aes_complete+0x70/0xfc) >> [ 10.644274] [] (s5p_aes_complete) from [] >> (s5p_aes_interrupt+0x134/0x1a0) >> [ 10.652771] [] (s5p_aes_interrupt) from [] >> (__handle_irq_event_percpu+0x9c/0x124) > > This looks like a bug in the s5p driver. It's calling the completion > function straight from the IRQ handler, which is triggering the > sanity check in skcipher_walk_first. > > The s5p driver needs to schedule a tasklet to call the completion > function. > > Do you have crypto self-test enabled? If so it should've caught > this at run-time. Otherwise you can disable the s5p driver until > it's fixed. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt