Re: BTRFS Balance Hard System Crash (Blinking LEDs)

2021-03-27 Thread Nathan Royce
An update... I encountered blinking LEDs while I was away from my
computer again.
I'm now pretty confident it wasn't an issue with btrfs balance, but
rather the sd-card not being seated well.

I just updated my old "F2FS Segmentation fault" post in linux-f2fs-devel.
In short, fsck for f2fs was failing, badblocks was coming up with only
errors, I cleaned the sd-card contacts, put it back in and badblocks
is running cleanly now.
It's just too late for me, and I have to rebuild that partition since,
for whatever reason, cryptsetup no longer recognizes the partition as
being LUKS even though badblocks was run non-destructive.

On Fri, Mar 26, 2021 at 11:51 AM Nathan Royce  wrote:
>
> Oh man, I'm hoping things aren't starting to fall apart here.
> I was doing my normal routine (tv, browsing, ... (no filesystem
> manipulations)) and out of the blue "kodi" just crashes. It's actually
> not all that uncommon, and I fired up "iotop" to make sure "coredump"
> was happening, and it was.
> I then did something else in the terminal, maybe an "ls", and that came up 
> with:
> *
> error while loading shared libraries: /usr/lib/libutil.so.1: ELF file
> version does not match current one
> *
...


Re: F2FS Segmentation Fault

2021-03-27 Thread Nathan Royce
I don't know how much of it was the issue, but when I unmounted the
sd-card, and closed the cryptsetup for it, and then ran
non-destructive badblocks on it, I was getting ONLY errors.
I stopped bb, then pulled out the card, blew on it, wiped down the
contacts with rubbing alcohol, let it dry, put it back in and now bb
is running cleanly.
I then stopped bb, tried to cryptsetup-open it and it said the
partition is not a valid LUKS device.
Weird since I was using non-destructive.
Looks like I'm now forced to rebuild that partition.

I wish I had troubleshot the aspect of the sd-card being properly
seated. I know I've experienced something similar to it in the past
where files suddenly aren't able to be read. Once I reseat the
sd-card, everything was fine.
The last time I had to even remove the card was maybe 1-2 weeks ago
when I had to deal with a noisy power-supply fan.

The whole debacle (including btrfs, keyboard leds blinking) may very
well have been from the sd-card not being seated well.

On Sat, Mar 27, 2021 at 7:02 AM Nathan Royce  wrote:
>
> An update, not quite 1 year later. I encountered another segfault issue.
>
> It began with my email report to the linux-btrfs mailing list titled
> "BTRFS Balance Hard System Crash (Blinking LEDs)" just the other day.
...


Re: F2FS Segmentation Fault

2021-03-27 Thread Nathan Royce
5d9] i_addr[42] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x1bb20aba]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5d9] i_addr[43] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x164914cd]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5d9] i_addr[44] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x18432b76]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5d9] i_addr[45] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0xcfefd9c5]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e484] i_addr[366] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0xd672fbb7]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e484] i_addr[367] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0xa113bab3]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e484] i_addr[368] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x1af84de0]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e484] i_addr[369] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x147f77a5]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5b0] i_addr[30] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0xb8fb4384]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5b0] i_addr[31] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x7dc7364]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5b0] i_addr[32] = 0
[ASSERT] (fsck_chk_data_blk:1555)  --> blkaddress is not valid. [0x20350042]
[FIX] (fsck_chk_inode_blk: 788)  --> [0x9e5b0] i_addr[33] = 0
Segmentation fault
$ sudo fsck.f2fs -af /dev/mapper/lukssdi2
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 124168159 (60628 MB)
Can't find a valid F2FS superblock at 0x0
Mismatch segment0(3096048428) cp_blkaddr(24874649)
Can't find a valid F2FS superblock at 0x1
*
journal:
*
Mar 27 06:22:07 neon kernel: Code: 41 f6 c1 04 75 53 4d 85 c9 74 0b 0f
b6 0f 88 0a 41 f6 c1 02 75 53 42 c6 04 0a 00 c3 0f 1f 44 00 00 48 8b
0f 4c 89 c6 48 89 0a <4a> 8b 4c 0f f8 48 8d 7a 08 48 83 e7 f8 4a 89 4c
0a f8 48 89 d1 48
Mar 27 06:22:07 neon kernel: fsck.f2fs[6302]: segfault at 5564d32978f5
ip 55651c124919 sp 7ffd202ff9f8 error 4 in
fsck.f2fs[55651c121000+1c000]
*
This kde-neon kernel version is 5.4.0, and the associated tools
version is f2fs-tools-1.11.0.

There hasn't been any power-outage that I'm aware of.

With that segfault, I'm thinking that fs is now toast, and I need to
rebuild that arch-linux partition. At least /home was on btrfs which
is still accessible.

While I'm at it (scrubbing btrfs), I think I'll memtest my RAM and
badblocks my sd-card.

On Tue, Jul 14, 2020 at 12:54 AM Jaegeuk Kim  wrote:
>
> On 07/13, Nathan Royce wrote:
> > On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim  wrote:
> > >
> > > Hi Nathan,
> > >
> > > Could you try to say "N" here to move forward to fix the corrupted 
> > > metadata?
> > >
> > > Thanks,
> > *
> > Do you want to restore lost files into ./lost_found/? [Y/N] N
...


Re: BTRFS Balance Hard System Crash (Blinking LEDs)

2021-03-26 Thread Nathan Royce
Oh man, I'm hoping things aren't starting to fall apart here.
I was doing my normal routine (tv, browsing, ... (no filesystem
manipulations)) and out of the blue "kodi" just crashes. It's actually
not all that uncommon, and I fired up "iotop" to make sure "coredump"
was happening, and it was.
I then did something else in the terminal, maybe an "ls", and that came up with:
*
error while loading shared libraries: /usr/lib/libutil.so.1: ELF file
version does not match current one
*
Again, it was just out of the blue. Same with other commands like
"coredumpctl" or "sync". Even "pacman -Qo /usr/lib/libutil.so.1"
caused SEGV.
Everything seemed fine after I had last booted (minus what I wrote in
my last email).
And the oddest thing is that, like I said before, my system/root stuff
(eg, /usr/lib/libutil.so.1) is being run from my sd-card (F2FS, not
BTRFS).

I see the coredumps were written out ~11:05, and journalctl started
showing issues arise ~10:56 (typically takes a long time to write out
on a slow sd-card):
*
...
Mar 26 11:05:13 computerName systemd-coredump[70088]: Process 70078
(pacman) of user 1000 dumped core.

Stack trace of thread 70078:
#0  0x75cf62ee58a5
do_lookup_x (ld-linux-x86-64.so.2 + 0xa8a5)
#1  0x75cf62ee6231
_dl_lookup_symbol_x (ld-linux-x86-64.so.2 + 0xb231)
#2  0x75cf62ee7dc7
_dl_relocate_object (ld-linux-x86-64.so.2 + 0xcdc7)
#3  0x75cf62edfcdd
dl_main (ld-linux-x86-64.so.2 + 0x4cdd)
#4  0x75cf62ef769f
_dl_sysdep_start (ld-linux-x86-64.so.2 + 0x1c69f)
#5  0x75cf62edd063
_dl_start (ld-linux-x86-64.so.2 + 0x2063)
#6  0x75cf62edc098
_start (ld-linux-x86-64.so.2 + 0x1098)
...
Mar 26 11:05:10 computerName kernel: Code: b4 24 d0 00 00 00 49 89 df
48 89 44 24 38 48 89 fb 4c 89 5c 24 60 eb 12 0f 1f 44 00 00 49 83 c4
04 83 e2 01 0f 85 f3 05 00 00 <41> 8b 04 24 48 89 c2 48 31 d8 48 d1 e8
75 e4 48 83 ec 08 4c 89 e0
Mar 26 11:05:09 computerName kernel: pacman[70078]: segfault at
75d29d0d5640 ip 75cf62ee58a5 sp 7ad0e460 error 4 in
ld-2.33.so[75cf62edc000+24000]
...
Mar 26 10:58:59 computerName kernel: Code: 84 e7 05 00 00 44 8b 33 45
85 f5 74 e4 66 0f ef ff 66 0f ef f6 66 0f ef e4 48 89 ef f3 0f 10 44
24 48 66 0f ef db 66 0f ef d2 <66> 0f 42 a0 61 72 70 cb ee 33 bb 14 5f
79 1d 76 e5 28 0f 11 44 24
Mar 26 10:58:59 computerName kernel: chrome[41222]: segfault at
cb707262 ip 77cb36b101ae sp 7fff250007c0 error 5 in
i965_dri.so[77cb36aa9000+8fa000]
...
Mar 26 10:58:25 computerName plasmashell[43148]: KCrash: Application
Name = kate path = /usr/bin pid = 43148
Mar 26 10:58:25 computerName plasmashell[43148]: KCrash: crashing...
crashRecursionCounter = 2
Mar 26 10:58:07 computerName systemd[1424]: Started Kate - Advanced Text Editor.
...
Mar 26 10:56:51 computerName sudo[69237]:userName : TTY=pts/3 ;
PWD=/ ; USER=root ; COMMAND=/usr/bin/iotop
...
Mar 26 10:56:32 computerName kernel: audit: type=1701
audit(1616774192.320:455): auid=1000 uid=1000 gid=1000 ses=3 pid=54221
comm="VideoPlayer" exe="/usr/local/lib/kodi/kodi.bin" sig=11 res=1
Mar 26 10:56:32 computerName kernel: Code: 00 00 01 00 00 00 00 00 00
00 02 04 00 00 48 7b 00 00 10 49 4a d0 48 7b 00 00 00 00 00 00 01 00
00 00 55 00 00 00 00 00 00 00  d8 8f 34 4f 7b 00 00 00 39 10 d0 48
7b 00 00 00 00 fa 00 00 fa
Mar 26 10:56:32 computerName kernel: VideoPlayer[61823]: segfault at
7b48d0579730 ip 7b48d0579730 sp 7b48ff043248 error 15
...
*
As you can see, pretty much everything was crashing (probably not
surprising if glibc is involved).
Now, like I said, I don't believe it's related to my BTRFS drive since
glibc was referenced which is located on my F2FS drive.
I ended up rebooting (again) and everything seems fine (so far) as I
write this and have the recorded DVR playing (kodi).
I don't know what those "kernel: Code:" is supposed to be/mean to me.

On Fri, Mar 26, 2021 at 8:29 AM Nathan Royce  wrote:
>
> *
> ...I "think" this is where the "emergency" drop out of boot occurred,
> and I just did a "systemctl reboot" which had the next boot succeed.
> Nope, I'm wrong. For whatever reason, this appears to be the boot that
> ended up working (searching for the first "microcode" reference
> indicating the start of a boot).
> Mar 25 21:44:17 computerName kernel: BTRFS critical (device dm-3):
> unable to add free space :-17
...


BTRFS Balance Hard System Crash (Blinking LEDs)

2021-03-26 Thread Nathan Royce
*
...I "think" this is where the "emergency" drop out of boot occurred,
and I just did a "systemctl reboot" which had the next boot succeed.
Nope, I'm wrong. For whatever reason, this appears to be the boot that
ended up working (searching for the first "microcode" reference
indicating the start of a boot).
Mar 25 21:44:17 computerName kernel: BTRFS critical (device dm-3):
unable to add free space :-17
...v 13 times
Mar 25 21:42:59 computerName kernel: BTRFS critical (device dm-3):
unable to add free space :-17
Mar 25 21:42:59 computerName kernel: BTRFS critical (device dm-3):
unable to add free space :-17
...v 36 times
Mar 25 21:40:45 computerName kernel: BTRFS critical (device dm-3):
unable to add free space :-17
Mar 25 21:40:44 computerName kernel: BTRFS critical (device dm-3):
unable to add free space :-17
Mar 25 21:40:44 computerName kernel: ---[ end trace 880e498e00cd6fcd ]---
Mar 25 21:40:44 computerName kernel:  ret_from_fork+0x22/0x30
Mar 25 21:40:44 computerName kernel:  ? __kthread_bind_mask+0x70/0x70
Mar 25 21:40:44 computerName kernel:  kthread+0x144/0x170
Mar 25 21:40:44 computerName kernel:  balance_kthread+0x35/0x50 [btrfs]
Mar 25 21:40:44 computerName kernel:  ? btrfs_balance+0xee0/0xee0 [btrfs]
Mar 25 21:40:44 computerName kernel:  btrfs_balance+0x765/0xee0 [btrfs]
Mar 25 21:40:44 computerName kernel:  btrfs_relocate_chunk+0x2a/0xc0 [btrfs]
Mar 25 21:40:44 computerName kernel:
btrfs_relocate_block_group+0x164/0x310 [btrfs]
Mar 25 21:40:44 computerName kernel:  relocate_block_group+0x2e9/0x5f0 [btrfs]
Mar 25 21:40:44 computerName kernel:  prepare_to_merge+0x246/0x280 [btrfs]
Mar 25 21:40:44 computerName kernel:
btrfs_commit_transaction+0x79b/0xa70 [btrfs]
Mar 25 21:40:44 computerName kernel:
btrfs_finish_extent_commit+0xb6/0x2c0 [btrfs]
Mar 25 21:40:44 computerName kernel:  ? clear_extent_bit+0x43/0x60 [btrfs]
Mar 25 21:40:44 computerName kernel:  unpin_extent_range+0x299/0x4d0 [btrfs]
Mar 25 21:40:44 computerName kernel:  ? kmem_cache_free+0xad/0x1e0
Mar 25 21:40:44 computerName kernel:  __btrfs_add_free_space+0xaf/0x4d0 [btrfs]
Mar 25 21:40:44 computerName kernel:  link_free_space+0x27/0x60 [btrfs]
Mar 25 21:40:44 computerName kernel: Call Trace:
Mar 25 21:40:44 computerName kernel: CR2: 3e4fc2c21000 CR3:
00012f60a003 CR4: 001606f0
Mar 25 21:40:44 computerName kernel: CS:  0010 DS:  ES:  CR0:
80050033
Mar 25 21:40:44 computerName kernel: FS:  ()
GS:95cad820() knlGS:
Mar 25 21:40:44 computerName kernel: R13: 95ca79eeac08 R14:
95ca79eeac00 R15: c000
Mar 25 21:40:44 computerName kernel: R10: 95ca181731e0 R11:
 R12: 95c9d6b57c30
Mar 25 21:40:44 computerName kernel: RBP:  R08:
 R09: 95c9d6b57c30
Mar 25 21:40:44 computerName kernel: RDX:  RSI:
026e55e9 RDI: 95ca79eeac08
Mar 25 21:40:44 computerName kernel: RAX: 95ca4f5389b0 RBX:
026e55e9 RCX: 95ca4f538328
Mar 25 21:40:44 computerName kernel: RSP: 0018:b171c067ba60 EFLAGS: 00010246
Mar 25 21:40:44 computerName kernel: Code: 89 e7 49 c7 44 24 08 00 00
00 00 49 c7 44 24 10 00 00 00 00 4c 89 21 e8 16 93 28 fa 31 c0 5b 5d
41 5c 41 5d c3 48 85 d2 75 c1 <0f> 0b b8 ef ff ff ff eb eb 0f 0b b8 ef
ff ff ff eb e2 66 0f 1f 44
Mar 25 21:40:44 computerName kernel: RIP:
0010:tree_insert_offset+0x88/0xa0 [btrfs]
Mar 25 21:40:44 computerName kernel: Hardware name: To Be Filled By
O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015
Mar 25 21:40:44 computerName kernel: CPU: 0 PID: 998 Comm:
btrfs-balance Tainted: G   OE 5.8.7-dirty #1
Mar 25 21:40:44 computerName kernel:  intel_gtt syscopyarea
sysfillrect sysimgblt snd fb_sys_fops soundcore mei lpc_ich evdev
mac_hid nct6775 hwmon_vid v4l2loopback_dc(OE) videodev drm mc agpgart
fuse ip_tables x_tables f2fs dm_crypt cbc enc>
Mar 25 21:40:44 computerName kernel: Modules linked in: ccm cmac
algif_hash bnep btrfs blake2b_generic xor raid6_pq libcrc32c
crc32c_generic nls_iso8859_1 nls_cp437 vfat fat snd_usb_audio
snd_usbmidi_lib snd_rawmidi snd_seq_device tda18271 a>
Mar 25 21:40:44 computerName kernel: WARNING: CPU: 0 PID: 998 at
fs/btrfs/free-space-cache.c:1499 tree_insert_offset+0x88/0xa0 [btrfs]
...boot crash AFTER balance
Mar 25 21:40:39 computerName kernel: BTRFS info (device dm-3): found 8
extents, stage: move data extents
Mar 25 21:40:37 computerName kernel: BTRFS info (device dm-3):
relocating block group 3875364929536 flags data
Mar 25 21:40:37 computerName kernel: BTRFS info (device dm-3):
balance: resume -dusage=90 -musage=90 -susage=90
Mar 25 21:40:37 computerName kernel: BTRFS error (device dm-3):
incorrect extent count for 2672774086656; counted 7070, expected 7073
Mar 25 21:40:37 computerName systemd[1]: Mounted 
Mar 25 21:40:33 computerName kernel: BTRFS info (device dm-3): bdev
/dev/mapper/ errs: wr 0, rd 56, flush 0, corrupt 0, gen 0
Mar 25 21:40:32 computerName kerne

Re: [PATCH] kconfig: streamline_config.pl: check defined(ENV variable) before using it

2020-09-02 Thread Nathan Royce
Heard, but all the same if it isn't important (which I'm assuming),
I'd just as soon be left out of it. That's just the way I am in
general, not wanting to be seen unless I have to be seen. Thanks
though.

On Wed, Sep 2, 2020 at 9:14 PM Masahiro Yamada  wrote:
>
> Even if you do not write the code,
> reporting bugs is a great contribution,
> and the Reported-by exists for that, I think.
>
> So, I just want to add your Reported-by tag
> (if you do not mind).
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH] kconfig: streamline_config.pl: check defined(ENV variable) before using it

2020-09-02 Thread Nathan Royce
Thanks, but I'd just as soon not be acknowledged/credited. All I did
was submit a report.

On Wed, Sep 2, 2020 at 11:47 AM Masahiro Yamada  wrote:
>
> Applied to linux-kbuild/fixes with Nathan's tag
>
> Reported-by: Nathan Royce 
>
>
>
> Nathan,
> I think adding your tag is OK to credit your contribution.
> Please let me know if you do not have it in
> the commit log.
>
>
>
> --
> Best Regards
> Masahiro Yamada


Re: localmodconfig - "intel_rapl_perf config not found!!"

2020-08-25 Thread Nathan Royce
Correct. I'm building for 5.8.3 and I'm currently on 5.7.4 (1 month
doesn't seem particularly old).

On Tue, Aug 25, 2020 at 2:13 PM Randy Dunlap  wrote:
>
> so intel_rapl_perf is listed in your lsmod.cfg file:
> intel_rapl_perf16384  2
>
> You say Linux 5.8.3.  I'm guessing that your "make localmodconfig" tree
> is Linux 5.8.3 (?).  What kernel version are you running?
> I think that it's older, and some file/module names have changed since then.


localmodconfig - "intel_rapl_perf config not found!!"

2020-08-25 Thread Nathan Royce
Intel Haswell
Linux 5.8.3

First time I've used localmodconfig ever since reading what it does
and liking the "supposed" kernel customization specific to the system.
I only use quotes on "supposed" because I DO still see entries I have
no interest in (not applicable to my system/needs).
I don't know if another email would be warranted for localmodconfig
only or if my expectation of it is unrealistic.

The "intel_rapl_perf config not found!!" comes up with every .config I try.
The simplest test I can come up with would be:
*
make defconfig //x86_64_defconfig
lsmod > lsmod.cfg
make localmodconfig LSMOD=lsmod.cfg
*

lsmod.cfg
*
Module  Size  Used by
uinput 20480  1
rfcomm 94208  16
ccm20480  9
cmac   16384  5
algif_hash 16384  2
bnep   28672  2
btrfs1556480  1
blake2b_generic20480  0
xor24576  1 btrfs
raid6_pq  122880  1 btrfs
libcrc32c  16384  1 btrfs
crc32c_generic 16384  0
nls_iso8859_1  16384  1
nls_cp437  20480  1
vfat   24576  1
fat90112  1 vfat
snd_usb_audio 311296  0
snd_usbmidi_lib45056  1 snd_usb_audio
snd_rawmidi45056  1 snd_usbmidi_lib
snd_seq_device 16384  1 snd_rawmidi
tda18271   53248  1
au8522_dig 16384  1
au8522_common  16384  1 au8522_dig
au0828 69632  1
tveeprom   28672  1 au0828
dvb_core  176128  1 au0828
videobuf2_vmalloc  20480  2 dvb_core,au0828
videobuf2_memops   20480  1 videobuf2_vmalloc
videobuf2_v4l2 28672  1 au0828
intel_rapl_msr 20480  0
btusb  57344  0
videodev  274432  2 videobuf2_v4l2,au0828
btrtl  24576  1 btusb
btbcm  20480  1 btusb
videobuf2_common   57344  3 videobuf2_v4l2,dvb_core,au0828
intel_rapl_common  32768  1 intel_rapl_msr
rc_core61440  1 au0828
btintel32768  1 btusb
bluetooth 688128  49 btrtl,btintel,btbcm,bnep,btusb,rfcomm
x86_pkg_temp_thermal20480  0
intel_powerclamp   20480  0
mousedev   24576  0
ecdh_generic   16384  2 bluetooth
ecc36864  1 ecdh_generic
crc16  16384  1 bluetooth
rtl8821ae 290816  0
coretemp   20480  0
snd_hda_codec_hdmi 73728  1
btcoexist 225280  1 rtl8821ae
kvm_intel 335872  0
rtl_pci36864  1 rtl8821ae
rtlwifi   139264  3 rtl_pci,rtl8821ae,btcoexist
kvm   876544  1 kvm_intel
iTCO_wdt   16384  0
mei_hdcp   24576  0
iTCO_vendor_support16384  1 iTCO_wdt
mac80211  954368  3 rtl_pci,rtl8821ae,rtlwifi
i915 2703360  60
snd_hda_codec_realtek   143360  1
irqbypass  16384  1 kvm
snd_hda_codec_generic98304  1 snd_hda_codec_realtek
ledtrig_audio  16384  2 snd_hda_codec_generic,snd_hda_codec_realtek
intel_cstate   16384  0
snd_hda_intel  53248  4
snd_soc_rt5640147456  0
intel_uncore  163840  0
cfg80211  925696  2 rtlwifi,mac80211
snd_intel_dspcfg   24576  1 snd_hda_intel
snd_soc_rl6231 20480  1 snd_soc_rt5640
intel_rapl_perf16384  2
snd_hda_codec 176128  4
snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
snd_soc_core  311296  1 snd_soc_rt5640
rfkill 32768  11 bluetooth,cfg80211
libarc416384  1 mac80211
snd_compress   32768  1 snd_soc_core
alx57344  0
input_leds 16384  0
snd_hda_core  114688  5
snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
i2c_algo_bit   16384  1 i915
ac97_bus   16384  1 snd_soc_core
snd_pcm_dmaengine  16384  1 snd_soc_core
snd_hwdep  20480  2 snd_usb_audio,snd_hda_codec
mdio   16384  1 alx
drm_kms_helper266240  1 i915
snd_pcm   159744  9
snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_soc_rt5640,snd_compress,snd_soc_core,snd_hda_core,snd_pcm_dmaengine
cec69632  2 drm_kms_helper,i915
snd_timer  49152  1 snd_pcm
mei_me 49152  1
intel_gtt  24576  1 i915
syscopyarea16384  1 drm_kms_helper
sysfillrect16384  1 drm_kms_helper
sysimgblt  16384  1 drm_kms_helper
snd   118784  22
snd_hda_codec_generic,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,snd_soc_core,snd_pcm,snd_rawmidi
fb_sys_fops16384  1 drm_kms_helper
soundcore  16384  1 snd
mei   131072  3 mei_hd

Re: F2FS Segmentation Fault

2020-07-13 Thread Nathan Royce
On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim  wrote:
>
> Hi Nathan,
>
> Could you try to say "N" here to move forward to fix the corrupted metadata?
>
> Thanks,
*
Do you want to restore lost files into ./lost_found/? [Y/N] N
Info: Write valid nat_bits in checkpoint
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18eca] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18ecb] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18ecc] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18ee3] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18ee4] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18ee5] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18f78] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18f79] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x18f7a] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x4d621] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x4d622] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x7fa32] in NAT
[FIX] (nullify_nat_entry:2273)  --> Remove nid [0x7fa33] in NAT
Info: Write valid nat_bits in checkpoint

Done.
*

*
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 124168159 (60628 MB)
Info: MKFS version
  "Linux version 5.1.15.a-1-hardened (builduser@slave-1) (gcc version
9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
Info: FSCK version
  from "Linux version 4.19.13-dirty (nater@devx64) (gcc version 8.2.1
20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
to "Linux version 4.19.13-dirty (nater@devx64) (gcc version 8.2.1
20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 
Info: total FS sectors = 124168152 (60628 MB)
Info: CKPT version = 63f2b4a
Info: checkpoint state = 281 :  allow_nocrc nat_bits unmount
Info: No error was reported
*
I'm now booted in from my SDHC card. So it "seems" I'm good to go.
But with the actions taken and the files I've seen displayed during
the fsck, I'm thinking I'm going to reinstall all packages.
Assuming the issue was related to the power outage, I do wonder why
there weren't any fsck issues at bootup at that time. I hadn't had any
disk issues before with that card.
At least now I know the issue would be resolved by not saving the lost
files and I can continue on my merry way.


F2FS Segmentation Fault

2020-07-13 Thread Nathan Royce
I won't re-format unless I hear something within a few days in case
you want me to try something.

Preface: There was a notable power outage a couple of nights ago.
When the power returned, everything seemed fine. No issues during
bootup or anything.
Then today, I went to open an application and my system started
schitzing out with programs suddenly closing(/crashing?).
I switched tty and tried to log in but was unable to even be allowed
to enter in my password.
I switched to another and tried logging in as root which succeeded (somehow).
I looked at the journal and saw an entry saying something about
/bin/login not being a valid exec format.
I went to reboot and when it got to fsck part of initramfs, it failed
and I was kicked to root.
I ran fsck and saw a bunch of issues, but I guess nothing could get
resolved enough to let me reboot.
Oh, in case you're wondering, my / (system) is on a 64GB SDHC card.
I just happened to also have an older / system on my mechanical drive
using BTRFS which I could boot to (which I'm on now).
I ran fsck from this older system and it seems I got the same results:

*
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 124168159 (60628 MB)
Info: MKFS version
  "Linux version 5.1.15.a-1-hardened (builduser@slave-1) (gcc version
9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
Info: FSCK version
  from "Linux version 4.19.13-dirty (userName@computerName) (gcc
version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
2018"
to "Linux version 4.19.13-dirty (userName@computerName) (gcc
version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
2018"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 
Info: total FS sectors = 124168152 (60628 MB)
Info: CKPT version = 63f2b4a
Info: checkpoint state = 55 :  crc fsck compacted_summary unmount

NID[0x18eca] is unreachable, blkaddr:0xcf1d9d3c
NID[0x18ecb] is unreachable, blkaddr:0x5db5f91f
NID[0x18ecc] is unreachable, blkaddr:0x4653d
NID[0x18ee3] is unreachable, blkaddr:0x144dc401
NID[0x18ee4] is unreachable, blkaddr:0x558cfba9
NID[0x18ee5] is unreachable, blkaddr:0x45553
NID[0x18f78] is unreachable, blkaddr:0x560555ac
NID[0x18f79] is unreachable, blkaddr:0x58cccb0d
NID[0x18f7a] is unreachable, blkaddr:0x53d84
NID[0x4d621] is unreachable, blkaddr:0x4fc1d
NID[0x4d622] is unreachable, blkaddr:0x4fc1e
NID[0x7fa32] is unreachable, blkaddr:0x20b0ca3a
NID[0x7fa33] is unreachable, blkaddr:0xf71b60
[FSCK] Unreachable nat entries[Fail] [0xd]
[FSCK] SIT valid block bitmap checking[Fail]
[FSCK] Hard link checking for regular file[Ok..] [0x4f6]
[FSCK] valid_block_count matching with CP [Fail] [0x736fcb]
[FSCK] valid_node_count matcing with CP (de lookup)   [Fail] [0x70327]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0x70334]
[FSCK] valid_inode_count matched with CP  [Fail] [0x6f09e]
[FSCK] free segment_count matched with CP [Ok..] [0x3bfc]
[FSCK] next block offset is free  [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs   [Fail]

Do you want to restore lost files into ./lost_found/? [Y/N] Y
Segmentation fault
*

*
   Message: Process 3425 (fsck.f2fs) of user 0 dumped core.

Stack trace of thread 3425:
#0  0x55f8515739c8 n/a (fsck.f2fs)
#1  0x55f851575261 n/a (fsck.f2fs)
#2  0x55f851572c56 n/a (fsck.f2fs)
#3  0x55f85156a3f0 n/a (fsck.f2fs)
#4  0x7f51420feee3 __libc_start_main (libc.so.6)
#5  0x55f85156a95e n/a (fsck.f2fs)
*

So if you want more information or need me to try something, let me
know soon if you would. Otherwise, I'll just be reformatting my card
in a few days.
It just could've been a fluke occurred because of the power outage but
didn't manifest itself until today.


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-19 Thread Nathan Royce
While your mention of quirks-table.h certainly had possibilities, I'm
afraid adding the "AU0828_DEVICE(0x05e1, 0x0400, "Hauppauge",
"Woodbury")," entry for my tuner did not make any difference regarding
the "Tuner is busy. Error -19" message.

I don't know if this means anything, but I see
https://patchwork.kernel.org/patch/97726/ from 2010 which contains
changes for the 0x0400 model. I guess it never got pulled in.

Really, it's fine for me just to hang back at v5.1 for a year or two
until ATSC 3.0 USB tuners come out at a reasonable price.

On Mon, Aug 19, 2019 at 4:44 PM shuah  wrote:
> You said you make changes to the
>
> "Whenever I update my kernel, I edit the
> ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my
> 0x400 device.
> I've been doing it for years and it's been working fine... until now..."
>
> Please send me the changes you make to the file. I see the following
> WOODBURY devices. I am assuming you add 0x400 entry.
>
> { USB_DEVICE(0x05e1, 0x0480),
>  .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
>  { USB_DEVICE(0x2040, 0x8200),
>  .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
>
>
> There is another table in sound/usb/quirks-table.h for AU0828
> devices. In addition to 812658d88d26, 66354f18fe5f makes change
> to this table to add a flag. I see two entries in that table:
>
> AU0828_DEVICE(0x05e1, 0x0480, "Hauppauge", "Woodbury"),
> AU0828_DEVICE(0x2040, 0x8200, "Hauppauge", "Woodbury"),
>
> Since these drivers are now coupled doing resource sharing,
> could it be that with your change to au02828 device table,
> your changes are bow incomplete.
>
> I don't have a Woodbury device though. This is something to
> try.
>
> Did you consider sending patch to add your device variant,
> so you don't have to keep making this change whenever you
> go to a new kernel?
>
> thanks,
> -- Shuah


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-19 Thread Nathan Royce
 examined.
If no changes are found in that branch/tag range, then the next step
would be to analyze any commits that are affected by parents/children
(references) of au0828 within that version range, and continually move
up/down the line. (eg. linux/usb.h which is referenced by au0828.h)
This way, the scope is very narrow at the beginning and widens as needed.
I think it's something that could be implemented in the git tool and
the user only needs to provide a starting place. Just a thought.

I can only hope that I incorrectly used bisecting and someone can
point to what I did wrong and provide a better way. (maybe I wouldn't
have to mrproper, so the testing wouldn't take days?)

On Mon, Aug 19, 2019 at 3:49 PM shuah  wrote:
>
> On 8/16/19 7:15 PM, Nathan Royce wrote:
> Hi Nathan,
>
> Just catching up with this thread. Let me know what you find. Can you
> build your own kernel and see what you can find?
>
> thanks,
> -- Shuah


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-19 Thread Nathan Royce
(resubmitting due to non "reply-to-all"):

Bugger, I just sent a reply to your last message, but it bounced back with:
*
550 5.7.1 Content-Policy reject msg: The message contains HTML
subpart, therefore we consider it SPAM or Outlook Virus. TEXT/PLAIN is
accepted.! BF:; S1728494AbfHSVzk
*
I just switched this email to plain-text and will resubmit my previous
email as plain-text.

Anyway, yeah, all I did in au0828-cards.c was add my 0x0400 like:
*
 { USB_DEVICE(0x2040, 0x7281),
.driver_info = AU0828_BOARD_HAUPPAUGE_HVR950Q_MXL },
{ USB_DEVICE(0x05e1, 0x0400),
.driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
{ USB_DEVICE(0x05e1, 0x0480),
.driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
{ USB_DEVICE(0x2040, 0x8200),
.driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
*

That's all I've ever had to do. I never knew about the quirks-table.h.
I'll take a look.
I saw in the log the 0x05e1 addition was made in 2016, but maybe it
only applies to the Media Controller API change requirement now (thus,
not having caused any problems in the past since the API wasn't being
used).

I've never sent in a patch before (anywhere. I just point out a
problem and let the dev code it in their style). Also I don't want to
be a bother in case something even that small could somehow break
something else, especially for something "off-brand"(?).
I never really minded building the module by itself.

I've just now started the build for linux-5.2.y with the
quirks-table.h change along with au0828-cards.c.
Thanks for that heads-up. Hopefully that does the trick (whatever the
trick/quirk is).

On Mon, Aug 19, 2019 at 4:44 PM shuah  wrote:
>
> On 8/19/19 2:49 PM, shuah wrote:
> > Hi Nathan,
> >
> > Just catching up with this thread. Let me know what you find. Can you
> > build your own kernel and see what you can find?
> >
>
> You said you make changes to the
>
> "Whenever I update my kernel, I edit the
> ./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my
> 0x400 device.
> I've been doing it for years and it's been working fine... until now..."
>
> Please send me the changes you make to the file. I see the following
> WOODBURY devices. I am assuming you add 0x400 entry.
>
> { USB_DEVICE(0x05e1, 0x0480),
>  .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
>  { USB_DEVICE(0x2040, 0x8200),
>  .driver_info = AU0828_BOARD_HAUPPAUGE_WOODBURY },
>
>
> There is another table in sound/usb/quirks-table.h for AU0828
> devices. In addition to 812658d88d26, 66354f18fe5f makes change
> to this table to add a flag. I see two entries in that table:
>
> AU0828_DEVICE(0x05e1, 0x0480, "Hauppauge", "Woodbury"),
> AU0828_DEVICE(0x2040, 0x8200, "Hauppauge", "Woodbury"),
>
> Since these drivers are now coupled doing resource sharing,
> could it be that with your change to au02828 device table,
> your changes are bow incomplete.
>
> I don't have a Woodbury device though. This is something to
> try.
>
> Did you consider sending patch to add your device variant,
> so you don't have to keep making this change whenever you
> go to a new kernel?
>
> thanks,
> -- Shuah


Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-16 Thread Nathan Royce
On Fri, Aug 16, 2019 at 1:42 PM Greg Kroah-Hartman
 wrote:
> If you revert that one commit, does things start working again?
>
> thanks,
>
> greg k-h
Hey Greg, I just got finished building it after running "$ git revert
812658d88d26" and verifying it reverted by comparing one of the files
from git log -p, but alas, no joy.

On Fri, Aug 16, 2019 at 5:41 PM Brad Love  wrote:
>
> Hi Nathan,
>
> I don't have a "woodbury", but I have a Hauppauge 950Q sitting around
> and tested it on latest mainline kernel. w_scan is ok and streaming is
> fine. There's no unexpected errors. The 950Q uses the same au0828 bridge
> and au8522 demod as woodbury, but a different tuner. Your problem
> wouldn't appear to be a general au0828 issue.
>
> You might have to check out git bisect. That will be the quickest way to
> get to the bottom, if you've got points A and B, and are
> building/running your own kernel.
>
> Cheers,
>
> Brad
Thanks Brad, I'll explore bisecting and hopefully will be able to
narrow down the cause.
I wasn't running my own kernel, but rather using the Arch Linux kernel
and modding the one module and putting it in "extramodules".


Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-16 Thread Nathan Royce
Right up front, I must say I do NOT have a Hauppauge tuner. I think
it's like maybe Mygica/Geniatech:
Bus 002 Device 004: ID 05e1:0400 Syntek Semiconductor Co., Ltd

Whenever I update my kernel, I edit the
./drivers/media/usb/au0828/au0828-cards.c file adding an entry for my
0x400 device.
I've been doing it for years and it's been working fine... until now...

*
Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19
<...18 more repeated entries...>
Aug 16 12:07:20 computerName kernel: usb 2-2.3: Tuner is busy. Error -19
Aug 16 12:07:10 computerName tvheadend[3276]: main: Log started
*
"w_scan" behaves the same way.

*
$ modprobe au0828
Aug 16 12:52:52 computerName kernel: videodev: Linux video capture
interface: v2.00
Aug 16 12:52:52 computerName kernel: au0828: au0828_init() Debugging is enabled
Aug 16 12:52:52 computerName kernel: au0828: au0828 driver loaded
Aug 16 12:52:52 computerName kernel: au0828: au0828_usb_probe() vendor
id 0x5e1 device id 0x400 ifnum:0
Aug 16 12:52:52 computerName kernel: au0828: au0828_gpio_setup()
Aug 16 12:52:52 computerName kernel: au0828: au0828_i2c_register()
Aug 16 12:52:52 computerName kernel: au0828: i2c bus registered
Aug 16 12:52:52 computerName kernel: au0828: au0828_card_setup()
Aug 16 12:52:52 computerName kernel: tveeprom: Encountered bad packet
header [20]. Corrupt or not a Hauppauge eeprom.
Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom:
warning: unknown hauppauge model #0
Aug 16 12:52:52 computerName kernel: au0828: hauppauge_eeprom:
hauppauge eeprom: model=0
Aug 16 12:52:52 computerName kernel: au0828: au0828_analog_register
called for intf#0!
Aug 16 12:52:52 computerName kernel: au0828: au0828_dvb_register()
Aug 16 12:52:52 computerName kernel: au8522 7-0047: creating new instance
Aug 16 12:52:52 computerName kernel: tda18271 7-0060: creating new instance
Aug 16 12:52:52 computerName kernel: tda18271: TDA18271HD/C2 detected @ 7-0060
Aug 16 12:52:53 computerName kernel: au0828: dvb_register()
Aug 16 12:52:53 computerName kernel: dvbdev: DVB: registering new
adapter (au0828)
Aug 16 12:52:53 computerName kernel: usb 2-2.3: DVB: registering
adapter 0 frontend 0 (Auvitek AU8522 QAM/8VSB Frontend)...
Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity:
media entity 'Auvitek AU8522 QAM/8VSB Frontend' registered.
Aug 16 12:52:53 computerName kernel: dvbdev: dvb_create_media_entity:
media entity 'dvb-demux' registered.
Aug 16 12:52:53 computerName kernel: au0828: Registered device AU0828
[Hauppauge Woodbury]
Aug 16 12:52:53 computerName kernel: usbcore: registered new interface
driver au0828
*
The "eeprom" thing has never been an issue with regard to my tuner
working. It still worked in spite of it.

It's odd because:
*
$ lsmod | grep au0828
au0828 86016  0
tveeprom   28672  1 au0828
dvb_core  176128  1 au0828
v4l2_common20480  1 au0828
videobuf2_vmalloc  20480  2 dvb_core,au0828
videobuf2_v4l2 28672  1 au0828
videobuf2_common   61440  3 videobuf2_v4l2,dvb_core,au0828
videodev  253952  4
v4l2_common,videobuf2_v4l2,videobuf2_common,au0828
rc_core61440  1 au0828
media  61440  6
videodev,snd_usb_audio,videobuf2_v4l2,dvb_core,videobuf2_common,au0828

$ ls -la /dev/dvb/adapter0/
total 0
drwxr-xr-x  2 root root 120 Aug 16 12:01 .
drwxr-xr-x  3 root root  60 Aug 16 12:01 ..
crw-rw+ 1 root video 212, 4 Aug 16 12:01 demux0
crw-rw+ 1 root video 212, 5 Aug 16 12:01 dvr0
crw-rw+ 1 root video 212, 3 Aug 16 12:01 frontend0
crw-rw+ 1 root video 212, 7 Aug 16 12:01 net0
*

The previous kernel version I was on that worked was 5.1.15.
I just reverted back to the previous version and it's working again.
I don't know what broke and where, between the versions.

I saw https://lkml.org/lkml/2019/1/21/1020 but this is back in January
so I don't know if something was more recently applied to au0828 that
makes use of the API.
"lsof" didn't show anything related to "/dev/dvb" being used.

Oh neat! Someone posted a neat git feature which I tried and I get:
*
$ git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset
%s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative
v5.1.15..v5.2.8 drivers/media/usb/au0828/
* be50f19fee84 - media: au0828: fix null dereference in error path (12 days ago)
* c942fddf8793 - treewide: Replace GPLv2 boilerplate/reference with
SPDX - rule 157 (3 months ago)
* 16216333235a - treewide: Replace GPLv2 boilerplate/reference with
SPDX - rule 1 (3 months ago)
* ec8f24b7faaf - treewide: Add SPDX license identifier -
Makefile/Kconfig (3 months ago)
* 14340de506c9 - media: prefix header search paths with $(srctree)/ (3
months ago)
* f604f0f5afb8 - media: au0828: stop video streaming only when last
user stops (4 months ago)
* 898bc40bfcc2 - media: au0828: Fix NULL pointer dereference in
au0828_analog_stream_enable() (4 months ago)
* 383b0e5b6ebb - medi

Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13

2019-01-10 Thread Nathan Royce
One more thought that may be nothing, but when kmemleak crashed,
SUnreclaim was at 932552 kB, and after reclaimed/cleared 299840 kB.
There weren't any performance issues like when I had a leak of 5.5 gB
in the 4.18 kernel.

On Mon, Jan 7, 2019 at 3:52 AM Catalin Marinas  wrote:
>
> Under memory pressure, kmemleak may fail to allocate memory. See this
> patch for an attempt to slightly improve things but it's not a proper
> solution:
>
> http://lkml.kernel.org/r/20190102180619.12392-1-...@lca.pw
>
> --
> Catalin


Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-09 Thread Nathan Royce
Wow, my system got wrecked (exaggeration) during this latest stretch...
Pulseaudio was stretched to the limit and beyond and was forced to
restart. Anything that was producing audio had to be restarted to get
it back.
This time was much like the first time and went from timestamp
573100.060927 (line 1) to 572506.604155 (line 11069), where 100%
(literally) of it was that event 37 in the journal, no other kernel
log entries except for the systemd-hostnamed audit before it all went
down.
And as usual, it was my USB TV tuner (tvheadend really) giving the
Poll Timeout log entries.
Those same uploaded trace files will be updated with the latest bugout.

On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman
 wrote:
>
> The event type 37 is a host controller event, most likely a event ring full 
> error.
>
> So there are probably so many events that we fill the event ring before we 
> can handle them.
>
> Could you take traces of this?
> Note that the trace file will be huge.
>
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> copy the traces somewhere safe once the error is triggered:
> cp /sys/kernel/debug/tracing/trace  /
>
> -Mathias


Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-09 Thread Nathan Royce
You can ignore the last set of files with a sample of 1.
I got a nice sample of like 150 about 6 hours ago.
The link I included in the previous reply contains the same filenames,
just updated.
The journal timestamps (to correspond with the trace times) go from
"[513438.430253] computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37"
to
"[513438.796965] computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37"
That's 150 of them in less than 1/2 second.

On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman
 wrote:
>
> The event type 37 is a host controller event, most likely a event ring full 
> error.
>
> So there are probably so many events that we fill the event ring before we 
> can handle them.
>
> Could you take traces of this?
> Note that the trace file will be huge.
>
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> copy the traces somewhere safe once the error is triggered:
> cp /sys/kernel/debug/tracing/trace  /
>
> -Mathias


Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13

2019-01-08 Thread Nathan Royce
I'm not all that sure it was memory related based on my Sun, 6 Jan
2019 13:17:04 -0600 post.
You'll see the log entries at 3AM, and based on earlier entries I
likely went to sleep around 1AM which would mean any memory intense
applications (eg. virtual machine) would've been closed out.
I have 8GB RAM in my desktop.


On Mon, Jan 7, 2019 at 3:52 AM Catalin Marinas  wrote:
>
> Hi Nathan,
>
> On Tue, Jan 01, 2019 at 01:17:06PM -0600, Nathan Royce wrote:
> > I had a leak somewhere and I was directed to look into SUnreclaim
> > which was 5.5 GB after an uptime of a little over 1 month on an 8 GB
> > system. kmalloc-2048 was a problem.
> > I just had enough and needed to find out the cause for my lagging system.
> >
> > I finally upgraded from 4.18.16 to 4.19.13 and enabled kmemleak to
> > hunt for the culprit. I don't think a day had elapsed before kmemleak
> > crashed and disabled itself.
>
> Under memory pressure, kmemleak may fail to allocate memory. See this
> patch for an attempt to slightly improve things but it's not a proper
> solution:
>
> http://lkml.kernel.org/r/20190102180619.12392-1-...@lca.pw
>
> --
> Catalin


Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-08 Thread Nathan Royce
OK, I finally got one... but there was only 1 journal log entry. The
previous time there were like maybe 10 (also very little), but the 2
times before that had enough for me to have to page through the log.

I actually messed up on a variable in my script so missed the actual
time, but the trace still encompassed entries around the log entry
time when I copied it manually.
I fixed the script, tested it and have it running again for the next time.

GZip compressed the 1GB trace down to 43MB, but PLZip got it down to
19.5MB: https://1drv.ms/f/s!AkkOvekTOCrYn0kEFtJzreV7gCTD
All 3 files are from the same trace, but wanted to give you options in
case you didn't have plzip.
The journal entry (time): [501180.585516] computername kernel:
xhci_hcd :00:14.0: ERROR unknown event type 37

On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman
 wrote:
>
> The event type 37 is a host controller event, most likely a event ring full 
> error.
>
> So there are probably so many events that we fill the event ring before we 
> can handle them.
>
> Could you take traces of this?
> Note that the trace file will be huge.
>
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> copy the traces somewhere safe once the error is triggered:
> cp /sys/kernel/debug/tracing/trace  /
>
> -Mathias


Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-06 Thread Nathan Royce
I'm only posting to say I'm still waiting...
The error came up while I slept, and when I copied that log and looked
at it (yes, it WAS huge, just as you said), the timestamps at the
head/tail were much later than the journal logged times.
So I made a little script to monitor the journal kernel entries for
that message and have it copy the file after maybe 5 seconds. And now,
I'm just waiting for that error to occur again.

On Wed, Jan 2, 2019 at 5:32 AM Mathias Nyman
 wrote:
>
> The event type 37 is a host controller event, most likely a event ring full 
> error.
>
> So there are probably so many events that we fill the event ring before we 
> can handle them.
>
> Could you take traces of this?
> Note that the trace file will be huge.
>
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> copy the traces somewhere safe once the error is triggered:
> cp /sys/kernel/debug/tracing/trace  /
>
> -Mathias


Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13

2019-01-06 Thread Nathan Royce
e
Jan 06 03:27:16 computername kernel: kmemleak: Kernel memory leak
detector disabled
Jan 06 03:27:16 computername kernel: kmemleak: Automatic memory
scanning thread ended
Jan 06 03:27:16 computername kernel: kmemleak: Kmemleak disabled
without freeing internal data. Reclaim the memory with "echo clear >
/sys/kernel/debug/kmemleak".
Jan 06 03:27:15 computername plasmashell[1065]: qml: temp unit: 0
Jan 06 03:27:21 computername plasmashell[1065]: qml: temp unit: 0
Jan 06 03:27:24 computername plasmashell[1065]: qml: temp unit: 0
*

On Tue, Jan 1, 2019 at 7:04 PM Nathan Royce  wrote:
>
> It was unrelated to my USB issue. It happened again after I rebooted
> within 4 hours of uptime.
> This time there were 2 traces, one right after the other and included
> another line number.
> *
> Jan 01 17:47:54 computername plasmashell[1048]: qt.qpa.xcb:
> QXcbConnection: XCB error: 2 (BadValue), sequence: 45625, resource id:
> 69206018, major code: 142 (Unknown), minor code: 3
> Jan 01 17:50:14 computername kernel: WARNING: CPU: 3 PID: 2154 at
> mm/page_alloc.c:4262 __alloc_pages_nodemask+0xf74/0xfb0
> Jan 01 17:50:15 computername kernel: Modules linked in: rfcomm ccm
> bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271
> au8522_dig au8522_common au0828 tveeprom dvb_core arc4 v4l2_common
> intel_rapl snd_soc_rt5640 iTCO_wdt rtl8821ae x86_pkg_temp_thermal
> btcoexist i>
> Jan 01 17:50:16 computername kernel:  soundcore mei_me lpc_ich mei
> crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64
> serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas
> usb_storage dm_crypt dm_mod sr_mod cdrom sd_mod hid_logitech_hidpp
> hid_logitech_>
> Jan 01 17:50:16 computername kernel: CPU: 3 PID: 2154 Comm:
> PeripBusCEC Not tainted 4.19.13-dirty #2
> Jan 01 17:50:16 computername kernel: Hardware name: To Be Filled By
> O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015
> Jan 01 17:50:16 computername kernel: RIP:
> 0010:__alloc_pages_nodemask+0xf74/0xfb0
> Jan 01 17:50:16 computername kernel: Code: ff 0f 0b e9 dc fc ff ff 0f
> 0b 48 8b b4 24 80 00 00 00 8b 7c 24 18 44 89 f1 48 c7 c2 40 9e 4a b6
> e8 91 ef ff ff e9 d3 f1 ff ff <0f> 0b e9 a9 fc ff ff e8 c0 7f ea ff 85
> d2 0f 85 15 fd ff ff 48 c7
> Jan 01 17:50:16 computername kernel: RSP: 0018:999e032731e0 EFLAGS: 
> 00010202
> Jan 01 17:50:16 computername kernel: RAX: 8bbcbabc0040 RBX:
> 0040 RCX: 0020
> Jan 01 17:50:16 computername kernel: RDX:  RSI:
> 0002 RDI: 8bbd9fdfc000
> Jan 01 17:50:16 computername kernel: RBP: 0020 R08:
> 0040 R09: 0f82
> Jan 01 17:50:16 computername kernel: R10: 0020 R11:
>  R12: 
> Jan 01 17:50:16 computername kernel: R13:  R14:
>  R15: 
> Jan 01 17:50:16 computername kernel: FS:  7f9515642700()
> GS:8bbd9818() knlGS:
> Jan 01 17:50:16 computername kernel: CS:  0010 DS:  ES:  CR0:
> 80050033
> Jan 01 17:50:16 computername kernel: CR2: 7fdbd95b1000 CR3:
> 00011087c003 CR4: 001626e0
> Jan 01 17:50:16 computername kernel: Call Trace:
> Jan 01 17:50:16 computername kernel:  ? ___slab_alloc+0x43f/0x630
> Jan 01 17:50:16 computername kernel:  ? orc_find+0x108/0x190
> Jan 01 17:50:16 computername kernel:  ? kmem_cache_alloc+0x1c5/0x210
> Jan 01 17:50:16 computername kernel:  ? unwind_next_frame+0x2f8/0x460
> Jan 01 17:50:16 computername kernel:  new_slab+0x2fb/0x6f0
> Jan 01 17:50:16 computername kernel:  ? _raw_spin_unlock+0x16/0x30
> Jan 01 17:50:16 computername kernel:  ? deactivate_slab.isra.27+0x5b4/0x690
> Jan 01 17:50:16 computername kernel:  ___slab_alloc+0x43f/0x630
> Jan 01 17:50:16 computername kernel:  ? alloc_extent_state+0x1f/0xd0 [btrfs]
> Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
> Jan 01 17:50:16 computername kernel:  ? ___slab_alloc+0x58d/0x630
> Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
> Jan 01 17:50:16 computername kernel:  __slab_alloc.isra.28+0x52/0x70
> Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
> Jan 01 17:50:16 computername kernel:  kmem_cache_alloc+0x1c5/0x210
> Jan 01 17:50:16 computername kernel:  ? alloc_extent_state+0x1f/0xd0 [btrfs]
> Jan 01 17:50:16 computername kernel:  create_object+0x43/0x2a0
> Jan 01 17:50:16 computername kernel:  ? alloc_extent_state+0x1f/0xd0 [btrfs]
> Jan 01 17:50:16 computername kernel:  kmem_cache_alloc+0x1a6/0x210
> Jan 01 17:50:16 computername kernel:  alloc_extent_state+0x1f/0xd0 [btrfs]
> Jan 01 17:50:16 computername kernel:  __clear_extent_bit+0x297/0x390 [btrfs]
> Jan 01 17:50:16 computername 

Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-02 Thread Nathan Royce
...But then again, maybe it wasn't the cable. It's acting up again.

On Tue, Jan 1, 2019 at 3:14 PM Nathan Royce  wrote:
>
> Looks like this particular issue may have been due to a touchy/finicky
> connection.
>
> I removed my tuner from my hub and removed the hub from my
> motherboard's USB and put my tuner in directly.
> It STILL produced the error, but after I put everything back and
> played around a little, the errors stopped.
>
> Just to be sure, I also rebooted and it's still fine. No xhci errors at all.
> The only thing I've done recently (within the past few days) was play
> with my scanner which is also on that hub and maybe brushed my tuner
> cable or something.


Re: kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13

2019-01-01 Thread Nathan Royce
 kernel: FS:  ()
GS:8bbd9800() knlGS:
Jan 01 17:50:16 computername kernel: CS:  0010 DS:  ES:  CR0:
80050033
Jan 01 17:50:16 computername kernel: CR2: 5636c5b233d0 CR3:
000150a0a006 CR4: 001626f0
Jan 01 17:50:16 computername kernel: Call Trace:
Jan 01 17:50:16 computername kernel:  ? orc_find+0x108/0x190
Jan 01 17:50:16 computername kernel:  ? unwind_next_frame+0x121/0x460
Jan 01 17:50:16 computername kernel:  ? kcryptd_crypt+0x1d1/0x3a0 [dm_crypt]
Jan 01 17:50:16 computername kernel:  ? _raw_spin_lock+0x2e/0x40
Jan 01 17:50:16 computername kernel:  ? _raw_spin_unlock+0x16/0x30
Jan 01 17:50:16 computername kernel:  new_slab+0x2fb/0x6f0
Jan 01 17:50:16 computername kernel:  ? _raw_spin_lock+0x13/0x40
Jan 01 17:50:16 computername kernel:  ? deactivate_slab.isra.27+0x5b4/0x690
Jan 01 17:50:16 computername kernel:  ___slab_alloc+0x43f/0x630
Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 17:50:16 computername kernel:  ? ___slab_alloc+0x58d/0x630
Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 17:50:16 computername kernel:  __slab_alloc.isra.28+0x52/0x70
Jan 01 17:50:16 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 17:50:16 computername kernel:  kmem_cache_alloc+0x1c5/0x210
Jan 01 17:50:16 computername kernel:  ? mempool_alloc+0x65/0x180
Jan 01 17:50:16 computername kernel:  create_object+0x43/0x2a0
Jan 01 17:50:16 computername kernel:  ? mempool_alloc+0x65/0x180
Jan 01 17:50:16 computername kernel:  kmem_cache_alloc+0x1a6/0x210
Jan 01 17:50:16 computername kernel:  ? wait_woken+0x80/0x80
Jan 01 17:50:16 computername kernel:  mempool_alloc+0x65/0x180
Jan 01 17:50:16 computername kernel:  ? crypt_convert+0x96b/0xf50 [dm_crypt]
Jan 01 17:50:16 computername kernel:  bio_alloc_bioset+0x14c/0x220
Jan 01 17:50:16 computername kernel:  ? _raw_spin_lock_irqsave+0x25/0x50
Jan 01 17:50:16 computername kernel:  kcryptd_crypt+0x1d1/0x3a0 [dm_crypt]
Jan 01 17:50:16 computername kernel:  process_one_work+0x1eb/0x410
Jan 01 17:50:16 computername kernel:  worker_thread+0x2d/0x3d0
Jan 01 17:50:16 computername kernel:  ? process_one_work+0x410/0x410
Jan 01 17:50:16 computername kernel:  kthread+0x112/0x130
Jan 01 17:50:16 computername kernel:  ? kthread_park+0x80/0x80
Jan 01 17:50:16 computername kernel:  ret_from_fork+0x35/0x40
Jan 01 17:50:16 computername kernel: ---[ end trace 2a9048666fdb2311 ]---
Jan 01 17:50:16 computername kernel: kmemleak: Cannot allocate a
kmemleak_object structure
Jan 01 17:50:16 computername kernel: kmemleak: Kernel memory leak
detector disabled
Jan 01 17:50:16 computername kernel: kmemleak: Automatic memory
scanning thread ended
Jan 01 17:50:16 computername kernel: kmemleak: Kmemleak disabled
without freeing internal data. Reclaim the memory with "echo clear >
/sys/kernel/debug/kmemleak".
Jan 01 17:50:25 computername plasmashell[1048]: qt.qpa.xcb:
QXcbConnection: XCB error: 2 (BadValue), sequence: 47417, resource id:
71303170, major code: 142 (Unknown), minor code: 3
*****

On Tue, Jan 1, 2019 at 1:17 PM Nathan Royce  wrote:
>
> Kernel 4.19.13
>
> *
> Jan 01 02:04:20 computername kernel: xhci_hcd :00:14.0: ERROR
> unknown event type 37
> Jan 01 02:04:20 computername kernel: WARNING: CPU: 2 PID: 2236 at
> mm/page_alloc.c:4254 __alloc_pages_nodemask+0xf52/0xfb0
> Jan 01 02:04:20 computername kernel: Modules linked in: rfcomm ccm
> bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271
> au8522_dig au8522_common au0828 snd_usb_audio tveeprom snd_usbmidi_lib
> dvb_core mousedev snd_rawmidi snd_seq_device btusb v4l2_common btrtl
> vide>
> Jan 01 02:04:20 computername kernel:  llc intel_rapl_perf soundcore
> alx i2c_i801 mdio evdev lpc_ich mei_me mei pcc_cpufreq mac_hid
> crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64
> serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas
> usb_storage dm_c>
> Jan 01 02:04:20 computername kernel: CPU: 2 PID: 2236 Comm:
> MainLoopThread Tainted: GW 4.19.13-dirty #2
> Jan 01 02:04:20 computername kernel: Hardware name: To Be Filled By
> O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015
> Jan 01 02:04:20 computername kernel: RIP:
> 0010:__alloc_pages_nodemask+0xf52/0xfb0
> Jan 01 02:04:20 computername kernel: Code: c7 44 24 54 00 00 00 00 25
> ff ff f7 ff 89 44 24 18 e9 ea f3 ff ff 48 89 9c 24 80 00 00 00 e9 ad
> f3 ff ff 0f 0b e9 dc fc ff ff <0f> 0b 48 8b b4 24 80 00 00 00 8b 7c 24
> 18 44 89 f1 48 c7 c2 40 9e
> Jan 01 02:04:20 computername kernel: RSP: 0018:af9f81066e90 EFLAGS: 
> 00010046
> Jan 01 02:04:20 computername kernel: RAX:  RBX:
> 0040 RCX: 
> Jan 01 02:04:20 computername kernel: RDX:  RSI:
> 0002 RDI: 9d26dfdfc000
> Jan 01 02:04:20 computername 

Re: kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-01 Thread Nathan Royce
Looks like this particular issue may have been due to a touchy/finicky
connection.

I removed my tuner from my hub and removed the hub from my
motherboard's USB and put my tuner in directly.
It STILL produced the error, but after I put everything back and
played around a little, the errors stopped.

Just to be sure, I also rebooted and it's still fine. No xhci errors at all.
The only thing I've done recently (within the past few days) was play
with my scanner which is also on that hub and maybe brushed my tuner
cable or something.

On Tue, Jan 1, 2019 at 12:57 PM Nathan Royce  wrote:
>
> Kernel 4.19.13
>
> 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB
> xHCI Controller
>
> Around 400 "unknown event type 37" messages logged in a 2 second span.
> *
> Jan 01 02:08:07 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
> QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
> Jan 01 02:08:00 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
> QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
> Jan 01 02:07:56 computername kernel: xhci_hcd :00:14.0: ERROR
> unknown event type 37
> ...
> Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR
> unknown event type 37
> Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR
> unknown event type 37
> Jan 01 02:07:52 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
> QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
> Jan 01 02:07:44 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
> QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
> *
>
> I question whether this also caused kemleak to crash as well (will
> post after this).
>
> Regarding my tv tuner, it isn't supported by the kernel specifically,
> but is close enough that all I have to do is alter a single source
> file to include my device's pid, and it works just fine almost all of
> the time.


kmemleak: Cannot allocate a kmemleak_object structure - Kernel 4.19.13

2019-01-01 Thread Nathan Royce
Kernel 4.19.13

*
Jan 01 02:04:20 computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37
Jan 01 02:04:20 computername kernel: WARNING: CPU: 2 PID: 2236 at
mm/page_alloc.c:4254 __alloc_pages_nodemask+0xf52/0xfb0
Jan 01 02:04:20 computername kernel: Modules linked in: rfcomm ccm
bnep nct6775 hwmon_vid nls_iso8859_1 nls_cp437 vfat fat tda18271
au8522_dig au8522_common au0828 snd_usb_audio tveeprom snd_usbmidi_lib
dvb_core mousedev snd_rawmidi snd_seq_device btusb v4l2_common btrtl
vide>
Jan 01 02:04:20 computername kernel:  llc intel_rapl_perf soundcore
alx i2c_i801 mdio evdev lpc_ich mei_me mei pcc_cpufreq mac_hid
crypto_user ip_tables x_tables serpent_avx2 serpent_avx_x86_64
serpent_sse2_x86_64 serpent_generic xts algif_skcipher af_alg uas
usb_storage dm_c>
Jan 01 02:04:20 computername kernel: CPU: 2 PID: 2236 Comm:
MainLoopThread Tainted: GW 4.19.13-dirty #2
Jan 01 02:04:20 computername kernel: Hardware name: To Be Filled By
O.E.M. To Be Filled By O.E.M./H97M-ITX/ac, BIOS P1.80 07/27/2015
Jan 01 02:04:20 computername kernel: RIP:
0010:__alloc_pages_nodemask+0xf52/0xfb0
Jan 01 02:04:20 computername kernel: Code: c7 44 24 54 00 00 00 00 25
ff ff f7 ff 89 44 24 18 e9 ea f3 ff ff 48 89 9c 24 80 00 00 00 e9 ad
f3 ff ff 0f 0b e9 dc fc ff ff <0f> 0b 48 8b b4 24 80 00 00 00 8b 7c 24
18 44 89 f1 48 c7 c2 40 9e
Jan 01 02:04:20 computername kernel: RSP: 0018:af9f81066e90 EFLAGS: 00010046
Jan 01 02:04:20 computername kernel: RAX:  RBX:
0040 RCX: 
Jan 01 02:04:20 computername kernel: RDX:  RSI:
0002 RDI: 9d26dfdfc000
Jan 01 02:04:20 computername kernel: RBP:  R08:
0040 R09: 0f82
Jan 01 02:04:20 computername kernel: R10:  R11:
 R12: 
Jan 01 02:04:20 computername kernel: R13:  R14:
 R15: 
Jan 01 02:04:20 computername kernel: FS:  7f7db94d5700()
GS:9d26d810() knlGS:
Jan 01 02:04:20 computername kernel: CS:  0010 DS:  ES:  CR0:
80050033
Jan 01 02:04:20 computername kernel: CR2: 92c9da10 CR3:
0001baefe002 CR4: 001626e0
Jan 01 02:04:20 computername kernel: Call Trace:
Jan 01 02:04:20 computername kernel:  ?
__dm_make_request.isra.18+0x3f/0xa0 [dm_mod]
Jan 01 02:04:20 computername kernel:  ? orc_find+0x108/0x190
Jan 01 02:04:20 computername kernel:  ? do_try_to_free_pages+0xc6/0x370
Jan 01 02:04:20 computername kernel:  new_slab+0x2fb/0x6f0
Jan 01 02:04:20 computername kernel:  ? _raw_spin_lock+0x13/0x40
Jan 01 02:04:20 computername kernel:  ? deactivate_slab.isra.27+0x5b4/0x690
Jan 01 02:04:20 computername kernel:  ___slab_alloc+0x43f/0x630
Jan 01 02:04:20 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 02:04:20 computername kernel:  ? ___slab_alloc+0x58d/0x630
Jan 01 02:04:20 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 02:04:20 computername kernel:  __slab_alloc.isra.28+0x52/0x70
Jan 01 02:04:20 computername kernel:  ? create_object+0x43/0x2a0
Jan 01 02:04:20 computername kernel:  kmem_cache_alloc+0x1c5/0x210
Jan 01 02:04:20 computername kernel:  ? mempool_alloc+0x65/0x180
Jan 01 02:04:20 computername kernel:  create_object+0x43/0x2a0
Jan 01 02:04:20 computername kernel:  ? mempool_alloc+0x65/0x180
Jan 01 02:04:20 computername kernel:  kmem_cache_alloc+0x1a6/0x210
Jan 01 02:04:20 computername kernel:  ? wait_woken+0x80/0x80
Jan 01 02:04:20 computername kernel:  mempool_alloc+0x65/0x180
Jan 01 02:04:20 computername kernel:  ? __process_bio+0x170/0x170 [dm_mod]
Jan 01 02:04:20 computername kernel:  bio_alloc_bioset+0x14c/0x220
Jan 01 02:04:20 computername kernel:  ? create_object+0x249/0x2a0
Jan 01 02:04:20 computername kernel:  ? __process_bio+0x170/0x170 [dm_mod]
Jan 01 02:04:20 computername kernel:  alloc_io+0x24/0x120 [dm_mod]
Jan 01 02:04:20 computername kernel:
__split_and_process_bio+0x53/0x1a0 [dm_mod]
Jan 01 02:04:20 computername kernel:  ? generic_make_request_checks+0x49a/0x6f0
Jan 01 02:04:20 computername kernel:  ? blk_queue_enter+0x233/0x260
Jan 01 02:04:20 computername kernel:
__dm_make_request.isra.18+0x3f/0xa0 [dm_mod]
Jan 01 02:04:20 computername kernel:  generic_make_request+0x1b9/0x3d0
Jan 01 02:04:20 computername kernel:  ? __se_sys_madvise.cold.2+0xbd/0xbd
Jan 01 02:04:20 computername kernel:  submit_bio+0x45/0x140
Jan 01 02:04:20 computername kernel:  __swap_writepage+0x133/0x3c0
Jan 01 02:04:20 computername kernel:  ? __frontswap_store+0x6e/0xf0
Jan 01 02:04:20 computername kernel:  shmem_writepage+0x229/0x310
Jan 01 02:04:20 computername kernel:  pageout.isra.11+0x117/0x350
Jan 01 02:04:20 computername kernel:  shrink_page_list+0x7ea/0xc80
Jan 01 02:04:20 computername kernel:  shrink_inactive_list+0x29f/0x6b0
Jan 01 02:04:20 computername kernel:  shrink_node_memcg+0x20f/0x780
Jan 01 02:04:20 computername kernel:  shrink_node+0xcf/0x4a0
Jan 01 02:04:20 computernam

kernel: xhci_hcd 0000:00:14.0: ERROR unknown event type 37 - Kernel 4.19.13

2019-01-01 Thread Nathan Royce
Kernel 4.19.13

00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB
xHCI Controller

Around 400 "unknown event type 37" messages logged in a 2 second span.
*
Jan 01 02:08:07 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
Jan 01 02:08:00 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
Jan 01 02:07:56 computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37
...
Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37
Jan 01 02:07:55 computername kernel: xhci_hcd :00:14.0: ERROR
unknown event type 37
Jan 01 02:07:52 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
Jan 01 02:07:44 computername tvheadend[2370]: linuxdvb: Auvitek AU8522
QAM/8VSB Frontend #0 : ATSC-T #0 - poll TIMEOUT
*

I question whether this also caused kemleak to crash as well (will
post after this).

Regarding my tv tuner, it isn't supported by the kernel specifically,
but is close enough that all I have to do is alter a single source
file to include my device's pid, and it works just fine almost all of
the time.


drivers/tty/serial/samsung.c s3c24xx_uart_copy_rx_to_tty

2018-04-27 Thread Nathan Royce
No idea why, but I will say that something I've done recently was
re-enabl my ath9k_htc wireless adapter which tends to firmware-panic
quite a bit which also sometimes kills off my ppp usb adapter.
I have a script running that monitors the journalctl and restarts
hostapd everytime my ath device firmware-panics and comes back alive.
Same with netctl for my ppp.

The first time I've noticed this particular issue was yesterday when
my ssh session became sluggish and I was eventually forced to pull
power from my odroid. I only have 4 cores enabled, otherwise the usb
issues increase.
I cobbled together a setup with my ATX power supply providing power to
everything (including usb devices and hubs).
*
Apr 27 10:27:39 computername kernel: []
(s3c64xx_serial_handle_irq) from []
(__handle_irq_event_percpu+0x50/0x11c)
Apr 27 10:28:42 computername systemd-journald[22668]: Missed 5173
kernel messages
Apr 27 10:28:42 computername kernel: CPU: 0 PID: 99 Comm: mmcqd/1
Tainted: G  D W   4.14.0-dirty #6
Apr 27 10:28:42 computername kernel: Hardware name: SAMSUNG EXYNOS
(Flattened Device Tree)
Apr 27 10:28:42 computername kernel: [] (unwind_backtrace)
from [] (show_stack+0x10/0x14)
Apr 27 10:28:42 computername kernel: [] (show_stack) from
[] (dump_stack+0x88/0x9c)
Apr 27 10:28:42 computername kernel: [] (dump_stack) from
[] (__warn+0xe8/0x100)
Apr 27 10:28:42 computername kernel: [] (__warn) from
[] (warn_slowpath_null+0x20/0x28)
Apr 27 10:28:42 computername kernel: [] (warn_slowpath_null)
from [] (s3c24xx_uart_copy_rx_to_tty+0xa0/0xd4)
Apr 27 10:28:42 computername kernel: []
(s3c24xx_uart_copy_rx_to_tty) from []
(s3c24xx_serial_rx_chars+0x14c/0x1b8)
Apr 27 10:28:42 computername kernel: []
(s3c24xx_serial_rx_chars) from []
(s3c64xx_serial_handle_irq+0x48/0x60)
Apr 27 10:28:42 computername kernel: []
(s3c64xx_serial_handle_irq) from []
(__handle_irq_event_percpu+0x50/0x11c)
Apr 27 10:28:42 computername kernel: []
(__handle_irq_event_percpu) from []
(handle_irq_event_percpu+0x2c/0x7c)
Apr 27 10:28:42 computername kernel: []
(handle_irq_event_percpu) from []
(handle_irq_event+0x38/0x5c)
Apr 27 10:28:42 computername kernel: [] (handle_irq_event)
from [] (handle_fasteoi_irq+0xa4/0x158)
Apr 27 10:28:42 computername kernel: [] (handle_fasteoi_irq)
from [] (generic_handle_irq+0x24/0x34)
Apr 27 10:28:42 computername kernel: [] (generic_handle_irq)
from [] (__handle_domain_irq+0x5c/0xb4)
Apr 27 10:28:42 computername kernel: []
(__handle_domain_irq) from [] (gic_handle_irq+0x3c/0x78)
Apr 27 10:28:42 computername kernel: [] (gic_handle_irq)
from [] (__irq_svc+0x6c/0x90)
Apr 27 10:28:42 computername kernel: Exception stack(0xede8baf8 to 0xede8bb40)
Apr 27 10:28:42 computername kernel: bae0:
  ede8bcf4 ede98c80
Apr 27 10:28:42 computername kernel: bb00:  0002 a00c0113
eddfec00 ede8bce0  ee20f904 ede8bd58
Apr 27 10:28:42 computername kernel: bb20: 0100 c0c02080 c0801550
ede8bb48 c074c7dc c074c7e0 600c0113 
Apr 27 10:28:42 computername kernel: [] (__irq_svc) from
[] (_raw_spin_unlock_irqrestore+0x10/0x14)
Apr 27 10:28:42 computername kernel: []
(_raw_spin_unlock_irqrestore) from []
(dw_mci_request_end+0xa8/0xdc)
Apr 27 10:28:42 computername kernel: [] (dw_mci_request_end)
from [] (dw_mci_tasklet_func+0x31c/0x3dc)
Apr 27 10:28:42 computername kernel: []
(dw_mci_tasklet_func) from [] (tasklet_action+0x7c/0x118)
Apr 27 10:28:42 computername kernel: [] (tasklet_action)
from [] (__do_softirq+0xe0/0x248)
Apr 27 10:28:42 computername kernel: [] (__do_softirq) from
[] (irq_exit+0xd8/0x140)
Apr 27 10:28:42 computername kernel: [] (irq_exit) from
[] (__handle_domain_irq+0x60/0xb4)
Apr 27 10:28:42 computername kernel: []
(__handle_domain_irq) from [] (gic_handle_irq+0x3c/0x78)
Apr 27 10:28:42 computername kernel: [] (gic_handle_irq)
from [] (__irq_svc+0x6c/0x90)
Apr 27 10:28:42 computername kernel: Exception stack(0xede8bc28 to 0xede8bc70)
Apr 27 10:28:42 computername kernel: bc20:   ede8bcf4
ede98c80  0001 7fff ede8bcf4
Apr 27 10:28:42 computername kernel: bc40: ede8a000 0002 
c0c05448  ede8bcf0 0100 ede8bc78
Apr 27 10:28:42 computername kernel: bc60: c074c7ec c074c7f0 600c0013 
Apr 27 10:28:42 computername kernel: [] (__irq_svc) from
[] (_raw_spin_unlock_irq+0xc/0x10)
Apr 27 10:28:42 computername kernel: []
(_raw_spin_unlock_irq) from [] (wait_for_common+0xa0/0x168)
Apr 27 10:28:42 computername kernel: [] (wait_for_common)
from [] (mmc_wait_for_req_done+0x8c/0x110)
Apr 27 10:28:42 computername kernel: []
(mmc_wait_for_req_done) from [] (mmc_wait_for_cmd+0x68/0x9c)
Apr 27 10:28:42 computername kernel: [] (mmc_wait_for_cmd)
from [] (__mmc_send_status+0x68/0x98)
Apr 27 10:28:42 computername kernel: [] (__mmc_send_status)
from [] (card_busy_detect+0x64/0x150)
Apr 27 10:28:42 computername kernel: [] (card_busy_detect)
from [] (mmc_blk_err_check+0x180/0x5bc)
Apr 27 10:28:42 computername kernel: [] (mmc_blk_err_check)
from [] 

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2018-04-24 Thread Nathan Royce
I finally got around to applying your patch, building the toolchain
(based on master source (gcc8)), but alas while there is no firmware
panic in the log, wifi drops off the face of the planet (ssid
disappears and hostapd doesn't know wifi failed (nothing in the log
either)).

On Wed, Jun 7, 2017 at 5:39 PM, Tobias Diedrich
 wrote:
> Oleksij Rempel wrote:
>> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
>> > Oleksij Rempel wrote:
>> >> Yes, this is "normal" problem. The firmware has no error handler for PCI
>> >> bus related exceptions. So if we filed to read PCI bus first time, we
>> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>> >> and provide an kernel "firmware panic!" message.
>> >> Every one who can or will to fix this, is welcome.
>> >>
>> >>> *
>> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
>> >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
>> > [...]
>> >
>> >> memdmp 50ae78 50ae88
>> >
>> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
>> >
>> > [...copy to bin...]
>> > $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin
>> > [..]
>> >0:   6c1004  entry   a1, 32
>> >3:   126aa2  l32ra2, 0xfffdaa8c
>> >6:   0c0200  memw
>> >9:   8820l32i.n  a8, a2, 0  <--Exception cause 
>> > PC still points at load
>> >b:   c020movi.n  a2, 0
>> >d:   081940  extui   a9, a8, 1, 1
>> >
>> > Judging from that it should be fairly simple to at least implement
>> > some sort of retry, possible after triggering a PCIe link retrain?
>>
>> I assume, yes.
>>
>> > There are some related PCIe root complex registers that may point to
>> > what exactly failed if they were dumped.
>> >
>> > The root complex registers live at 0x0004 and I think match the
>> > registers described for the root complex in the AR9344 datasheet.
>>
>> Suddenly I don't have ar7010 docs to tell..
>>
>> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
>> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
>> > the hierarchy reports any of the following errors and the associated
>> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
>> > ERR_NONFATAL."
>> >
>> > AFAICS link retrain can be done by setting bit3 (INIT_RST,
>> > "Application request to initiate a training reset") in
>> > PCIE_APP (0x4).
>> >
>> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
>> > flips some bits in the RC to enable the PCIe bus for reading the
>> > EEPROM).
>> >
>> > The root complex pci configuration space is at 0x2 which could
>> > have further error details:
>> >> memdmp 2 20200
>> >
>> > 02: a02a 168c 0010 0006  0001 0001   .*..
>> > 020010:          
>> > 020020:          
>> > 020030:    0040    01ff  ...@
>> > 020040: 5bc3 5001        [.P.
>> > 020050: 0080 7005        ..p.
>> > 020060:          
>> > 020070: 0042 0010  8701  2010 0013 4411  .BD.
>> > 020080: 3011    00c0 03c0    0...
>> > 020090:    0010      
>> > 0200a0:          
>> > 0200b0:          
>> > 0200c0:          
>> > 0200d0:          
>> > 0200e0:          
>> > 0200f0:          
>> > 020100: 1401 0001     0006 2030  ...0
>> > 020110:    2000  00a0    
>> > 020120:          
>> > 020130:          
>> > 020140: 0001 0002        
>> > 020150:   8000 00ff      
>> > 020160:          
>> > 020170:          
>> > 020180:          
>> > 020190:          
>> > 0201a0:          
>> > 0201b0:          
>> > 0201c0:          
>> > 0201d0:          
>> > 0201e0:          
>> > 0201f0:         

Re: kernel BUG at fs/btrfs/ctree.c:3182 - occurred during heavy NFS transfer

2017-11-01 Thread Nathan Royce
I'm guessing this is related.
I noticed my tv wasn't recording to my drive and when I tried to touch
a file on the drive, my console become unresponsive.
Trying to reboot took like 5 minutes to even stop the processes and in
the end couldn't unmount the drive and I had to cut the power to
finally get it to boot.

Nov 01 17:41:42 dd kernel: [ cut here ]
Nov 01 17:41:42 dd kernel: WARNING: CPU: 0 PID: 227 at
fs/btrfs/file.c:547 btrfs_drop_extent_cache+0x4b4/0x4e8 [btrfs]
Nov 01 17:41:42 dd kernel: Modules linked in: arc4 tda18271 au8522_dig
au8522_common ath9k_htc ath9k_common au0828 btusb v4l2_common ath9k_hw
btintel videobuf2_vmalloc btbcm videobuf2_memops tveeprom bluetooth
ath dvb_core videobuf2_v4l2 videodev mac80211 ecdh_generic vi
Nov 01 17:41:42 dd kernel: CPU: 0 PID: 227 Comm: mount Not tainted
4.13.0-dirty #2
Nov 01 17:41:42 dd kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Nov 01 17:41:42 dd kernel: [] (unwind_backtrace) from
[] (show_stack+0x10/0x14)
Nov 01 17:41:42 dd kernel: [] (show_stack) from []
(dump_stack+0x88/0x9c)
Nov 01 17:41:42 dd kernel: [] (dump_stack) from []
(__warn+0xe8/0x100)
Nov 01 17:41:42 dd kernel: [] (__warn) from []
(warn_slowpath_null+0x20/0x28)
Nov 01 17:41:42 dd kernel: [] (warn_slowpath_null) from
[] (btrfs_drop_extent_cache+0x4b4/0x4e8 [btrfs])
Nov 01 17:41:42 dd kernel: [] (btrfs_drop_extent_cache
[btrfs]) from [] (__btrfs_drop_extents+0x618/0x1000 [btrfs])
Nov 01 17:41:42 dd kernel: [] (__btrfs_drop_extents [btrfs])
from [] (btrfs_drop_extents+0x60/0x80 [btrfs])
Nov 01 17:41:42 dd kernel: [] (btrfs_drop_extents [btrfs])
from [] (replay_one_extent+0x718/0x818 [btrfs])
Nov 01 17:41:42 dd kernel: [] (replay_one_extent [btrfs])
from [] (replay_one_buffer+0x248/0x780 [btrfs])
Nov 01 17:41:42 dd kernel: [] (replay_one_buffer [btrfs])
from [] (walk_down_log_tree+0x144/0x38c [btrfs])
Nov 01 17:41:42 dd kernel: [] (walk_down_log_tree [btrfs])
from [] (walk_log_tree+0xd0/0x1e8 [btrfs])
Nov 01 17:41:42 dd kernel: [] (walk_log_tree [btrfs]) from
[] (btrfs_recover_log_trees+0x21c/0x49c [btrfs])
Nov 01 17:41:42 dd kernel: [] (btrfs_recover_log_trees
[btrfs]) from [] (open_ctree+0x232c/0x2400 [btrfs])
Nov 01 17:41:42 dd kernel: [] (open_ctree [btrfs]) from
[] (btrfs_mount+0xecc/0xfa8 [btrfs])
Nov 01 17:41:42 dd kernel: [] (btrfs_mount [btrfs]) from
[] (mount_fs+0x2c/0x164)
Nov 01 17:41:42 dd kernel: [] (mount_fs) from []
(vfs_kern_mount.part.3+0x48/0xe0)
Nov 01 17:41:42 dd kernel: [] (vfs_kern_mount.part.3) from
[] (btrfs_mount+0x350/0xfa8 [btrfs])
Nov 01 17:41:42 dd kernel: [] (btrfs_mount [btrfs]) from
[] (mount_fs+0x2c/0x164)
Nov 01 17:41:42 dd kernel: [] (mount_fs) from []
(vfs_kern_mount.part.3+0x48/0xe0)
Nov 01 17:41:42 dd kernel: [] (vfs_kern_mount.part.3) from
[] (do_mount+0x1a8/0xc44)
Nov 01 17:41:42 dd kernel: [] (do_mount) from []
(SyS_mount+0x54/0xc0)
Nov 01 17:41:42 dd kernel: [] (SyS_mount) from []
(__sys_trace_return+0x0/0x10)
Nov 01 17:41:42 dd kernel: ---[ end trace 35a26e49cc780cf9 ]---


kernel BUG at fs/btrfs/ctree.c:3182 - occurred during heavy NFS transfer

2017-11-01 Thread Nathan Royce
ODroid XU4
Arch Linux
Kernel 4.13 (custom)
4TB USB 3.0 mechanical WD Drive/hub (had bad-block issues in the past
that were "corrected")
Occurred when using rsync to copy files to an encfs mount over nfs
(only 22MB made it).

Note, I keep the activity on my odroid very low or things start to bug
out left and right. I even set the kernel config so only the 4 LITTLE
ARM cores are used rather than include the other 4 big cores.
Heavy IO such as from an rsync is one such thing to cause a bugout it seems.

Surprisingly, the whigout didn't cause my drive to remount RO.

Nov 01 09:43:46 dd kernel: [ cut here ]
Nov 01 09:43:46 dd kernel: kernel BUG at fs/btrfs/ctree.c:3182!
Nov 01 09:43:46 dd kernel: Internal error: Oops - BUG: 0 [#1] SMP ARM
Nov 01 09:43:46 dd kernel: Modules linked in: nf_conntrack_netlink
nfnetlink cmac ccm ppp_deflate ppp_async ppp_generic slhc bridge stp
llc nf_log_ipv4 ipt_REJECT nf_reject_ipv4 xt_recent nf_log_ipv6
iptable_filter nf_log_common ipt_MASQUERADE xt_LOG
nf_nat_masquerade_ip
Nov 01 09:43:46 dd kernel:  usbserial btrfs xor xor_neon lzo_compress
lzo_decompress zlib_deflate raid6_pq nfsd auth_rpcgss oid_registry
nfs_acl lockd grace crypto_user sunrpc ip_tables x_tables
Nov 01 09:43:46 dd kernel: CPU: 3 PID: 476 Comm: nfsd Not tainted
4.13.0-dirty #2
Nov 01 09:43:46 dd kernel: Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Nov 01 09:43:46 dd kernel: task: e9987080 task.stack: e8c28000
Nov 01 09:43:46 dd kernel: PC is at btrfs_set_item_key_safe+0x138/0x144 [btrfs]
Nov 01 09:43:46 dd kernel: LR is at comp_keys+0x4c/0x68 [btrfs]
Nov 01 09:43:46 dd kernel: pc : []lr : []
psr: 60010013
Nov 01 09:43:46 dd kernel: sp : e8c29890  ip : 006c  fp : 0001
Nov 01 09:43:46 dd kernel: r10: ec8ce000  r9 : d75feb40  r8 : e8c29893
Nov 01 09:43:46 dd kernel: r7 : c52816c8  r6 : c0c05448  r5 : 002e
 r4 : e8c29986
Nov 01 09:43:46 dd kernel: r3 : 00040d00  r2 : 00040d00  r1 : e8c29986
 r0 : 
Nov 01 09:43:46 dd kernel: Flags: nZCv  IRQs on  FIQs on  Mode SVC_32
ISA ARM  Segment none
Nov 01 09:43:46 dd kernel: Control: 10c5387d  Table: 4bc2806a  DAC: 0051
Nov 01 09:43:46 dd kernel: Process nfsd (pid: 476, stack limit = 0xe8c28218)
Nov 01 09:43:46 dd kernel: Stack: (0xe8c29890 to 0xe8c2a000)
Nov 01 09:43:47 dd kernel: 9880:
8b002000 365c 6c00 0001
Nov 01 09:43:47 dd kernel: 98a0:  00040d00 c52816c8 
a000  286a e8c29975
Nov 01 09:43:47 dd kernel: 98c0: d75feb40 bf19a130 00365c8b 
a000   0001
Nov 01 09:43:47 dd kernel: 98e0: dbe24000 bf1ab0ec  c52816c8
00011000  00365c8b 
Nov 01 09:43:47 dd kernel: 9900:  e99f5800 286a 
0001 e9335248  
Nov 01 09:43:47 dd kernel: 9920:  e73472d0  
a000   
Nov 01 09:43:48 dd kernel: 9940:  ec8ce000  
  bf221f04 c0c05448
Nov 01 09:43:48 dd kernel: 9960:  e8c29a7c  
0004 365c8b10  a0006c00
Nov 01 09:43:48 dd kernel: 9980:  5c8b 0036 006c
0100 c500 d0bf41e0 e93350a8
Nov 01 09:43:48 dd kernel: 99a0: 1000  00365c8b 
00a0006c  c5281b00 00040d00
Nov 01 09:43:48 dd kernel: 99c0: d75feb44 c52816c8 c24da070 6000
 a000  0001
Nov 01 09:43:48 dd kernel: 99e0:  bf1cdb60 a000 
0001   
Nov 01 09:43:48 dd kernel: 9a00: 0001 0035 e8c29a7c eb78f9ab
6000  e8c29a8c c24da000
Nov 01 09:43:48 dd kernel: 9a20:  e8c29b74 e93350b4 e9335080
e93350a8  e99f5800 e9335248
Nov 01 09:43:49 dd kernel: 9a40: a000  a000 
ee22d000 ec8ce000 e73472d0 d75feb40
Nov 01 09:43:49 dd kernel: 9a60: e8c29cf8 e9335220 c0c05448 e8c29aa0
c52816c8   
Nov 01 09:43:49 dd kernel: 9a80:  c24da9d0 c24da9d0 e8c29a8c
 2000 c5281bd8 c5281bec
Nov 01 09:43:49 dd kernel: 9aa0:  c07289cc c5281bd8 00040d00
d75feb44   e9335080
Nov 01 09:43:49 dd kernel: 9ac0: d75feb40 d75fe2d0 ffef c0c05448
e73472d0 bf1ce878 e8c29b74 e8c29cf8
Nov 01 09:43:49 dd kernel: 9ae0: 8000  9007 
00365c8b  e8c29b20 f0802000
Nov 01 09:43:49 dd kernel: 9b00: d75feb40  c0162e10 60010013
 e99f5800 c0c57dd8 ee22d000
Nov 01 09:43:49 dd kernel: 9b20: e9335248 ec8ce000  e8c29cf8
e000 e9ad8c80 e9335118 
Nov 01 09:43:49 dd kernel: 9b40: c0c57dd8 e000 ee22d298 0001
e9ad8c80 e8c29b70 e8c28000 c0162e10
Nov 01 09:43:50 dd kernel: 9b60: 60010013   
ee22d284 e8c29b74 e8c29b74 8bc96be0
Nov 01 09:43:50 dd kernel: 9b80: 365c 0100  
00a0 c0235cc4  e73472d0
Nov 01 09:43:50 dd kernel: 9ba0: ecc24390 00040d00 ee991000 c0c05448
ee22d000 ec8ce240 e9335080 ec8ce240
Nov 01 09:43:50 dd 

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-03 Thread Nathan Royce
On Sat, Jun 3, 2017 at 2:57 AM, Oleksij Rempel  wrote:
> Hm... this function and file:
> linux/drivers/net/wireless/ath/ath9k/common-beacon.c
> didn't changed since 2015. So, it should be some thing different.
> Can you run
> git bisect to find exact patch caused this regression?
>
That was the first time I experienced the x/0 issue and don't know how
I'd reproduce it.

> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
>
Thanks for that explanation. I'm not sure it's something I could
tackle though. My best bet in the meantime is to coax systemd to
restart the services when the device pops up. However, every attempt
has failed so far.

> It is possible. If adapter is used in AP mode, then lots of WiFi noise
> is dumped over this interface. I assume the reproducibility depends on
> external environment, not internal.
>
I find this quite believable. I have 2.4ghz happening with the
TP-Link, ZTE Mobley, bluetooth, logitech unifying, usb 3.0. Though all
4 devices are going through the USB 2.0 port, and the tp-link and
mobley have long usb cables in the attic and the hub connects through
a 2m usb extension. So yeah, I've got a lot of variables in play.


ath9k - Division by zero in kernel (as well as firmware panic)

2017-06-02 Thread Nathan Royce
ODroid XU4

$ uname -a
Linux computer 4.12.0-rc3-dirty #1 SMP Wed May 31 15:02:05 CDT 2017
armv7l GNU/Linux

$ lsusb
...
Bus 001 Device 002: ID 2109:2813 VIA Labs, Inc.
Bus 001 Device 010: ID 0cf3:7015 Qualcomm Atheros Communications
TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros AR7010+AR9287]
...

*
Jun 02 16:20:11 computer hostapd[14954]: vwlan0: interface state
COUNTRY_UPDATE->HT_SCAN
Jun 02 16:20:17 computer hostapd[14954]: 20/40 MHz operation not
permitted on channel pri=7 sec=3 based on overlapping BSSes
Jun 02 16:20:18 computer kernel: Division by zero in kernel.
Jun 02 16:20:18 computer kernel: CPU: 1 PID: 14507 Comm: kworker/u16:2
Tainted: GW   4.12.0-rc3-dirty #1
Jun 02 16:20:18 computer kernel: Hardware name: SAMSUNG EXYNOS
(Flattened Device Tree)
Jun 02 16:20:18 computer kernel: Workqueue: phy5 ieee80211_scan_work [mac80211]
Jun 02 16:20:18 computer kernel: [] (unwind_backtrace) from
[] (show_stack+0x10/0x14)
Jun 02 16:20:18 computer kernel: [] (show_stack) from
[] (dump_stack+0x88/0x9c)
Jun 02 16:20:18 computer kernel: [] (dump_stack) from
[] (Ldiv0_64+0x8/0x18)
Jun 02 16:20:18 computer kernel: [] (Ldiv0_64) from
[] (ath9k_get_next_tbtt+0x58/0x5c [ath9k_common])
Jun 02 16:20:18 computer kernel: [] (ath9k_get_next_tbtt
[ath9k_common]) from [] (ath9k_cmn_beacon_config
Jun 02 16:20:18 computer kernel: []
(ath9k_cmn_beacon_config_ap [ath9k_common]) from []
(ath9k_htc_beacon
Jun 02 16:20:18 computer kernel: []
(ath9k_htc_beacon_config_ap [ath9k_htc]) from []
(ath9k_htc_vif_recon
Jun 02 16:20:18 computer kernel: [] (ath9k_htc_vif_reconfig
[ath9k_htc]) from [] (ath9k_htc_sw_scan_compl
Jun 02 16:20:18 computer kernel: []
(ath9k_htc_sw_scan_complete [ath9k_htc]) from []
(__ieee80211_scan_co
Jun 02 16:20:18 computer kernel: []
(__ieee80211_scan_completed [mac80211]) from []
(ieee80211_scan_work+
Jun 02 16:20:18 computer kernel: [] (ieee80211_scan_work
[mac80211]) from [] (process_one_work+0x1d8/0x40
Jun 02 16:20:18 computer kernel: [] (process_one_work) from
[] (worker_thread+0x4c/0x564)
Jun 02 16:20:18 computer kernel: [] (worker_thread) from
[] (kthread+0x14c/0x154)
Jun 02 16:20:18 computer kernel: [] (kthread) from
[] (ret_from_fork+0x14/0x3c)
Jun 02 16:20:18 computer hostapd[14954]: Using interface wlan0 with
hwaddr  and ssid ""
Jun 02 16:20:18 computer kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
vwlan0: link becomes ready
*
This is a new one on me.

The "normal" problem (search shows to be a very old issue) I
consistently (daily or multiple times/day) encounter is:
*
Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
Jun 02 14:55:30 computer kernel: usb 1-1.1: USB disconnect, device number 9
Jun 02 14:55:30 computer systemd-networkd[11959]: vwlan0: Lost carrier
Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
Jun 02 14:55:30 computer kernel: wlan0: deauthenticating from
 by local choice (Reason: 3=DEAUTH_LEAVING)
Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
Jun 02 14:55:30 computer systemd-networkd[11959]: wlan0: Lost carrier
Jun 02 14:55:30 computer systemd[1]: Stopping A simple WPA encrypted
wireless connection using a static IP...
-- Subject: Unit netctl@wlan0.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit netctl@wlan0.service has begun shutting down.
Jun 02 14:55:30 computer kernel: device vwlan0 left promiscuous mode
Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
Jun 02 14:55:30 computer audit: ANOM_PROMISCUOUS dev=vwlan0 prom=0
old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
Jun 02 14:55:30 computer hostapd[13218]: vwlan0: AP-STA-DISCONNECTED 
Jun 02 14:55:30 computer hostapd[13218]: Failed to set beacon parameters
Jun 02 14:55:30 computer hostapd[13218]: vwlan0: INTERFACE-DISABLED
Jun 02 14:55:30 computer kernel: usb 1-1.1: ath9k_htc: USB layer deinitialized
Jun 02 14:55:30 computer systemd[1]: Starting Load/Save RF Kill Switch Status...
-- Subject: Unit systemd-rfkill.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-rfkill.service has begun starting up.
Jun 02 14:55:30 computer systemd[1]: Started Load/Save RF Kill Switch Status.
-- Subject: Unit systemd-rfkill.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-rfkill.service has finished starting up.
--
-- The start-up result is done.
Jun 02 14:55:30 computer network[13261]: Stopping network profile 'wlan0'...
Jun 02 14:55:30 computer kernel: usb 1-1.1: new high-speed USB devic

ath: firmware panic! exccause: 0x0000000d

2017-03-27 Thread Nathan Royce
I find that every time all of the cpu cores are being used, when
compiling the kernel source for example, I end up losing my wireless
adapter.
It seems to be an old issue: https://bbs.archlinux.org/viewtopic.php?id=182173

ARM ODroid XU4

$ uname -a
Linux server 4.11.0-rc1-00315-g106e4da60209-dirty #1 SMP Sun Mar 12
16:44:41 CDT 2017 armv7l GNU/Linux

$ lsusb
...
Bus 003 Device 009: ID 0cf3:7015 Qualcomm Atheros Communications
TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros A
R7010+AR9287]
...

*
Mar 27 02:48:49 server kernel: usb 3-1.2.4: ath: firmware panic!
exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038
Mar 27 02:48:49 server kernel: usb 3-1.2.4: USB disconnect, device number 7
Mar 27 02:48:49 server kernel: ath: phy0: Chip reset failed
Mar 27 02:48:49 server kernel: ath: phy0: Unable to reset channel
(2442 Mhz) reset status -22
Mar 27 02:48:49 server kernel: ath: phy0: Unable to set channel
Mar 27 02:48:49 server kernel: ath: phy0: RX failed to go idle in 10
ms RXSM=0x4ceb
Mar 27 02:48:49 server kernel: ath: phy0: Failed to wakeup in 500us
Mar 27 02:48:49 server kernel: ath: phy0: RX failed to go idle in 10
ms RXSM=0x4ceb
Mar 27 02:48:49 server kernel: ath: phy0: Failed to wakeup in 500us
Mar 27 02:48:50 server kernel: br0: port 2(custom_wlan0) entered disabled state
Mar 27 02:48:50 server audit: ANOM_PROMISCUOUS dev=custom_wlan0 prom=0
old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967
Mar 27 02:48:50 server kernel: device custom_wlan0 left promiscuous mode
Mar 27 02:48:50 server kernel: br0: port 2(custom_wlan0) entered disabled state
Mar 27 02:48:50 server hostapd[422]: custom_wlan0: AP-STA-DISCONNECTED

Mar 27 02:48:50 server systemd-networkd[414]: custom_wlan0: Lost carrier
Mar 27 02:48:50 server hostapd[422]: Failed to set beacon parameters
Mar 27 02:48:50 server hostapd[422]: custom_wlan0: INTERFACE-DISABLED
Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: USB layer deinitialized
Mar 27 02:48:50 server systemd[1]: Starting Load/Save RF Kill Switch Status...
-- Subject: Unit systemd-rfkill.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-rfkill.service has begun starting up.
Mar 27 02:48:50 server systemd[1]: Started Load/Save RF Kill Switch Status.
-- Subject: Unit systemd-rfkill.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-rfkill.service has finished starting up.
--
-- The start-up result is done.
Mar 27 02:48:50 server kernel: usb 3-1.2.4: new high-speed USB device
number 9 using xhci-hcd
Mar 27 02:48:50 server kernel: usb 3-1.2.4: New USB device found,
idVendor=0cf3, idProduct=7015
Mar 27 02:48:50 server kernel: usb 3-1.2.4: New USB device strings:
Mfr=16, Product=32, SerialNumber=48
Mar 27 02:48:50 server kernel: usb 3-1.2.4: Product: USB WLAN
Mar 27 02:48:50 server kernel: usb 3-1.2.4: Manufacturer: ATHEROS
Mar 27 02:48:50 server kernel: usb 3-1.2.4: SerialNumber: 12345
Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: Firmware
ath9k_htc/htc_7010-1.4.0.fw requested
Mar 27 02:48:50 server kernel: usb 3-1.2.4: ath9k_htc: Transferred FW:
ath9k_htc/htc_7010-1.4.0.fw, size: 72812
Mar 27 02:48:50 server kernel: ath9k_htc 3-1.2.4:1.0: ath9k_htc: HTC
initialized with 45 credits
Mar 27 02:48:50 server kernel: ath9k_htc 3-1.2.4:1.0: ath9k_htc: FW Version: 1.4
...
*


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-10 Thread Nathan Royce
Sure, I went ahead and rebuilt it just using the bare exynos_defconfig
and adding XTS and ECB and no other changes.

No flags were used. No patches were used other than the 2 you
provided. Just the barest of bears, the barest of bones, the barest of
deserts, the barest of hairless cats.

I also wiped out the 4.10.1 modules directory and zImage and dtb
before copying them into place.
*
[   16.280951] s5p-jpeg 11f6.jpeg: Samsung S5P JPEG codec
[   16.327434] CPU: 3 PID: 115 Comm: irq/69-1083 Not tainted 4.10.1-dirty #1
[   16.334527] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   16.340533] task: edc52d00 task.stack: edcc
[   16.345040] PC is at post_crypt+0x194/0x1a0 [xts]
[   16.349712] LR is at post_crypt+0x188/0x1a0 [xts]
[   16.354390] pc : []lr : []psr: 200d0113
[   16.354390] sp : edcc1ea8  ip : ed6f38f4  fp : 30702272
[   16.365838] r10: 8ee5436d  r9 :   r8 : ed6f3800
[   16.371023] r7 :   r6 : 0400  r5 :   r4 : 
[   16.377523] r3 : ef5ead22  r2 : 0200  r1 : 0200  r0 : 
[   16.384024] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   16.391128] Control: 10c5387d  Table: 6d6f806a  DAC: 0051
[   16.396847] Process irq/69-1083 (pid: 115, stack limit = 0xedcc0210)
[   16.403519] Stack: (0xedcc1ea8 to 0xedcc2000)
[   16.407853] 1ea0:   c0c08304 ef5ead20 ecd69200
ef5ead20 ecd69200 ed6f39dc
[   16.416011] 1ec0: 0400   0400 
c010f774 c0113bac 
[   16.424156] 1ee0:     
0010 0010 000f
[   16.432302] 1f00: ed6f3800 edcae3bc 000c edcae3e8 
600d0113 ee889d5c bf182764
[   16.440447] 1f20: edcae390 c0566d84  0001 edcacec0
eea14b00  eea14b00
[   16.448592] 1f40: edcacec0 c01651c4 eeb00528 c01651e0 edcc
edcacee4  c01654b4
[   16.456738] 1f60:  c01652b8 eeb00500 edcc 
edcacf00 edcacec0 c0165388
[   16.464884] 1f80: eeb00528 c013673c edcc edcacf00 c0136634
  
[   16.473029] 1fa0:    c0107778 
  
[   16.481174] 1fc0:     
  
[   16.489320] 1fe0:     0013
  
[   16.497473] [] (post_crypt [xts]) from []
(decrypt_done+0x4c/0x54 [xts])
[   16.505877] [] (decrypt_done [xts]) from []
(s5p_aes_interrupt+0x1bc/0x208)
[   16.514544] [] (s5p_aes_interrupt) from []
(irq_thread_fn+0x1c/0x54)
[   16.522592] [] (irq_thread_fn) from []
(irq_thread+0x12c/0x1e0)
[   16.530220] [] (irq_thread) from [] (kthread+0x108/0x138)
[   16.537324] [] (kthread) from []
(ret_from_fork+0x14/0x3c)
[   16.544514] Code: eb471ad2 e598c118 e58d0020 e1a04000 (e5906004)
[   16.550709] ---[ end trace 0e5ce4ea2ad2d7e2 ]---
[   16.555224] genirq: exiting task "irq/69-1083" (115) is an
active IRQ thread (irq 69)
*
I'm sure you could just copy my crypttab and fstab entries that is
shown in my first email.

On Fri, Mar 10, 2017 at 12:06 PM, Krzysztof Kozlowski  wrote:
> On Thu, Mar 09, 2017 at 05:16:35AM -0600, Nathan Royce wrote:
>> Gave it a try on 4.10.1, but still to no avail:
>
> (...)
>
>> Also for the sake of testing, I did not add any FLAGS for compilation this 
>> time.
>
> Damn, I am fixing bugs around but not the one you are hitting. Can you
> also check if exynos_defconfig (+XTS + any other needed setting sfor
> you) also has this issue?
>
> I want to reproduce it but my setup does not use cryptswap. Probably I
> will have to set it up.
>
> Best regards,
> Krzysztof
>


config_s5psss.tar.gz
Description: GNU Zip compressed data


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-09 Thread Nathan Royce
Gave it a try on 4.10.1, but still to no avail:
*
[8.516138] raid6: using intx1 recovery algorithm
[ [0;32m  OK   [0m] Started Flush Journal to Persistent Storage.
[9.692091] Unable to handle kernel NULL pointer dereference at
virtual address 0004
[9.698896] pgd = c0004000
[9.701489] [0004] *pgd=
[9.705055] Internal error: Oops: 17 [#1] SMP ARM
[9.709677] Modules linked in: xor_neon zlib_deflate aes_arm
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
ip_tables x_tables
[9.719177] xor: measuring software checksum speed
[9.727455] CPU: 2 PID: 121 Comm: irq/69-1083 Not tainted 4.10.1-dirty #1
[9.728911]arm4regs  :   304.000 MB/sec
[9.738707] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[9.738913]8regs :   224.000 MB/sec
[9.748924]32regs:   208.000 MB/sec
[9.753095] task: edc80b00 task.stack: edd08000
[9.757626] PC is at post_crypt+0x1b4/0x1c4
[9.758914]neon  :   316.000 MB/sec
[9.758927] xor: using function: neon (316.000 MB/sec)
[9.771040] LR is at post_crypt+0x1a8/0x1c4
[9.775197] pc : []lr : []psr: 200c0013
[9.775197] sp : edd09e90  ip : edcd64f4  fp : 02cfca75
[9.786670] r10: 3df4074e  r9 : c0c0540c  r8 : edcd6400
[9.791831] r7 :   r6 : 0400  r5 :   r4 : 
[9.798333] r3 : ef4a775a  r2 : 0200  r1 : 0200  r0 : 
[9.804834] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[9.811901] Control: 10c5387d  Table: 6c61c06a  DAC: 0051
[9.817618] Process irq/69-1083 (pid: 121, stack limit = 0xedd08218)
[9.824291] Stack: (0xedd09e90 to 0xedd0a000)
[9.828624] 9e80: ef4a7758
ecca6200 ef4a7758 ecca6200
[9.836781] 9ea0: edcd65dc 0400   0400
 eea8f810 0002
[9.844926] 9ec0:     
 0010 0010
[9.853072] 9ee0: 000f 00040a01 ee958390 edcd6400 ee9583bc
000c ee9583e8 
[9.861217] 9f00:  600c0013 ee889d20 c033608c ee958390
c05a7ea8  0001
[9.869363] 9f20: ee957b40 eea8a400 eea8a400 ee957b40 c016ee68
c0c0540c  c016ee84
[9.877508] 9f40: edd08000 ee957b64 eea8a400 c016f198 ee957b80
 c016ef7c 00040a01
[9.885653] 9f60:  eea21380 edd08000  ee957b80
ee957b40 c016f04c eea213a8
[9.893800] 9f80: ee889d20 c0138710 edd08000 ee957b80 c0138608
  
[9.901944] 9fa0:    c0107a38 
  
[9.910089] 9fc0:     
  
[9.918235] 9fe0:     0013
  
[9.926399] [] (post_crypt) from []
(decrypt_done+0x4c/0x54)
[9.933761] [] (decrypt_done) from []
(s5p_aes_interrupt+0x1bc/0x208)
[9.941908] [] (s5p_aes_interrupt) from []
(irq_thread_fn+0x1c/0x54)
[9.949956] [] (irq_thread_fn) from []
(irq_thread+0x14c/0x204)
[9.957585] [] (irq_thread) from [] (kthread+0x108/0x138)
[9.964681] [] (kthread) from []
(ret_from_fork+0x14/0x3c)
[9.971871] Code: eb0114aa e598c118 e58d001c e1a04000 (e5906004)
[9.977963] ---[ end trace 8c160bf6676cfe1c ]---
[9.982560] genirq: exiting task "irq/69-1083" (121) is an
active IRQ thread (irq 69)
[   11.715339] Btrfs loaded, crc32c=crc32c-generic
*

Also for the sake of testing, I did not add any FLAGS for compilation this time.

On Wed, Mar 8, 2017 at 3:15 PM, Krzysztof Kozlowski  wrote:
> On Wed, Mar 08, 2017 at 07:45:42PM +0200, Krzysztof Kozlowski wrote:
> I sent a fix. At least for spin lock recursion in tcrypt.
>
> Could you give it a try?
>
> Best regards,
> Krzysztof


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-06 Thread Nathan Royce
OK, I just tried 4.10.0 and the output is looking the same.

I can't say my setup is all that odd. The cryptographic use is only
with the swap partition found in my original email (seen in Herbert's
reply).

My normal build goes as such:
1) git clean -xdf
2) git reset --hard
3) curl 
https://github.com/tobetter/linux/commit/9cdf86bac1db2d74bf98508226e86679581f8f80.patch
| git apply -
   //usb: host: xhci-plat: Get PHYs for xhci's hcds
4) curl 
https://github.com/tobetter/linux/commit/142cf1b68fa0e1710f3623875d5c269cbbc2f005.patch
| git apply -
   //base: platform: name the device already during allocation
5) curl 
https://github.com/tobetter/linux/commit/3772f11d73289ea40825f40ba5c64b5b0e3888ff.patch
| git apply -
   //phy: exynos5-usbdrd: Calibrate LOS levels for exynos5420/5800
6) sed -i -e "s/static void exynos5420_usbdrd_phy_calibrate/static int
exynos5420_usbdrd_phy_calibrate/" ./drivers/phy/phy-exynos5-usbdrd.c
7) //duplicate entry in drivers/media/usb/au0828/au0828-cards.c for my
0x400 vid tuner.
8) HOST_EXTRACFLAGS="-O3 -pipe -mfpu=neon-vfpv4 -mfloat-abi=hard
-march=armv7-a -mtune=cortex-a15.cortex-a7" make -j 8 zImage
exynos5422-odroidxu4.dtb modules 2>&1 | tee make.log
9) INSTALL_MOD_PATH=./tmp INSTALL_FW_PATH=./tmp make modules_install
firmware_install 2>&1 | tee makeModFirm.log
10) sudo cp -rv ./tmp/lib/* /usr/lib
11) sudo cp -v ./arch/arm/boot/zImage /boot/zImage-4.10.0
12) sudo cp -v ./arch/arm/boot/dts/exynos5422-odroidxu4.dtb
/boot/exynos5422-odroidxu4-4.10.0.dtb
13) sudo ln -s /boot/zImage-4.10.0 /boot/zImage
14) sudo ln -s /boot/exynos5422-odroidxu4-4.10.0.dtb
/boot/exynos5422-odroidxu4.dtb
15) sudo sync
16) sudo systemctl reboot

I've attached the config I use.

On Mon, Mar 6, 2017 at 11:35 AM, Krzysztof Kozlowski  wrote:
> On Mon, Mar 06, 2017 at 10:18:45AM -0600, Nathan Royce wrote:
>> I tried the patch you submitted, however it also fails for the most part.
>>
>> "For the most part" because "xts" is now found.
>> $ grep xts /proc/crypto
>> name : xts(aes)
>> driver   : xts(ecb-aes-s5p)
>
> Ah, so probably I did not fix the original issue but some other... or
> maybe there are multiple issues.
>
> Could you attach your config and any other essential reproduction steps 
> (unusual settings?).
>
> I saw you tried v4.10.1, could you try just v4.10?
>
> Best regards,
> Krzysztof
>
>>
>> Fail:
>> *
>> [   21.057756] xor: using function: neon (352.000 MB/sec)
>> [   21.064243] Unable to handle kernel NULL pointer dereference at
>> virtual address 0004
>> [   21.070966] pgd = c0004000
>> [   21.073599] [0004] *pgd=
>> [   21.077165] Internal error: Oops: 17 [#1] SMP ARM
>> [   21.081836] Modules linked in: xor aes_arm xor_neon zlib_deflate
>> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
>> ip_tables x_tables
>> [   21.095239] CPU: 5 PID: 121 Comm: irq/69-1083 Not tainted 
>> 4.10.1-dirty #1
>> [   21.102288] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [   21.108355] task: ee3e3700 task.stack: edcf6000
>> [   21.112821] PC is at post_crypt+0x1b4/0x1c4
>> [   21.116972] LR is at post_crypt+0x1a8/0x1c4
>> [   21.121131] pc : []lr : []psr: 200c0093
>> [   21.121131] sp : edcf7e68  ip : ec59dcf4  fp : 117ce9ac
>> [   21.132576] r10: 244525e3  r9 : c0c0540c  r8 : ec59dc00
>> [   21.137768] r7 :   r6 : 0400  r5 :   r4 : 
>> [   21.144267] r3 : ef49fcde  r2 : 0200  r1 : 0200  r0 : 
>> [   21.150768] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
>> Segment none
>> [   21.157964] Control: 10c5387d  Table: 6618c06a  DAC: 0051
>> [   21.163677] Process irq/69-1083 (pid: 121, stack limit = 0xedcf6218)
>> [   21.170350] Stack: (0xedcf7e68 to 0xedcf8000)
>> [   21.174684] 7e60:   ef49fcdc ec93f200 ef49fcdc
>> ec93f200 ec59dddc 0400
>> [   21.182853] 7e80:   0400  ef49fcdc
>> c01100fc  
>> [   21.190983] 7ea0:    c0110f80 0010
>> 0010 000f 00040a01
>> [   21.199128] 7ec0:  ec59dc00 c0c0540c  
>> 600c0013 0002 
>> [   21.207274] 7ee0: ee889d20 c033608c eea21c90 c05a80d0 eea21ce8
>> eea21c90 000c 00040a01
>> [   21.215418] 7f00: eea21ce8 eea21c90 000c  eea21ce8
>> c05a8290  0001
>> [   21.223564] 7f20: eea2a600 eea8a400 eea8a400 eea2a600 c016ee68
>> c0c0540c  c016ee84
>> [   21.231710] 7f40: edcf6000 eea2a624 eea8a400 c016f198 eea2a640
>>  c016ef7c 00040a01
>> [   21.239868] 7f60: 000

Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-06 Thread Nathan Royce
22.027743] [] (__handle_domain_irq) from []
(gic_handle_irq+0x38/0x74)
[   22.036061] [] (gic_handle_irq) from []
(__irq_svc+0x6c/0x90)
[   22.043510] Exception stack(0xc0c01f38 to 0xc0c01f80)
[   22.048529] 1f20:
0001 
[   22.056685] 1f40:  c0114e60 c0c0 c0c05490 c0c0542c
c0b4ff88 c0c01f90 
[   22.064832] 1f60:  efffc7c0 600e0013 c0c01f88 c0108480
c0108484 600e0013 
[   22.072978] [] (__irq_svc) from []
(arch_cpu_idle+0x38/0x3c)
[   22.080347] [] (arch_cpu_idle) from []
(do_idle+0x164/0x1f8)
[   22.087708] [] (do_idle) from []
(cpu_startup_entry+0x18/0x1c)
[   22.095258] [] (cpu_startup_entry) from []
(start_kernel+0x374/0x394)
[   22.103389] handlers:
[   22.105635] [] irq_default_primary_handler threaded
[] s5p_aes_interrupt
[   22.114046] Disabling IRQ #69
[   23.496638] Btrfs loaded, crc32c=crc32c-generic
*
Do I need to add "irqpoll" to my u-boot boot config now?

Yeah, the mailing list bounced my original email because I wasn't
using plain-text, but my full post shows in Herbert's reply.

On Sun, Mar 5, 2017 at 11:16 AM, Krzysztof Kozlowski  wrote:
> On Fri, Mar 03, 2017 at 12:02:10PM +0800, Herbert Xu wrote:
>> On Thu, Mar 02, 2017 at 05:35:30PM -0600, Nathan Royce wrote:
>> > ARM ODroid XU4
>> >
>> > $ cat /proc/config.gz | gunzip | grep XTS
>> > CONFIG_CRYPTO_XTS=y
>> >
>> > $ grep xts /proc/crypto
>> > //4.9.13
>> > name : xts(aes)
>> > driver   : xts(aes-generic)
>> > //4.10.1
>> > 
>> > //cbc can be found though
>> >
>> > CRYPTTAB:
>> > cryptswap1 UUID= /dev/urandom
>> > swap,offset=2048,cipher=aes-xts-plain64:sha512,size=512,nofail
>> >
>> > FSTAB:
>> > /dev/mapper/cryptswap1 none swap sw 0 0
>> >
>> > Boot Log:
>> > [   10.535985] [ cut here ]
>> > [   10.539252] WARNING: CPU: 0 PID: 0 at crypto/skcipher.c:430
>> > skcipher_walk_first+0x13c/0x14c
>> > [   10.547542] Modules linked in: xor xor_neon aes_arm zlib_deflate
>> > dm_crypt raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
>> > ip_tables x_tables
>> > [   10.561716] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.1-dirty #1
>> > [   10.568049] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> > [   10.574171] [] (unwind_backtrace) from []
>> > (show_stack+0x10/0x14)
>> > [   10.581893] [] (show_stack) from []
>> > (dump_stack+0x84/0x98)
>> > [   10.589073] [] (dump_stack) from []
>> > (__warn+0xe8/0x100)
>> > [   10.595975] [] (__warn) from []
>> > (warn_slowpath_null+0x20/0x28)
>> > [   10.603546] [] (warn_slowpath_null) from []
>> > (skcipher_walk_first+0x13c/0x14c)
>> > [   10.612390] [] (skcipher_walk_first) from []
>> > (skcipher_walk_virt+0x1c/0x38)
>> > [   10.621056] [] (skcipher_walk_virt) from []
>> > (post_crypt+0x38/0x1c4)
>> > [   10.629022] [] (post_crypt) from []
>> > (decrypt_done+0x4c/0x54)
>> > [   10.636389] [] (decrypt_done) from []
>> > (s5p_aes_complete+0x70/0xfc)
>> > [   10.644274] [] (s5p_aes_complete) from []
>> > (s5p_aes_interrupt+0x134/0x1a0)
>> > [   10.652771] [] (s5p_aes_interrupt) from []
>> > (__handle_irq_event_percpu+0x9c/0x124)
>>
>> This looks like a bug in the s5p driver.  It's calling the completion
>> function straight from the IRQ handler, which is triggering the
>> sanity check in skcipher_walk_first.
>>
>> The s5p driver needs to schedule a tasklet to call the completion
>> function.
>
> Tasklet... or threaded IRQ handler maybe? I sent a fix.
>
> BTW, I subscribe the crypto list but I could not find the original email
> there.
>
> Best regards,
> Krzysztof
>


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-03 Thread Nathan Royce
Yup, when I disabled the s5p driver, xts DID show in the /proc/crypto list.

Heh, I was about to ask if it was something I should push towards
another maintainer for s5p stuff, but found you listed in that as
well.
If I am incorrect in that assumption, do let me know whom else I
should make aware of this issue.
Also let me know if you would like the rest of the kernel panic. Maybe
you already have enough to go on and don't need it.

Thanks for all that clarity.

On Fri, Mar 3, 2017 at 6:04 AM, Herbert Xu  wrote:
> On Fri, Mar 03, 2017 at 04:36:18AM -0600, Nathan Royce wrote:
>> I do have ECB selected as well:
>> DM_CRYPT=y
>> CRYPTO_ECB=y
>> CRYPTO_XTS=y
>>
>> name : ecb(aes)
>> driver   : ecb-aes-s5p
>> module   : kernel
>> priority : 100
>> refcnt   : 1
>> selftest : passed
>> internal : no
>> type : ablkcipher
>> async: yes
>> blocksize: 16
>> min keysize  : 16
>> max keysize  : 32
>> ivsize   : 0
>> geniv: 
>> //still no "xts" can be found in the list
>
> Weird.  So you can't find any instances of xts in /proc/crypto
> at all? Even if the self-test fails it should still register an
> entry there...
>
> In any case, I think disabling the s5p driver should work at
> least.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-03 Thread Nathan Royce
I do have ECB selected as well:
DM_CRYPT=y
CRYPTO_ECB=y
CRYPTO_XTS=y

name : ecb(aes)
driver   : ecb-aes-s5p
module   : kernel
priority : 100
refcnt   : 1
selftest : passed
internal : no
type : ablkcipher
async: yes
blocksize: 16
min keysize  : 16
max keysize  : 32
ivsize   : 0
geniv: 
//still no "xts" can be found in the list

I saw this about the regression that sounds similar to my issue,
except even when I built-in dm_crypt (no initramfs. just diving
straight into system), it still fails:
http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg23748.html

On Fri, Mar 3, 2017 at 3:33 AM, Herbert Xu  wrote:
> On Fri, Mar 03, 2017 at 03:00:26AM -0600, Nathan Royce wrote:
>> OK, I went ahead and enabled self tests
>> "CRYPTO_MANAGER_DISABLE_TESTS=n", and my system was able to boot,
>> albeit with failures:
>> *
>> Mar 02 23:14:38 server kernel: ---[ end trace 1c8a91f28cbcebf3 ]---
>> Mar 02 23:14:38 server kernel: alg: skcipher: encryption failed on
>> test 1 for xts(ecb-aes-s5p): ret=35
>> Mar 02 23:14:38 server kernel: device-mapper: table: 254:0: crypt:
>> Error allocating crypto tfm
>> Mar 02 23:14:38 server kernel: device-mapper: ioctl: error adding
>> target to table
>> Mar 02 23:14:39 server systemd-cryptsetup[234]: Failed to activate
>> with key file '/dev/urandom': Invalid argument
>> *
>> (weird that it asked for the passphrase)
>>
>> But I do question whether the root issue is related to s5p... Maybe
>> there is a correlation in the warning, but to me it looks like the
>> issue is something else.
>
> I see.  Do you have ECB enabled in your config? The new XTS requires
> ECB to be present so that could be your problem.
>
> There is already a patch on its way to stable to add the Kconfig
> select on ECB.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: XTS Crypto Not Found In /proc/crypto Even After Compiled for 4.10.1.

2017-03-03 Thread Nathan Royce
OK, I went ahead and enabled self tests
"CRYPTO_MANAGER_DISABLE_TESTS=n", and my system was able to boot,
albeit with failures:
*
Mar 02 23:14:38 server kernel: ---[ end trace 1c8a91f28cbcebf3 ]---
Mar 02 23:14:38 server kernel: alg: skcipher: encryption failed on
test 1 for xts(ecb-aes-s5p): ret=35
Mar 02 23:14:38 server kernel: device-mapper: table: 254:0: crypt:
Error allocating crypto tfm
Mar 02 23:14:38 server kernel: device-mapper: ioctl: error adding
target to table
Mar 02 23:14:39 server systemd-cryptsetup[234]: Failed to activate
with key file '/dev/urandom': Invalid argument
*
(weird that it asked for the passphrase)

But I do question whether the root issue is related to s5p... Maybe
there is a correlation in the warning, but to me it looks like the
issue is something else.
In my OP, I noted that the xts crypto isn't even found in /proc/crypto
in 4.10. I'd think it would at least be listed, even if it isn't used.
CBC is listed in /proc/crypto with kernel 4.9.13 and 4.10.1 (cbc-aes-s5p)
XTS is listed in /proc/crypto with kernel 4.9.13 but NOT 4.10.1

I should also add that I didn't include other tainted messages since
they followed the messages I first posted.
I was assuming that when the first issue would work, the others would
follow suit. I just didn't want to inundate with possible junk.
I still have the log file if you think it would be helpful to post the rest.

PS: I also noticed the bounce from my first mail submission because I
didn't enable plain-text for the e-mail (marked as spam because the
email contained html). I rectified that for this reply.

On Thu, Mar 2, 2017 at 10:02 PM, Herbert Xu  wrote:
> On Thu, Mar 02, 2017 at 05:35:30PM -0600, Nathan Royce wrote:
>> ARM ODroid XU4
>>
>> $ cat /proc/config.gz | gunzip | grep XTS
>> CONFIG_CRYPTO_XTS=y
>>
>> $ grep xts /proc/crypto
>> //4.9.13
>> name : xts(aes)
>> driver   : xts(aes-generic)
>> //4.10.1
>> 
>> //cbc can be found though
>>
>> CRYPTTAB:
>> cryptswap1 UUID= /dev/urandom
>> swap,offset=2048,cipher=aes-xts-plain64:sha512,size=512,nofail
>>
>> FSTAB:
>> /dev/mapper/cryptswap1 none swap sw 0 0
>>
>> Boot Log:
>> [   10.535985] [ cut here ]
>> [   10.539252] WARNING: CPU: 0 PID: 0 at crypto/skcipher.c:430
>> skcipher_walk_first+0x13c/0x14c
>> [   10.547542] Modules linked in: xor xor_neon aes_arm zlib_deflate
>> dm_crypt raid6_pq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
>> ip_tables x_tables
>> [   10.561716] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.1-dirty #1
>> [   10.568049] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [   10.574171] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14)
>> [   10.581893] [] (show_stack) from []
>> (dump_stack+0x84/0x98)
>> [   10.589073] [] (dump_stack) from []
>> (__warn+0xe8/0x100)
>> [   10.595975] [] (__warn) from []
>> (warn_slowpath_null+0x20/0x28)
>> [   10.603546] [] (warn_slowpath_null) from []
>> (skcipher_walk_first+0x13c/0x14c)
>> [   10.612390] [] (skcipher_walk_first) from []
>> (skcipher_walk_virt+0x1c/0x38)
>> [   10.621056] [] (skcipher_walk_virt) from []
>> (post_crypt+0x38/0x1c4)
>> [   10.629022] [] (post_crypt) from []
>> (decrypt_done+0x4c/0x54)
>> [   10.636389] [] (decrypt_done) from []
>> (s5p_aes_complete+0x70/0xfc)
>> [   10.644274] [] (s5p_aes_complete) from []
>> (s5p_aes_interrupt+0x134/0x1a0)
>> [   10.652771] [] (s5p_aes_interrupt) from []
>> (__handle_irq_event_percpu+0x9c/0x124)
>
> This looks like a bug in the s5p driver.  It's calling the completion
> function straight from the IRQ handler, which is triggering the
> sanity check in skcipher_walk_first.
>
> The s5p driver needs to schedule a tasklet to call the completion
> function.
>
> Do you have crypto self-test enabled? If so it should've caught
> this at run-time.  Otherwise you can disable the s5p driver until
> it's fixed.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt