On 2018-10-22 14:28, Sean Young wrote:
On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote:
On 10/22/2018 11:14 AM, Sean Young wrote:
On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote:
On 2018-10-22 10:59, Sean Young wrote:
On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote:
Hi Sean,

Can you take a look at this, it appears to be an RC issue, see my analysis 
below.

On 10/20/2018 03:26 PM, Torbjorn Jansson wrote:
Hello

i'm using the pulse8 usb cec adapter to control my tv.
i have a few scripts that poll the power status of my tv and after a while it 
stops working returning errors when trying to check if tv is on or off.
this i think matches a kernel oops i'm seeing that i suspect is related to this.

i have sometimes been able to recover from this problem by completely cutting 
power to my tv and also unplugging the usb cec adapter.
i have a feeling that the tv is at least partly to blame for cec-ctl not 
working but in any case there shouldn't be a kernel oops.


also every now and then i see this in dmesg:
cec cec0: transmit: failed 05
cec cec0: transmit: failed 06
but that doesn't appear to do any harm as far as i can tell.

any idea whats causing the oops?

the ops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P           OE 
4.18.12-200.fc28.x86_64 #1
Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018
Workqueue: events pulse8_irq_work_handler [pulse8_cec]
RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core]

Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, 
flags);

The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is 
called
from rc_register_device(), except when the protocol is CEC:

          /* Ensure that the lirc kfifo is setup before we start the thread */
          if (dev->allowed_protocols != RC_PROTO_BIT_CEC) {
                  rc = ir_lirc_register(dev);
                  if (rc < 0)
                          goto out_rx;
          }

So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was 
never
initialized.

Could this be fall-out from the lirc changes you did not too long ago?

Yes, this is broken. My bad, sorry. I think this must have been broken since
v4.16. I can write a patch but I don't have a patch but I'm on the train
to ELCE in Edinburgh now, with no hardware to test on.


Sean


the kernel oops have been happening for a while now.
yesterday when i checked my logs i can see them at least back a couple of
months when i was running 4.17

also my scripts to poll status of my tv and turn it on/off works for a while
so it doesn't crash right away.
maybe it only crashes when i send cec command to turn on/off tv and only
polling for status is no problem.


i think i have a separate issue too because i had problems even before the
kernel oopses started.
but i suspect this is caused by my tv locking up the cec bus because
unplugging power to tv for a few minutes (i must wait or it will still be
just as broken) and then back used to resolve the cec errors from my
scripts.


Would you be able to test the following patch please?

Sean,

I think you should be able to test this with the vivid driver. Load the vivid 
driver,
run:

cec-ctl --tv; cec-ctl -d1 --playback

Then:

cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5

Ah, thanks. That will help with testing/reproducing.
That said, I tried this, but it doesn't crash for me, but perhaps I need to run
some RC command first...

Hmm I think those commands should be enough. It probably needs
CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now.

Thanks,

Sean


FYI the commands i run is as follows.

getting status, this is run frequently several times per minute (once every 10 or 15 seconds):
cec-ctl --to=0 --give-device-power-status

then when i want to power on or off this is run:
cec-ctl --to=0 --image-view-on
or
cec-ctl --to=0 --standby


it usually takes day or two before i get a kernel oops.
but i haven't studied super closely when exactly the oops happens in relation to what commands i sent since by now I'm used to it not working most of the time.

Reply via email to