On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote: > Hi, > > > There is some race condition between tty_port_put and flush_to_ldisc > which lead to use after free case: > (Kernel 4.1) > > [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 > ... > ... > ... > > [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] > (tty_ldisc_ref+0x24/0x60) > [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] > (flush_to_ldisc+0x6c/0x21c) > [1403.5132] r5:dbcd4a84 r4:00000000 > [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] > (process_one_work+0x214/0x570) > [1403.5132] r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 > r5:cb37b780 > [1403.5132] r4:dbcd4a84 > [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] > (worker_thread+0x60/0x580) > [1403.5132] r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 > r5:cb37b798 > [1403.5132] r4:cb37b780 > [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] > (kthread+0xec/0x104) > [1403.5132] r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 > r5:d8a3fc80 > [1403.5132] r4:00000000 > [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c) > > > for checking further we entered some debug prints and added delay in > flush_to_ldisc to reproduce > and seems there is some issue with workqueue implementation of TTY: > > bool tty_buffer_cancel_work(struct tty_port *port) > { > bool ret; > ret = cancel_work_sync(&port->buf.work); // Check return value of > cancel_work_sync > pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned > int)&port->buf.work, (void *)_RET_IP_, ret); > return ret; > } > > static void flush_to_ldisc(struct work_struct *work) > { > ... > mdelay(100); // Added Delay to reproduce race > > if (flag_work_cancel) { > pr_emerg("scheduled work after stopping work %x\n", (unsigned > int)work); > > .... > } > > static void pty_cleanup(struct tty_struct *tty) > { > ... > flag_work_cancel = 1; > ... > } > > > [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c 0 // > Since return is 0 so no work is pending > > [1403.5129] scheduled work after stopping work dbcd4a84 // Still same work > is scheduled after cancelled > [1403.5130] Unable to handle kernel paging request at virtual address > 6b6b6b83 // Kernel OOPs occured because of use after free
Ok, after my initial "use a newer kernel" comment, this really does look strange. There has also been a lot of workqueue fixes and rework since 4.1, and that might be the thing that fixes this issue here. However, are you sure you are not just calling flush_to_ldisc() directly through some codepath somehow? If you look at the stack in the pr_emerg() message, where did it come from? From the same workqueue that you already stopped? Testing on a newer kernel would be great, if possible. thanks, greg k-h