Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Quoting Jens Axboe (2013-04-19 09:32:50) > > > > No wonder this thing crashes. Chris, can't the original bio carry > > bbio in bi_private and let end_bio_extent_readpage() free the bbio > > instead of abusing bi_bdev like this? > > Ugh, wtf. > > Chris, time for a swim in the bay :-) Yeah, I can't really defend this one. We needed a space for an int and I assumed end_io meant the FS was free to do horrible things. Really though, I'll just take a quick dip in the lake and patch this out of btrfs. Jan is probably right about changing around our endio callbacks to explicitly pass the mirror, it should be less complex and cleaner. Many thanks to everyone here that tracked it down. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: > (cc'ing btrfs people) > > On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: > > RIP: 0010:[] [] > > ftrace_raw_event_block_bio_complete+0x73/0xf0 > ... > > [] bio_endio+0x80/0x90 > > [] btrfs_end_bio+0xf6/0x190 [btrfs] > > [] bio_endio+0x3d/0x90 > > [] req_bio_endio+0xa3/0xe0 > > Ugh > > In fs/btrfs/volumes.c > > static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) > { > ... > bio->bi_bdev = (struct block_device *) > (unsigned long)bbio->mirror_num; > ... > } > > static void btrfs_end_bio(struct bio *bio, int err) > { > ... > bio->bi_bdev = (struct block_device *) > (unsigned long)bbio->mirror_num; > > ... > } > > In fs/btrfs/extent_io.c > > static void end_bio_extent_readpage(struct bio *bio, int err) > { > int mirror; > ... > mirror = (int)(unsigned long)bio->bi_bdev; > ... > } > > Ewweehh > > No wonder this thing crashes. Chris, can't the original bio carry > bbio in bi_private and let end_bio_extent_readpage() free the bbio > instead of abusing bi_bdev like this? Ugh, wtf. Chris, time for a swim in the bay :-) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: > On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: > > No wonder this thing crashes. Chris, can't the original bio carry > > bbio in bi_private and let end_bio_extent_readpage() free the bbio > > instead of abusing bi_bdev like this? > > BTW, I think it's a bit too late to fix this properly from btrfs side > unless we're gonna do -rc8, so let's revert the TP patch for now and > sort this out in the next devel cycle. AFAICS, while disturbingly > (ha|yu)cky, the bi_bdev trick should be okay without the new TP. The tracing commit was already backed out yesterday, so no problems there. I figured it'd be a bit too late to fix something this nasty in a suitably clean and straight forward fashion. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Quoting Tejun Heo (2013-04-19 01:57:54) > > Ewweehh > > No wonder this thing crashes. Chris, can't the original bio carry > bbio in bi_private and let end_bio_extent_readpage() free the bbio > instead of abusing bi_bdev like this? Yes, we can definitely carry bbio up higher in the stack. I'll patch it up right now. I do agree that it'll be too big for -final, but we'll have it either way. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Fri, April 19, 2013 at 07:57 (+0200), Tejun Heo wrote: > (cc'ing btrfs people) > > On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: >> RIP: 0010:[] [] >> ftrace_raw_event_block_bio_complete+0x73/0xf0 > ... >> [] bio_endio+0x80/0x90 >> [] btrfs_end_bio+0xf6/0x190 [btrfs] >> [] bio_endio+0x3d/0x90 >> [] req_bio_endio+0xa3/0xe0 > > Ugh > > In fs/btrfs/volumes.c > > static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) > { > ... > bio->bi_bdev = (struct block_device *) > (unsigned long)bbio->mirror_num; > ... > } > > static void btrfs_end_bio(struct bio *bio, int err) > { > ... > bio->bi_bdev = (struct block_device *) > (unsigned long)bbio->mirror_num; > > ... > } > > In fs/btrfs/extent_io.c > > static void end_bio_extent_readpage(struct bio *bio, int err) > { > int mirror; > ... > mirror = (int)(unsigned long)bio->bi_bdev; > ... > } > > Ewweehh > > No wonder this thing crashes. Chris, can't the original bio carry > bbio in bi_private and let end_bio_extent_readpage() free the bbio > instead of abusing bi_bdev like this? Oops. It's been my patch back in 2011 (commit 2774b2ca3), sent as an RFC-Patch and just slipped in without further discussion of that exact change. Hackish, yes - my reasoning was because the block layer changed bio->bi_bdev anyway, no one would want to look into it after the bio returned (and in fact it didn't hurt for like two years now). Although the block layer changes bi_bdev, it stays a valid bdev pointer, I admit. One way around this would be what you suggest, however that would mean the caller of (btrfs|btree)_submit_bio_hook gets its completion called in the end, but must know that the private is in fact a bbio which in turn carries the caller's private. Doesn't sound clean to me, either. The best idea I currently have is to add a dispatcher function that does the freeing of bbio and calls the actual completion with mirror_num as a separate parameter. That would make all the btrfs completions incompatible with bio_end_io_t, but it shouldn't hurt. At least now I know I wasn't invited to LSF for a good reason :-) -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/19/2013 02:17 PM, Tejun Heo wrote: > On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: >> No wonder this thing crashes. Chris, can't the original bio carry >> bbio in bi_private and let end_bio_extent_readpage() free the bbio >> instead of abusing bi_bdev like this? > > BTW, I think it's a bit too late to fix this properly from btrfs side > unless we're gonna do -rc8, so let's revert the TP patch for now and > sort this out in the next devel cycle. AFAICS, while disturbingly > (ha|yu)cky, the bi_bdev trick should be okay without the new TP. Thank you for investigating this. And sorry for my incompleted panic picture before that let you detour. :-( Regards, Wanlong Gao > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: > No wonder this thing crashes. Chris, can't the original bio carry > bbio in bi_private and let end_bio_extent_readpage() free the bbio > instead of abusing bi_bdev like this? BTW, I think it's a bit too late to fix this properly from btrfs side unless we're gonna do -rc8, so let's revert the TP patch for now and sort this out in the next devel cycle. AFAICS, while disturbingly (ha|yu)cky, the bi_bdev trick should be okay without the new TP. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? BTW, I think it's a bit too late to fix this properly from btrfs side unless we're gonna do -rc8, so let's revert the TP patch for now and sort this out in the next devel cycle. AFAICS, while disturbingly (ha|yu)cky, the bi_bdev trick should be okay without the new TP. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/19/2013 02:17 PM, Tejun Heo wrote: On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? BTW, I think it's a bit too late to fix this properly from btrfs side unless we're gonna do -rc8, so let's revert the TP patch for now and sort this out in the next devel cycle. AFAICS, while disturbingly (ha|yu)cky, the bi_bdev trick should be okay without the new TP. Thank you for investigating this. And sorry for my incompleted panic picture before that let you detour. :-( Regards, Wanlong Gao Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Fri, April 19, 2013 at 07:57 (+0200), Tejun Heo wrote: (cc'ing btrfs people) On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: RIP: 0010:[812484d3] [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 ... [811b6c10] bio_endio+0x80/0x90 [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs] [811b6bcd] bio_endio+0x3d/0x90 [81249873] req_bio_endio+0xa3/0xe0 Ugh In fs/btrfs/volumes.c static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } static void btrfs_end_bio(struct bio *bio, int err) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } In fs/btrfs/extent_io.c static void end_bio_extent_readpage(struct bio *bio, int err) { int mirror; ... mirror = (int)(unsigned long)bio-bi_bdev; ... } Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? Oops. It's been my patch back in 2011 (commit 2774b2ca3), sent as an RFC-Patch and just slipped in without further discussion of that exact change. Hackish, yes - my reasoning was because the block layer changed bio-bi_bdev anyway, no one would want to look into it after the bio returned (and in fact it didn't hurt for like two years now). Although the block layer changes bi_bdev, it stays a valid bdev pointer, I admit. One way around this would be what you suggest, however that would mean the caller of (btrfs|btree)_submit_bio_hook gets its completion called in the end, but must know that the private is in fact a bbio which in turn carries the caller's private. Doesn't sound clean to me, either. The best idea I currently have is to add a dispatcher function that does the freeing of bbio and calls the actual completion with mirror_num as a separate parameter. That would make all the btrfs completions incompatible with bio_end_io_t, but it shouldn't hurt. At least now I know I wasn't invited to LSF for a good reason :-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Quoting Tejun Heo (2013-04-19 01:57:54) Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? Yes, we can definitely carry bbio up higher in the stack. I'll patch it up right now. I do agree that it'll be too big for -final, but we'll have it either way. -chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: On Thu, Apr 18, 2013 at 10:57:54PM -0700, Tejun Heo wrote: No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? BTW, I think it's a bit too late to fix this properly from btrfs side unless we're gonna do -rc8, so let's revert the TP patch for now and sort this out in the next devel cycle. AFAICS, while disturbingly (ha|yu)cky, the bi_bdev trick should be okay without the new TP. The tracing commit was already backed out yesterday, so no problems there. I figured it'd be a bit too late to fix something this nasty in a suitably clean and straight forward fashion. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: (cc'ing btrfs people) On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: RIP: 0010:[812484d3] [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 ... [811b6c10] bio_endio+0x80/0x90 [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs] [811b6bcd] bio_endio+0x3d/0x90 [81249873] req_bio_endio+0xa3/0xe0 Ugh In fs/btrfs/volumes.c static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } static void btrfs_end_bio(struct bio *bio, int err) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } In fs/btrfs/extent_io.c static void end_bio_extent_readpage(struct bio *bio, int err) { int mirror; ... mirror = (int)(unsigned long)bio-bi_bdev; ... } Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? Ugh, wtf. Chris, time for a swim in the bay :-) -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Quoting Jens Axboe (2013-04-19 09:32:50) No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? Ugh, wtf. Chris, time for a swim in the bay :-) Yeah, I can't really defend this one. We needed a space for an int and I assumed end_io meant the FS was free to do horrible things. Really though, I'll just take a quick dip in the lake and patch this out of btrfs. Jan is probably right about changing around our endio callbacks to explicitly pass the mirror, it should be less complex and cleaner. Many thanks to everyone here that tracked it down. -chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
(cc'ing btrfs people) On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: > RIP: 0010:[] [] > ftrace_raw_event_block_bio_complete+0x73/0xf0 ... > [] bio_endio+0x80/0x90 > [] btrfs_end_bio+0xf6/0x190 [btrfs] > [] bio_endio+0x3d/0x90 > [] req_bio_endio+0xa3/0xe0 Ugh In fs/btrfs/volumes.c static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { ... bio->bi_bdev = (struct block_device *) (unsigned long)bbio->mirror_num; ... } static void btrfs_end_bio(struct bio *bio, int err) { ... bio->bi_bdev = (struct block_device *) (unsigned long)bbio->mirror_num; ... } In fs/btrfs/extent_io.c static void end_bio_extent_readpage(struct bio *bio, int err) { int mirror; ... mirror = (int)(unsigned long)bio->bi_bdev; ... } Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 10:30 PM, Jens Axboe wrote: > On Thu, Apr 18 2013, Wanlong Gao wrote: >> On 04/18/2013 09:35 PM, Jens Axboe wrote: >>> On Thu, Apr 18 2013, Wanlong Gao wrote: > A bio is always fully initialized, regardless of which internal > allocator it came from. If people are doing private kmallocs, then they > better be using bio_init() as well. > > Wanlong, would it be possible to get a full dmesg on boot see I can see > what drivers and file systems are in use? Anything special about your > setup. Sure, attached. >>> >>> Does the below help? >> >> Still got panic with this patch. :-( > > Could you capture the full dump? I'd like to see what rcx was, and that > seems to have scrolled off. > I got the panic message using netconsole: EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda8): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda9): warning: maximal mount count reached, running e2fsck is recommended EXT4-fs (sda9): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda10): mounted filesystem with ordered data mode. Opts: (null) SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled XFS (sda11): Mounting Filesystem XFS (sda11): Ending clean mount kjournald starting. Commit interval 5 seconds EXT3-fs (sda12): warning: maximal mount count reached, running e2fsck is recommended EXT3-fs (sda12): using internal journal EXT3-fs (sda12): mounted filesystem with ordered data mode XFS (sda13): Mounting Filesystem XFS (sda13): Ending clean mount xor: measuring software checksum speed prefetch64-sse: 7572.000 MB/sec generic_sse: 6300.000 MB/sec xor: using function: prefetch64-sse (7572.000 MB/sec) raid6: sse2x14796 MB/s raid6: sse2x25449 MB/s raid6: sse2x45816 MB/s raid6: using algorithm sse2x4 (5816 MB/s) raid6: using ssse3x2 recovery algorithm Btrfs loaded device fsid 01548710-abce-4290-801d-5f19dde14497 devid 1 transid 131526 /dev/sda14 BUG: unable to handle kernel NULL pointer dereference at 0001 IP: [] ftrace_raw_event_block_bio_complete+0x73/0xf0 PGD 0 Oops: [#1] SMP Modules linked in: btrfs raid6_pq zlib_deflate xor xfs libcrc32c netconsole ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp ppdev parport_pc parport fuse cpufreq_ondemand cachefiles fscache bridge stp llc ipv6 ext3 jbd dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support sg acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode serio_raw pcspkr i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core igb dca i2c_algo_bit i2c_core ptp pps_core dm_mod ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc7-4-gbb33db7 #1 Lenovo Lenovo WQ TR280 G3/TR350 G7/TB36D RIP: 0010:[] [] ftrace_raw_event_block_bio_complete+0x73/0xf0 RSP: 0018:8802bfc63bb8 EFLAGS: 00010202 RAX: 8802b1e88698 RBX: 8802b05f6bc0 RCX: 0001 RDX: RSI: RDI: 8802b1e8869c RBP: 8802bfc63c08 R08: 000a R09: R10: 002c R11: R12: 81a95b60 R13: 0100 R14: 8802b1e88694 R15: FS: () GS:8802bfc6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0001 CR3: 01a0c000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper/3 (pid: 0, threadinfo 8802b4598000, task 8802b4594a80) Stack: 8802b4546c40 0286 0100 8802b4546c40 8802bfc63c08 8802b1f82160 8802b05f6bc0 1000 8802bfc63c38 811b6c10 Call Trace: [] bio_endio+0x80/0x90 [] btrfs_end_bio+0xf6/0x190 [btrfs] [] bio_endio+0x3d/0x90 [] req_bio_endio+0xa3/0xe0 [] blk_update_request+0x10f/0x480 [] blk_update_bidi_request+0x27/0xb0 [] blk_end_bidi_request+0x2f/0x80 [] blk_end_request+0x10/0x20 [] scsi_end_request+0x40/0xb0 [] scsi_io_completion+0x9f/0x660 [] scsi_finish_command+0xc9/0x130 [] scsi_softirq_done+0x147/0x170 [] blk_done_softirq+0x82/0xa0 [] __do_softirq+0xe7/0x250 [] irq_exit+0xb5/0xc0 [] do_IRQ+0x66/0xe0 [] common_interrupt+0x6d/0x6d [] ? cpuidle_wrap_enter+0x50/0x90 [] ? cpuidle_wrap_enter+0x49/0x90 [] cpuidle_enter_tk+0x10/0x20 [] cpuidle_enter_state+0x17/0x50 [] cpuidle_idle_call+0xaa/0x110 []
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Could it be a module not recompiled with a newer kernel? -- Steve Jens Axboe wrote: > >Hmm dunno. It happens right after we've completed the bio, which >touches >a lot of fields too. bi_bdev sits between bi_next (which we definitely >used) and bi_flags. > >But adding slab use-after-free debugging would show for sure. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Linus Torvalds wrote: > On Thu, Apr 18, 2013 at 11:13 AM, Jens Axboe wrote: > > On Thu, Apr 18 2013, Tejun Heo wrote: > >> On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: > >> > > >> > Yep, thanks Linus for that hint... Must be someone abusing it for a > >> > flag field post submission? Crazy. > >> > >> Let's hope that's not the case because there'll be blood if it is. :) > > > > Yeah, it's beyond the amount of crazy I've come to expect from various > > random users of IO interfaces :-) > > I think it's more likely to be some use-after-free after a long timeout. > > Wanlong says it happens a few minutes after boot, so maybe something > times out a command, does the blk_complete_request(), and free's the > bio, which gets re-used before the softirq actually ends up running. > > I note that Wanlong uses the SLAB allocator, not the SLUB one. I > wonder if the thing goes away with SLUB, and if not, if > CONFIG_SLUB_DEBUG_ON=y might help debug it? Hmm dunno. It happens right after we've completed the bio, which touches a lot of fields too. bi_bdev sits between bi_next (which we definitely used) and bi_flags. But adding slab use-after-free debugging would show for sure. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 11:13 AM, Jens Axboe wrote: > On Thu, Apr 18 2013, Tejun Heo wrote: >> On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: >> > >> > Yep, thanks Linus for that hint... Must be someone abusing it for a >> > flag field post submission? Crazy. >> >> Let's hope that's not the case because there'll be blood if it is. :) > > Yeah, it's beyond the amount of crazy I've come to expect from various > random users of IO interfaces :-) I think it's more likely to be some use-after-free after a long timeout. Wanlong says it happens a few minutes after boot, so maybe something times out a command, does the blk_complete_request(), and free's the bio, which gets re-used before the softirq actually ends up running. I note that Wanlong uses the SLAB allocator, not the SLUB one. I wonder if the thing goes away with SLUB, and if not, if CONFIG_SLUB_DEBUG_ON=y might help debug it? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: > On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: > > > If I'm not mistaken, it seems like we have a bio which has 1 as its > > > ->bi_bdev. Ummm no idea whatsoever how that can happen tho right > > > now. Looking into it. > > > > Yep, thanks Linus for that hint... Must be someone abusing it for a > > flag field post submission? Crazy. > > Let's hope that's not the case because there'll be blood if it is. :) Yeah, it's beyond the amount of crazy I've come to expect from various random users of IO interfaces :-) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: > > If I'm not mistaken, it seems like we have a bio which has 1 as its > > ->bi_bdev. Ummm no idea whatsoever how that can happen tho right > > now. Looking into it. > > Yep, thanks Linus for that hint... Must be someone abusing it for a > flag field post submission? Crazy. Let's hope that's not the case because there'll be blood if it is. :) -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Hello, Wanlong. On Thu, Apr 18, 2013 at 10:45:10PM +0800, Wanlong Gao wrote: > OK, but I should capture it tomorrow morning because this remote machine has > already panicked > and need hard reboot. Can you please apply the following patch when you try the next time and report the kernel log? It should prevent the oops from happening and dump information on at least who are the involved parties. Thanks. diff --git a/block/blk-core.c b/block/blk-core.c index 074b758..d5efebe 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -150,6 +150,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) } EXPORT_SYMBOL(blk_rq_init); + + static void req_bio_endio(struct request *rq, struct bio *bio, unsigned int nbytes, int error) { @@ -174,8 +176,15 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_integrity_advance(bio, nbytes); /* don't actually finish bio if it's part of flush sequence */ - if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) + if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) { + if (bio->bi_bdev && (unsigned long)bio->bi_bdev < 4096) { + printk_ratelimited("XXX req_bio_endio: queue %s bio=%p bi_bdev=%p bi_end_io=%pf\n", + (rq->q && rq->q->backing_dev_info.dev) ? + dev_name(rq->q->backing_dev_info.dev) : "", + bio, bio->bi_bdev, bio->bi_end_io); + } bio_endio(bio, error); + } } void blk_dump_rq_flags(struct request *rq, char *msg) diff --git a/fs/bio.c b/fs/bio.c index bb5768f..da64e5a 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1428,7 +1428,10 @@ void bio_endio(struct bio *bio, int error) else if (!test_bit(BIO_UPTODATE, >bi_flags)) error = -EIO; - trace_block_bio_complete(bio, error); + if (bio->bi_bdev && (unsigned long)bio->bi_bdev < 4096) + printk_ratelimited("XXX bio_endio: skipping TP for %p\n", bio); + else + trace_block_bio_complete(bio, error); if (bio->bi_end_io) bio->bi_end_io(bio, error); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: > Hello, > > On Thu, Apr 18, 2013 at 09:08:34AM -0700, Linus Torvalds wrote: > > On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe wrote: > > > > > > Could you capture the full dump? I'd like to see what rcx was, and that > > > seems to have scrolled off. > > > > You don't need it. The %cr2 value shows the address of the bad access, > > and it's 0x0001. > > If I'm not mistaken, it seems like we have a bio which has 1 as its > ->bi_bdev. Ummm no idea whatsoever how that can happen tho right > now. Looking into it. Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Hello, On Thu, Apr 18, 2013 at 09:08:34AM -0700, Linus Torvalds wrote: > On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe wrote: > > > > Could you capture the full dump? I'd like to see what rcx was, and that > > seems to have scrolled off. > > You don't need it. The %cr2 value shows the address of the bad access, > and it's 0x0001. If I'm not mistaken, it seems like we have a bio which has 1 as its ->bi_bdev. Ummm no idea whatsoever how that can happen tho right now. Looking into it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe wrote: > > Could you capture the full dump? I'd like to see what rcx was, and that > seems to have scrolled off. You don't need it. The %cr2 value shows the address of the bad access, and it's 0x0001. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 10:30 PM, Jens Axboe wrote: > On Thu, Apr 18 2013, Wanlong Gao wrote: >> On 04/18/2013 09:35 PM, Jens Axboe wrote: >>> On Thu, Apr 18 2013, Wanlong Gao wrote: > A bio is always fully initialized, regardless of which internal > allocator it came from. If people are doing private kmallocs, then they > better be using bio_init() as well. > > Wanlong, would it be possible to get a full dmesg on boot see I can see > what drivers and file systems are in use? Anything special about your > setup. Sure, attached. >>> >>> Does the below help? >> >> Still got panic with this patch. :-( > > Could you capture the full dump? I'd like to see what rcx was, and that > seems to have scrolled off. OK, but I should capture it tomorrow morning because this remote machine has already panicked and need hard reboot. Thanks, Wanlong Gao > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Wanlong Gao wrote: > On 04/18/2013 09:35 PM, Jens Axboe wrote: > > On Thu, Apr 18 2013, Wanlong Gao wrote: > >>> A bio is always fully initialized, regardless of which internal > >>> allocator it came from. If people are doing private kmallocs, then they > >>> better be using bio_init() as well. > >>> > >>> Wanlong, would it be possible to get a full dmesg on boot see I can see > >>> what drivers and file systems are in use? Anything special about your > >>> setup. > >> > >> Sure, attached. > > > > Does the below help? > > Still got panic with this patch. :-( Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 09:35 PM, Jens Axboe wrote: > On Thu, Apr 18 2013, Wanlong Gao wrote: >>> A bio is always fully initialized, regardless of which internal >>> allocator it came from. If people are doing private kmallocs, then they >>> better be using bio_init() as well. >>> >>> Wanlong, would it be possible to get a full dmesg on boot see I can see >>> what drivers and file systems are in use? Anything special about your >>> setup. >> >> Sure, attached. > > Does the below help? Still got panic with this patch. :-( Thanks, Wanlong Gao > > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > index 9e5b8c2..f9a51a6 100644 > --- a/kernel/trace/blktrace.c > +++ b/kernel/trace/blktrace.c > @@ -780,15 +780,23 @@ static void blk_add_trace_bio_bounce(void *ignore, > blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0); > } > > +static struct request_queue *blk_trace_get_queue(struct block_device *bdev) > +{ > + if (bdev->bd_disk == NULL) > + return NULL; > + > + return bdev_get_queue(bdev); > +} > + > static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int > error) > { > struct request_queue *q; > struct blk_trace *bt; > > - if (!bio->bi_bdev) > + q = blk_trace_get_queue(bio->bi_bdev); > + if (!q) > return; > > - q = bdev_get_queue(bio->bi_bdev); > bt = q->blk_trace; > > /* > @@ -1641,14 +1649,6 @@ static ssize_t blk_trace_mask2str(char *buf, int mask) > return p - buf; > } > > -static struct request_queue *blk_trace_get_queue(struct block_device *bdev) > -{ > - if (bdev->bd_disk == NULL) > - return NULL; > - > - return bdev_get_queue(bdev); > -} > - > static ssize_t sysfs_blk_trace_attr_show(struct device *dev, >struct device_attribute *attr, >char *buf) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Wanlong Gao wrote: > > A bio is always fully initialized, regardless of which internal > > allocator it came from. If people are doing private kmallocs, then they > > better be using bio_init() as well. > > > > Wanlong, would it be possible to get a full dmesg on boot see I can see > > what drivers and file systems are in use? Anything special about your > > setup. > > Sure, attached. Does the below help? diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 9e5b8c2..f9a51a6 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -780,15 +780,23 @@ static void blk_add_trace_bio_bounce(void *ignore, blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0); } +static struct request_queue *blk_trace_get_queue(struct block_device *bdev) +{ + if (bdev->bd_disk == NULL) + return NULL; + + return bdev_get_queue(bdev); +} + static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error) { struct request_queue *q; struct blk_trace *bt; - if (!bio->bi_bdev) + q = blk_trace_get_queue(bio->bi_bdev); + if (!q) return; - q = bdev_get_queue(bio->bi_bdev); bt = q->blk_trace; /* @@ -1641,14 +1649,6 @@ static ssize_t blk_trace_mask2str(char *buf, int mask) return p - buf; } -static struct request_queue *blk_trace_get_queue(struct block_device *bdev) -{ - if (bdev->bd_disk == NULL) - return NULL; - - return bdev_get_queue(bdev); -} - static ssize_t sysfs_blk_trace_attr_show(struct device *dev, struct device_attribute *attr, char *buf) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Wed, Apr 17 2013, Steven Rostedt wrote: > On Wed, 2013-04-17 at 16:36 +0800, Wanlong Gao wrote: > > Hi Tj, all, > > > > I can get kernel panic on my machine with kernel 3.9.0-rc7-4-gbb33db7 every > > time > > after booting for several minutes. > > > > Here attached the panic picture and the config. > > I'm sorry that the panic log is not completed. > > > > But fortunately, the panic was gone when I reverted the below commit from > > Tj, > > > > commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f > > Author: Tejun Heo > > Date: Fri Jan 11 13:06:33 2013 -0800 > > > > block: add missing block_bio_complete() tracepoint > > > > I think this will be helpful for you to resolve this bug, and this may be > > urgent, > > because it's already rc-7 now. > > > Doing a objdump on the code I found this: > > 8123d479: 8b 77 48mov0x48(%rdi),%esi > 8123d47c: 48 8d 7d c8 lea-0x38(%rbp),%rdi > 8123d480: 48 8b 4d b8 mov-0x48(%rbp),%rcx > 8123d484: ba 28 00 00 00 mov$0x28,%edx > 8123d489: 45 89 e8mov%r13d,%r8d > 8123d48c: e8 1f d3 ea ff callq 810ea7b0 > > 8123d491: 48 85 c0test %rax,%rax > 8123d494: 49 89 c6mov%rax,%r14 > 8123d497: 74 5a je 8123d4f3 > > 8123d499: 48 89 c7mov%rax,%rdi > 8123d49c: e8 5f 4f ea ff callq 810e2400 > > 8123d4a1: 48 8b 4b 10 mov0x10(%rbx),%rcx > 8123d4a5: 31 d2 xor%edx,%edx > 8123d4a7: 48 85 c9test %rcx,%rcx > 8123d4aa: 74 02 je 8123d4ae > > > 8123d4ac: 8b 11 mov(%rcx),%edx <<< > BUG IS HERE > > 8123d4ae: 89 50 08mov%edx,0x8(%rax) > 8123d4b1: 48 8b 13mov(%rbx),%rdx > 8123d4b4: 48 8d 78 20 lea0x20(%rax),%rdi > 8123d4b8: 48 89 50 10 mov%rdx,0x10(%rax) > 8123d4bc: 8b 53 30mov0x30(%rbx),%edx > 8123d4bf: 44 89 78 1c mov%r15d,0x1c(%rax) > 8123d4c3: c1 ea 09shr$0x9,%edx > 8123d4c6: 89 50 18mov%edx,0x18(%rax) > 8123d4c9: 8b 53 30mov0x30(%rbx),%edx > 8123d4cc: 48 8b 73 20 mov0x20(%rbx),%rsi > > According to the picture, that showed where in the code dump, it has the > command that died: "<8b> 11 89 50 08". > > Just before that, rcx was tested for zero, and passed (wasn't zero). But > it seems that it wasn't a real address space either. > > bio is allocated with kmalloc and not kzalloc, so it is possible that if > bio->bi_bdev never got initialized and is just garbage. A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Will get this reverted for 3.9 final, so we have some time to investigate. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Wed, Apr 17 2013, Steven Rostedt wrote: On Wed, 2013-04-17 at 16:36 +0800, Wanlong Gao wrote: Hi Tj, all, I can get kernel panic on my machine with kernel 3.9.0-rc7-4-gbb33db7 every time after booting for several minutes. Here attached the panic picture and the config. I'm sorry that the panic log is not completed. But fortunately, the panic was gone when I reverted the below commit from Tj, commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f Author: Tejun Heo t...@kernel.org Date: Fri Jan 11 13:06:33 2013 -0800 block: add missing block_bio_complete() tracepoint I think this will be helpful for you to resolve this bug, and this may be urgent, because it's already rc-7 now. Doing a objdump on the code I found this: 8123d479: 8b 77 48mov0x48(%rdi),%esi 8123d47c: 48 8d 7d c8 lea-0x38(%rbp),%rdi 8123d480: 48 8b 4d b8 mov-0x48(%rbp),%rcx 8123d484: ba 28 00 00 00 mov$0x28,%edx 8123d489: 45 89 e8mov%r13d,%r8d 8123d48c: e8 1f d3 ea ff callq 810ea7b0 trace_current_buffer_lock_reserve 8123d491: 48 85 c0test %rax,%rax 8123d494: 49 89 c6mov%rax,%r14 8123d497: 74 5a je 8123d4f3 ftrace_raw_event_block_bio_complete+0xb3 8123d499: 48 89 c7mov%rax,%rdi 8123d49c: e8 5f 4f ea ff callq 810e2400 ring_buffer_event_data 8123d4a1: 48 8b 4b 10 mov0x10(%rbx),%rcx 8123d4a5: 31 d2 xor%edx,%edx 8123d4a7: 48 85 c9test %rcx,%rcx 8123d4aa: 74 02 je 8123d4ae ftrace_raw_event_block_bio_complete+0x6e 8123d4ac: 8b 11 mov(%rcx),%edx BUG IS HERE 8123d4ae: 89 50 08mov%edx,0x8(%rax) 8123d4b1: 48 8b 13mov(%rbx),%rdx 8123d4b4: 48 8d 78 20 lea0x20(%rax),%rdi 8123d4b8: 48 89 50 10 mov%rdx,0x10(%rax) 8123d4bc: 8b 53 30mov0x30(%rbx),%edx 8123d4bf: 44 89 78 1c mov%r15d,0x1c(%rax) 8123d4c3: c1 ea 09shr$0x9,%edx 8123d4c6: 89 50 18mov%edx,0x18(%rax) 8123d4c9: 8b 53 30mov0x30(%rbx),%edx 8123d4cc: 48 8b 73 20 mov0x20(%rbx),%rsi According to the picture, that showed where in the code dump, it has the command that died: 8b 11 89 50 08. Just before that, rcx was tested for zero, and passed (wasn't zero). But it seems that it wasn't a real address space either. bio is allocated with kmalloc and not kzalloc, so it is possible that if bio-bi_bdev never got initialized and is just garbage. A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Will get this reverted for 3.9 final, so we have some time to investigate. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Wanlong Gao wrote: A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Sure, attached. Does the below help? diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 9e5b8c2..f9a51a6 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -780,15 +780,23 @@ static void blk_add_trace_bio_bounce(void *ignore, blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0); } +static struct request_queue *blk_trace_get_queue(struct block_device *bdev) +{ + if (bdev-bd_disk == NULL) + return NULL; + + return bdev_get_queue(bdev); +} + static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error) { struct request_queue *q; struct blk_trace *bt; - if (!bio-bi_bdev) + q = blk_trace_get_queue(bio-bi_bdev); + if (!q) return; - q = bdev_get_queue(bio-bi_bdev); bt = q-blk_trace; /* @@ -1641,14 +1649,6 @@ static ssize_t blk_trace_mask2str(char *buf, int mask) return p - buf; } -static struct request_queue *blk_trace_get_queue(struct block_device *bdev) -{ - if (bdev-bd_disk == NULL) - return NULL; - - return bdev_get_queue(bdev); -} - static ssize_t sysfs_blk_trace_attr_show(struct device *dev, struct device_attribute *attr, char *buf) -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 09:35 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Sure, attached. Does the below help? Still got panic with this patch. :-( Thanks, Wanlong Gao diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 9e5b8c2..f9a51a6 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -780,15 +780,23 @@ static void blk_add_trace_bio_bounce(void *ignore, blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0); } +static struct request_queue *blk_trace_get_queue(struct block_device *bdev) +{ + if (bdev-bd_disk == NULL) + return NULL; + + return bdev_get_queue(bdev); +} + static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error) { struct request_queue *q; struct blk_trace *bt; - if (!bio-bi_bdev) + q = blk_trace_get_queue(bio-bi_bdev); + if (!q) return; - q = bdev_get_queue(bio-bi_bdev); bt = q-blk_trace; /* @@ -1641,14 +1649,6 @@ static ssize_t blk_trace_mask2str(char *buf, int mask) return p - buf; } -static struct request_queue *blk_trace_get_queue(struct block_device *bdev) -{ - if (bdev-bd_disk == NULL) - return NULL; - - return bdev_get_queue(bdev); -} - static ssize_t sysfs_blk_trace_attr_show(struct device *dev, struct device_attribute *attr, char *buf) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Wanlong Gao wrote: On 04/18/2013 09:35 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Sure, attached. Does the below help? Still got panic with this patch. :-( Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 10:30 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: On 04/18/2013 09:35 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Sure, attached. Does the below help? Still got panic with this patch. :-( Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. OK, but I should capture it tomorrow morning because this remote machine has already panicked and need hard reboot. Thanks, Wanlong Gao -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe ax...@kernel.dk wrote: Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. You don't need it. The %cr2 value shows the address of the bad access, and it's 0x0001. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Hello, On Thu, Apr 18, 2013 at 09:08:34AM -0700, Linus Torvalds wrote: On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe ax...@kernel.dk wrote: Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. You don't need it. The %cr2 value shows the address of the bad access, and it's 0x0001. If I'm not mistaken, it seems like we have a bio which has 1 as its -bi_bdev. Ummm no idea whatsoever how that can happen tho right now. Looking into it. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: Hello, On Thu, Apr 18, 2013 at 09:08:34AM -0700, Linus Torvalds wrote: On Thu, Apr 18, 2013 at 7:30 AM, Jens Axboe ax...@kernel.dk wrote: Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. You don't need it. The %cr2 value shows the address of the bad access, and it's 0x0001. If I'm not mistaken, it seems like we have a bio which has 1 as its -bi_bdev. Ummm no idea whatsoever how that can happen tho right now. Looking into it. Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Hello, Wanlong. On Thu, Apr 18, 2013 at 10:45:10PM +0800, Wanlong Gao wrote: OK, but I should capture it tomorrow morning because this remote machine has already panicked and need hard reboot. Can you please apply the following patch when you try the next time and report the kernel log? It should prevent the oops from happening and dump information on at least who are the involved parties. Thanks. diff --git a/block/blk-core.c b/block/blk-core.c index 074b758..d5efebe 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -150,6 +150,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) } EXPORT_SYMBOL(blk_rq_init); + + static void req_bio_endio(struct request *rq, struct bio *bio, unsigned int nbytes, int error) { @@ -174,8 +176,15 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_integrity_advance(bio, nbytes); /* don't actually finish bio if it's part of flush sequence */ - if (bio-bi_size == 0 !(rq-cmd_flags REQ_FLUSH_SEQ)) + if (bio-bi_size == 0 !(rq-cmd_flags REQ_FLUSH_SEQ)) { + if (bio-bi_bdev (unsigned long)bio-bi_bdev 4096) { + printk_ratelimited(XXX req_bio_endio: queue %s bio=%p bi_bdev=%p bi_end_io=%pf\n, + (rq-q rq-q-backing_dev_info.dev) ? + dev_name(rq-q-backing_dev_info.dev) : unknown, + bio, bio-bi_bdev, bio-bi_end_io); + } bio_endio(bio, error); + } } void blk_dump_rq_flags(struct request *rq, char *msg) diff --git a/fs/bio.c b/fs/bio.c index bb5768f..da64e5a 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1428,7 +1428,10 @@ void bio_endio(struct bio *bio, int error) else if (!test_bit(BIO_UPTODATE, bio-bi_flags)) error = -EIO; - trace_block_bio_complete(bio, error); + if (bio-bi_bdev (unsigned long)bio-bi_bdev 4096) + printk_ratelimited(XXX bio_endio: skipping TP for %p\n, bio); + else + trace_block_bio_complete(bio, error); if (bio-bi_end_io) bio-bi_end_io(bio, error); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: If I'm not mistaken, it seems like we have a bio which has 1 as its -bi_bdev. Ummm no idea whatsoever how that can happen tho right now. Looking into it. Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. Let's hope that's not the case because there'll be blood if it is. :) -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Tejun Heo wrote: On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: If I'm not mistaken, it seems like we have a bio which has 1 as its -bi_bdev. Ummm no idea whatsoever how that can happen tho right now. Looking into it. Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. Let's hope that's not the case because there'll be blood if it is. :) Yeah, it's beyond the amount of crazy I've come to expect from various random users of IO interfaces :-) -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18, 2013 at 11:13 AM, Jens Axboe ax...@kernel.dk wrote: On Thu, Apr 18 2013, Tejun Heo wrote: On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. Let's hope that's not the case because there'll be blood if it is. :) Yeah, it's beyond the amount of crazy I've come to expect from various random users of IO interfaces :-) I think it's more likely to be some use-after-free after a long timeout. Wanlong says it happens a few minutes after boot, so maybe something times out a command, does the blk_complete_request(), and free's the bio, which gets re-used before the softirq actually ends up running. I note that Wanlong uses the SLAB allocator, not the SLUB one. I wonder if the thing goes away with SLUB, and if not, if CONFIG_SLUB_DEBUG_ON=y might help debug it? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Thu, Apr 18 2013, Linus Torvalds wrote: On Thu, Apr 18, 2013 at 11:13 AM, Jens Axboe ax...@kernel.dk wrote: On Thu, Apr 18 2013, Tejun Heo wrote: On Thu, Apr 18, 2013 at 10:39:00AM -0700, Jens Axboe wrote: Yep, thanks Linus for that hint... Must be someone abusing it for a flag field post submission? Crazy. Let's hope that's not the case because there'll be blood if it is. :) Yeah, it's beyond the amount of crazy I've come to expect from various random users of IO interfaces :-) I think it's more likely to be some use-after-free after a long timeout. Wanlong says it happens a few minutes after boot, so maybe something times out a command, does the blk_complete_request(), and free's the bio, which gets re-used before the softirq actually ends up running. I note that Wanlong uses the SLAB allocator, not the SLUB one. I wonder if the thing goes away with SLUB, and if not, if CONFIG_SLUB_DEBUG_ON=y might help debug it? Hmm dunno. It happens right after we've completed the bio, which touches a lot of fields too. bi_bdev sits between bi_next (which we definitely used) and bi_flags. But adding slab use-after-free debugging would show for sure. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
Could it be a module not recompiled with a newer kernel? -- Steve Jens Axboe ax...@kernel.dk wrote: Hmm dunno. It happens right after we've completed the bio, which touches a lot of fields too. bi_bdev sits between bi_next (which we definitely used) and bi_flags. But adding slab use-after-free debugging would show for sure. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 10:30 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: On 04/18/2013 09:35 PM, Jens Axboe wrote: On Thu, Apr 18 2013, Wanlong Gao wrote: A bio is always fully initialized, regardless of which internal allocator it came from. If people are doing private kmallocs, then they better be using bio_init() as well. Wanlong, would it be possible to get a full dmesg on boot see I can see what drivers and file systems are in use? Anything special about your setup. Sure, attached. Does the below help? Still got panic with this patch. :-( Could you capture the full dump? I'd like to see what rcx was, and that seems to have scrolled off. I got the panic message using netconsole: EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda8): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda9): warning: maximal mount count reached, running e2fsck is recommended EXT4-fs (sda9): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (sda10): mounted filesystem with ordered data mode. Opts: (null) SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled XFS (sda11): Mounting Filesystem XFS (sda11): Ending clean mount kjournald starting. Commit interval 5 seconds EXT3-fs (sda12): warning: maximal mount count reached, running e2fsck is recommended EXT3-fs (sda12): using internal journal EXT3-fs (sda12): mounted filesystem with ordered data mode XFS (sda13): Mounting Filesystem XFS (sda13): Ending clean mount xor: measuring software checksum speed prefetch64-sse: 7572.000 MB/sec generic_sse: 6300.000 MB/sec xor: using function: prefetch64-sse (7572.000 MB/sec) raid6: sse2x14796 MB/s raid6: sse2x25449 MB/s raid6: sse2x45816 MB/s raid6: using algorithm sse2x4 (5816 MB/s) raid6: using ssse3x2 recovery algorithm Btrfs loaded device fsid 01548710-abce-4290-801d-5f19dde14497 devid 1 transid 131526 /dev/sda14 BUG: unable to handle kernel NULL pointer dereference at 0001 IP: [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 PGD 0 Oops: [#1] SMP Modules linked in: btrfs raid6_pq zlib_deflate xor xfs libcrc32c netconsole ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp ppdev parport_pc parport fuse cpufreq_ondemand cachefiles fscache bridge stp llc ipv6 ext3 jbd dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support sg acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode serio_raw pcspkr i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core igb dca i2c_algo_bit i2c_core ptp pps_core dm_mod ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc7-4-gbb33db7 #1 Lenovo Lenovo WQ TR280 G3/TR350 G7/TB36D RIP: 0010:[812484d3] [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 RSP: 0018:8802bfc63bb8 EFLAGS: 00010202 RAX: 8802b1e88698 RBX: 8802b05f6bc0 RCX: 0001 RDX: RSI: RDI: 8802b1e8869c RBP: 8802bfc63c08 R08: 000a R09: R10: 002c R11: R12: 81a95b60 R13: 0100 R14: 8802b1e88694 R15: FS: () GS:8802bfc6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0001 CR3: 01a0c000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper/3 (pid: 0, threadinfo 8802b4598000, task 8802b4594a80) Stack: 8802b4546c40 0286 0100 8802b4546c40 8802bfc63c08 8802b1f82160 8802b05f6bc0 1000 8802bfc63c38 811b6c10 Call Trace: IRQ [811b6c10] bio_endio+0x80/0x90 [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs] [811b6bcd] bio_endio+0x3d/0x90 [81249873] req_bio_endio+0xa3/0xe0 [8124af9f] blk_update_request+0x10f/0x480 [8124b337] blk_update_bidi_request+0x27/0xb0 [8124c52f] blk_end_bidi_request+0x2f/0x80 [8124c5d0] blk_end_request+0x10/0x20 [8139c1b0] scsi_end_request+0x40/0xb0 [8139c52f] scsi_io_completion+0x9f/0x660 [81393899] scsi_finish_command+0xc9/0x130 [8139cc57] scsi_softirq_done+0x147/0x170 [81252c52] blk_done_softirq+0x82/0xa0 [8105f537] __do_softirq+0xe7/0x250 [8105f7a5] irq_exit+0xb5/0xc0
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
(cc'ing btrfs people) On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: RIP: 0010:[812484d3] [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 ... [811b6c10] bio_endio+0x80/0x90 [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs] [811b6bcd] bio_endio+0x3d/0x90 [81249873] req_bio_endio+0xa3/0xe0 Ugh In fs/btrfs/volumes.c static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } static void btrfs_end_bio(struct bio *bio, int err) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } In fs/btrfs/extent_io.c static void end_bio_extent_readpage(struct bio *bio, int err) { int mirror; ... mirror = (int)(unsigned long)bio-bi_bdev; ... } Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Wed, 2013-04-17 at 16:36 +0800, Wanlong Gao wrote: > Hi Tj, all, > > I can get kernel panic on my machine with kernel 3.9.0-rc7-4-gbb33db7 every > time > after booting for several minutes. > > Here attached the panic picture and the config. > I'm sorry that the panic log is not completed. > > But fortunately, the panic was gone when I reverted the below commit from Tj, > > commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f > Author: Tejun Heo > Date: Fri Jan 11 13:06:33 2013 -0800 > > block: add missing block_bio_complete() tracepoint > > I think this will be helpful for you to resolve this bug, and this may be > urgent, > because it's already rc-7 now. Doing a objdump on the code I found this: 8123d479: 8b 77 48mov0x48(%rdi),%esi 8123d47c: 48 8d 7d c8 lea-0x38(%rbp),%rdi 8123d480: 48 8b 4d b8 mov-0x48(%rbp),%rcx 8123d484: ba 28 00 00 00 mov$0x28,%edx 8123d489: 45 89 e8mov%r13d,%r8d 8123d48c: e8 1f d3 ea ff callq 810ea7b0 8123d491: 48 85 c0test %rax,%rax 8123d494: 49 89 c6mov%rax,%r14 8123d497: 74 5a je 8123d4f3 8123d499: 48 89 c7mov%rax,%rdi 8123d49c: e8 5f 4f ea ff callq 810e2400 8123d4a1: 48 8b 4b 10 mov0x10(%rbx),%rcx 8123d4a5: 31 d2 xor%edx,%edx 8123d4a7: 48 85 c9test %rcx,%rcx 8123d4aa: 74 02 je 8123d4ae 8123d4ac: 8b 11 mov(%rcx),%edx <<< BUG IS HERE 8123d4ae: 89 50 08mov%edx,0x8(%rax) 8123d4b1: 48 8b 13mov(%rbx),%rdx 8123d4b4: 48 8d 78 20 lea0x20(%rax),%rdi 8123d4b8: 48 89 50 10 mov%rdx,0x10(%rax) 8123d4bc: 8b 53 30mov0x30(%rbx),%edx 8123d4bf: 44 89 78 1c mov%r15d,0x1c(%rax) 8123d4c3: c1 ea 09shr$0x9,%edx 8123d4c6: 89 50 18mov%edx,0x18(%rax) 8123d4c9: 8b 53 30mov0x30(%rbx),%edx 8123d4cc: 48 8b 73 20 mov0x20(%rbx),%rsi According to the picture, that showed where in the code dump, it has the command that died: "<8b> 11 89 50 08". Just before that, rcx was tested for zero, and passed (wasn't zero). But it seems that it wasn't a real address space either. bio is allocated with kmalloc and not kzalloc, so it is possible that if bio->bi_bdev never got initialized and is just garbage. -- Steve > > > Thanks, > Wanlong Gao -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Wed, 2013-04-17 at 16:36 +0800, Wanlong Gao wrote: Hi Tj, all, I can get kernel panic on my machine with kernel 3.9.0-rc7-4-gbb33db7 every time after booting for several minutes. Here attached the panic picture and the config. I'm sorry that the panic log is not completed. But fortunately, the panic was gone when I reverted the below commit from Tj, commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f Author: Tejun Heo t...@kernel.org Date: Fri Jan 11 13:06:33 2013 -0800 block: add missing block_bio_complete() tracepoint I think this will be helpful for you to resolve this bug, and this may be urgent, because it's already rc-7 now. Doing a objdump on the code I found this: 8123d479: 8b 77 48mov0x48(%rdi),%esi 8123d47c: 48 8d 7d c8 lea-0x38(%rbp),%rdi 8123d480: 48 8b 4d b8 mov-0x48(%rbp),%rcx 8123d484: ba 28 00 00 00 mov$0x28,%edx 8123d489: 45 89 e8mov%r13d,%r8d 8123d48c: e8 1f d3 ea ff callq 810ea7b0 trace_current_buffer_lock_reserve 8123d491: 48 85 c0test %rax,%rax 8123d494: 49 89 c6mov%rax,%r14 8123d497: 74 5a je 8123d4f3 ftrace_raw_event_block_bio_complete+0xb3 8123d499: 48 89 c7mov%rax,%rdi 8123d49c: e8 5f 4f ea ff callq 810e2400 ring_buffer_event_data 8123d4a1: 48 8b 4b 10 mov0x10(%rbx),%rcx 8123d4a5: 31 d2 xor%edx,%edx 8123d4a7: 48 85 c9test %rcx,%rcx 8123d4aa: 74 02 je 8123d4ae ftrace_raw_event_block_bio_complete+0x6e 8123d4ac: 8b 11 mov(%rcx),%edx BUG IS HERE 8123d4ae: 89 50 08mov%edx,0x8(%rax) 8123d4b1: 48 8b 13mov(%rbx),%rdx 8123d4b4: 48 8d 78 20 lea0x20(%rax),%rdi 8123d4b8: 48 89 50 10 mov%rdx,0x10(%rax) 8123d4bc: 8b 53 30mov0x30(%rbx),%edx 8123d4bf: 44 89 78 1c mov%r15d,0x1c(%rax) 8123d4c3: c1 ea 09shr$0x9,%edx 8123d4c6: 89 50 18mov%edx,0x18(%rax) 8123d4c9: 8b 53 30mov0x30(%rbx),%edx 8123d4cc: 48 8b 73 20 mov0x20(%rbx),%rsi According to the picture, that showed where in the code dump, it has the command that died: 8b 11 89 50 08. Just before that, rcx was tested for zero, and passed (wasn't zero). But it seems that it wasn't a real address space either. bio is allocated with kmalloc and not kzalloc, so it is possible that if bio-bi_bdev never got initialized and is just garbage. -- Steve Thanks, Wanlong Gao -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/