Re: [PATCH v2] fs: guard_bio_eod() needs to consider partitions
On Tue, Oct 24, 2017 at 11:21:48AM -0600, Greg Edwards wrote: > guard_bio_eod() needs to look at the partition capacity, not just the > capacity of the whole device, when determining if truncation is > necessary. > > [ 60.268688] attempt to access beyond end of device > [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506 > [ 60.268693] buffer_io_error: 2 callbacks suppressed > [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async > page read > > Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and > partitions index") > Signed-off-by: Greg Edwards > --- > Changes from v1: > * use __disk_get_part instead of disk_get_part, similar to what > blk_partition_remap does Al, Christoph, Any thoughts on this version? It would be nice to get this fixed before 4.14 goes out, as this is a regression from previous releases. Greg
[PATCH v2] fs: guard_bio_eod() needs to consider partitions
guard_bio_eod() needs to look at the partition capacity, not just the capacity of the whole device, when determining if truncation is necessary. [ 60.268688] attempt to access beyond end of device [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506 [ 60.268693] buffer_io_error: 2 callbacks suppressed [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Signed-off-by: Greg Edwards --- Changes from v1: * use __disk_get_part instead of disk_get_part, similar to what blk_partition_remap does fs/buffer.c | 10 +- include/linux/genhd.h | 1 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/buffer.c b/fs/buffer.c index 170df856bdb9..b96f3b98a6ef 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3055,8 +3055,16 @@ void guard_bio_eod(int op, struct bio *bio) sector_t maxsector; struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1]; unsigned truncated_bytes; + struct hd_struct *part; + + rcu_read_lock(); + part = __disk_get_part(bio->bi_disk, bio->bi_partno); + if (part) + maxsector = part_nr_sects_read(part); + else + maxsector = get_capacity(bio->bi_disk); + rcu_read_unlock(); - maxsector = get_capacity(bio->bi_disk); if (!maxsector) return; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index ea652bfcd675..cb8c5fd74b71 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -242,6 +242,7 @@ static inline dev_t part_devt(struct hd_struct *part) return part_to_dev(part)->devt; } +extern struct hd_struct *__disk_get_part(struct gendisk *disk, int partno); extern struct hd_struct *disk_get_part(struct gendisk *disk, int partno); static inline void disk_put_part(struct hd_struct *part) -- 2.13.6
Re: [PATCH] fs: guard_bio_eod() needs to consider partitions
On Tue, Oct 24, 2017 at 09:46:15AM +0200, Christoph Hellwig wrote: > On Tue, Oct 24, 2017 at 03:16:36AM +0100, Al Viro wrote: >>> Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and >>> partitions index") >>> Signed-off-by: Greg Edwards >> >> That smells like a nasty source of overhead, especially on SMP boxen... > > We could just use __disk_get_part similar to the partition remapping, > or just pass the block_device to guard_bio_eod. Do you have a preference? __disk_get_part would be isolated to guard_bio_eod. If we pass in the block_device, it would also touch all the mpage_bio_submit callers.
[PATCH] fs: guard_bio_eod() needs to consider partitions
guard_bio_eod() needs to look at the partition capacity, not just the capacity of the whole device, when determining if truncation is necessary. [ 60.268688] attempt to access beyond end of device [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506 [ 60.268693] buffer_io_error: 2 callbacks suppressed [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Signed-off-by: Greg Edwards --- fs/buffer.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/buffer.c b/fs/buffer.c index 170df856bdb9..989d3b085efe 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3055,8 +3055,15 @@ void guard_bio_eod(int op, struct bio *bio) sector_t maxsector; struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1]; unsigned truncated_bytes; + struct hd_struct *part; + + part = disk_get_part(bio->bi_disk, bio->bi_partno); + if (part) + maxsector = part_nr_sects_read(part); + else + maxsector = get_capacity(bio->bi_disk); + disk_put_part(part); - maxsector = get_capacity(bio->bi_disk); if (!maxsector) return; -- 2.13.6
[PATCH] mpt3sas: Fix resume on WarpDrive flash cards
mpt3sas crashes on resume after suspend with WarpDrive flash cards. The reply_post_host_index array is not set back up after the resume, and we deference a stale pointer in _base_interrupt(). [ 47.309711] BUG: unable to handle kernel paging request at c90001f8006c [ 47.318289] IP: [] _base_interrupt+0x49f/0xa30 [mpt3sas] [ 47.326749] PGD 41ccaa067 PUD 41ccab067 PMD 3466c067 PTE 0 [ 47.333848] Oops: 0002 [#1] SMP ... [ 47.452708] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #6 [ 47.460506] Hardware name: Dell Inc. OptiPlex 990/06D7TR, BIOS A18 09/24/2013 [ 47.469629] task: 81c0d500 ti: 81c0 task.ti: 81c0 [ 47.479112] RIP: 0010:[] [] _base_interrupt+0x49f/0xa30 [mpt3sas] [ 47.490466] RSP: 0018:88041d203e30 EFLAGS: 00010002 [ 47.497801] RAX: 0001 RBX: 880033f4c000 RCX: 0001 [ 47.506973] RDX: c90001f8006c RSI: 0082 RDI: 0082 [ 47.516141] RBP: 88041d203eb0 R08: 8804118e2820 R09: 0001 [ 47.525300] R10: 0001 R11: 100c R12: [ 47.534457] R13: 880412c487e0 R14: 88041a8987d8 R15: 0001 [ 47.543632] FS: () GS:88041d20() knlGS: [ 47.553796] CS: 0010 DS: ES: CR0: 80050033 [ 47.561632] CR2: c90001f8006c CR3: 01c06000 CR4: 000406f0 [ 47.570883] Stack: [ 47.575015] 1d211228 88041d2100c0 8800c47d8130 0100 [ 47.584625] 8804100c 100c 88041a8992a0 88041a8987f8 [ 47.594230] 88041d203e00 8e55 038c 880414ad4280 [ 47.603862] Call Trace: [ 47.608474] [ 47.610413] [] ? call_timer_fn+0x35/0x120 [ 47.620539] [] handle_irq_event_percpu+0x7f/0x1c0 [ 47.629061] [] handle_irq_event+0x2c/0x50 [ 47.636859] [] handle_edge_irq+0x6f/0x130 [ 47.644654] [] handle_irq+0x73/0x120 [ 47.652011] [] ? atomic_notifier_call_chain+0x1a/0x20 [ 47.660854] [] do_IRQ+0x4b/0xd0 [ 47.66] [] common_interrupt+0x8c/0x8c [ 47.675635] Move the reply_post_host_index array setup into mpt3sas_base_map_resources(), which is also in the resume path. Cc: sta...@vger.kernel.org Signed-off-by: Greg Edwards --- drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 751f13e..750f82c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -2188,6 +2188,17 @@ mpt3sas_base_map_resources(struct MPT3SAS_ADAPTER *ioc) } else ioc->msix96_vector = 0; + if (ioc->is_warpdrive) { + ioc->reply_post_host_index[0] = (resource_size_t __iomem *) + &ioc->chip->ReplyPostHostIndex; + + for (i = 1; i < ioc->cpu_msix_table_sz; i++) + ioc->reply_post_host_index[i] = + (resource_size_t __iomem *) + ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) + * 4))); + } + list_for_each_entry(reply_q, &ioc->reply_queue_list, list) pr_info(MPT3SAS_FMT "%s: IRQ %d\n", reply_q->name, ((ioc->msix_enable) ? "PCI-MSI-X enabled" : @@ -5280,17 +5291,6 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc) if (r) goto out_free_resources; - if (ioc->is_warpdrive) { - ioc->reply_post_host_index[0] = (resource_size_t __iomem *) - &ioc->chip->ReplyPostHostIndex; - - for (i = 1; i < ioc->cpu_msix_table_sz; i++) - ioc->reply_post_host_index[i] = - (resource_size_t __iomem *) - ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1) - * 4))); - } - pci_set_drvdata(ioc->pdev, ioc->shost); r = _base_get_ioc_facts(ioc, CAN_SLEEP); if (r) -- 2.7.4
Re: [PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ
On Tue, Jul 29, 2014 at 12:45:31PM +0200, Joerg Roedel wrote: > On Wed, Jul 23, 2014 at 10:13:26AM -0600, Greg Edwards wrote: >> A user process setting the CPU affinity of an IRQ for a KVM >> direct-assigned device via /proc/irq//smp_affinity can race with >> the IRQ being released by QEMU, resulting in a NULL iommu pointer >> dereference in get_irte(). > > Maybe I wasn't clear enough, what I am missing is a panic message with a > backtrace from the NULL pointer deref you are seeing in the commit > message. Sure, sorry I didn't send that along previously. We originally saw this on a 3.9 kernel, then recently on a 3.14 stable kernel. I hadn't reproduced it on current tip yet. Here it is for Linus' tree as of this morning: [ 6638.327851] BUG: unable to handle kernel NULL pointer dereference at 0090 [ 6638.335955] IP: [] intel_ioapic_set_affinity+0x82/0x1b0 [ 6638.343012] PGD 99172e067 PUD 1026979067 PMD 0 [ 6638.347858] Oops: [#1] SMP [ 6638.351386] Modules linked in: [ 6638.354718] CPU: 1 PID: 3354 Comm: affin Not tainted 3.16.0-rc7-7-g31dab71 #1 [ 6638.362480] Hardware name: Supermicro SYS-F617R2-RT+/X9DRFR, BIOS 3.0a 01/29/2014 [ 6638.370259] task: 881025b0e720 ti: 88099173c000 task.ti: 88099173c000 [ 6638.378063] RIP: 0010:[] [] intel_ioapic_set_affinity+0x82/0x1b0 [ 6638.387649] RSP: 0018:88099173fdb0 EFLAGS: 00010046 [ 6638.393314] RAX: 0082 RBX: 880a36294600 RCX: 0082 [ 6638.400824] RDX: RSI: RDI: 8266af00 [ 6638.408341] RBP: 88099173fdf8 R08: R09: 88103ec00490 [ 6638.415867] R10: R11: R12: 88099173fe90 [ 6638.423402] R13: 005f R14: 880faa38fe80 R15: 880faa38fe80 [ 6638.430942] FS: 7f7161f05740() GS:88107fc4() knlGS: [ 6638.439456] CS: 0010 DS: ES: CR0: 80050033 [ 6638.445640] CR2: 0090 CR3: 00099140d000 CR4: 001427e0 [ 6638.453227] Stack: [ 6638.455699] 81c44740 88099173fdc8 c991fd3b [ 6638.463660] 880a36294600 88099173fe90 88099173fe90 [ 6638.471623] 0286 88099173fe08 8190aac5 88099173fe28 [ 6638.479605] Call Trace: [ 6638.482562] [] set_remapped_irq_affinity+0x25/0x40 [ 6638.489534] [] irq_do_set_affinity+0x1c/0x50 [ 6638.495985] [] irq_set_affinity_locked+0x98/0xd0 [ 6638.502781] [] __irq_set_affinity+0x46/0x70 [ 6638.509142] [] write_irq_affinity.isra.6+0xdc/0x100 [ 6638.516198] [] irq_affinity_list_proc_write+0x1c/0x20 [ 6638.523440] [] proc_reg_write+0x3d/0x80 [ 6638.529474] [] vfs_write+0xb7/0x1f0 [ 6638.535165] [] ? putname+0x29/0x40 [ 6638.540771] [] SyS_write+0x55/0xd0 [ 6638.546386] [] system_call_fastpath+0x16/0x1b [ 6638.552970] Code: ff 48 85 d2 74 68 4c 8b 7a 30 4d 85 ff 74 5f 48 c7 c7 00 af 66 82 e8 9e 1b 1d 00 49 8b 57 20 41 0f b7 77 28 48 c7 c7 00 af 66 82 <48> 8b 8a 90 00 00 00 41 0f b7 57 2a 01 f2 48 89 c6 48 63 d2 48 [ 6638.574199] RIP [] intel_ioapic_set_affinity+0x82/0x1b0 [ 6638.581735] RSP [ 6638.585872] CR2: 0090 > Also I am still wondering why it is possible to set affinity from > userspace while the irq is about to be freed. Shouldn't the proc files > are already unregistered when the irq is freed? The QEMU thread is doing a KVM_DEASSIGN_DEV_IRQ ioctl. It looks like the call chain is: kvm_vm_ioctl_assigned_device kvm_vm_ioctl_deassign_dev_irq kvm_deassign_irq deassign_host_irq pci_disable_msix free_msi_irqs arch_teardown_msi_irqs default_teardown_msi_irqs arch_teardown_msi_irq native_teardown_msi_irq irq_free_hwirq irq_free_hwirqs irq_free_hwirqs() does: 460 for (i = from, j = cnt; j > 0; i++, j--) { 461 irq_set_status_flags(i, _IRQ_NOREQUEST | _IRQ_NOPROBE); 462 arch_teardown_hwirq(i); <--- QEMU is down this path to free_irte() 463 } 464 irq_free_descs(from, cnt); < proc files unregistered down this path I guess the question is if it would be safe to change that ordering? I don't know. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ
A user process setting the CPU affinity of an IRQ for a KVM direct-assigned device via /proc/irq//smp_affinity can race with the IRQ being released by QEMU, resulting in a NULL iommu pointer dereference in get_irte(). Signed-off-by: Greg Edwards --- Dropped the Cc: for stable since this likely wouldn't ever be seen in the real world. We saw it on an automated CI stress test. drivers/iommu/intel_irq_remapping.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 9b17489..d926676 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -70,6 +70,11 @@ static int get_irte(int irq, struct irte *entry) raw_spin_lock_irqsave(&irq_2_ir_lock, flags); + if (unlikely(!irq_iommu->iommu)) { + raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags); + return -1; + } + index = irq_iommu->irte_index + irq_iommu->sub_handle; *entry = *(irq_iommu->iommu->ir_table->base + index); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu/vt-d: fix race between free_irte() and get_irte()
On Wed, Jul 23, 2014 at 04:40:24PM +0200, Joerg Roedel wrote: > On Tue, Jul 22, 2014 at 08:27:19AM -0600, Greg Edwards wrote: >> get_irte() can race with free_irte() and dereference a NULL iommu >> pointer. > > Have you seen any real occurance of this race? Get_irte is called in the > set_affinity path, how can that race with the irq being freed? Yes, that's how we hit it. A process was setting the CPU affinity while QEMU was releasing the IRQ. We have a CI stress test that turned this up. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iommu/vt-d: fix race between free_irte() and get_irte()
get_irte() can race with free_irte() and dereference a NULL iommu pointer. Signed-off-by: Greg Edwards Cc: sta...@vger.kernel.org --- drivers/iommu/intel_irq_remapping.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 9b17489..2d67e6d 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -70,6 +70,12 @@ static int get_irte(int irq, struct irte *entry) raw_spin_lock_irqsave(&irq_2_ir_lock, flags); + /* ensure we're not racing with free_irte() */ + if (unlikely(!irq_iommu->iommu)) { + raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags); + return -1; + } + index = irq_iommu->irte_index + irq_iommu->sub_handle; *entry = *(irq_iommu->iommu->ir_table->base + index); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] KVM: IOMMU: hva align mapping page size
When determining the page size we could use to map with the IOMMU, the page size should also be aligned with the hva, not just the gfn. The gfn may not reflect the real alignment within the hugetlbfs file. Signed-off-by: Greg Edwards Cc: sta...@vger.kernel.org --- virt/kvm/iommu.c | 4 1 file changed, 4 insertions(+) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 72a130b..c329c8f 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -103,6 +103,10 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) while ((gfn << PAGE_SHIFT) & (page_size - 1)) page_size >>= 1; + /* Make sure hva is aligned to the page size we want to map */ + while (__gfn_to_hva_memslot(slot, gfn) & (page_size - 1)) + page_size >>= 1; + /* * Pin all pages we are about to map in memory. This is * important because we unmap and unpin in 4kb steps later. -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: IOMMU: hva align mapping page size
When determining the page size we could use to map with the IOMMU, the page size should be aligned with the hva, not the gfn. The gfn may not reflect the real alignment within the hugetlbfs file. Most of the time, this works fine. However, if the hugetlbfs file is backed by non-contiguous huge pages, a multi-huge page memslot starts at an unaligned offset within the hugetlbfs file, and the gfn is aligned with respect to the huge page size, kvm_host_page_size() will return the huge page size and we will use that to map with the IOMMU. When we later unpin that same memslot, the IOMMU returns the unmap size as the huge page size, and we happily unpin that many pfns in monotonically increasing order, not realizing we are spanning non-contiguous huge pages and partially unpin the wrong huge page. Instead, ensure the IOMMU mapping page size is aligned with the hva corresponding to the gfn, which does reflect the alignment within the hugetlbfs file. Signed-off-by: Greg Edwards Cc: sta...@vger.kernel.org --- This resolves the bug previously reported (and misdiagnosed) here: http://www.spinics.net/lists/kvm/msg97599.html virt/kvm/iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 72a130b..0e2ff32 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -99,8 +99,8 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) while ((gfn + (page_size >> PAGE_SHIFT)) > end_gfn) page_size >>= 1; - /* Make sure gfn is aligned to the page size we want to map */ - while ((gfn << PAGE_SHIFT) & (page_size - 1)) + /* Make sure hva is aligned to the page size we want to map */ + while (__gfn_to_hva_memslot(slot, gfn) & (page_size - 1)) page_size >>= 1; /* -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix mmap_kmem (was: [question] What's the difference between /dev/kmem and /dev/mem)
On Sat, Aug 13, 2005 at 08:18:07PM +0200, Arjan van de Ven wrote: | On Sat, 2005-08-13 at 10:37 -0700, Linus Torvalds wrote: | > Actually, the more I looked at that mmap_kmem() function, the less I liked | > it. Let's get that sucker fixed better first. It's still not wonderful, | > but at least now it tries to verify the whole _range_ of the mapping. | | actually if that is your goal this just isn't enough... assume the | situation of a 1 page "forbidden gap", if you mmap 3 pages with the gap | in the middle then the code you send still doesn't cope. At which | point... it gets messy... mmap_mem suffers from a lack of proper checks as well. For example, on Altix page 0 of each node is reserved for prom and a read or write to it will cause an MCA. mmaping /dev/mem with offset 0 will nicely explode. Would adding a pfn_valid test in mmap_mem be the best bet, or could we consolidate the checks currently in mmap_kmem into mmap_mem? Greg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CON_CONSDEV bit not set correctly on last console
On Wed, Apr 06, 2005 at 03:53:17PM -0700, Andrew Morton wrote: | Greg Edwards <[EMAIL PROTECTED]> wrote: | > | > According to include/linux/console.h, CON_CONSDEV flag should be set on | > the last console specified on the boot command line: | > | > 86 #define CON_PRINTBUFFER (1) | > 87 #define CON_CONSDEV (2) /* Last on the command line */ | > 88 #define CON_ENABLED (4) | > 89 #define CON_BOOT(8) | > | > This does not currently happen if there is more than one console specified | > on the boot commandline. Instead, it gets set on the first console on the | > command line. This can cause problems for things like kdb that look for | > the CON_CONSDEV flag to see if the console is valid. | > | > Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next | > preferred console at unregister time if the console being unregistered | > currently has that bit set. | > | > Example (from sn2 ia64): | > | > elilo vmlinuz root= console=ttyS0 console=ttySG0 | > | > in this case, the flags on ttySG console struct will be 0x4 (should be | > 0x6). | > | > Attached patch against bk fixes both issues for the cases I looked at. It | > uses selected_console (which gets incremented for each console specified | > on the command line) as the indicator of which console to set CON_CONSDEV | > on. When adding the console to the list, if the previous one had | > CON_CONSDEV set, it masks it out. Tested on ia64 and x86. | | The `console=a console=b' behaviour seem basically random to me :(. And it | gets re-randomised on a regular basis. | | I wonder if we should leave the existing behaviour alone (continue to set | CON_CONSDEV on the first console) and just change the documentation? | That'll minimise the disruption which we cause. The problem with the current behavior is it breaks overriding the default from the boot line. In the ia64 case, there may be a global append line defining console=a in elilo.conf. Then you want to boot your kernel, and want to override the default by passing console=b on the boot line. elilo constructs the kernel cmdline by starting with the value of the global append line, then tacks on whatever else you specify, which puts console=b last. You can always edit the elilo.conf and change the global value, but that seems like a pretty big hammer for something you may just want to test. Greg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on the last console specified on the boot command line: 86 #define CON_PRINTBUFFER (1) 87 #define CON_CONSDEV (2) /* Last on the command line */ 88 #define CON_ENABLED (4) 89 #define CON_BOOT(8) This does not currently happen if there is more than one console specified on the boot commandline. Instead, it gets set on the first console on the command line. This can cause problems for things like kdb that look for the CON_CONSDEV flag to see if the console is valid. Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next preferred console at unregister time if the console being unregistered currently has that bit set. Example (from sn2 ia64): elilo vmlinuz root= console=ttyS0 console=ttySG0 in this case, the flags on ttySG console struct will be 0x4 (should be 0x6). Attached patch against bk fixes both issues for the cases I looked at. It uses selected_console (which gets incremented for each console specified on the command line) as the indicator of which console to set CON_CONSDEV on. When adding the console to the list, if the previous one had CON_CONSDEV set, it masks it out. Tested on ia64 and x86. Signed-off-by: Greg Edwards <[EMAIL PROTECTED]> printk.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) Index: bk-2.6/kernel/printk.c === --- bk-2.6.orig/kernel/printk.c 2005-04-05 10:55:19.258081161 -0500 +++ bk-2.6/kernel/printk.c 2005-04-05 11:50:16.949394443 -0500 @@ -861,8 +861,10 @@ void register_console(struct console * c break; console->flags |= CON_ENABLED; console->index = console_cmdline[i].index; - if (i == preferred_console) + if (i == selected_console) { console->flags |= CON_CONSDEV; + preferred_console = selected_console; + } break; } @@ -882,6 +884,8 @@ void register_console(struct console * c if ((console->flags & CON_CONSDEV) || console_drivers == NULL) { console->next = console_drivers; console_drivers = console; + if (console->next) + console->next->flags &= ~CON_CONSDEV; } else { console->next = console_drivers->next; console_drivers->next = console; @@ -922,10 +926,14 @@ int unregister_console(struct console * /* If last console is removed, we re-enable picking the first * one that gets registered. Without that, pmac early boot console * would prevent fbcon from taking over. +* +* If this isn't the last console and it has CON_CONSDEV set, we +* need to set it on the next preferred console. */ if (console_drivers == NULL) preferred_console = selected_console; - + else if (console->flags & CON_CONSDEV) + console_drivers->flags |= CON_CONSDEV; release_console_sem(); return res; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/