Re: [PATCH v2] fs: guard_bio_eod() needs to consider partitions

2017-11-01 Thread Greg Edwards
On Tue, Oct 24, 2017 at 11:21:48AM -0600, Greg Edwards wrote:
> guard_bio_eod() needs to look at the partition capacity, not just the
> capacity of the whole device, when determining if truncation is
> necessary.
>
> [   60.268688] attempt to access beyond end of device
> [   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
> [   60.268693] buffer_io_error: 2 callbacks suppressed
> [   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async 
> page read
>
> Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and 
> partitions index")
> Signed-off-by: Greg Edwards 
> ---
> Changes from v1:
>   * use __disk_get_part instead of disk_get_part, similar to what
> blk_partition_remap does

Al, Christoph,

Any thoughts on this version?

It would be nice to get this fixed before 4.14 goes out, as this is a
regression from previous releases.

Greg


[PATCH v2] fs: guard_bio_eod() needs to consider partitions

2017-10-24 Thread Greg Edwards
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.

[   60.268688] attempt to access beyond end of device
[   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[   60.268693] buffer_io_error: 2 callbacks suppressed
[   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page 
read

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and 
partitions index")
Signed-off-by: Greg Edwards 
---
Changes from v1:
  * use __disk_get_part instead of disk_get_part, similar to what
blk_partition_remap does

 fs/buffer.c   | 10 +-
 include/linux/genhd.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 170df856bdb9..b96f3b98a6ef 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3055,8 +3055,16 @@ void guard_bio_eod(int op, struct bio *bio)
sector_t maxsector;
struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes;
+   struct hd_struct *part;
+
+   rcu_read_lock();
+   part = __disk_get_part(bio->bi_disk, bio->bi_partno);
+   if (part)
+   maxsector = part_nr_sects_read(part);
+   else
+   maxsector = get_capacity(bio->bi_disk);
+   rcu_read_unlock();
 
-   maxsector = get_capacity(bio->bi_disk);
if (!maxsector)
return;
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ea652bfcd675..cb8c5fd74b71 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -242,6 +242,7 @@ static inline dev_t part_devt(struct hd_struct *part)
return part_to_dev(part)->devt;
 }
 
+extern struct hd_struct *__disk_get_part(struct gendisk *disk, int partno);
 extern struct hd_struct *disk_get_part(struct gendisk *disk, int partno);
 
 static inline void disk_put_part(struct hd_struct *part)
-- 
2.13.6



Re: [PATCH] fs: guard_bio_eod() needs to consider partitions

2017-10-24 Thread Greg Edwards
On Tue, Oct 24, 2017 at 09:46:15AM +0200, Christoph Hellwig wrote:
> On Tue, Oct 24, 2017 at 03:16:36AM +0100, Al Viro wrote:
>>> Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and 
>>> partitions index")
>>> Signed-off-by: Greg Edwards 
>>
>> That smells like a nasty source of overhead, especially on SMP boxen...
>
> We could just use __disk_get_part similar to the partition remapping,
> or just pass the block_device to guard_bio_eod.

Do you have a preference?  __disk_get_part would be isolated to
guard_bio_eod.  If we pass in the block_device, it would also touch all
the mpage_bio_submit callers.


[PATCH] fs: guard_bio_eod() needs to consider partitions

2017-10-23 Thread Greg Edwards
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.

[   60.268688] attempt to access beyond end of device
[   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[   60.268693] buffer_io_error: 2 callbacks suppressed
[   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page 
read

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and 
partitions index")
Signed-off-by: Greg Edwards 
---
 fs/buffer.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 170df856bdb9..989d3b085efe 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3055,8 +3055,15 @@ void guard_bio_eod(int op, struct bio *bio)
sector_t maxsector;
struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes;
+   struct hd_struct *part;
+
+   part = disk_get_part(bio->bi_disk, bio->bi_partno);
+   if (part)
+   maxsector = part_nr_sects_read(part);
+   else
+   maxsector = get_capacity(bio->bi_disk);
+   disk_put_part(part);
 
-   maxsector = get_capacity(bio->bi_disk);
if (!maxsector)
return;
 
-- 
2.13.6



[PATCH] mpt3sas: Fix resume on WarpDrive flash cards

2016-07-30 Thread Greg Edwards
mpt3sas crashes on resume after suspend with WarpDrive flash cards.  The
reply_post_host_index array is not set back up after the resume, and we
deference a stale pointer in _base_interrupt().

[   47.309711] BUG: unable to handle kernel paging request at c90001f8006c
[   47.318289] IP: [] _base_interrupt+0x49f/0xa30 [mpt3sas]
[   47.326749] PGD 41ccaa067 PUD 41ccab067 PMD 3466c067 PTE 0
[   47.333848] Oops: 0002 [#1] SMP
...
[   47.452708] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #6
[   47.460506] Hardware name: Dell Inc. OptiPlex 990/06D7TR, BIOS A18 09/24/2013
[   47.469629] task: 81c0d500 ti: 81c0 task.ti: 
81c0
[   47.479112] RIP: 0010:[]  [] 
_base_interrupt+0x49f/0xa30 [mpt3sas]
[   47.490466] RSP: 0018:88041d203e30  EFLAGS: 00010002
[   47.497801] RAX: 0001 RBX: 880033f4c000 RCX: 0001
[   47.506973] RDX: c90001f8006c RSI: 0082 RDI: 0082
[   47.516141] RBP: 88041d203eb0 R08: 8804118e2820 R09: 0001
[   47.525300] R10: 0001 R11: 100c R12: 
[   47.534457] R13: 880412c487e0 R14: 88041a8987d8 R15: 0001
[   47.543632] FS:  () GS:88041d20() 
knlGS:
[   47.553796] CS:  0010 DS:  ES:  CR0: 80050033
[   47.561632] CR2: c90001f8006c CR3: 01c06000 CR4: 000406f0
[   47.570883] Stack:
[   47.575015]  1d211228 88041d2100c0 8800c47d8130 
0100
[   47.584625]  8804100c 100c 88041a8992a0 
88041a8987f8
[   47.594230]  88041d203e00 8e55 038c 
880414ad4280
[   47.603862] Call Trace:
[   47.608474]  
[   47.610413]  [] ? call_timer_fn+0x35/0x120
[   47.620539]  [] handle_irq_event_percpu+0x7f/0x1c0
[   47.629061]  [] handle_irq_event+0x2c/0x50
[   47.636859]  [] handle_edge_irq+0x6f/0x130
[   47.644654]  [] handle_irq+0x73/0x120
[   47.652011]  [] ? atomic_notifier_call_chain+0x1a/0x20
[   47.660854]  [] do_IRQ+0x4b/0xd0
[   47.66]  [] common_interrupt+0x8c/0x8c
[   47.675635]  

Move the reply_post_host_index array setup into
mpt3sas_base_map_resources(), which is also in the resume path.

Cc: sta...@vger.kernel.org
Signed-off-by: Greg Edwards 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 751f13e..750f82c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2188,6 +2188,17 @@ mpt3sas_base_map_resources(struct MPT3SAS_ADAPTER *ioc)
} else
ioc->msix96_vector = 0;
 
+   if (ioc->is_warpdrive) {
+   ioc->reply_post_host_index[0] = (resource_size_t __iomem *)
+   &ioc->chip->ReplyPostHostIndex;
+
+   for (i = 1; i < ioc->cpu_msix_table_sz; i++)
+   ioc->reply_post_host_index[i] =
+   (resource_size_t __iomem *)
+   ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1)
+   * 4)));
+   }
+
list_for_each_entry(reply_q, &ioc->reply_queue_list, list)
pr_info(MPT3SAS_FMT "%s: IRQ %d\n",
reply_q->name,  ((ioc->msix_enable) ? "PCI-MSI-X enabled" :
@@ -5280,17 +5291,6 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
if (r)
goto out_free_resources;
 
-   if (ioc->is_warpdrive) {
-   ioc->reply_post_host_index[0] = (resource_size_t __iomem *)
-   &ioc->chip->ReplyPostHostIndex;
-
-   for (i = 1; i < ioc->cpu_msix_table_sz; i++)
-   ioc->reply_post_host_index[i] =
-   (resource_size_t __iomem *)
-   ((u8 __iomem *)&ioc->chip->Doorbell + (0x4000 + ((i - 1)
-   * 4)));
-   }
-
pci_set_drvdata(ioc->pdev, ioc->shost);
r = _base_get_ioc_facts(ioc, CAN_SLEEP);
if (r)
-- 
2.7.4



Re: [PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ

2014-07-29 Thread Greg Edwards
On Tue, Jul 29, 2014 at 12:45:31PM +0200, Joerg Roedel wrote:
> On Wed, Jul 23, 2014 at 10:13:26AM -0600, Greg Edwards wrote:
>> A user process setting the CPU affinity of an IRQ for a KVM
>> direct-assigned device via /proc/irq//smp_affinity can race with
>> the IRQ being released by QEMU, resulting in a NULL iommu pointer
>> dereference in get_irte().
>
> Maybe I wasn't clear enough, what I am missing is a panic message with a
> backtrace from the NULL pointer deref you are seeing in the commit
> message.

Sure, sorry I didn't send that along previously.  We originally saw this on a
3.9 kernel, then recently on a 3.14 stable kernel.  I hadn't reproduced it on
current tip yet.  Here it is for Linus' tree as of this morning:

[ 6638.327851] BUG: unable to handle kernel NULL pointer dereference at 
0090
[ 6638.335955] IP: [] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.343012] PGD 99172e067 PUD 1026979067 PMD 0 
[ 6638.347858] Oops:  [#1] SMP 
[ 6638.351386] Modules linked in:
[ 6638.354718] CPU: 1 PID: 3354 Comm: affin Not tainted 
3.16.0-rc7-7-g31dab71 #1
[ 6638.362480] Hardware name: Supermicro SYS-F617R2-RT+/X9DRFR, BIOS 3.0a 
01/29/2014
[ 6638.370259] task: 881025b0e720 ti: 88099173c000 task.ti: 
88099173c000
[ 6638.378063] RIP: 0010:[]  [] 
intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.387649] RSP: 0018:88099173fdb0  EFLAGS: 00010046
[ 6638.393314] RAX: 0082 RBX: 880a36294600 RCX: 0082
[ 6638.400824] RDX:  RSI:  RDI: 8266af00
[ 6638.408341] RBP: 88099173fdf8 R08:  R09: 88103ec00490
[ 6638.415867] R10:  R11:  R12: 88099173fe90
[ 6638.423402] R13: 005f R14: 880faa38fe80 R15: 880faa38fe80
[ 6638.430942] FS:  7f7161f05740() GS:88107fc4() 
knlGS:
[ 6638.439456] CS:  0010 DS:  ES:  CR0: 80050033
[ 6638.445640] CR2: 0090 CR3: 00099140d000 CR4: 001427e0
[ 6638.453227] Stack:
[ 6638.455699]  81c44740 88099173fdc8  
c991fd3b
[ 6638.463660]  880a36294600 88099173fe90 88099173fe90 

[ 6638.471623]  0286 88099173fe08 8190aac5 
88099173fe28
[ 6638.479605] Call Trace:
[ 6638.482562]  [] set_remapped_irq_affinity+0x25/0x40
[ 6638.489534]  [] irq_do_set_affinity+0x1c/0x50
[ 6638.495985]  [] irq_set_affinity_locked+0x98/0xd0
[ 6638.502781]  [] __irq_set_affinity+0x46/0x70
[ 6638.509142]  [] write_irq_affinity.isra.6+0xdc/0x100
[ 6638.516198]  [] irq_affinity_list_proc_write+0x1c/0x20
[ 6638.523440]  [] proc_reg_write+0x3d/0x80
[ 6638.529474]  [] vfs_write+0xb7/0x1f0
[ 6638.535165]  [] ? putname+0x29/0x40
[ 6638.540771]  [] SyS_write+0x55/0xd0
[ 6638.546386]  [] system_call_fastpath+0x16/0x1b
[ 6638.552970] Code: ff 48 85 d2 74 68 4c 8b 7a 30 4d 85 ff 74 5f 48 c7 c7 00 
af 66 82 e8 9e 1b 1d 00 49 8b 57 20 41 0f b7 77 28 48 c7 c7 00 af 66 82 <48> 8b 
8a 90 00 00 00 41 0f b7 57 2a 01 f2 48 89 c6 48 63 d2 48 
[ 6638.574199] RIP  [] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.581735]  RSP 
[ 6638.585872] CR2: 0090

> Also I am still wondering why it is possible to set affinity from
> userspace while the irq is about to be freed. Shouldn't the proc files
> are already unregistered when the irq is freed?

The QEMU thread is doing a KVM_DEASSIGN_DEV_IRQ ioctl.  It looks like the call
chain is:

  kvm_vm_ioctl_assigned_device
kvm_vm_ioctl_deassign_dev_irq
  kvm_deassign_irq
deassign_host_irq
  pci_disable_msix
free_msi_irqs
  arch_teardown_msi_irqs
default_teardown_msi_irqs
  arch_teardown_msi_irq
native_teardown_msi_irq
  irq_free_hwirq
irq_free_hwirqs

irq_free_hwirqs() does:

460 for (i = from, j = cnt; j > 0; i++, j--) {
461 irq_set_status_flags(i, _IRQ_NOREQUEST | _IRQ_NOPROBE);
462 arch_teardown_hwirq(i);   <--- QEMU is down this 
path to free_irte()
463 }
464 irq_free_descs(from, cnt);   < proc files unregistered 
down this path


I guess the question is if it would be safe to change that ordering?  I don't 
know.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ

2014-07-23 Thread Greg Edwards
A user process setting the CPU affinity of an IRQ for a KVM
direct-assigned device via /proc/irq//smp_affinity can race with
the IRQ being released by QEMU, resulting in a NULL iommu pointer
dereference in get_irte().

Signed-off-by: Greg Edwards 
---

Dropped the Cc: for stable since this likely wouldn't ever be seen in
the real world.  We saw it on an automated CI stress test.

 drivers/iommu/intel_irq_remapping.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 9b17489..d926676 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -70,6 +70,11 @@ static int get_irte(int irq, struct irte *entry)
 
raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
 
+   if (unlikely(!irq_iommu->iommu)) {
+   raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+   return -1;
+   }
+
index = irq_iommu->irte_index + irq_iommu->sub_handle;
*entry = *(irq_iommu->iommu->ir_table->base + index);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu/vt-d: fix race between free_irte() and get_irte()

2014-07-23 Thread Greg Edwards
On Wed, Jul 23, 2014 at 04:40:24PM +0200, Joerg Roedel wrote:
> On Tue, Jul 22, 2014 at 08:27:19AM -0600, Greg Edwards wrote:
>> get_irte() can race with free_irte() and dereference a NULL iommu
>> pointer.
>
> Have you seen any real occurance of this race? Get_irte is called in the
> set_affinity path, how can that race with the irq being freed?

Yes, that's how we hit it.  A process was setting the CPU affinity while
QEMU was releasing the IRQ.  We have a CI stress test that turned this
up.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iommu/vt-d: fix race between free_irte() and get_irte()

2014-07-22 Thread Greg Edwards
get_irte() can race with free_irte() and dereference a NULL iommu
pointer.

Signed-off-by: Greg Edwards 
Cc: sta...@vger.kernel.org
---
 drivers/iommu/intel_irq_remapping.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 9b17489..2d67e6d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -70,6 +70,12 @@ static int get_irte(int irq, struct irte *entry)
 
raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
 
+   /* ensure we're not racing with free_irte() */
+   if (unlikely(!irq_iommu->iommu)) {
+   raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+   return -1;
+   }
+
index = irq_iommu->irte_index + irq_iommu->sub_handle;
*entry = *(irq_iommu->iommu->ir_table->base + index);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] KVM: IOMMU: hva align mapping page size

2013-11-04 Thread Greg Edwards
When determining the page size we could use to map with the IOMMU, the
page size should also be aligned with the hva, not just the gfn.  The
gfn may not reflect the real alignment within the hugetlbfs file.

Signed-off-by: Greg Edwards 
Cc: sta...@vger.kernel.org
---
 virt/kvm/iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 72a130b..c329c8f 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -103,6 +103,10 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
while ((gfn << PAGE_SHIFT) & (page_size - 1))
page_size >>= 1;
 
+   /* Make sure hva is aligned to the page size we want to map */
+   while (__gfn_to_hva_memslot(slot, gfn) & (page_size - 1))
+   page_size >>= 1;
+
/*
 * Pin all pages we are about to map in memory. This is
 * important because we unmap and unpin in 4kb steps later.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] KVM: IOMMU: hva align mapping page size

2013-11-01 Thread Greg Edwards
When determining the page size we could use to map with the IOMMU, the
page size should be aligned with the hva, not the gfn.  The gfn may not
reflect the real alignment within the hugetlbfs file.

Most of the time, this works fine.  However, if the hugetlbfs file is
backed by non-contiguous huge pages, a multi-huge page memslot starts at
an unaligned offset within the hugetlbfs file, and the gfn is aligned
with respect to the huge page size, kvm_host_page_size() will return the
huge page size and we will use that to map with the IOMMU.

When we later unpin that same memslot, the IOMMU returns the unmap size
as the huge page size, and we happily unpin that many pfns in
monotonically increasing order, not realizing we are spanning
non-contiguous huge pages and partially unpin the wrong huge page.

Instead, ensure the IOMMU mapping page size is aligned with the hva
corresponding to the gfn, which does reflect the alignment within the
hugetlbfs file.

Signed-off-by: Greg Edwards 
Cc: sta...@vger.kernel.org
---
This resolves the bug previously reported (and misdiagnosed) here:

 http://www.spinics.net/lists/kvm/msg97599.html

 virt/kvm/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 72a130b..0e2ff32 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -99,8 +99,8 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
while ((gfn + (page_size >> PAGE_SHIFT)) > end_gfn)
page_size >>= 1;
 
-   /* Make sure gfn is aligned to the page size we want to map */
-   while ((gfn << PAGE_SHIFT) & (page_size - 1))
+   /* Make sure hva is aligned to the page size we want to map */
+   while (__gfn_to_hva_memslot(slot, gfn) & (page_size - 1))
page_size >>= 1;
 
/*
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix mmap_kmem (was: [question] What's the difference between /dev/kmem and /dev/mem)

2005-08-16 Thread Greg Edwards
On Sat, Aug 13, 2005 at 08:18:07PM +0200, Arjan van de Ven wrote:
| On Sat, 2005-08-13 at 10:37 -0700, Linus Torvalds wrote:
| > Actually, the more I looked at that mmap_kmem() function, the less I liked 
| > it.  Let's get that sucker fixed better first. It's still not wonderful, 
| > but at least now it tries to verify the whole _range_ of the mapping.
| 
| actually if that is your goal this just isn't enough... assume the
| situation of a 1 page "forbidden gap", if you mmap 3 pages with the gap
| in the middle then the code you send still doesn't cope. At which
| point... it gets messy...

mmap_mem suffers from a lack of proper checks as well.  For example, on
Altix page 0 of each node is reserved for prom and a read or write to it
will cause an MCA.  mmaping /dev/mem with offset 0 will nicely explode.
Would adding a pfn_valid test in mmap_mem be the best bet, or could we
consolidate the checks currently in mmap_kmem into mmap_mem?

Greg
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CON_CONSDEV bit not set correctly on last console

2005-04-06 Thread Greg Edwards
On Wed, Apr 06, 2005 at 03:53:17PM -0700, Andrew Morton wrote:
| Greg Edwards <[EMAIL PROTECTED]> wrote:
| >
| > According to include/linux/console.h, CON_CONSDEV flag should be set on
| > the last console specified on the boot command line:
| > 
| >  86 #define CON_PRINTBUFFER (1)
| >  87 #define CON_CONSDEV (2) /* Last on the command line */
| >  88 #define CON_ENABLED (4)
| >  89 #define CON_BOOT(8)
| > 
| > This does not currently happen if there is more than one console specified
| > on the boot commandline.  Instead, it gets set on the first console on the
| > command line.  This can cause problems for things like kdb that look for
| > the CON_CONSDEV flag to see if the console is valid.
| > 
| > Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
| > preferred console at unregister time if the console being unregistered
| > currently has that bit set.
| > 
| > Example (from sn2 ia64):
| > 
| > elilo vmlinuz root= console=ttyS0 console=ttySG0
| > 
| > in this case, the flags on ttySG console struct will be 0x4 (should be
| > 0x6).
| > 
| > Attached patch against bk fixes both issues for the cases I looked at.  It
| > uses selected_console (which gets incremented for each console specified
| > on the command line) as the indicator of which console to set CON_CONSDEV
| > on.  When adding the console to the list, if the previous one had
| > CON_CONSDEV set, it masks it out.  Tested on ia64 and x86.
| 
| The `console=a console=b' behaviour seem basically random to me :(.  And it
| gets re-randomised on a regular basis.
| 
| I wonder if we should leave the existing behaviour alone (continue to set
| CON_CONSDEV on the first console) and just change the documentation? 
| That'll minimise the disruption which we cause.

The problem with the current behavior is it breaks overriding the default
from the boot line.  In the ia64 case, there may be a global append line
defining console=a in elilo.conf.  Then you want to boot your kernel, and
want to override the default by passing console=b on the boot line.
elilo constructs the kernel cmdline by starting with the value of the
global append line, then tacks on whatever else you specify, which puts
console=b last.

You can always edit the elilo.conf and change the global value, but that
seems like a pretty big hammer for something you may just want to test.

Greg
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CON_CONSDEV bit not set correctly on last console

2005-04-05 Thread Greg Edwards
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:

 86 #define CON_PRINTBUFFER (1)
 87 #define CON_CONSDEV (2) /* Last on the command line */
 88 #define CON_ENABLED (4)
 89 #define CON_BOOT(8)

This does not currently happen if there is more than one console specified
on the boot commandline.  Instead, it gets set on the first console on the
command line.  This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.

Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.

Example (from sn2 ia64):

elilo vmlinuz root= console=ttyS0 console=ttySG0

in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).

Attached patch against bk fixes both issues for the cases I looked at.  It
uses selected_console (which gets incremented for each console specified
on the command line) as the indicator of which console to set CON_CONSDEV
on.  When adding the console to the list, if the previous one had
CON_CONSDEV set, it masks it out.  Tested on ia64 and x86.

Signed-off-by: Greg Edwards <[EMAIL PROTECTED]>


 printk.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

Index: bk-2.6/kernel/printk.c
===
--- bk-2.6.orig/kernel/printk.c 2005-04-05 10:55:19.258081161 -0500
+++ bk-2.6/kernel/printk.c  2005-04-05 11:50:16.949394443 -0500
@@ -861,8 +861,10 @@ void register_console(struct console * c
break;
console->flags |= CON_ENABLED;
console->index = console_cmdline[i].index;
-   if (i == preferred_console)
+   if (i == selected_console) {
console->flags |= CON_CONSDEV;
+   preferred_console = selected_console;
+   }
break;
}
 
@@ -882,6 +884,8 @@ void register_console(struct console * c
if ((console->flags & CON_CONSDEV) || console_drivers == NULL) {
console->next = console_drivers;
console_drivers = console;
+   if (console->next)
+   console->next->flags &= ~CON_CONSDEV;
} else {
console->next = console_drivers->next;
console_drivers->next = console;
@@ -922,10 +926,14 @@ int unregister_console(struct console * 
/* If last console is removed, we re-enable picking the first
 * one that gets registered. Without that, pmac early boot console
 * would prevent fbcon from taking over.
+*
+* If this isn't the last console and it has CON_CONSDEV set, we
+* need to set it on the next preferred console.
 */
if (console_drivers == NULL)
preferred_console = selected_console;
-   
+   else if (console->flags & CON_CONSDEV)
+   console_drivers->flags |= CON_CONSDEV;
 
release_console_sem();
return res;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/