subject:"alpha iommu fixes"

Re: alpha iommu fixes

2001-05-22 Thread Albert D. Cahalan

David S. Miller writes:

> What are these "devices", and what drivers "just program the cards to
> start the dma on those hundred mbyte of ram"?

Hmmm, I have a few cards that are used that way. They are used
for communication between nodes of a cluster.

One might put 16 cards in a system. The cards are quite happy to
do a 2 GB DMA transfer. Scatter-gather is possible, but it cuts
performance. Typically the driver would provide a huge chunk
of memory for an app to use, mapped using large pages on x86 or
using BAT registers on ppc. (reserved during boot of course)
The app would crunch numbers using the CPU (with AltiVec, VIS,
3dnow, etc.) and instruct the device to transfer data to/from
the memory region.

Remote nodes initiate DMA too, even supplying the PCI bus address
on both sides of the interconnect. :-) No IOMMU problems with
that one, eh? The other node may transfer data at will.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 10:24 PM +0100 2001-05-22, Alan Cox wrote:
>  > On the main board, and not just the old ones. These days it's
>>  typically in the chipset's south bridge. "Third-party DMA" is
>>  sometimes called "fly-by DMA". The ISA card is a slave, as is memory,
>>  and the DMA chip reads from one ands writes to the other.
>
>There is also another mode which will give the Alpha kittens I suspect. A
>few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes
>to the ISA DMA controller which does a pointless ISA transfer and the PCI
>card sniffs the DMA controller setup (as it goes to pci, then when nobody
>claims it on to the isa bridge) then does bus mastering DMA of its own to fake
>the ISA dma

That's sick.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell

At 2:02 PM -0700 2001-05-22, Richard Henderson wrote:
>On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote:
>>  64KB for 8-bit DMA; 128KB for 16-bit DMA. [...]  This doesn't
>>  apply to bus-master DMA, just the legacy (8237) stuff.
>
>Would this 8237 be something on the ISA card, or something on
>the old pc mainboards?  I'm wondering if we can safely ignore
>this issue altogether here...

On the main board, and not just the old ones. These days it's 
typically in the chipset's south bridge. "Third-party DMA" is 
sometimes called "fly-by DMA". The ISA card is a slave, as is memory, 
and the DMA chip reads from one ands writes to the other.

IDE didn't originally use DMA at all (but floppies did), just 
programmed IO. These days, PC chipsets mostly have some form of 
extended higher-performance DMA facilities for stuff like IDE, but 
I'm not really familiar with the details.

I do wish Linux didn't have so much PC legacy sh^Htuff 
embedded into the i386 architecture.

>  > There was also a 24-bit address limitation.
>
>Yes, that's in the number of address lines going to the isa card.
>We work around that one by having an iommu arena from 8M to 16M
>and forcing all ISA traffic to go through there.

-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Alan Cox


> On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
> > I'm also wondering if ISA needs the sg to start on a 64k boundary,
> Traditionally, ISA could not do DMA across a 64k boundary.

The ISA dmac on the x86 needs a 64K boundary (128K for 16bit) because it
did not carry the 16 bit address to the top latch byte. 

> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Alan Cox


> ISA cards can do sg?

AHA1542 scsi for one. It wasnt that uncommon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Pavel Machek


Hi!

>  > > [..]  Even sparc64's fancy
>  > > iommu-based pci_map_single() always succeeds.
>  > 
>  > Whatever sparc64 does to hide the driver bugs you can break it if you
>  > pci_map 4G+1 bytes of phyical memory.
> 
> Which is an utterly stupid thing to do.
> 
> Please construct a plausable situation where this would occur legally
> and not be a driver bug, given the maximum number of PCI busses and
> slots found on sparc64 and the maximum _concurrent_ usage of PCI dma
> space for any given driver (which isn't doing something stupid).

What stops you from plugging PCI-to-PCI bridges in order to create
some large number of slots, like 128?
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson

On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote:
> 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...]  This doesn't
> apply to bus-master DMA, just the legacy (8237) stuff.

Would this 8237 be something on the ISA card, or something on
the old pc mainboards?  I'm wondering if we can safely ignore
this issue altogether here...

> There was also a 24-bit address limitation.

Yes, that's in the number of address lines going to the isa card.
We work around that one by having an iommu arena from 8M to 16M
and forcing all ISA traffic to go through there.

r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson


On Tue, May 22, 2001 at 04:40:17PM -0400, Jeff Garzik wrote:
> ISA cards can do sg?

No, but the host iommu can.  The isa card sees whatever
view of memory presented to it by the iommu.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 1:28 PM -0700 2001-05-22, Richard Henderson wrote:
>On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
>>  I'm also wondering if ISA needs the sg to start on a 64k boundary,
>
>Traditionally, ISA could not do DMA across a 64k boundary.
>
>The only ISA card I have (a soundblaster compatible) appears
>to work without caring for this, but I suppose we should pay
>lip service to pedantics.

64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the 
legacy third-party-DMA controllers, which had only 16-bit address 
registers (the high part of the address lives in a non-counting 
register). This doesn't apply to bus-master DMA, just the legacy 
(8237) stuff. There was also a 24-bit address limitation.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson


On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
> I'm also wondering if ISA needs the sg to start on a 64k boundary,

Traditionally, ISA could not do DMA across a 64k boundary.

The only ISA card I have (a soundblaster compatible) appears
to work without caring for this, but I suppose we should pay
lip service to pedantics.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell

At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote:
>On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:
>
> Electrically (someone correct me, I'm probably wrong) PCI is
> limited to 6 physical plug-in slots I believe, let's say it's 8
> to choose an arbitrary larger number to be safe.
>
>Minor nit... it can in fact be higher than this, but typically it is
>not. CompactPCI implementations may go higher (different electrical
>characteristics allow for this).

Compact PCI specifies a max of 8 slots (one of which is typically the 
system board). Regular PCI doesn't have a hard and fast slot limit 
(except for the logical limit of 32 devices per bus); the limits are 
driven by electrical loading concerns. As I recall, a bus of typical 
length can accommodate 10 "loads", where a load is either a device 
pin or a slot connector (that is, an expansion card counts as two 
loads, one for the device and one for the connector). (I take this to 
be a rule of thumb, not a hard spec, based on the detailed electrical 
requirements in the PCI spec.)

Still, the presence of bridges opens up the number of devices on a 
root PCI bus to a very high number, logically. Certainly having three 
or four quad Ethernet cards, so 12 or 16 devices, is a plausible 
configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst 
bandwidth of 533 MB/second, which would be saturated by 20 full 
duplex 100baseT ports that were themselves saturated in both 
directions (all ignoring overhead). Full saturation is not reasonable 
for either PCI or Ethernet; I'm just looking at order-of-magnitude 
numbers here.

The bottom line is: don't make any hard and fast assumption about the 
number of devices connected to a root PCI bus.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Tue, May 22, 2001 at 07:55:18PM +0400, Ivan Kokshaysky wrote:
> Yes. Though those races more likely would cause silent data
> corruption, but not immediate crash.

Ok. I wasn't sure if it was crashing or not for you.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Tue, May 22, 2001 at 06:44:09PM +0400, Ivan Kokshaysky wrote:
> On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote:
> > Ivan could you test the above fix on the platforms that needs the
> > align_entry hack?
> 
> That was one of the first things I noticed, and I've tried exactly
> that (2 instead of ~1UL).

just in case (I guess it wouldn't matter much but), but are you sure you
tried it with also the locking fixes applied too?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Ivan Kokshaysky

On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote:
> Ivan could you test the above fix on the platforms that needs the
> align_entry hack?

That was one of the first things I noticed, and I've tried exactly
that (2 instead of ~1UL).
No, it wasn't the cause of the crashes on pyxis, so I left it as is.
But probably it worth to be changed, at least for correctness.

Ivan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


While merging all the recent fixes in my tree and while reserving the
pci32 space above -1M to have a dynamic window of almost 1G without
dropping down the direct window, I noticed and fixed a severe bug, and
so now I started to wonder if the real reason of the crash when an
invalid entry is cached in the tlb and we do dma through it (both of
es40 and other platforms as well according to Ivan) could be just this
new software bug:

for (i = 0; i < n; ++i)
ptes[p+i] = ~1UL;

we reserve by setting also all the bits over 31 to 1. The tsunami specs
says that bits between 32 and 63 _must_ be zero, so the above is
definitely buggy. Maybe this has relactions with the fact the crashes
triggered on >=4G machines.

I will change it to:

for (i = 0; i < n; ++i)
ptes[p+i] = 0x2;

which is just obviously correct for our internal management of the
allocation in the critical sections and that is a definitely necessary
fix according to the specs. Maybe this is the right(tm) fix and then I
can drop the artificial alignment and the tsunami will go to re-fetch
the pte on memory automatically when we do the I/O through an invalid
pte then. If tsunami gets fixed by it I can bet then we can drop the
align_entry field from the pci_iommu_arena structure all together and
what was referred as hardware bug for the other platforms would be
infact a software bug in the iommu code.

I am optimistic this is the definitive fix so I will left out the
so far absolutely necessary artifical alignment on the tsunami for now
and I will put in this critical fix for now (until I get the confirm),
and if it works I will drop the align_entry field all together from the
pci_iommu_arena structure.

Ivan could you test the above fix on the platforms that needs the
align_entry hack?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote:
> diff -ruNp linux/arch/alpha/kernel/pci_iommu.c 
>linux-new/arch/alpha/kernel/pci_iommu.c
> --- linux/arch/alpha/kernel/pci_iommu.c   Fri Mar  2 11:12:07 2001
> +++ linux-new/arch/alpha/kernel/pci_iommu.c   Mon May 21 01:25:25 2001
> @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru
>   paddr &= ~PAGE_MASK;
>   npages = calc_npages(paddr + size);
>   dma_ofs = iommu_arena_alloc(arena, npages);
> - if (dma_ofs < 0)
> - return -1;
> + if (dma_ofs < 0) {
> + /* If we attempted a direct map above but failed, die.  */
> + if (leader->dma_address == 0)
> + return -1;
> +
> + /* Otherwise, break up the remaining virtually contiguous
> +hunks into individual direct maps.  */
> + for (sg = leader; sg < end; ++sg)
> + if (sg->dma_address == 2 || sg->dma_address == -2)
 should be == 1

> + sg->dma_address = 0;
> +
> + /* Retry.  */
> + return sg_fill(leader, end, out, arena, max_dma);
> + }
>  
>   out->dma_address = arena->dma_base + dma_ofs*PAGE_SIZE + paddr;
>   out->dma_length = size;

I am going to merge this one (however it won't help on the big memory
machines, it will only try to hide the problem on the machines with not
much memory above 2G).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote:
 diff -ruNp linux/arch/alpha/kernel/pci_iommu.c 
linux-new/arch/alpha/kernel/pci_iommu.c
 --- linux/arch/alpha/kernel/pci_iommu.c   Fri Mar  2 11:12:07 2001
 +++ linux-new/arch/alpha/kernel/pci_iommu.c   Mon May 21 01:25:25 2001
 @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru
   paddr = ~PAGE_MASK;
   npages = calc_npages(paddr + size);
   dma_ofs = iommu_arena_alloc(arena, npages);
 - if (dma_ofs  0)
 - return -1;
 + if (dma_ofs  0) {
 + /* If we attempted a direct map above but failed, die.  */
 + if (leader-dma_address == 0)
 + return -1;
 +
 + /* Otherwise, break up the remaining virtually contiguous
 +hunks into individual direct maps.  */
 + for (sg = leader; sg  end; ++sg)
 + if (sg-dma_address == 2 || sg-dma_address == -2)
 should be == 1

 + sg-dma_address = 0;
 +
 + /* Retry.  */
 + return sg_fill(leader, end, out, arena, max_dma);
 + }
  
   out-dma_address = arena-dma_base + dma_ofs*PAGE_SIZE + paddr;
   out-dma_length = size;

I am going to merge this one (however it won't help on the big memory
machines, it will only try to hide the problem on the machines with not
much memory above 2G).

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Tue, May 22, 2001 at 06:44:09PM +0400, Ivan Kokshaysky wrote:
 On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote:
  Ivan could you test the above fix on the platforms that needs the
  align_entry hack?
 
 That was one of the first things I noticed, and I've tried exactly
 that (2 instead of ~1UL).

just in case (I guess it wouldn't matter much but), but are you sure you
tried it with also the locking fixes applied too?

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson


On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote:
 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...]  This doesn't
 apply to bus-master DMA, just the legacy (8237) stuff.

Would this 8237 be something on the ISA card, or something on
the old pc mainboards?  I'm wondering if we can safely ignore
this issue altogether here...

 There was also a 24-bit address limitation.

Yes, that's in the number of address lines going to the isa card.
We work around that one by having an iommu arena from 8M to 16M
and forcing all ISA traffic to go through there.


r~
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Pavel Machek


Hi!

[..]  Even sparc64's fancy
iommu-based pci_map_single() always succeeds.
   
   Whatever sparc64 does to hide the driver bugs you can break it if you
   pci_map 4G+1 bytes of phyical memory.
 
 Which is an utterly stupid thing to do.
 
 Please construct a plausable situation where this would occur legally
 and not be a driver bug, given the maximum number of PCI busses and
 slots found on sparc64 and the maximum _concurrent_ usage of PCI dma
 space for any given driver (which isn't doing something stupid).

What stops you from plugging PCI-to-PCI bridges in order to create
some large number of slots, like 128?
Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Albert D. Cahalan


David S. Miller writes:

 What are these devices, and what drivers just program the cards to
 start the dma on those hundred mbyte of ram?

Hmmm, I have a few cards that are used that way. They are used
for communication between nodes of a cluster.

One might put 16 cards in a system. The cards are quite happy to
do a 2 GB DMA transfer. Scatter-gather is possible, but it cuts
performance. Typically the driver would provide a huge chunk
of memory for an app to use, mapped using large pages on x86 or
using BAT registers on ppc. (reserved during boot of course)
The app would crunch numbers using the CPU (with AltiVec, VIS,
3dnow, etc.) and instruct the device to transfer data to/from
the memory region.

Remote nodes initiate DMA too, even supplying the PCI bus address
on both sides of the interconnect. :-) No IOMMU problems with
that one, eh? The other node may transfer data at will.






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote:
On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:

 Electrically (someone correct me, I'm probably wrong) PCI is
 limited to 6 physical plug-in slots I believe, let's say it's 8
 to choose an arbitrary larger number to be safe.

Minor nit... it can in fact be higher than this, but typically it is
not. CompactPCI implementations may go higher (different electrical
characteristics allow for this).

Compact PCI specifies a max of 8 slots (one of which is typically the 
system board). Regular PCI doesn't have a hard and fast slot limit 
(except for the logical limit of 32 devices per bus); the limits are 
driven by electrical loading concerns. As I recall, a bus of typical 
length can accommodate 10 loads, where a load is either a device 
pin or a slot connector (that is, an expansion card counts as two 
loads, one for the device and one for the connector). (I take this to 
be a rule of thumb, not a hard spec, based on the detailed electrical 
requirements in the PCI spec.)

Still, the presence of bridges opens up the number of devices on a 
root PCI bus to a very high number, logically. Certainly having three 
or four quad Ethernet cards, so 12 or 16 devices, is a plausible 
configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst 
bandwidth of 533 MB/second, which would be saturated by 20 full 
duplex 100baseT ports that were themselves saturated in both 
directions (all ignoring overhead). Full saturation is not reasonable 
for either PCI or Ethernet; I'm just looking at order-of-magnitude 
numbers here.

The bottom line is: don't make any hard and fast assumption about the 
number of devices connected to a root PCI bus.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Alan Cox


 ISA cards can do sg?

AHA1542 scsi for one. It wasnt that uncommon.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson


On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
 I'm also wondering if ISA needs the sg to start on a 64k boundary,

Traditionally, ISA could not do DMA across a 64k boundary.

The only ISA card I have (a soundblaster compatible) appears
to work without caring for this, but I suppose we should pay
lip service to pedantics.


r~
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


On Tue, May 22, 2001 at 07:55:18PM +0400, Ivan Kokshaysky wrote:
 Yes. Though those races more likely would cause silent data
 corruption, but not immediate crash.

Ok. I wasn't sure if it was crashing or not for you.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 10:24 PM +0100 2001-05-22, Alan Cox wrote:
   On the main board, and not just the old ones. These days it's
  typically in the chipset's south bridge. Third-party DMA is
  sometimes called fly-by DMA. The ISA card is a slave, as is memory,
  and the DMA chip reads from one ands writes to the other.

There is also another mode which will give the Alpha kittens I suspect. A
few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes
to the ISA DMA controller which does a pointless ISA transfer and the PCI
card sniffs the DMA controller setup (as it goes to pci, then when nobody
claims it on to the isa bridge) then does bus mastering DMA of its own to fake
the ISA dma

That's sick.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Richard Henderson


On Tue, May 22, 2001 at 04:40:17PM -0400, Jeff Garzik wrote:
 ISA cards can do sg?

No, but the host iommu can.  The isa card sees whatever
view of memory presented to it by the iommu.


r~
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Alan Cox


 On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
  I'm also wondering if ISA needs the sg to start on a 64k boundary,
 Traditionally, ISA could not do DMA across a 64k boundary.

The ISA dmac on the x86 needs a 64K boundary (128K for 16bit) because it
did not carry the 16 bit address to the top latch byte. 

 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Ivan Kokshaysky


On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote:
 Ivan could you test the above fix on the platforms that needs the
 align_entry hack?

That was one of the first things I noticed, and I've tried exactly
that (2 instead of ~1UL).
No, it wasn't the cause of the crashes on pyxis, so I left it as is.
But probably it worth to be changed, at least for correctness.

Ivan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Andrea Arcangeli


While merging all the recent fixes in my tree and while reserving the
pci32 space above -1M to have a dynamic window of almost 1G without
dropping down the direct window, I noticed and fixed a severe bug, and
so now I started to wonder if the real reason of the crash when an
invalid entry is cached in the tlb and we do dma through it (both of
es40 and other platforms as well according to Ivan) could be just this
new software bug:

for (i = 0; i  n; ++i)
ptes[p+i] = ~1UL;

we reserve by setting also all the bits over 31 to 1. The tsunami specs
says that bits between 32 and 63 _must_ be zero, so the above is
definitely buggy. Maybe this has relactions with the fact the crashes
triggered on =4G machines.

I will change it to:

for (i = 0; i  n; ++i)
ptes[p+i] = 0x2;

which is just obviously correct for our internal management of the
allocation in the critical sections and that is a definitely necessary
fix according to the specs. Maybe this is the right(tm) fix and then I
can drop the artificial alignment and the tsunami will go to re-fetch
the pte on memory automatically when we do the I/O through an invalid
pte then. If tsunami gets fixed by it I can bet then we can drop the
align_entry field from the pci_iommu_arena structure all together and
what was referred as hardware bug for the other platforms would be
infact a software bug in the iommu code.

I am optimistic this is the definitive fix so I will left out the
so far absolutely necessary artifical alignment on the tsunami for now
and I will put in this critical fix for now (until I get the confirm),
and if it works I will drop the align_entry field all together from the
pci_iommu_arena structure.

Ivan could you test the above fix on the platforms that needs the
align_entry hack?

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 2:02 PM -0700 2001-05-22, Richard Henderson wrote:
On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote:
  64KB for 8-bit DMA; 128KB for 16-bit DMA. [...]  This doesn't
  apply to bus-master DMA, just the legacy (8237) stuff.

Would this 8237 be something on the ISA card, or something on
the old pc mainboards?  I'm wondering if we can safely ignore
this issue altogether here...

On the main board, and not just the old ones. These days it's 
typically in the chipset's south bridge. Third-party DMA is 
sometimes called fly-by DMA. The ISA card is a slave, as is memory, 
and the DMA chip reads from one ands writes to the other.

IDE didn't originally use DMA at all (but floppies did), just 
programmed IO. These days, PC chipsets mostly have some form of 
extended higher-performance DMA facilities for stuff like IDE, but 
I'm not really familiar with the details.

asideI do wish Linux didn't have so much PC legacy sh^Htuff 
embedded into the i386 architecture./aside

   There was also a 24-bit address limitation.

Yes, that's in the number of address lines going to the isa card.
We work around that one by having an iommu arena from 8M to 16M
and forcing all ISA traffic to go through there.


-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-22 Thread Jonathan Lundell


At 1:28 PM -0700 2001-05-22, Richard Henderson wrote:
On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote:
  I'm also wondering if ISA needs the sg to start on a 64k boundary,

Traditionally, ISA could not do DMA across a 64k boundary.

The only ISA card I have (a soundblaster compatible) appears
to work without caring for this, but I suppose we should pay
lip service to pedantics.

64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the 
legacy third-party-DMA controllers, which had only 16-bit address 
registers (the high part of the address lives in a non-counting 
register). This doesn't apply to bus-master DMA, just the legacy 
(8237) stuff. There was also a 24-bit address limitation.
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote:
> should probably just go ahead and allocate the 512M or 1G
> scatter-gather arena.

I just have a bugreport in my mailbox about pci_map faliures even after
I enlarged to window to 1G argghh (at first it looked apparently stable
by growing the window), so I'm stuck again, it seems I was right in not
being careless about the pci_map_* bugs today even if the 1G window
looked to offer a rasonable marging at first.

The pci_map_* failed triggers during a benchmark with a certain driver
that does massive DMA (similar to the examples I did previously), the
developers of the driver simply told me the hardware wants to do massive
zerocopy dma to userspace and they apparently excluded it could be a
memleak in the driver missing some pci_unmap_* after I told them to
check for that. Even enabling HIGHMEM would not be enough because they
do dma on userspace but on the network side, so it won't be taken care
by create_bounces(), so I at least would need to put another bounce
buffer layer in the driver to make highmem to work.

Other more efficient ways to go besides highmem plus additional bounce
buffer layer are:

2) fixing all buggy drivers now (would be a great pain as it seems to me
   I should do that alone apparently as it seems everybody else doesn't
   care about those bugs for 2.4)
3) let the "massing DMA" hardware to use DAC

Theoritically I could also cheat again and take a way 4) that is to try
to enlarge the window beyond 1G and see if the bugs gets hided also
during the benchmark that way, but I would take this as last resort as
this would again not be a definitive solution and I'd risk to get stuck
again tomorrow like I'm right now.

I think I will prefer to take a dirty way 3) just for those drivers to
solve this production problem even if it won't be implemented in a
generic manner at first (I got the idea from the quadrics folks that do
this just now with their nics if I understood well).

If I understand correctly on the tsunami enabling DAC simply means to
enable the pchip->pctl |= MWIN (monster window) bit during the boot
stage on both pchip.

Then the device driver of the "massive DMA" hardware should simply
program the registers of the nic to do use DAC with bus addresses that
are the phys address of the destination/source memory of the DMA,
only changed to have bit 40th set to 1. Those should be all the needed
changes necessary to make pci64 to work on tsunami at the same time of
pci32 direct/dynamic windows and it would be very efficient and it
sounds the best way to workaround the broken pci_map_* in 2.4 given
fixing the pci_map_* the right way is a pain.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Jens Axboe


On Mon, May 21 2001, Andi Kleen wrote:
> On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote:
> >  > That's currently the case, but at least on IA32 the block layer
> >  > must be fixed soon because it's a serious performance problem in
> >  > some cases (and fixing it is not very hard).
> > 
> > If such a far reaching change goes into 2.4.x, I would probably
> > begin looking at enhancing the PCI dma interfaces as needed ;-)
> 
> Hmm, I don't think it'll be a far reaching change. As far as I can see 
> all it needs is a new entry point for block device drivers that uses 
> bh->b_page. When that entry point exists skip the create_bounce call 
> in __make_request. After that it is purely problem for selected drivers.

I've already done it, however not as a 2.4 solution. The partial and WIP
patches is here:

*.kernel.org/pub/linux/kernel/people/axboe/v2.5/bio-7

Block driver can indicate the need for bounce buffers above a certain
page.

Of course I can hack up something for 2.4 as well, but is this really a
pressing need?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Richard Henderson


On Mon, May 21, 2001 at 03:51:51PM +0400, Ivan Kokshaysky wrote:
> I'm unable reproduce it with *8Mb* window, so I'm asking.

Me either.  But Tom Vier, the guy who started this thread
was able to use up the 8MB.  Which is completely believable.

The following should aleviate the situation on these smaller
machines where the direct map does cover all physical memory.
Really, we were failing gratuitously before.

On Tsunami and Titan, espectially with more than 4G ram we
should probably just go ahead and allocate the 512M or 1G
scatter-gather arena.

(BTW, Andrea, it's easy enough to work around the Cypress
problem by marking the last 1M of the 1G arena in use.)


r~



diff -ruNp linux/arch/alpha/kernel/pci_iommu.c linux-new/arch/alpha/kernel/pci_iommu.c
--- linux/arch/alpha/kernel/pci_iommu.c Fri Mar  2 11:12:07 2001
+++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001
@@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru
paddr &= ~PAGE_MASK;
npages = calc_npages(paddr + size);
dma_ofs = iommu_arena_alloc(arena, npages);
-   if (dma_ofs < 0)
-   return -1;
+   if (dma_ofs < 0) {
+   /* If we attempted a direct map above but failed, die.  */
+   if (leader->dma_address == 0)
+   return -1;
+
+   /* Otherwise, break up the remaining virtually contiguous
+  hunks into individual direct maps.  */
+   for (sg = leader; sg < end; ++sg)
+   if (sg->dma_address == 2 || sg->dma_address == -2)
+   sg->dma_address = 0;
+
+   /* Retry.  */
+   return sg_fill(leader, end, out, arena, max_dma);
+   }
 
out->dma_address = arena->dma_base + dma_ofs*PAGE_SIZE + paddr;
out->dma_length = size;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Ivan Kokshaysky

On Mon, May 21, 2001 at 06:55:29AM -0700, Jonathan Lundell wrote:
> 8 slots (and  you're right, 6 is a practical upper limit, fewer for 
> 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot 
> limit becomes largely irrelevant.

True, but the bandwidth limit is highly relevant. That's why modern
systems have multiple root buses, not a bridged ones.

Ivan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Jonathan Lundell


At 3:19 AM -0700 2001-05-21, David S. Miller wrote:
>This is totally wrong in two ways.
>
>Let me fix this, the IOMMU on these machines is per PCI bus, so this
>figure should be drastically lower.
>
>Electrically (someone correct me, I'm probably wrong) PCI is limited
>to 6 physical plug-in slots I believe, let's say it's 8 to choose an
>arbitrary larger number to be safe.
>
>Then we have:
>
>max bytes per bttv: max_gbuffers * max_gbufsize
>   64   * 0x208000  == 133.12MB
>
>133.12MB * 8 PCI slots == ~1.06 GB
>
>Which is still only half of the total IOMMU space available per
>controller.

8 slots (and  you're right, 6 is a practical upper limit, fewer for 
66 MHz) *per bus*. Buses can proliferate like crazy, so the slot 
limit becomes largely irrelevant. A typical quad Ethernet card, for 
example (and this is true for many/most multiple-device cards), has a 
bridge, its own internal PCI bus, and four "slots" ("devices" in PCI 
terminology).
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Peter Rival


Andrea Arcangeli wrote:

> On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote:
> > How many physical PCI slots on a Tsunami system?  (I know the
>
> on tsunamis probably not many, but on a Typhoon (the one in the es40
> that is the 4-way extension) I don't know, but certainly the box is
> large.
>

ES40 has either 8 or 10 PCI slots across 2 PCI buses.  And then there's
Wildfire - 14 slots per PCI drawer (4 PCI buses) * 2 drawers/QBB * 8 QBBs =
224 PCI slots & 64 PCI buses.  BTW, Titan (aka ES45) has 10 slots as well,
but with 3 buses instead.

 - Pete

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Ivan Kokshaysky

On Mon, May 21, 2001 at 01:19:59PM +0200, Andrea Arcangeli wrote:
> Alpha in mainline is just screwedup if a single pci bus tries to dynamic
> map more than 128mbyte, changing it to 512mbyte is trivial, growing more

Could you just describe the configuration where increasing sg window
from 128 to 512Mb actually fixes "out of ptes" problem? I mean which
drivers involved, what kind of load etc.
I'm unable reproduce it with *8Mb* window, so I'm asking.

Ivan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andi Kleen writes:
 > How about a new function (pci_nonrepresentable_address() or whatever) 
 > that returns true when page cache contains pages that are not representable
 > physically as void *. On IA32 it would return true only if CONFIG_PAE is 
 > true and there is memory >4GB. 

No, if we're going to change anything, let's do it right.

Sure, you'll make this one check "portable", but the guts of the
main ifdef stuff for DAC support is still there.

I'd rather live with the hackish stuff temporarily, and get this all
cleaned up in one shot when we have a real DAC support API.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote:
> How many physical PCI slots on a Tsunami system?  (I know the

on tsunamis probably not many, but on a Typhoon (the one in the es40
that is the 4-way extension) I don't know, but certainly the box is
large.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 03:59:58AM -0700, David S. Miller wrote:
> This still leaves around 800MB IOMMU space free on that sparc64 PCI
> controller.

if it was 400mbyte you were screwed up too, the point here is that the
marging is way too to allows ignore the issue completly, furthmore there
can be fragmentation effects in the pagetbles, at least in the way alpha
manages them which is to find contigous virtual pci bus addresses for each sg.
Alpha in mainline is just screwedup if a single pci bus tries to dynamic
map more than 128mbyte, changing it to 512mbyte is trivial, growing more
has performance implications as it needs to reduce the direct windows
which I don't like to as it would also increase the number of machines
that will get bitten by drivers that still use the virt_to_bus and also
increase the pressure on the iommu ptes too.

Now I'm not asking to break the API for 2.4 to take care of that, you
seems to be convinced in fixing this for 2.5 and I'm ok with that,
I just changed the printk of running out of entries to be KERN_ERR at
least, so we know if somebody has real life troubles with 2.4 I will go
HIGHMEM which is a matter of 2 hours for me to implement.

Only thing I suggest is to change the API before starting fixing the
drivers, I mean: don't start checking for bus address 0 before changing
the API to return faliure in another way. It's true x86 is reserving the
zero page anyways because it's a magic bios thing, but for example on
the alpha such a 0 bus address that we cannot use wastes 8 mbyte of DMA
virtual bus addresses that we reserve for the ISA cards (of course we
almost never need 16mbyte of ram all under isa dma but since it's so
low cost to allow that I think we will just in case).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andrea Arcangeli writes:
 > On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:
 > > max bytes per bttv: max_gbuffers * max_gbufsize
 > >64   * 0x208000  == 133.12MB
 > > 
 > > 133.12MB * 8 PCI slots == ~1.06 GB
 > > 
 > > Which is still only half of the total IOMMU space available per
 > > controller.
 > 
 > and it is the double of the iommu space that I am going to reserve for
 > pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll
 > change to 512mbyte)

How many physical PCI slots on a Tsunami system?  (I know the
answer, this question is rhetorical :-)

See?  This is why I think all these examples are silly, and we
need to be realistic about this whole situation.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:
> max bytes per bttv: max_gbuffers * max_gbufsize
>   64   * 0x208000  == 133.12MB
> 
> 133.12MB * 8 PCI slots == ~1.06 GB
> 
> Which is still only half of the total IOMMU space available per
> controller.

and it is the double of the iommu space that I am going to reserve for
pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll
change to 512mbyte) also bttv is not doing that a large dma and by
default it only uses 2 buffers in the ring. bttv is not a good example
of what can really overflow the pci virtual address space in real life
(when I mentioned it it was only to point out it still uses
virt_to_bus), filling a pci bus with bttv cards sounds quite silly
anyways ;)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andrea Arcangeli writes:
 > On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote:
 > > I think such designs which gobble up a gig or so of DMA mappings on
 > 
 > they maps something like 200mbyte I think. I also seen other cards doing
 > the same kind of stuff again for the distributed computing.

Ok, 200MB, let's see what this gives us as an example.

200MB multiplied by 6 PCI slots, which uses up about 1.2GB IOMMU
space.

This still leaves around 800MB IOMMU space free on that sparc64 PCI
controller.

It wouldn't run out of space, and this is assuming that Sun ever made
a sparc64 system with 6 physical PCI card slots (I don't think they
ever did honestly, I think 4 physical card slots was the maximum).

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote:
> I think such designs which gobble up a gig or so of DMA mappings on

they maps something like 200mbyte I think. I also seen other cards doing
the same kind of stuff again for the distributed computing.

> to be using dual address cycles, ie. 64-bit PCI addressing.  It is
> the same response you would give to someone trying to obtain 3 or more
> gigabytes of user address space in a process on x86, right?  You might

I never seen those running on 64bit boxes even if they are supposed to
run there too.

Here it's a little different, 32bit virtual address space limitation
isn't always a showstopper for those kind of CPU intensive apps (they
don't need huge caches).

> respond to that person "What you really need is x86-64." for example
> :-)

for the 32bit virtual address space issues of course yes ;)

> To me, from this perspective, the Quadrics sounds instead like a very
> broken piece of hardware.  And in any event, is there even a Quadrics

they're not the only ones doing that, I seen others doing that kind of
stuff, it's just a matter of information memory fast across a cluster,
if you delegate that work to a separate engine (btw they runs a
sparc32bit cpu, also guess why they aren't pci64) you can spend much
more cpu cycles of the main CPU on the userspace computations.

> driver for sparc64? :-)  (I'm a free software old-fart, so please
> excuse my immediate association between "high end" and "proprietary"
> :-)

:)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andi Kleen writes:
 > On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote:
 > > egrep illegal_highdma net/core/dev.c
 > 
 > There is just no portable way for the driver to figure out if it should
 > set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally
 > set even on IA32. Currently it requires an architecture ifdef to set properly.

Well, certainly, this could perhaps be a bug in the Acenic driver.
It should check if DAC cycles can be used on the platform, for example.

But please, let's get back to the original problem though.

The original claim is that the situation was not handled at all.  All
I'm trying to say is simply that the net stack does check via
illegal_highdma() the condition you stated was not being checked at
all.  To me it sounded like you were claiming that HIGHMEM pages went
totally unchecked through device transmit, and that is totally untrue.

If you were trying to point out the problem with what the Acenic
driver is doind, just state that next time ok? :-)

There is no question that what Acenic is doing with ifdefs needs a
clean portable solution.  This will be part of the 64-bit DAC API
interfaces (whenever those become really necessary, I simply don't
see the need right now).

Plainly, I'm going to be highly reluctant to make changes to the PCI
dma API in 2.4.x  It is already hard enough to get all the PCI drivers
in line and using it.  Suggesting this kind of change is similar to
saying "let's change the arguments to request_irq()".  We would do it
to fix a true "people actually hit this" kind of bug, of course.  Yet
we would avoid it at all possible costs due to the disruption this
would cause.

I'm not trying to be a big bad guy about this.  What I'm trying to do
is make sure at least one person (me :-) is thinking about the
ramifications any such change has on all current drivers which use
these interfaces already.  And also, to port maintainers...

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Gerd Knorr


>  This without considering bttv and friends are not even trying to use the
>  pci_map_* yet, I hope you don't watch TV on your sparc64 if you have
>  enough ram.

The bttv devel versions[1] are fixed already, they should work
out-of-the box on sparc too.  Just watching TV is harmless (needs
lots of I/O bandwidth, but doesn't need much address space).
Video capture does a better job on eating iommu resources ...

  Gerd

[1] http://bytesex.org/bttv/, 0.8.x versions.

-- 
Gerd Knorr <[EMAIL PROTECTED]>  --  SuSE Labs, Außenstelle Berlin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen


On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote:
> 
> Andi Kleen writes:
>  > [BTW, the 2.4.4 netstack does not seem to make any attempt to handle the
>  > pagecache > 4GB case on IA32 for sendfile, as the pci_* functions are dummies 
>  > here.  It probably needs bounce buffers there for this case]
> 
> egrep illegal_highdma net/core/dev.c

There is just no portable way for the driver to figure out if it should
set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally
set even on IA32. Currently it requires an architecture ifdef to set properly.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen

On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote:
>  > That's currently the case, but at least on IA32 the block layer
>  > must be fixed soon because it's a serious performance problem in
>  > some cases (and fixing it is not very hard).
> 
> If such a far reaching change goes into 2.4.x, I would probably
> begin looking at enhancing the PCI dma interfaces as needed ;-)

Hmm, I don't think it'll be a far reaching change. As far as I can see 
all it needs is a new entry point for block device drivers that uses 
bh->b_page. When that entry point exists skip the create_bounce call 
in __make_request. After that it is purely problem for selected drivers.

[BTW, the 2.4.4 netstack does not seem to make any attempt to handle the
pagecache > 4GB case on IA32 for sendfile, as the pci_* functions are dummies 
here.  It probably needs bounce buffers there for this case]

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


> I can be really wrong on this because I didn't checked anything about
> the GART yet but I suspect you cannot use the GART for this stuff on
> ia32 in 2.4 because I think I recall it provides not an huge marging of
> mapping entries that so would far too easily trigger the bugs in the
> device drivers not checking for pci_map_* faliures also in a common
> desktop/webserver/fileserver kind of usage of an high end machine.

Not all chipsets support reading through GART address space from PCI either,
it is meant for AGP to use.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

David S. Miller writes:
 > 
 > 1) I showed you in a private email that I calculated the
 >maximum possible IOMMU space that one could allocate
 >to bttv cards in a fully loaded Sunfire sparc64 system
 >to be between 300MB and 400MB.  This is assuming that
 >every PCI slot contained a bttv card, and it still
 >used only ~%35 of the available IOMMU resources.

This is totally wrong in two ways.

Let me fix this, the IOMMU on these machines is per PCI bus, so this
figure should be drastically lower.

Electrically (someone correct me, I'm probably wrong) PCI is limited
to 6 physical plug-in slots I believe, let's say it's 8 to choose an
arbitrary larger number to be safe.

Then we have:

max bytes per bttv: max_gbuffers * max_gbufsize
64   * 0x208000  == 133.12MB

133.12MB * 8 PCI slots == ~1.06 GB

Which is still only half of the total IOMMU space available per
controller.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andrea Arcangeli writes:
 > I just given you a test case that triggers on sparc64 in earlier email.

If you are talking about the bttv card:

1) I showed you in a private email that I calculated the
   maximum possible IOMMU space that one could allocate
   to bttv cards in a fully loaded Sunfire sparc64 system
   to be between 300MB and 400MB.  This is assuming that
   every PCI slot contained a bttv card, and it still
   used only ~%35 of the available IOMMU resources.

2) It currently doesn't even use the portable APIs yet anyways,
   so effectively it is not supported on sparc64.

The only other examples you showed were theoretical, for cards and
configurations that simply are not supported or cannot happen
on sparc64 with current kernels.

 > Chris just given a real world example of applications where that kind of
 > design is useful and there are certainly other kind of apps where that
 > kind of hardware design can be useful too.
 > 
 > A name of an high end pci32 card that AFIK can trigger those bugs is the
 > Quadrics which is a very nice piece of hardware btw.

I think such designs which gobble up a gig or so of DMA mappings on
pci32 are not useful in the slightest.  These cards really ought
to be using dual address cycles, ie. 64-bit PCI addressing.  It is
the same response you would give to someone trying to obtain 3 or more
gigabytes of user address space in a process on x86, right?  You might
respond to that person "What you really need is x86-64." for example
:-)

To me, from this perspective, the Quadrics sounds instead like a very
broken piece of hardware.  And in any event, is there even a Quadrics
driver for sparc64? :-)  (I'm a free software old-fart, so please
excuse my immediate association between "high end" and "proprietary"
:-)

Finally Andrea, have you even begun to consider the possible
starvation cases once we make this a resource allocation which can
fail under "normal" conditions.  Maybe the device eatins all the IOMMU
entries, immediately obtains a new mapping when he frees any mapping,
effectively keeping out all other devices.

This may be easily solved, I don't know.

But this along with the potential scsi layer issues, are basically the
reasons I'm trying hard to keep the API as it is right now for 2.4.x
Changing this in 2.4.x is going to open up Pandora's Box, really.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 11:42:16AM +0200, Andi Kleen wrote:
> [actually most IA32 boxes already have one in form of the AGP GART, it's just
> not commonly used for serious things yet]

I can be really wrong on this because I didn't checked anything about
the GART yet but I suspect you cannot use the GART for this stuff on
ia32 in 2.4 because I think I recall it provides not an huge marging of
mapping entries that so would far too easily trigger the bugs in the
device drivers not checking for pci_map_* faliures also in a common
desktop/webserver/fileserver kind of usage of an high end machine.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andi Kleen writes:
 > > Certainly, when this changes, we can make the interfaces adapt to
 > > this.
 > 
 > I am just curious why you didn't consider that case when designing the
 > interfaces. Was that a deliberate decision or just an oversight?
 > [I guess the first, but why?]

I didn't want the API to do exactly what we needed it to do
but not one bit more.  I tried very hard to keep it as minimal
as possible, and I even fought many additions to the API (a few
of which turned out to be reasonable, see pci_pool threads).

To this end, since HIGHMEM is needed anyways on such machines
(ie. the "sizeof(void *)" issue), I decided to not consider that
case.

Working on pages is useful even _ignoring_ the specific issues we are
talking about.  It really is the one generic way to represent all
pieces of memory inside the kernel (regardless of HIGHMEM and similar
issues).

But I simply did not see anyone who would really make use of it in
the 2.4.x timeframe.  (and I made this estimate in the middle of
2.3.x, so I didn't even see zerocopy coming along so clearly, shrug)

 > That's currently the case, but at least on IA32 the block layer
 > must be fixed soon because it's a serious performance problem in
 > some cases (and fixing it is not very hard).

If such a far reaching change goes into 2.4.x, I would probably
begin looking at enhancing the PCI dma interfaces as needed ;-)

 > Now that will probably first use DAC
 > and not a IO-MMU, and thus not use the pci mapping API, but I would not be 
 > surprised if people came up with IO-MMU schemes for it too.
 > [actually most IA32 boxes already have one in form of the AGP GART, it's just
 > not commonly used for serious things yet]

DAC usage should go through a portable PCI dma API as well,
for the reasons you mention as well as others.  If we do this
from the beginning, there will be no chance for things like
virt_to_bus64() et al. to start sneaking into the PCI drivers :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen

On Mon, May 21, 2001 at 02:30:09AM -0700, David S. Miller wrote:
> 
> Andi Kleen writes:
>  > On the topic of to the PCI DMA code: one thing I'm missing
>  > are pci_map_single()/pci_map_sg() that take struct page * instead of
>  > of direct pointers. Currently I don't see how you would implement IO-MMU IO
>  > on a 32bit box with more than 4GB of memory, because the address won't
>  > fit into the pointer.
> 
> How does the buffer get there in the first place? :-)

I guess you know ;)

e.g. via page table tricks from user space, like the PAE mode on IA32
or via kmap.

> Certainly, when this changes, we can make the interfaces adapt to
> this.

I am just curious why you didn't consider that case when designing the
interfaces. Was that a deliberate decision or just an oversight?
[I guess the first, but why?]

> 
> Because of this, for example, the sbus IOMMU stuff on sparc32 still
> uses HIGHMEM exactly because of this pointer limitation.  In fact,
> any machine using >4GB of memory currently cannot be supported without
> highmem enabled, which is going to enable bounce buffering in the block
> I/O layer, etc.

That's currently the case, but at least on IA32 the block layer must be
fixed soon because it's a serious performance problem in some cases
(and fixing it is not very hard). Now that will probably first use DAC
and not a IO-MMU, and thus not use the pci mapping API, but I would not be 
surprised if people came up with IO-MMU schemes for it too.
[actually most IA32 boxes already have one in form of the AGP GART, it's just
not commonly used for serious things yet]

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andi Kleen writes:
 > On the topic of to the PCI DMA code: one thing I'm missing
 > are pci_map_single()/pci_map_sg() that take struct page * instead of
 > of direct pointers. Currently I don't see how you would implement IO-MMU IO
 > on a 32bit box with more than 4GB of memory, because the address won't
 > fit into the pointer.

How does the buffer get there in the first place? :-)

Yes, the zerocopy stuff is capable of doing this.  But the block
I/O layer is not, neither is any other subsystem to my knowledge.

Certainly, when this changes, we can make the interfaces adapt to
this.

Because of this, for example, the sbus IOMMU stuff on sparc32 still
uses HIGHMEM exactly because of this pointer limitation.  In fact,
any machine using >4GB of memory currently cannot be supported without
highmem enabled, which is going to enable bounce buffering in the block
I/O layer, etc.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen



On the topic of to the PCI DMA code: one thing I'm missing
are pci_map_single()/pci_map_sg() that take struct page * instead of
of direct pointers. Currently I don't see how you would implement IO-MMU IO
on a 32bit box with more than 4GB of memory, because the address won't
fit into the pointer.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli

On Mon, May 21, 2001 at 12:05:40AM -0700, David S. Miller wrote:
> together.  And it was agreed upon that the routines will not allow
> failure in 2.4.x and we would work on resolving this in 2.5.x and no
> sooner.

I'm glad you at least just considered to fix all those bugs for 2.5 but
that won't change that if somebody runs out of entries now with sparc64
the only thing I can do in a short term is to use HIGHMEM, so that the
serialization to limit the amount of max simultaneous pci32 DMA will
happen in the code that allocates the bounce buffers. Tell me a best way
to get rid of those bugs all together if you can.

Furthmore some arch for the legacy pci32 cards may not provide an huge
amount of entries and so you could more easily trigger those bugs
without the need of uncommon hardware, those bugs renders the iommu
unusable for those archs in 2.4 because it would trigger the device
drivers bugs far too easily.

Please tell Andrew to worry about that, if somebody ever worried about
that we would have all network drivers correct just now and the needed
panics in the lowlevel scsi layer.

This without considering bttv and friends are not even trying to use the
pci_map_* yet, I hope you don't watch TV on your sparc64 if you have
enough ram.

I hate those kind of broken compromises between something that works
almost all the time and that breaks when you are not only using a few
harddisk and a few nic, and that is unfixable in the right way in a
short term after it triggers (bttv is fixable in a short term of course,
I'm only talking about when you run out of pci mappings).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
 > Tell me a best way to get rid of those bugs all together if you can.

Please give me a test case that triggers the bug on sparc64
and I will promptly work on a fix, ok?

I mean a test case you _actually_ trigger, not some fantasy case.

In theory it can happen, but nobody is showing me that it actually
does ever happen.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


>  > Look at the history of kernel API's over time. Everything that can
>  > go wrong eventually does.
> 
> I agree, and it will be dealt with in 2.5.x
> 
> The scsi layer in 2.4.x is simply not able to handle failure in these
> code paths, as Gerard Roudier has mentioned.

On that I am unconvinced. It is certainly grungy enough that fighting that war
in 2.5 makes sense however.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Alan Cox writes:
 > Pages allocated in main memory and mapped for access by PCI devices. On some
 > HP systems there is now way for such a page to stay coherent. It is quite
 > possible to sync the view but there is no sane way to allow any
 > pci_alloc_consistent to succeed

This is not what the HP folk told me, and in fact they said that
pci_alloc_consistent could be made to work via disabling the cache
attribute in the cpu side mappings or something similar in the PCI
controller IOMMU mappings.

Please someone on the HPPA team provide details :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Alan Cox writes:
 > Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are
 > limited to the low 2Gb range for pci mapping and otherwise need bounce buffers.
 > Or how about any consistent alloc on certain HP machines which totally lack
 > coherency - also I suspect the R10K on an O2 might fall into that - Ralf ?

If they need bounce buffers because of a device specific DMA range
limitation (this is what I gather this is), then the PCI dma interface
is of no help to this case.

 > Look at the history of kernel API's over time. Everything that can
 > go wrong eventually does.

I agree, and it will be dealt with in 2.5.x

The scsi layer in 2.4.x is simply not able to handle failure in these
code paths, as Gerard Roudier has mentioned.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox

> Alan Cox writes:
>  > And how do you propose to implemnt cache coherent pci allocations
>  > on machines which lack the ability to have pages coherent between
>  > I/O and memory space ?
> 
> Pages, being in memory space, are never in I/O space.

Ok my fault. Let me try that again with clearer Linux terminology.

Pages allocated in main memory and mapped for access by PCI devices. On some
HP systems there is now way for such a page to stay coherent. It is quite
possible to sync the view but there is no sane way to allow any
pci_alloc_consistent to succeed

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


> What are these "devices", and what drivers "just program the cards to
> start the dma on those hundred mbyte of ram"?
> 
> Are we designing Linux for hypothetical systems with hypothetical
> devices and drivers, or for the real world?

Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are
limited to the low 2Gb range for pci mapping and otherwise need bounce buffers.
Or how about any consistent alloc on certain HP machines which totally lack
coherency - also I suspect the R10K on an O2 might fall into that - Ralf ?

Look at the history of kernel API's over time. Everything that can go wrong
eventually does.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox

> Andrew Morton writes:
>  > Well this is news to me.  No drivers understand this.
>  > How long has this been the case?  What platforms?
> 
> The DMA interfaces may never fail and I've discussed this over and
> over with port maintainers a _long_ time ago.

And how do you propose to implemnt cache coherent pci allocations on machines
which lack the ability to have pages coherent between I/O and memory space ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller

Andrea Arcangeli writes:
 > Assume I have a dozen of PCI cards that does DMA using SG tables that
 > can map up to some houndred mbytes of ram each, so I can just program
 > the cards to start the dma on those houndred mbyte of ram, most of the
 > time the I/O is not simulaneous, but very rarely it happens to be
 > simultaneous and in turn it tries to pci_map_sg more than 4G of physical
 > ram.

What are these "devices", and what drivers "just program the cards to
start the dma on those hundred mbyte of ram"?

Are we designing Linux for hypothetical systems with hypothetical
devices and drivers, or for the real world?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


 What are these devices, and what drivers just program the cards to
 start the dma on those hundred mbyte of ram?
 
 Are we designing Linux for hypothetical systems with hypothetical
 devices and drivers, or for the real world?

Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are
limited to the low 2Gb range for pci mapping and otherwise need bounce buffers.
Or how about any consistent alloc on certain HP machines which totally lack
coherency - also I suspect the R10K on an O2 might fall into that - Ralf ?

Look at the history of kernel API's over time. Everything that can go wrong
eventually does.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


 Alan Cox writes:
   And how do you propose to implemnt cache coherent pci allocations
   on machines which lack the ability to have pages coherent between
   I/O and memory space ?
 
 Pages, being in memory space, are never in I/O space.

Ok my fault. Let me try that again with clearer Linux terminology.

Pages allocated in main memory and mapped for access by PCI devices. On some
HP systems there is now way for such a page to stay coherent. It is quite
possible to sync the view but there is no sane way to allow any
pci_alloc_consistent to succeed

Alan


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Alan Cox writes:
  Pages allocated in main memory and mapped for access by PCI devices. On some
  HP systems there is now way for such a page to stay coherent. It is quite
  possible to sync the view but there is no sane way to allow any
  pci_alloc_consistent to succeed

This is not what the HP folk told me, and in fact they said that
pci_alloc_consistent could be made to work via disabling the cache
attribute in the cpu side mappings or something similar in the PCI
controller IOMMU mappings.

Please someone on the HPPA team provide details :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Alan Cox writes:
  Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are
  limited to the low 2Gb range for pci mapping and otherwise need bounce buffers.
  Or how about any consistent alloc on certain HP machines which totally lack
  coherency - also I suspect the R10K on an O2 might fall into that - Ralf ?

If they need bounce buffers because of a device specific DMA range
limitation (this is what I gather this is), then the PCI dma interface
is of no help to this case.

  Look at the history of kernel API's over time. Everything that can
  go wrong eventually does.

I agree, and it will be dealt with in 2.5.x

The scsi layer in 2.4.x is simply not able to handle failure in these
code paths, as Gerard Roudier has mentioned.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


   Look at the history of kernel API's over time. Everything that can
   go wrong eventually does.
 
 I agree, and it will be dealt with in 2.5.x
 
 The scsi layer in 2.4.x is simply not able to handle failure in these
 code paths, as Gerard Roudier has mentioned.

On that I am unconvinced. It is certainly grungy enough that fighting that war
in 2.5 makes sense however.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
  I just given you a test case that triggers on sparc64 in earlier email.

If you are talking about the bttv card:

1) I showed you in a private email that I calculated the
   maximum possible IOMMU space that one could allocate
   to bttv cards in a fully loaded Sunfire sparc64 system
   to be between 300MB and 400MB.  This is assuming that
   every PCI slot contained a bttv card, and it still
   used only ~%35 of the available IOMMU resources.

2) It currently doesn't even use the portable APIs yet anyways,
   so effectively it is not supported on sparc64.

The only other examples you showed were theoretical, for cards and
configurations that simply are not supported or cannot happen
on sparc64 with current kernels.

  Chris just given a real world example of applications where that kind of
  design is useful and there are certainly other kind of apps where that
  kind of hardware design can be useful too.
  
  A name of an high end pci32 card that AFIK can trigger those bugs is the
  Quadrics which is a very nice piece of hardware btw.

I think such designs which gobble up a gig or so of DMA mappings on
pci32 are not useful in the slightest.  These cards really ought
to be using dual address cycles, ie. 64-bit PCI addressing.  It is
the same response you would give to someone trying to obtain 3 or more
gigabytes of user address space in a process on x86, right?  You might
respond to that person What you really need is x86-64. for example
:-)

To me, from this perspective, the Quadrics sounds instead like a very
broken piece of hardware.  And in any event, is there even a Quadrics
driver for sparc64? :-)  (I'm a free software old-fart, so please
excuse my immediate association between high end and proprietary
:-)

Finally Andrea, have you even begun to consider the possible
starvation cases once we make this a resource allocation which can
fail under normal conditions.  Maybe the device eatins all the IOMMU
entries, immediately obtains a new mapping when he frees any mapping,
effectively keeping out all other devices.

This may be easily solved, I don't know.

But this along with the potential scsi layer issues, are basically the
reasons I'm trying hard to keep the API as it is right now for 2.4.x
Changing this in 2.4.x is going to open up Pandora's Box, really.

Later,
David S. Miller
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Gerd Knorr


  This without considering bttv and friends are not even trying to use the
  pci_map_* yet, I hope you don't watch TV on your sparc64 if you have
  enough ram.

The bttv devel versions[1] are fixed already, they should work
out-of-the box on sparc too.  Just watching TV is harmless (needs
lots of I/O bandwidth, but doesn't need much address space).
Video capture does a better job on eating iommu resources ...

  Gerd

[1] http://bytesex.org/bttv/, 0.8.x versions.

-- 
Gerd Knorr [EMAIL PROTECTED]  --  SuSE Labs, Außenstelle Berlin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 03:59:58AM -0700, David S. Miller wrote:
 This still leaves around 800MB IOMMU space free on that sparc64 PCI
 controller.

if it was 400mbyte you were screwed up too, the point here is that the
marging is way too to allows ignore the issue completly, furthmore there
can be fragmentation effects in the pagetbles, at least in the way alpha
manages them which is to find contigous virtual pci bus addresses for each sg.
Alpha in mainline is just screwedup if a single pci bus tries to dynamic
map more than 128mbyte, changing it to 512mbyte is trivial, growing more
has performance implications as it needs to reduce the direct windows
which I don't like to as it would also increase the number of machines
that will get bitten by drivers that still use the virt_to_bus and also
increase the pressure on the iommu ptes too.

Now I'm not asking to break the API for 2.4 to take care of that, you
seems to be convinced in fixing this for 2.5 and I'm ok with that,
I just changed the printk of running out of entries to be KERN_ERR at
least, so we know if somebody has real life troubles with 2.4 I will go
HIGHMEM which is a matter of 2 hours for me to implement.

Only thing I suggest is to change the API before starting fixing the
drivers, I mean: don't start checking for bus address 0 before changing
the API to return faliure in another way. It's true x86 is reserving the
zero page anyways because it's a magic bios thing, but for example on
the alpha such a 0 bus address that we cannot use wastes 8 mbyte of DMA
virtual bus addresses that we reserve for the ISA cards (of course we
almost never need 16mbyte of ram all under isa dma but since it's so
low cost to allow that I think we will just in case).

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
  On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:
   max bytes per bttv: max_gbuffers * max_gbufsize
  64   * 0x208000  == 133.12MB
   
   133.12MB * 8 PCI slots == ~1.06 GB
   
   Which is still only half of the total IOMMU space available per
   controller.
  
  and it is the double of the iommu space that I am going to reserve for
  pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll
  change to 512mbyte)

How many physical PCI slots on a Tsunami system?  (I know the
answer, this question is rhetorical :-)

See?  This is why I think all these examples are silly, and we
need to be realistic about this whole situation.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
  Tell me a best way to get rid of those bugs all together if you can.

Please give me a test case that triggers the bug on sparc64
and I will promptly work on a fix, ok?

I mean a test case you _actually_ trigger, not some fantasy case.

In theory it can happen, but nobody is showing me that it actually
does ever happen.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Richard Henderson


On Mon, May 21, 2001 at 03:51:51PM +0400, Ivan Kokshaysky wrote:
 I'm unable reproduce it with *8Mb* window, so I'm asking.

Me either.  But Tom Vier, the guy who started this thread
was able to use up the 8MB.  Which is completely believable.

The following should aleviate the situation on these smaller
machines where the direct map does cover all physical memory.
Really, we were failing gratuitously before.

On Tsunami and Titan, espectially with more than 4G ram we
should probably just go ahead and allocate the 512M or 1G
scatter-gather arena.

(BTW, Andrea, it's easy enough to work around the Cypress
problem by marking the last 1M of the 1G arena in use.)


r~



diff -ruNp linux/arch/alpha/kernel/pci_iommu.c linux-new/arch/alpha/kernel/pci_iommu.c
--- linux/arch/alpha/kernel/pci_iommu.c Fri Mar  2 11:12:07 2001
+++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001
@@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru
paddr = ~PAGE_MASK;
npages = calc_npages(paddr + size);
dma_ofs = iommu_arena_alloc(arena, npages);
-   if (dma_ofs  0)
-   return -1;
+   if (dma_ofs  0) {
+   /* If we attempted a direct map above but failed, die.  */
+   if (leader-dma_address == 0)
+   return -1;
+
+   /* Otherwise, break up the remaining virtually contiguous
+  hunks into individual direct maps.  */
+   for (sg = leader; sg  end; ++sg)
+   if (sg-dma_address == 2 || sg-dma_address == -2)
+   sg-dma_address = 0;
+
+   /* Retry.  */
+   return sg_fill(leader, end, out, arena, max_dma);
+   }
 
out-dma_address = arena-dma_base + dma_ofs*PAGE_SIZE + paddr;
out-dma_length = size;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote:
 should probably just go ahead and allocate the 512M or 1G
 scatter-gather arena.

I just have a bugreport in my mailbox about pci_map faliures even after
I enlarged to window to 1G argghh (at first it looked apparently stable
by growing the window), so I'm stuck again, it seems I was right in not
being careless about the pci_map_* bugs today even if the 1G window
looked to offer a rasonable marging at first.

The pci_map_* failed triggers during a benchmark with a certain driver
that does massive DMA (similar to the examples I did previously), the
developers of the driver simply told me the hardware wants to do massive
zerocopy dma to userspace and they apparently excluded it could be a
memleak in the driver missing some pci_unmap_* after I told them to
check for that. Even enabling HIGHMEM would not be enough because they
do dma on userspace but on the network side, so it won't be taken care
by create_bounces(), so I at least would need to put another bounce
buffer layer in the driver to make highmem to work.

Other more efficient ways to go besides highmem plus additional bounce
buffer layer are:

2) fixing all buggy drivers now (would be a great pain as it seems to me
   I should do that alone apparently as it seems everybody else doesn't
   care about those bugs for 2.4)
3) let the massing DMA hardware to use DAC

Theoritically I could also cheat again and take a way 4) that is to try
to enlarge the window beyond 1G and see if the bugs gets hided also
during the benchmark that way, but I would take this as last resort as
this would again not be a definitive solution and I'd risk to get stuck
again tomorrow like I'm right now.

I think I will prefer to take a dirty way 3) just for those drivers to
solve this production problem even if it won't be implemented in a
generic manner at first (I got the idea from the quadrics folks that do
this just now with their nics if I understood well).

If I understand correctly on the tsunami enabling DAC simply means to
enable the pchip-pctl |= MWIN (monster window) bit during the boot
stage on both pchip.

Then the device driver of the massive DMA hardware should simply
program the registers of the nic to do use DAC with bus addresses that
are the phys address of the destination/source memory of the DMA,
only changed to have bit 40th set to 1. Those should be all the needed
changes necessary to make pci64 to work on tsunami at the same time of
pci32 direct/dynamic windows and it would be very efficient and it
sounds the best way to workaround the broken pci_map_* in 2.4 given
fixing the pci_map_* the right way is a pain.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



David S. Miller writes:
  
  1) I showed you in a private email that I calculated the
 maximum possible IOMMU space that one could allocate
 to bttv cards in a fully loaded Sunfire sparc64 system
 to be between 300MB and 400MB.  This is assuming that
 every PCI slot contained a bttv card, and it still
 used only ~%35 of the available IOMMU resources.

This is totally wrong in two ways.

Let me fix this, the IOMMU on these machines is per PCI bus, so this
figure should be drastically lower.

Electrically (someone correct me, I'm probably wrong) PCI is limited
to 6 physical plug-in slots I believe, let's say it's 8 to choose an
arbitrary larger number to be safe.

Then we have:

max bytes per bttv: max_gbuffers * max_gbufsize
64   * 0x208000  == 133.12MB

133.12MB * 8 PCI slots == ~1.06 GB

Which is still only half of the total IOMMU space available per
controller.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
  Assume I have a dozen of PCI cards that does DMA using SG tables that
  can map up to some houndred mbytes of ram each, so I can just program
  the cards to start the dma on those houndred mbyte of ram, most of the
  time the I/O is not simulaneous, but very rarely it happens to be
  simultaneous and in turn it tries to pci_map_sg more than 4G of physical
  ram.

What are these devices, and what drivers just program the cards to
start the dma on those hundred mbyte of ram?

Are we designing Linux for hypothetical systems with hypothetical
devices and drivers, or for the real world?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Ivan Kokshaysky


On Mon, May 21, 2001 at 06:55:29AM -0700, Jonathan Lundell wrote:
 8 slots (and  you're right, 6 is a practical upper limit, fewer for 
 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot 
 limit becomes largely irrelevant.

True, but the bandwidth limit is highly relevant. That's why modern
systems have multiple root buses, not a bridged ones.

Ivan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote:
 How many physical PCI slots on a Tsunami system?  (I know the

on tsunamis probably not many, but on a Typhoon (the one in the es40
that is the 4-way extension) I don't know, but certainly the box is
large.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen


On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote:
 
 Andi Kleen writes:
   [BTW, the 2.4.4 netstack does not seem to make any attempt to handle the
   pagecache  4GB case on IA32 for sendfile, as the pci_* functions are dummies 
   here.  It probably needs bounce buffers there for this case]
 
 egrep illegal_highdma net/core/dev.c

There is just no portable way for the driver to figure out if it should
set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally
set even on IA32. Currently it requires an architecture ifdef to set properly.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen


On Mon, May 21, 2001 at 02:30:09AM -0700, David S. Miller wrote:
 
 Andi Kleen writes:
   On the topic of to the PCI DMA code: one thing I'm missing
   are pci_map_single()/pci_map_sg() that take struct page * instead of
   of direct pointers. Currently I don't see how you would implement IO-MMU IO
   on a 32bit box with more than 4GB of memory, because the address won't
   fit into the pointer.
 
 How does the buffer get there in the first place? :-)

I guess you know ;)

e.g. via page table tricks from user space, like the PAE mode on IA32
or via kmap.

 Certainly, when this changes, we can make the interfaces adapt to
 this.

I am just curious why you didn't consider that case when designing the
interfaces. Was that a deliberate decision or just an oversight?
[I guess the first, but why?]

 
 Because of this, for example, the sbus IOMMU stuff on sparc32 still
 uses HIGHMEM exactly because of this pointer limitation.  In fact,
 any machine using 4GB of memory currently cannot be supported without
 highmem enabled, which is going to enable bounce buffering in the block
 I/O layer, etc.

That's currently the case, but at least on IA32 the block layer must be
fixed soon because it's a serious performance problem in some cases
(and fixing it is not very hard). Now that will probably first use DAC
and not a IO-MMU, and thus not use the pci mapping API, but I would not be 
surprised if people came up with IO-MMU schemes for it too.
[actually most IA32 boxes already have one in form of the AGP GART, it's just
not commonly used for serious things yet]

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andi Kleen writes:
  On the topic of to the PCI DMA code: one thing I'm missing
  are pci_map_single()/pci_map_sg() that take struct page * instead of
  of direct pointers. Currently I don't see how you would implement IO-MMU IO
  on a 32bit box with more than 4GB of memory, because the address won't
  fit into the pointer.

How does the buffer get there in the first place? :-)

Yes, the zerocopy stuff is capable of doing this.  But the block
I/O layer is not, neither is any other subsystem to my knowledge.

Certainly, when this changes, we can make the interfaces adapt to
this.

Because of this, for example, the sbus IOMMU stuff on sparc32 still
uses HIGHMEM exactly because of this pointer limitation.  In fact,
any machine using 4GB of memory currently cannot be supported without
highmem enabled, which is going to enable bounce buffering in the block
I/O layer, etc.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Jens Axboe


On Mon, May 21 2001, Andi Kleen wrote:
 On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote:
That's currently the case, but at least on IA32 the block layer
must be fixed soon because it's a serious performance problem in
some cases (and fixing it is not very hard).
  
  If such a far reaching change goes into 2.4.x, I would probably
  begin looking at enhancing the PCI dma interfaces as needed ;-)
 
 Hmm, I don't think it'll be a far reaching change. As far as I can see 
 all it needs is a new entry point for block device drivers that uses 
 bh-b_page. When that entry point exists skip the create_bounce call 
 in __make_request. After that it is purely problem for selected drivers.

I've already done it, however not as a 2.4 solution. The partial and WIP
patches is here:

*.kernel.org/pub/linux/kernel/people/axboe/v2.5/bio-7

Block driver can indicate the need for bounce buffers above a certain
page.

Of course I can hack up something for 2.4 as well, but is this really a
pressing need?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Peter Rival


Andrea Arcangeli wrote:

 On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote:
  How many physical PCI slots on a Tsunami system?  (I know the

 on tsunamis probably not many, but on a Typhoon (the one in the es40
 that is the 4-way extension) I don't know, but certainly the box is
 large.


ES40 has either 8 or 10 PCI slots across 2 PCI buses.  And then there's
Wildfire - 14 slots per PCI drawer (4 PCI buses) * 2 drawers/QBB * 8 QBBs =
224 PCI slots  64 PCI buses.  BTW, Titan (aka ES45) has 10 slots as well,
but with 3 buses instead.

 - Pete

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


 I can be really wrong on this because I didn't checked anything about
 the GART yet but I suspect you cannot use the GART for this stuff on
 ia32 in 2.4 because I think I recall it provides not an huge marging of
 mapping entries that so would far too easily trigger the bugs in the
 device drivers not checking for pci_map_* faliures also in a common
 desktop/webserver/fileserver kind of usage of an high end machine.

Not all chipsets support reading through GART address space from PCI either,
it is meant for AGP to use.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Ivan Kokshaysky


On Mon, May 21, 2001 at 01:19:59PM +0200, Andrea Arcangeli wrote:
 Alpha in mainline is just screwedup if a single pci bus tries to dynamic
 map more than 128mbyte, changing it to 512mbyte is trivial, growing more

Could you just describe the configuration where increasing sg window
from 128 to 512Mb actually fixes out of ptes problem? I mean which
drivers involved, what kind of load etc.
I'm unable reproduce it with *8Mb* window, so I'm asking.

Ivan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote:
 I think such designs which gobble up a gig or so of DMA mappings on

they maps something like 200mbyte I think. I also seen other cards doing
the same kind of stuff again for the distributed computing.

 to be using dual address cycles, ie. 64-bit PCI addressing.  It is
 the same response you would give to someone trying to obtain 3 or more
 gigabytes of user address space in a process on x86, right?  You might

I never seen those running on 64bit boxes even if they are supposed to
run there too.

Here it's a little different, 32bit virtual address space limitation
isn't always a showstopper for those kind of CPU intensive apps (they
don't need huge caches).

 respond to that person What you really need is x86-64. for example
 :-)

for the 32bit virtual address space issues of course yes ;)

 To me, from this perspective, the Quadrics sounds instead like a very
 broken piece of hardware.  And in any event, is there even a Quadrics

they're not the only ones doing that, I seen others doing that kind of
stuff, it's just a matter of information memory fast across a cluster,
if you delegate that work to a separate engine (btw they runs a
sparc32bit cpu, also guess why they aren't pci64) you can spend much
more cpu cycles of the main CPU on the userspace computations.

 driver for sparc64? :-)  (I'm a free software old-fart, so please
 excuse my immediate association between high end and proprietary
 :-)

:)

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andi Kleen writes:
   Certainly, when this changes, we can make the interfaces adapt to
   this.
  
  I am just curious why you didn't consider that case when designing the
  interfaces. Was that a deliberate decision or just an oversight?
  [I guess the first, but why?]

I didn't want the API to do exactly what we needed it to do
but not one bit more.  I tried very hard to keep it as minimal
as possible, and I even fought many additions to the API (a few
of which turned out to be reasonable, see pci_pool threads).

To this end, since HIGHMEM is needed anyways on such machines
(ie. the sizeof(void *) issue), I decided to not consider that
case.

Working on pages is useful even _ignoring_ the specific issues we are
talking about.  It really is the one generic way to represent all
pieces of memory inside the kernel (regardless of HIGHMEM and similar
issues).

But I simply did not see anyone who would really make use of it in
the 2.4.x timeframe.  (and I made this estimate in the middle of
2.3.x, so I didn't even see zerocopy coming along so clearly, shrug)

  That's currently the case, but at least on IA32 the block layer
  must be fixed soon because it's a serious performance problem in
  some cases (and fixing it is not very hard).

If such a far reaching change goes into 2.4.x, I would probably
begin looking at enhancing the PCI dma interfaces as needed ;-)

  Now that will probably first use DAC
  and not a IO-MMU, and thus not use the pci mapping API, but I would not be 
  surprised if people came up with IO-MMU schemes for it too.
  [actually most IA32 boxes already have one in form of the AGP GART, it's just
  not commonly used for serious things yet]

DAC usage should go through a portable PCI dma API as well,
for the reasons you mention as well as others.  If we do this
from the beginning, there will be no chance for things like
virt_to_bus64() et al. to start sneaking into the PCI drivers :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andi Kleen writes:
  On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote:
   egrep illegal_highdma net/core/dev.c
  
  There is just no portable way for the driver to figure out if it should
  set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally
  set even on IA32. Currently it requires an architecture ifdef to set properly.

Well, certainly, this could perhaps be a bug in the Acenic driver.
It should check if DAC cycles can be used on the platform, for example.

But please, let's get back to the original problem though.

The original claim is that the situation was not handled at all.  All
I'm trying to say is simply that the net stack does check via
illegal_highdma() the condition you stated was not being checked at
all.  To me it sounded like you were claiming that HIGHMEM pages went
totally unchecked through device transmit, and that is totally untrue.

If you were trying to point out the problem with what the Acenic
driver is doind, just state that next time ok? :-)

There is no question that what Acenic is doing with ifdefs needs a
clean portable solution.  This will be part of the 64-bit DAC API
interfaces (whenever those become really necessary, I simply don't
see the need right now).

Plainly, I'm going to be highly reluctant to make changes to the PCI
dma API in 2.4.x  It is already hard enough to get all the PCI drivers
in line and using it.  Suggesting this kind of change is similar to
saying let's change the arguments to request_irq().  We would do it
to fix a true people actually hit this kind of bug, of course.  Yet
we would avoid it at all possible costs due to the disruption this
would cause.

I'm not trying to be a big bad guy about this.  What I'm trying to do
is make sure at least one person (me :-) is thinking about the
ramifications any such change has on all current drivers which use
these interfaces already.  And also, to port maintainers...

Later,
David S. Miller
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 12:05:40AM -0700, David S. Miller wrote:
 together.  And it was agreed upon that the routines will not allow
 failure in 2.4.x and we would work on resolving this in 2.5.x and no
 sooner.

I'm glad you at least just considered to fix all those bugs for 2.5 but
that won't change that if somebody runs out of entries now with sparc64
the only thing I can do in a short term is to use HIGHMEM, so that the
serialization to limit the amount of max simultaneous pci32 DMA will
happen in the code that allocates the bounce buffers. Tell me a best way
to get rid of those bugs all together if you can.

Furthmore some arch for the legacy pci32 cards may not provide an huge
amount of entries and so you could more easily trigger those bugs
without the need of uncommon hardware, those bugs renders the iommu
unusable for those archs in 2.4 because it would trigger the device
drivers bugs far too easily.

Please tell Andrew to worry about that, if somebody ever worried about
that we would have all network drivers correct just now and the needed
panics in the lowlevel scsi layer.

This without considering bttv and friends are not even trying to use the
pci_map_* yet, I hope you don't watch TV on your sparc64 if you have
enough ram.

I hate those kind of broken compromises between something that works
almost all the time and that breaks when you are not only using a few
harddisk and a few nic, and that is unfixable in the right way in a
short term after it triggers (bttv is fixable in a short term of course,
I'm only talking about when you run out of pci mappings).

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andrea Arcangeli writes:
  On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote:
   I think such designs which gobble up a gig or so of DMA mappings on
  
  they maps something like 200mbyte I think. I also seen other cards doing
  the same kind of stuff again for the distributed computing.

Ok, 200MB, let's see what this gives us as an example.

200MB multiplied by 6 PCI slots, which uses up about 1.2GB IOMMU
space.

This still leaves around 800MB IOMMU space free on that sparc64 PCI
controller.

It wouldn't run out of space, and this is assuming that Sun ever made
a sparc64 system with 6 physical PCI card slots (I don't think they
ever did honestly, I think 4 physical card slots was the maximum).

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andi Kleen



On the topic of to the PCI DMA code: one thing I'm missing
are pci_map_single()/pci_map_sg() that take struct page * instead of
of direct pointers. Currently I don't see how you would implement IO-MMU IO
on a 32bit box with more than 4GB of memory, because the address won't
fit into the pointer.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Alan Cox


 Andrew Morton writes:
   Well this is news to me.  No drivers understand this.
   How long has this been the case?  What platforms?
 
 The DMA interfaces may never fail and I've discussed this over and
 over with port maintainers a _long_ time ago.

And how do you propose to implemnt cache coherent pci allocations on machines
which lack the ability to have pages coherent between I/O and memory space ?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Jonathan Lundell


At 3:19 AM -0700 2001-05-21, David S. Miller wrote:
This is totally wrong in two ways.

Let me fix this, the IOMMU on these machines is per PCI bus, so this
figure should be drastically lower.

Electrically (someone correct me, I'm probably wrong) PCI is limited
to 6 physical plug-in slots I believe, let's say it's 8 to choose an
arbitrary larger number to be safe.

Then we have:

max bytes per bttv: max_gbuffers * max_gbufsize
   64   * 0x208000  == 133.12MB

133.12MB * 8 PCI slots == ~1.06 GB

Which is still only half of the total IOMMU space available per
controller.

8 slots (and  you're right, 6 is a practical upper limit, fewer for 
66 MHz) *per bus*. Buses can proliferate like crazy, so the slot 
limit becomes largely irrelevant. A typical quad Ethernet card, for 
example (and this is true for many/most multiple-device cards), has a 
bridge, its own internal PCI bus, and four slots (devices in PCI 
terminology).
-- 
/Jonathan Lundell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread David S. Miller



Andi Kleen writes:
  How about a new function (pci_nonrepresentable_address() or whatever) 
  that returns true when page cache contains pages that are not representable
  physically as void *. On IA32 it would return true only if CONFIG_PAE is 
  true and there is memory 4GB. 

No, if we're going to change anything, let's do it right.

Sure, you'll make this one check portable, but the guts of the
main ifdef stuff for DAC support is still there.

I'd rather live with the hackish stuff temporarily, and get this all
cleaned up in one shot when we have a real DAC support API.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-21 Thread Andrea Arcangeli


On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote:
 max bytes per bttv: max_gbuffers * max_gbufsize
   64   * 0x208000  == 133.12MB
 
 133.12MB * 8 PCI slots == ~1.06 GB
 
 Which is still only half of the total IOMMU space available per
 controller.

and it is the double of the iommu space that I am going to reserve for
pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll
change to 512mbyte) also bttv is not doing that a large dma and by
default it only uses 2 buffers in the ring. bttv is not a good example
of what can really overflow the pci virtual address space in real life
(when I mentioned it it was only to point out it still uses
virt_to_bus), filling a pci bus with bttv cards sounds quite silly
anyways ;)

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 157 matches

Mail list logo