Re: alpha iommu fixes
David S. Miller writes: > What are these "devices", and what drivers "just program the cards to > start the dma on those hundred mbyte of ram"? Hmmm, I have a few cards that are used that way. They are used for communication between nodes of a cluster. One might put 16 cards in a system. The cards are quite happy to do a 2 GB DMA transfer. Scatter-gather is possible, but it cuts performance. Typically the driver would provide a huge chunk of memory for an app to use, mapped using large pages on x86 or using BAT registers on ppc. (reserved during boot of course) The app would crunch numbers using the CPU (with AltiVec, VIS, 3dnow, etc.) and instruct the device to transfer data to/from the memory region. Remote nodes initiate DMA too, even supplying the PCI bus address on both sides of the interconnect. :-) No IOMMU problems with that one, eh? The other node may transfer data at will. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 10:24 PM +0100 2001-05-22, Alan Cox wrote: > > On the main board, and not just the old ones. These days it's >> typically in the chipset's south bridge. "Third-party DMA" is >> sometimes called "fly-by DMA". The ISA card is a slave, as is memory, >> and the DMA chip reads from one ands writes to the other. > >There is also another mode which will give the Alpha kittens I suspect. A >few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes >to the ISA DMA controller which does a pointless ISA transfer and the PCI >card sniffs the DMA controller setup (as it goes to pci, then when nobody >claims it on to the isa bridge) then does bus mastering DMA of its own to fake >the ISA dma That's sick. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 2:02 PM -0700 2001-05-22, Richard Henderson wrote: >On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: >> 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't >> apply to bus-master DMA, just the legacy (8237) stuff. > >Would this 8237 be something on the ISA card, or something on >the old pc mainboards? I'm wondering if we can safely ignore >this issue altogether here... On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. "Third-party DMA" is sometimes called "fly-by DMA". The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. IDE didn't originally use DMA at all (but floppies did), just programmed IO. These days, PC chipsets mostly have some form of extended higher-performance DMA facilities for stuff like IDE, but I'm not really familiar with the details. I do wish Linux didn't have so much PC legacy sh^Htuff embedded into the i386 architecture. > > There was also a 24-bit address limitation. > >Yes, that's in the number of address lines going to the isa card. >We work around that one by having an iommu arena from 8M to 16M >and forcing all ISA traffic to go through there. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: > > I'm also wondering if ISA needs the sg to start on a 64k boundary, > Traditionally, ISA could not do DMA across a 64k boundary. The ISA dmac on the x86 needs a 64K boundary (128K for 16bit) because it did not carry the 16 bit address to the top latch byte. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> ISA cards can do sg? AHA1542 scsi for one. It wasnt that uncommon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Hi! > > > [..] Even sparc64's fancy > > > iommu-based pci_map_single() always succeeds. > > > > Whatever sparc64 does to hide the driver bugs you can break it if you > > pci_map 4G+1 bytes of phyical memory. > > Which is an utterly stupid thing to do. > > Please construct a plausable situation where this would occur legally > and not be a driver bug, given the maximum number of PCI busses and > slots found on sparc64 and the maximum _concurrent_ usage of PCI dma > space for any given driver (which isn't doing something stupid). What stops you from plugging PCI-to-PCI bridges in order to create some large number of slots, like 128? Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: > 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't > apply to bus-master DMA, just the legacy (8237) stuff. Would this 8237 be something on the ISA card, or something on the old pc mainboards? I'm wondering if we can safely ignore this issue altogether here... > There was also a 24-bit address limitation. Yes, that's in the number of address lines going to the isa card. We work around that one by having an iommu arena from 8M to 16M and forcing all ISA traffic to go through there. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 04:40:17PM -0400, Jeff Garzik wrote: > ISA cards can do sg? No, but the host iommu can. The isa card sees whatever view of memory presented to it by the iommu. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 1:28 PM -0700 2001-05-22, Richard Henderson wrote: >On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: >> I'm also wondering if ISA needs the sg to start on a 64k boundary, > >Traditionally, ISA could not do DMA across a 64k boundary. > >The only ISA card I have (a soundblaster compatible) appears >to work without caring for this, but I suppose we should pay >lip service to pedantics. 64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the legacy third-party-DMA controllers, which had only 16-bit address registers (the high part of the address lives in a non-counting register). This doesn't apply to bus-master DMA, just the legacy (8237) stuff. There was also a 24-bit address limitation. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: > I'm also wondering if ISA needs the sg to start on a 64k boundary, Traditionally, ISA could not do DMA across a 64k boundary. The only ISA card I have (a soundblaster compatible) appears to work without caring for this, but I suppose we should pay lip service to pedantics. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote: >On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: > > Electrically (someone correct me, I'm probably wrong) PCI is > limited to 6 physical plug-in slots I believe, let's say it's 8 > to choose an arbitrary larger number to be safe. > >Minor nit... it can in fact be higher than this, but typically it is >not. CompactPCI implementations may go higher (different electrical >characteristics allow for this). Compact PCI specifies a max of 8 slots (one of which is typically the system board). Regular PCI doesn't have a hard and fast slot limit (except for the logical limit of 32 devices per bus); the limits are driven by electrical loading concerns. As I recall, a bus of typical length can accommodate 10 "loads", where a load is either a device pin or a slot connector (that is, an expansion card counts as two loads, one for the device and one for the connector). (I take this to be a rule of thumb, not a hard spec, based on the detailed electrical requirements in the PCI spec.) Still, the presence of bridges opens up the number of devices on a root PCI bus to a very high number, logically. Certainly having three or four quad Ethernet cards, so 12 or 16 devices, is a plausible configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst bandwidth of 533 MB/second, which would be saturated by 20 full duplex 100baseT ports that were themselves saturated in both directions (all ignoring overhead). Full saturation is not reasonable for either PCI or Ethernet; I'm just looking at order-of-magnitude numbers here. The bottom line is: don't make any hard and fast assumption about the number of devices connected to a root PCI bus. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 07:55:18PM +0400, Ivan Kokshaysky wrote: > Yes. Though those races more likely would cause silent data > corruption, but not immediate crash. Ok. I wasn't sure if it was crashing or not for you. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 06:44:09PM +0400, Ivan Kokshaysky wrote: > On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote: > > Ivan could you test the above fix on the platforms that needs the > > align_entry hack? > > That was one of the first things I noticed, and I've tried exactly > that (2 instead of ~1UL). just in case (I guess it wouldn't matter much but), but are you sure you tried it with also the locking fixes applied too? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote: > Ivan could you test the above fix on the platforms that needs the > align_entry hack? That was one of the first things I noticed, and I've tried exactly that (2 instead of ~1UL). No, it wasn't the cause of the crashes on pyxis, so I left it as is. But probably it worth to be changed, at least for correctness. Ivan. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
While merging all the recent fixes in my tree and while reserving the pci32 space above -1M to have a dynamic window of almost 1G without dropping down the direct window, I noticed and fixed a severe bug, and so now I started to wonder if the real reason of the crash when an invalid entry is cached in the tlb and we do dma through it (both of es40 and other platforms as well according to Ivan) could be just this new software bug: for (i = 0; i < n; ++i) ptes[p+i] = ~1UL; we reserve by setting also all the bits over 31 to 1. The tsunami specs says that bits between 32 and 63 _must_ be zero, so the above is definitely buggy. Maybe this has relactions with the fact the crashes triggered on >=4G machines. I will change it to: for (i = 0; i < n; ++i) ptes[p+i] = 0x2; which is just obviously correct for our internal management of the allocation in the critical sections and that is a definitely necessary fix according to the specs. Maybe this is the right(tm) fix and then I can drop the artificial alignment and the tsunami will go to re-fetch the pte on memory automatically when we do the I/O through an invalid pte then. If tsunami gets fixed by it I can bet then we can drop the align_entry field from the pci_iommu_arena structure all together and what was referred as hardware bug for the other platforms would be infact a software bug in the iommu code. I am optimistic this is the definitive fix so I will left out the so far absolutely necessary artifical alignment on the tsunami for now and I will put in this critical fix for now (until I get the confirm), and if it works I will drop the align_entry field all together from the pci_iommu_arena structure. Ivan could you test the above fix on the platforms that needs the align_entry hack? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote: > diff -ruNp linux/arch/alpha/kernel/pci_iommu.c >linux-new/arch/alpha/kernel/pci_iommu.c > --- linux/arch/alpha/kernel/pci_iommu.c Fri Mar 2 11:12:07 2001 > +++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001 > @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru > paddr &= ~PAGE_MASK; > npages = calc_npages(paddr + size); > dma_ofs = iommu_arena_alloc(arena, npages); > - if (dma_ofs < 0) > - return -1; > + if (dma_ofs < 0) { > + /* If we attempted a direct map above but failed, die. */ > + if (leader->dma_address == 0) > + return -1; > + > + /* Otherwise, break up the remaining virtually contiguous > +hunks into individual direct maps. */ > + for (sg = leader; sg < end; ++sg) > + if (sg->dma_address == 2 || sg->dma_address == -2) should be == 1 > + sg->dma_address = 0; > + > + /* Retry. */ > + return sg_fill(leader, end, out, arena, max_dma); > + } > > out->dma_address = arena->dma_base + dma_ofs*PAGE_SIZE + paddr; > out->dma_length = size; I am going to merge this one (however it won't help on the big memory machines, it will only try to hide the problem on the machines with not much memory above 2G). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote: diff -ruNp linux/arch/alpha/kernel/pci_iommu.c linux-new/arch/alpha/kernel/pci_iommu.c --- linux/arch/alpha/kernel/pci_iommu.c Fri Mar 2 11:12:07 2001 +++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001 @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru paddr = ~PAGE_MASK; npages = calc_npages(paddr + size); dma_ofs = iommu_arena_alloc(arena, npages); - if (dma_ofs 0) - return -1; + if (dma_ofs 0) { + /* If we attempted a direct map above but failed, die. */ + if (leader-dma_address == 0) + return -1; + + /* Otherwise, break up the remaining virtually contiguous +hunks into individual direct maps. */ + for (sg = leader; sg end; ++sg) + if (sg-dma_address == 2 || sg-dma_address == -2) should be == 1 + sg-dma_address = 0; + + /* Retry. */ + return sg_fill(leader, end, out, arena, max_dma); + } out-dma_address = arena-dma_base + dma_ofs*PAGE_SIZE + paddr; out-dma_length = size; I am going to merge this one (however it won't help on the big memory machines, it will only try to hide the problem on the machines with not much memory above 2G). Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 06:44:09PM +0400, Ivan Kokshaysky wrote: On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote: Ivan could you test the above fix on the platforms that needs the align_entry hack? That was one of the first things I noticed, and I've tried exactly that (2 instead of ~1UL). just in case (I guess it wouldn't matter much but), but are you sure you tried it with also the locking fixes applied too? Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't apply to bus-master DMA, just the legacy (8237) stuff. Would this 8237 be something on the ISA card, or something on the old pc mainboards? I'm wondering if we can safely ignore this issue altogether here... There was also a 24-bit address limitation. Yes, that's in the number of address lines going to the isa card. We work around that one by having an iommu arena from 8M to 16M and forcing all ISA traffic to go through there. r~ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Hi! [..] Even sparc64's fancy iommu-based pci_map_single() always succeeds. Whatever sparc64 does to hide the driver bugs you can break it if you pci_map 4G+1 bytes of phyical memory. Which is an utterly stupid thing to do. Please construct a plausable situation where this would occur legally and not be a driver bug, given the maximum number of PCI busses and slots found on sparc64 and the maximum _concurrent_ usage of PCI dma space for any given driver (which isn't doing something stupid). What stops you from plugging PCI-to-PCI bridges in order to create some large number of slots, like 128? Pavel -- I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care. Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
David S. Miller writes: What are these devices, and what drivers just program the cards to start the dma on those hundred mbyte of ram? Hmmm, I have a few cards that are used that way. They are used for communication between nodes of a cluster. One might put 16 cards in a system. The cards are quite happy to do a 2 GB DMA transfer. Scatter-gather is possible, but it cuts performance. Typically the driver would provide a huge chunk of memory for an app to use, mapped using large pages on x86 or using BAT registers on ppc. (reserved during boot of course) The app would crunch numbers using the CPU (with AltiVec, VIS, 3dnow, etc.) and instruct the device to transfer data to/from the memory region. Remote nodes initiate DMA too, even supplying the PCI bus address on both sides of the interconnect. :-) No IOMMU problems with that one, eh? The other node may transfer data at will. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote: On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Minor nit... it can in fact be higher than this, but typically it is not. CompactPCI implementations may go higher (different electrical characteristics allow for this). Compact PCI specifies a max of 8 slots (one of which is typically the system board). Regular PCI doesn't have a hard and fast slot limit (except for the logical limit of 32 devices per bus); the limits are driven by electrical loading concerns. As I recall, a bus of typical length can accommodate 10 loads, where a load is either a device pin or a slot connector (that is, an expansion card counts as two loads, one for the device and one for the connector). (I take this to be a rule of thumb, not a hard spec, based on the detailed electrical requirements in the PCI spec.) Still, the presence of bridges opens up the number of devices on a root PCI bus to a very high number, logically. Certainly having three or four quad Ethernet cards, so 12 or 16 devices, is a plausible configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst bandwidth of 533 MB/second, which would be saturated by 20 full duplex 100baseT ports that were themselves saturated in both directions (all ignoring overhead). Full saturation is not reasonable for either PCI or Ethernet; I'm just looking at order-of-magnitude numbers here. The bottom line is: don't make any hard and fast assumption about the number of devices connected to a root PCI bus. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
ISA cards can do sg? AHA1542 scsi for one. It wasnt that uncommon. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: I'm also wondering if ISA needs the sg to start on a 64k boundary, Traditionally, ISA could not do DMA across a 64k boundary. The only ISA card I have (a soundblaster compatible) appears to work without caring for this, but I suppose we should pay lip service to pedantics. r~ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 07:55:18PM +0400, Ivan Kokshaysky wrote: Yes. Though those races more likely would cause silent data corruption, but not immediate crash. Ok. I wasn't sure if it was crashing or not for you. Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 10:24 PM +0100 2001-05-22, Alan Cox wrote: On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. Third-party DMA is sometimes called fly-by DMA. The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. There is also another mode which will give the Alpha kittens I suspect. A few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes to the ISA DMA controller which does a pointless ISA transfer and the PCI card sniffs the DMA controller setup (as it goes to pci, then when nobody claims it on to the isa bridge) then does bus mastering DMA of its own to fake the ISA dma That's sick. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 04:40:17PM -0400, Jeff Garzik wrote: ISA cards can do sg? No, but the host iommu can. The isa card sees whatever view of memory presented to it by the iommu. r~ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: I'm also wondering if ISA needs the sg to start on a 64k boundary, Traditionally, ISA could not do DMA across a 64k boundary. The ISA dmac on the x86 needs a 64K boundary (128K for 16bit) because it did not carry the 16 bit address to the top latch byte. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Tue, May 22, 2001 at 04:29:16PM +0200, Andrea Arcangeli wrote: Ivan could you test the above fix on the platforms that needs the align_entry hack? That was one of the first things I noticed, and I've tried exactly that (2 instead of ~1UL). No, it wasn't the cause of the crashes on pyxis, so I left it as is. But probably it worth to be changed, at least for correctness. Ivan. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
While merging all the recent fixes in my tree and while reserving the pci32 space above -1M to have a dynamic window of almost 1G without dropping down the direct window, I noticed and fixed a severe bug, and so now I started to wonder if the real reason of the crash when an invalid entry is cached in the tlb and we do dma through it (both of es40 and other platforms as well according to Ivan) could be just this new software bug: for (i = 0; i n; ++i) ptes[p+i] = ~1UL; we reserve by setting also all the bits over 31 to 1. The tsunami specs says that bits between 32 and 63 _must_ be zero, so the above is definitely buggy. Maybe this has relactions with the fact the crashes triggered on =4G machines. I will change it to: for (i = 0; i n; ++i) ptes[p+i] = 0x2; which is just obviously correct for our internal management of the allocation in the critical sections and that is a definitely necessary fix according to the specs. Maybe this is the right(tm) fix and then I can drop the artificial alignment and the tsunami will go to re-fetch the pte on memory automatically when we do the I/O through an invalid pte then. If tsunami gets fixed by it I can bet then we can drop the align_entry field from the pci_iommu_arena structure all together and what was referred as hardware bug for the other platforms would be infact a software bug in the iommu code. I am optimistic this is the definitive fix so I will left out the so far absolutely necessary artifical alignment on the tsunami for now and I will put in this critical fix for now (until I get the confirm), and if it works I will drop the align_entry field all together from the pci_iommu_arena structure. Ivan could you test the above fix on the platforms that needs the align_entry hack? Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 2:02 PM -0700 2001-05-22, Richard Henderson wrote: On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't apply to bus-master DMA, just the legacy (8237) stuff. Would this 8237 be something on the ISA card, or something on the old pc mainboards? I'm wondering if we can safely ignore this issue altogether here... On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. Third-party DMA is sometimes called fly-by DMA. The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. IDE didn't originally use DMA at all (but floppies did), just programmed IO. These days, PC chipsets mostly have some form of extended higher-performance DMA facilities for stuff like IDE, but I'm not really familiar with the details. asideI do wish Linux didn't have so much PC legacy sh^Htuff embedded into the i386 architecture./aside There was also a 24-bit address limitation. Yes, that's in the number of address lines going to the isa card. We work around that one by having an iommu arena from 8M to 16M and forcing all ISA traffic to go through there. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 1:28 PM -0700 2001-05-22, Richard Henderson wrote: On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: I'm also wondering if ISA needs the sg to start on a 64k boundary, Traditionally, ISA could not do DMA across a 64k boundary. The only ISA card I have (a soundblaster compatible) appears to work without caring for this, but I suppose we should pay lip service to pedantics. 64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the legacy third-party-DMA controllers, which had only 16-bit address registers (the high part of the address lives in a non-counting register). This doesn't apply to bus-master DMA, just the legacy (8237) stuff. There was also a 24-bit address limitation. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote: > should probably just go ahead and allocate the 512M or 1G > scatter-gather arena. I just have a bugreport in my mailbox about pci_map faliures even after I enlarged to window to 1G argghh (at first it looked apparently stable by growing the window), so I'm stuck again, it seems I was right in not being careless about the pci_map_* bugs today even if the 1G window looked to offer a rasonable marging at first. The pci_map_* failed triggers during a benchmark with a certain driver that does massive DMA (similar to the examples I did previously), the developers of the driver simply told me the hardware wants to do massive zerocopy dma to userspace and they apparently excluded it could be a memleak in the driver missing some pci_unmap_* after I told them to check for that. Even enabling HIGHMEM would not be enough because they do dma on userspace but on the network side, so it won't be taken care by create_bounces(), so I at least would need to put another bounce buffer layer in the driver to make highmem to work. Other more efficient ways to go besides highmem plus additional bounce buffer layer are: 2) fixing all buggy drivers now (would be a great pain as it seems to me I should do that alone apparently as it seems everybody else doesn't care about those bugs for 2.4) 3) let the "massing DMA" hardware to use DAC Theoritically I could also cheat again and take a way 4) that is to try to enlarge the window beyond 1G and see if the bugs gets hided also during the benchmark that way, but I would take this as last resort as this would again not be a definitive solution and I'd risk to get stuck again tomorrow like I'm right now. I think I will prefer to take a dirty way 3) just for those drivers to solve this production problem even if it won't be implemented in a generic manner at first (I got the idea from the quadrics folks that do this just now with their nics if I understood well). If I understand correctly on the tsunami enabling DAC simply means to enable the pchip->pctl |= MWIN (monster window) bit during the boot stage on both pchip. Then the device driver of the "massive DMA" hardware should simply program the registers of the nic to do use DAC with bus addresses that are the phys address of the destination/source memory of the DMA, only changed to have bit 40th set to 1. Those should be all the needed changes necessary to make pci64 to work on tsunami at the same time of pci32 direct/dynamic windows and it would be very efficient and it sounds the best way to workaround the broken pci_map_* in 2.4 given fixing the pci_map_* the right way is a pain. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21 2001, Andi Kleen wrote: > On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote: > > > That's currently the case, but at least on IA32 the block layer > > > must be fixed soon because it's a serious performance problem in > > > some cases (and fixing it is not very hard). > > > > If such a far reaching change goes into 2.4.x, I would probably > > begin looking at enhancing the PCI dma interfaces as needed ;-) > > Hmm, I don't think it'll be a far reaching change. As far as I can see > all it needs is a new entry point for block device drivers that uses > bh->b_page. When that entry point exists skip the create_bounce call > in __make_request. After that it is purely problem for selected drivers. I've already done it, however not as a 2.4 solution. The partial and WIP patches is here: *.kernel.org/pub/linux/kernel/people/axboe/v2.5/bio-7 Block driver can indicate the need for bounce buffers above a certain page. Of course I can hack up something for 2.4 as well, but is this really a pressing need? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:51:51PM +0400, Ivan Kokshaysky wrote: > I'm unable reproduce it with *8Mb* window, so I'm asking. Me either. But Tom Vier, the guy who started this thread was able to use up the 8MB. Which is completely believable. The following should aleviate the situation on these smaller machines where the direct map does cover all physical memory. Really, we were failing gratuitously before. On Tsunami and Titan, espectially with more than 4G ram we should probably just go ahead and allocate the 512M or 1G scatter-gather arena. (BTW, Andrea, it's easy enough to work around the Cypress problem by marking the last 1M of the 1G arena in use.) r~ diff -ruNp linux/arch/alpha/kernel/pci_iommu.c linux-new/arch/alpha/kernel/pci_iommu.c --- linux/arch/alpha/kernel/pci_iommu.c Fri Mar 2 11:12:07 2001 +++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001 @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru paddr &= ~PAGE_MASK; npages = calc_npages(paddr + size); dma_ofs = iommu_arena_alloc(arena, npages); - if (dma_ofs < 0) - return -1; + if (dma_ofs < 0) { + /* If we attempted a direct map above but failed, die. */ + if (leader->dma_address == 0) + return -1; + + /* Otherwise, break up the remaining virtually contiguous + hunks into individual direct maps. */ + for (sg = leader; sg < end; ++sg) + if (sg->dma_address == 2 || sg->dma_address == -2) + sg->dma_address = 0; + + /* Retry. */ + return sg_fill(leader, end, out, arena, max_dma); + } out->dma_address = arena->dma_base + dma_ofs*PAGE_SIZE + paddr; out->dma_length = size; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 06:55:29AM -0700, Jonathan Lundell wrote: > 8 slots (and you're right, 6 is a practical upper limit, fewer for > 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot > limit becomes largely irrelevant. True, but the bandwidth limit is highly relevant. That's why modern systems have multiple root buses, not a bridged ones. Ivan. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 3:19 AM -0700 2001-05-21, David S. Miller wrote: >This is totally wrong in two ways. > >Let me fix this, the IOMMU on these machines is per PCI bus, so this >figure should be drastically lower. > >Electrically (someone correct me, I'm probably wrong) PCI is limited >to 6 physical plug-in slots I believe, let's say it's 8 to choose an >arbitrary larger number to be safe. > >Then we have: > >max bytes per bttv: max_gbuffers * max_gbufsize > 64 * 0x208000 == 133.12MB > >133.12MB * 8 PCI slots == ~1.06 GB > >Which is still only half of the total IOMMU space available per >controller. 8 slots (and you're right, 6 is a practical upper limit, fewer for 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot limit becomes largely irrelevant. A typical quad Ethernet card, for example (and this is true for many/most multiple-device cards), has a bridge, its own internal PCI bus, and four "slots" ("devices" in PCI terminology). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli wrote: > On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote: > > How many physical PCI slots on a Tsunami system? (I know the > > on tsunamis probably not many, but on a Typhoon (the one in the es40 > that is the 4-way extension) I don't know, but certainly the box is > large. > ES40 has either 8 or 10 PCI slots across 2 PCI buses. And then there's Wildfire - 14 slots per PCI drawer (4 PCI buses) * 2 drawers/QBB * 8 QBBs = 224 PCI slots & 64 PCI buses. BTW, Titan (aka ES45) has 10 slots as well, but with 3 buses instead. - Pete - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 01:19:59PM +0200, Andrea Arcangeli wrote: > Alpha in mainline is just screwedup if a single pci bus tries to dynamic > map more than 128mbyte, changing it to 512mbyte is trivial, growing more Could you just describe the configuration where increasing sg window from 128 to 512Mb actually fixes "out of ptes" problem? I mean which drivers involved, what kind of load etc. I'm unable reproduce it with *8Mb* window, so I'm asking. Ivan. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: > How about a new function (pci_nonrepresentable_address() or whatever) > that returns true when page cache contains pages that are not representable > physically as void *. On IA32 it would return true only if CONFIG_PAE is > true and there is memory >4GB. No, if we're going to change anything, let's do it right. Sure, you'll make this one check "portable", but the guts of the main ifdef stuff for DAC support is still there. I'd rather live with the hackish stuff temporarily, and get this all cleaned up in one shot when we have a real DAC support API. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote: > How many physical PCI slots on a Tsunami system? (I know the on tsunamis probably not many, but on a Typhoon (the one in the es40 that is the 4-way extension) I don't know, but certainly the box is large. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:59:58AM -0700, David S. Miller wrote: > This still leaves around 800MB IOMMU space free on that sparc64 PCI > controller. if it was 400mbyte you were screwed up too, the point here is that the marging is way too to allows ignore the issue completly, furthmore there can be fragmentation effects in the pagetbles, at least in the way alpha manages them which is to find contigous virtual pci bus addresses for each sg. Alpha in mainline is just screwedup if a single pci bus tries to dynamic map more than 128mbyte, changing it to 512mbyte is trivial, growing more has performance implications as it needs to reduce the direct windows which I don't like to as it would also increase the number of machines that will get bitten by drivers that still use the virt_to_bus and also increase the pressure on the iommu ptes too. Now I'm not asking to break the API for 2.4 to take care of that, you seems to be convinced in fixing this for 2.5 and I'm ok with that, I just changed the printk of running out of entries to be KERN_ERR at least, so we know if somebody has real life troubles with 2.4 I will go HIGHMEM which is a matter of 2 hours for me to implement. Only thing I suggest is to change the API before starting fixing the drivers, I mean: don't start checking for bus address 0 before changing the API to return faliure in another way. It's true x86 is reserving the zero page anyways because it's a magic bios thing, but for example on the alpha such a 0 bus address that we cannot use wastes 8 mbyte of DMA virtual bus addresses that we reserve for the ISA cards (of course we almost never need 16mbyte of ram all under isa dma but since it's so low cost to allow that I think we will just in case). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: > On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: > > max bytes per bttv: max_gbuffers * max_gbufsize > >64 * 0x208000 == 133.12MB > > > > 133.12MB * 8 PCI slots == ~1.06 GB > > > > Which is still only half of the total IOMMU space available per > > controller. > > and it is the double of the iommu space that I am going to reserve for > pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll > change to 512mbyte) How many physical PCI slots on a Tsunami system? (I know the answer, this question is rhetorical :-) See? This is why I think all these examples are silly, and we need to be realistic about this whole situation. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: > max bytes per bttv: max_gbuffers * max_gbufsize > 64 * 0x208000 == 133.12MB > > 133.12MB * 8 PCI slots == ~1.06 GB > > Which is still only half of the total IOMMU space available per > controller. and it is the double of the iommu space that I am going to reserve for pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll change to 512mbyte) also bttv is not doing that a large dma and by default it only uses 2 buffers in the ring. bttv is not a good example of what can really overflow the pci virtual address space in real life (when I mentioned it it was only to point out it still uses virt_to_bus), filling a pci bus with bttv cards sounds quite silly anyways ;) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: > On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote: > > I think such designs which gobble up a gig or so of DMA mappings on > > they maps something like 200mbyte I think. I also seen other cards doing > the same kind of stuff again for the distributed computing. Ok, 200MB, let's see what this gives us as an example. 200MB multiplied by 6 PCI slots, which uses up about 1.2GB IOMMU space. This still leaves around 800MB IOMMU space free on that sparc64 PCI controller. It wouldn't run out of space, and this is assuming that Sun ever made a sparc64 system with 6 physical PCI card slots (I don't think they ever did honestly, I think 4 physical card slots was the maximum). Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote: > I think such designs which gobble up a gig or so of DMA mappings on they maps something like 200mbyte I think. I also seen other cards doing the same kind of stuff again for the distributed computing. > to be using dual address cycles, ie. 64-bit PCI addressing. It is > the same response you would give to someone trying to obtain 3 or more > gigabytes of user address space in a process on x86, right? You might I never seen those running on 64bit boxes even if they are supposed to run there too. Here it's a little different, 32bit virtual address space limitation isn't always a showstopper for those kind of CPU intensive apps (they don't need huge caches). > respond to that person "What you really need is x86-64." for example > :-) for the 32bit virtual address space issues of course yes ;) > To me, from this perspective, the Quadrics sounds instead like a very > broken piece of hardware. And in any event, is there even a Quadrics they're not the only ones doing that, I seen others doing that kind of stuff, it's just a matter of information memory fast across a cluster, if you delegate that work to a separate engine (btw they runs a sparc32bit cpu, also guess why they aren't pci64) you can spend much more cpu cycles of the main CPU on the userspace computations. > driver for sparc64? :-) (I'm a free software old-fart, so please > excuse my immediate association between "high end" and "proprietary" > :-) :) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: > On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote: > > egrep illegal_highdma net/core/dev.c > > There is just no portable way for the driver to figure out if it should > set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally > set even on IA32. Currently it requires an architecture ifdef to set properly. Well, certainly, this could perhaps be a bug in the Acenic driver. It should check if DAC cycles can be used on the platform, for example. But please, let's get back to the original problem though. The original claim is that the situation was not handled at all. All I'm trying to say is simply that the net stack does check via illegal_highdma() the condition you stated was not being checked at all. To me it sounded like you were claiming that HIGHMEM pages went totally unchecked through device transmit, and that is totally untrue. If you were trying to point out the problem with what the Acenic driver is doind, just state that next time ok? :-) There is no question that what Acenic is doing with ifdefs needs a clean portable solution. This will be part of the 64-bit DAC API interfaces (whenever those become really necessary, I simply don't see the need right now). Plainly, I'm going to be highly reluctant to make changes to the PCI dma API in 2.4.x It is already hard enough to get all the PCI drivers in line and using it. Suggesting this kind of change is similar to saying "let's change the arguments to request_irq()". We would do it to fix a true "people actually hit this" kind of bug, of course. Yet we would avoid it at all possible costs due to the disruption this would cause. I'm not trying to be a big bad guy about this. What I'm trying to do is make sure at least one person (me :-) is thinking about the ramifications any such change has on all current drivers which use these interfaces already. And also, to port maintainers... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> This without considering bttv and friends are not even trying to use the > pci_map_* yet, I hope you don't watch TV on your sparc64 if you have > enough ram. The bttv devel versions[1] are fixed already, they should work out-of-the box on sparc too. Just watching TV is harmless (needs lots of I/O bandwidth, but doesn't need much address space). Video capture does a better job on eating iommu resources ... Gerd [1] http://bytesex.org/bttv/, 0.8.x versions. -- Gerd Knorr <[EMAIL PROTECTED]> -- SuSE Labs, Außenstelle Berlin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote: > > Andi Kleen writes: > > [BTW, the 2.4.4 netstack does not seem to make any attempt to handle the > > pagecache > 4GB case on IA32 for sendfile, as the pci_* functions are dummies > > here. It probably needs bounce buffers there for this case] > > egrep illegal_highdma net/core/dev.c There is just no portable way for the driver to figure out if it should set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally set even on IA32. Currently it requires an architecture ifdef to set properly. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote: > > That's currently the case, but at least on IA32 the block layer > > must be fixed soon because it's a serious performance problem in > > some cases (and fixing it is not very hard). > > If such a far reaching change goes into 2.4.x, I would probably > begin looking at enhancing the PCI dma interfaces as needed ;-) Hmm, I don't think it'll be a far reaching change. As far as I can see all it needs is a new entry point for block device drivers that uses bh->b_page. When that entry point exists skip the create_bounce call in __make_request. After that it is purely problem for selected drivers. [BTW, the 2.4.4 netstack does not seem to make any attempt to handle the pagecache > 4GB case on IA32 for sendfile, as the pci_* functions are dummies here. It probably needs bounce buffers there for this case] -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> I can be really wrong on this because I didn't checked anything about > the GART yet but I suspect you cannot use the GART for this stuff on > ia32 in 2.4 because I think I recall it provides not an huge marging of > mapping entries that so would far too easily trigger the bugs in the > device drivers not checking for pci_map_* faliures also in a common > desktop/webserver/fileserver kind of usage of an high end machine. Not all chipsets support reading through GART address space from PCI either, it is meant for AGP to use. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
David S. Miller writes: > > 1) I showed you in a private email that I calculated the >maximum possible IOMMU space that one could allocate >to bttv cards in a fully loaded Sunfire sparc64 system >to be between 300MB and 400MB. This is assuming that >every PCI slot contained a bttv card, and it still >used only ~%35 of the available IOMMU resources. This is totally wrong in two ways. Let me fix this, the IOMMU on these machines is per PCI bus, so this figure should be drastically lower. Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Then we have: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: > I just given you a test case that triggers on sparc64 in earlier email. If you are talking about the bttv card: 1) I showed you in a private email that I calculated the maximum possible IOMMU space that one could allocate to bttv cards in a fully loaded Sunfire sparc64 system to be between 300MB and 400MB. This is assuming that every PCI slot contained a bttv card, and it still used only ~%35 of the available IOMMU resources. 2) It currently doesn't even use the portable APIs yet anyways, so effectively it is not supported on sparc64. The only other examples you showed were theoretical, for cards and configurations that simply are not supported or cannot happen on sparc64 with current kernels. > Chris just given a real world example of applications where that kind of > design is useful and there are certainly other kind of apps where that > kind of hardware design can be useful too. > > A name of an high end pci32 card that AFIK can trigger those bugs is the > Quadrics which is a very nice piece of hardware btw. I think such designs which gobble up a gig or so of DMA mappings on pci32 are not useful in the slightest. These cards really ought to be using dual address cycles, ie. 64-bit PCI addressing. It is the same response you would give to someone trying to obtain 3 or more gigabytes of user address space in a process on x86, right? You might respond to that person "What you really need is x86-64." for example :-) To me, from this perspective, the Quadrics sounds instead like a very broken piece of hardware. And in any event, is there even a Quadrics driver for sparc64? :-) (I'm a free software old-fart, so please excuse my immediate association between "high end" and "proprietary" :-) Finally Andrea, have you even begun to consider the possible starvation cases once we make this a resource allocation which can fail under "normal" conditions. Maybe the device eatins all the IOMMU entries, immediately obtains a new mapping when he frees any mapping, effectively keeping out all other devices. This may be easily solved, I don't know. But this along with the potential scsi layer issues, are basically the reasons I'm trying hard to keep the API as it is right now for 2.4.x Changing this in 2.4.x is going to open up Pandora's Box, really. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 11:42:16AM +0200, Andi Kleen wrote: > [actually most IA32 boxes already have one in form of the AGP GART, it's just > not commonly used for serious things yet] I can be really wrong on this because I didn't checked anything about the GART yet but I suspect you cannot use the GART for this stuff on ia32 in 2.4 because I think I recall it provides not an huge marging of mapping entries that so would far too easily trigger the bugs in the device drivers not checking for pci_map_* faliures also in a common desktop/webserver/fileserver kind of usage of an high end machine. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: > > Certainly, when this changes, we can make the interfaces adapt to > > this. > > I am just curious why you didn't consider that case when designing the > interfaces. Was that a deliberate decision or just an oversight? > [I guess the first, but why?] I didn't want the API to do exactly what we needed it to do but not one bit more. I tried very hard to keep it as minimal as possible, and I even fought many additions to the API (a few of which turned out to be reasonable, see pci_pool threads). To this end, since HIGHMEM is needed anyways on such machines (ie. the "sizeof(void *)" issue), I decided to not consider that case. Working on pages is useful even _ignoring_ the specific issues we are talking about. It really is the one generic way to represent all pieces of memory inside the kernel (regardless of HIGHMEM and similar issues). But I simply did not see anyone who would really make use of it in the 2.4.x timeframe. (and I made this estimate in the middle of 2.3.x, so I didn't even see zerocopy coming along so clearly, shrug) > That's currently the case, but at least on IA32 the block layer > must be fixed soon because it's a serious performance problem in > some cases (and fixing it is not very hard). If such a far reaching change goes into 2.4.x, I would probably begin looking at enhancing the PCI dma interfaces as needed ;-) > Now that will probably first use DAC > and not a IO-MMU, and thus not use the pci mapping API, but I would not be > surprised if people came up with IO-MMU schemes for it too. > [actually most IA32 boxes already have one in form of the AGP GART, it's just > not commonly used for serious things yet] DAC usage should go through a portable PCI dma API as well, for the reasons you mention as well as others. If we do this from the beginning, there will be no chance for things like virt_to_bus64() et al. to start sneaking into the PCI drivers :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 02:30:09AM -0700, David S. Miller wrote: > > Andi Kleen writes: > > On the topic of to the PCI DMA code: one thing I'm missing > > are pci_map_single()/pci_map_sg() that take struct page * instead of > > of direct pointers. Currently I don't see how you would implement IO-MMU IO > > on a 32bit box with more than 4GB of memory, because the address won't > > fit into the pointer. > > How does the buffer get there in the first place? :-) I guess you know ;) e.g. via page table tricks from user space, like the PAE mode on IA32 or via kmap. > Certainly, when this changes, we can make the interfaces adapt to > this. I am just curious why you didn't consider that case when designing the interfaces. Was that a deliberate decision or just an oversight? [I guess the first, but why?] > > Because of this, for example, the sbus IOMMU stuff on sparc32 still > uses HIGHMEM exactly because of this pointer limitation. In fact, > any machine using >4GB of memory currently cannot be supported without > highmem enabled, which is going to enable bounce buffering in the block > I/O layer, etc. That's currently the case, but at least on IA32 the block layer must be fixed soon because it's a serious performance problem in some cases (and fixing it is not very hard). Now that will probably first use DAC and not a IO-MMU, and thus not use the pci mapping API, but I would not be surprised if people came up with IO-MMU schemes for it too. [actually most IA32 boxes already have one in form of the AGP GART, it's just not commonly used for serious things yet] -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: > On the topic of to the PCI DMA code: one thing I'm missing > are pci_map_single()/pci_map_sg() that take struct page * instead of > of direct pointers. Currently I don't see how you would implement IO-MMU IO > on a 32bit box with more than 4GB of memory, because the address won't > fit into the pointer. How does the buffer get there in the first place? :-) Yes, the zerocopy stuff is capable of doing this. But the block I/O layer is not, neither is any other subsystem to my knowledge. Certainly, when this changes, we can make the interfaces adapt to this. Because of this, for example, the sbus IOMMU stuff on sparc32 still uses HIGHMEM exactly because of this pointer limitation. In fact, any machine using >4GB of memory currently cannot be supported without highmem enabled, which is going to enable bounce buffering in the block I/O layer, etc. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On the topic of to the PCI DMA code: one thing I'm missing are pci_map_single()/pci_map_sg() that take struct page * instead of of direct pointers. Currently I don't see how you would implement IO-MMU IO on a 32bit box with more than 4GB of memory, because the address won't fit into the pointer. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 12:05:40AM -0700, David S. Miller wrote: > together. And it was agreed upon that the routines will not allow > failure in 2.4.x and we would work on resolving this in 2.5.x and no > sooner. I'm glad you at least just considered to fix all those bugs for 2.5 but that won't change that if somebody runs out of entries now with sparc64 the only thing I can do in a short term is to use HIGHMEM, so that the serialization to limit the amount of max simultaneous pci32 DMA will happen in the code that allocates the bounce buffers. Tell me a best way to get rid of those bugs all together if you can. Furthmore some arch for the legacy pci32 cards may not provide an huge amount of entries and so you could more easily trigger those bugs without the need of uncommon hardware, those bugs renders the iommu unusable for those archs in 2.4 because it would trigger the device drivers bugs far too easily. Please tell Andrew to worry about that, if somebody ever worried about that we would have all network drivers correct just now and the needed panics in the lowlevel scsi layer. This without considering bttv and friends are not even trying to use the pci_map_* yet, I hope you don't watch TV on your sparc64 if you have enough ram. I hate those kind of broken compromises between something that works almost all the time and that breaks when you are not only using a few harddisk and a few nic, and that is unfixable in the right way in a short term after it triggers (bttv is fixable in a short term of course, I'm only talking about when you run out of pci mappings). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: > Tell me a best way to get rid of those bugs all together if you can. Please give me a test case that triggers the bug on sparc64 and I will promptly work on a fix, ok? I mean a test case you _actually_ trigger, not some fantasy case. In theory it can happen, but nobody is showing me that it actually does ever happen. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> > Look at the history of kernel API's over time. Everything that can > > go wrong eventually does. > > I agree, and it will be dealt with in 2.5.x > > The scsi layer in 2.4.x is simply not able to handle failure in these > code paths, as Gerard Roudier has mentioned. On that I am unconvinced. It is certainly grungy enough that fighting that war in 2.5 makes sense however. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Alan Cox writes: > Pages allocated in main memory and mapped for access by PCI devices. On some > HP systems there is now way for such a page to stay coherent. It is quite > possible to sync the view but there is no sane way to allow any > pci_alloc_consistent to succeed This is not what the HP folk told me, and in fact they said that pci_alloc_consistent could be made to work via disabling the cache attribute in the cpu side mappings or something similar in the PCI controller IOMMU mappings. Please someone on the HPPA team provide details :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Alan Cox writes: > Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are > limited to the low 2Gb range for pci mapping and otherwise need bounce buffers. > Or how about any consistent alloc on certain HP machines which totally lack > coherency - also I suspect the R10K on an O2 might fall into that - Ralf ? If they need bounce buffers because of a device specific DMA range limitation (this is what I gather this is), then the PCI dma interface is of no help to this case. > Look at the history of kernel API's over time. Everything that can > go wrong eventually does. I agree, and it will be dealt with in 2.5.x The scsi layer in 2.4.x is simply not able to handle failure in these code paths, as Gerard Roudier has mentioned. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> Alan Cox writes: > > And how do you propose to implemnt cache coherent pci allocations > > on machines which lack the ability to have pages coherent between > > I/O and memory space ? > > Pages, being in memory space, are never in I/O space. Ok my fault. Let me try that again with clearer Linux terminology. Pages allocated in main memory and mapped for access by PCI devices. On some HP systems there is now way for such a page to stay coherent. It is quite possible to sync the view but there is no sane way to allow any pci_alloc_consistent to succeed Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> What are these "devices", and what drivers "just program the cards to > start the dma on those hundred mbyte of ram"? > > Are we designing Linux for hypothetical systems with hypothetical > devices and drivers, or for the real world? Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are limited to the low 2Gb range for pci mapping and otherwise need bounce buffers. Or how about any consistent alloc on certain HP machines which totally lack coherency - also I suspect the R10K on an O2 might fall into that - Ralf ? Look at the history of kernel API's over time. Everything that can go wrong eventually does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
> Andrew Morton writes: > > Well this is news to me. No drivers understand this. > > How long has this been the case? What platforms? > > The DMA interfaces may never fail and I've discussed this over and > over with port maintainers a _long_ time ago. And how do you propose to implemnt cache coherent pci allocations on machines which lack the ability to have pages coherent between I/O and memory space ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: > Assume I have a dozen of PCI cards that does DMA using SG tables that > can map up to some houndred mbytes of ram each, so I can just program > the cards to start the dma on those houndred mbyte of ram, most of the > time the I/O is not simulaneous, but very rarely it happens to be > simultaneous and in turn it tries to pci_map_sg more than 4G of physical > ram. What are these "devices", and what drivers "just program the cards to start the dma on those hundred mbyte of ram"? Are we designing Linux for hypothetical systems with hypothetical devices and drivers, or for the real world? Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
What are these devices, and what drivers just program the cards to start the dma on those hundred mbyte of ram? Are we designing Linux for hypothetical systems with hypothetical devices and drivers, or for the real world? Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are limited to the low 2Gb range for pci mapping and otherwise need bounce buffers. Or how about any consistent alloc on certain HP machines which totally lack coherency - also I suspect the R10K on an O2 might fall into that - Ralf ? Look at the history of kernel API's over time. Everything that can go wrong eventually does. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Alan Cox writes: And how do you propose to implemnt cache coherent pci allocations on machines which lack the ability to have pages coherent between I/O and memory space ? Pages, being in memory space, are never in I/O space. Ok my fault. Let me try that again with clearer Linux terminology. Pages allocated in main memory and mapped for access by PCI devices. On some HP systems there is now way for such a page to stay coherent. It is quite possible to sync the view but there is no sane way to allow any pci_alloc_consistent to succeed Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Alan Cox writes: Pages allocated in main memory and mapped for access by PCI devices. On some HP systems there is now way for such a page to stay coherent. It is quite possible to sync the view but there is no sane way to allow any pci_alloc_consistent to succeed This is not what the HP folk told me, and in fact they said that pci_alloc_consistent could be made to work via disabling the cache attribute in the cpu side mappings or something similar in the PCI controller IOMMU mappings. Please someone on the HPPA team provide details :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Alan Cox writes: Ok how about a PIV Xeon with 64Gb of memory and 5 AMI Megaraids, which are limited to the low 2Gb range for pci mapping and otherwise need bounce buffers. Or how about any consistent alloc on certain HP machines which totally lack coherency - also I suspect the R10K on an O2 might fall into that - Ralf ? If they need bounce buffers because of a device specific DMA range limitation (this is what I gather this is), then the PCI dma interface is of no help to this case. Look at the history of kernel API's over time. Everything that can go wrong eventually does. I agree, and it will be dealt with in 2.5.x The scsi layer in 2.4.x is simply not able to handle failure in these code paths, as Gerard Roudier has mentioned. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Look at the history of kernel API's over time. Everything that can go wrong eventually does. I agree, and it will be dealt with in 2.5.x The scsi layer in 2.4.x is simply not able to handle failure in these code paths, as Gerard Roudier has mentioned. On that I am unconvinced. It is certainly grungy enough that fighting that war in 2.5 makes sense however. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: I just given you a test case that triggers on sparc64 in earlier email. If you are talking about the bttv card: 1) I showed you in a private email that I calculated the maximum possible IOMMU space that one could allocate to bttv cards in a fully loaded Sunfire sparc64 system to be between 300MB and 400MB. This is assuming that every PCI slot contained a bttv card, and it still used only ~%35 of the available IOMMU resources. 2) It currently doesn't even use the portable APIs yet anyways, so effectively it is not supported on sparc64. The only other examples you showed were theoretical, for cards and configurations that simply are not supported or cannot happen on sparc64 with current kernels. Chris just given a real world example of applications where that kind of design is useful and there are certainly other kind of apps where that kind of hardware design can be useful too. A name of an high end pci32 card that AFIK can trigger those bugs is the Quadrics which is a very nice piece of hardware btw. I think such designs which gobble up a gig or so of DMA mappings on pci32 are not useful in the slightest. These cards really ought to be using dual address cycles, ie. 64-bit PCI addressing. It is the same response you would give to someone trying to obtain 3 or more gigabytes of user address space in a process on x86, right? You might respond to that person What you really need is x86-64. for example :-) To me, from this perspective, the Quadrics sounds instead like a very broken piece of hardware. And in any event, is there even a Quadrics driver for sparc64? :-) (I'm a free software old-fart, so please excuse my immediate association between high end and proprietary :-) Finally Andrea, have you even begun to consider the possible starvation cases once we make this a resource allocation which can fail under normal conditions. Maybe the device eatins all the IOMMU entries, immediately obtains a new mapping when he frees any mapping, effectively keeping out all other devices. This may be easily solved, I don't know. But this along with the potential scsi layer issues, are basically the reasons I'm trying hard to keep the API as it is right now for 2.4.x Changing this in 2.4.x is going to open up Pandora's Box, really. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
This without considering bttv and friends are not even trying to use the pci_map_* yet, I hope you don't watch TV on your sparc64 if you have enough ram. The bttv devel versions[1] are fixed already, they should work out-of-the box on sparc too. Just watching TV is harmless (needs lots of I/O bandwidth, but doesn't need much address space). Video capture does a better job on eating iommu resources ... Gerd [1] http://bytesex.org/bttv/, 0.8.x versions. -- Gerd Knorr [EMAIL PROTECTED] -- SuSE Labs, Außenstelle Berlin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:59:58AM -0700, David S. Miller wrote: This still leaves around 800MB IOMMU space free on that sparc64 PCI controller. if it was 400mbyte you were screwed up too, the point here is that the marging is way too to allows ignore the issue completly, furthmore there can be fragmentation effects in the pagetbles, at least in the way alpha manages them which is to find contigous virtual pci bus addresses for each sg. Alpha in mainline is just screwedup if a single pci bus tries to dynamic map more than 128mbyte, changing it to 512mbyte is trivial, growing more has performance implications as it needs to reduce the direct windows which I don't like to as it would also increase the number of machines that will get bitten by drivers that still use the virt_to_bus and also increase the pressure on the iommu ptes too. Now I'm not asking to break the API for 2.4 to take care of that, you seems to be convinced in fixing this for 2.5 and I'm ok with that, I just changed the printk of running out of entries to be KERN_ERR at least, so we know if somebody has real life troubles with 2.4 I will go HIGHMEM which is a matter of 2 hours for me to implement. Only thing I suggest is to change the API before starting fixing the drivers, I mean: don't start checking for bus address 0 before changing the API to return faliure in another way. It's true x86 is reserving the zero page anyways because it's a magic bios thing, but for example on the alpha such a 0 bus address that we cannot use wastes 8 mbyte of DMA virtual bus addresses that we reserve for the ISA cards (of course we almost never need 16mbyte of ram all under isa dma but since it's so low cost to allow that I think we will just in case). Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. and it is the double of the iommu space that I am going to reserve for pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll change to 512mbyte) How many physical PCI slots on a Tsunami system? (I know the answer, this question is rhetorical :-) See? This is why I think all these examples are silly, and we need to be realistic about this whole situation. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: Tell me a best way to get rid of those bugs all together if you can. Please give me a test case that triggers the bug on sparc64 and I will promptly work on a fix, ok? I mean a test case you _actually_ trigger, not some fantasy case. In theory it can happen, but nobody is showing me that it actually does ever happen. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:51:51PM +0400, Ivan Kokshaysky wrote: I'm unable reproduce it with *8Mb* window, so I'm asking. Me either. But Tom Vier, the guy who started this thread was able to use up the 8MB. Which is completely believable. The following should aleviate the situation on these smaller machines where the direct map does cover all physical memory. Really, we were failing gratuitously before. On Tsunami and Titan, espectially with more than 4G ram we should probably just go ahead and allocate the 512M or 1G scatter-gather arena. (BTW, Andrea, it's easy enough to work around the Cypress problem by marking the last 1M of the 1G arena in use.) r~ diff -ruNp linux/arch/alpha/kernel/pci_iommu.c linux-new/arch/alpha/kernel/pci_iommu.c --- linux/arch/alpha/kernel/pci_iommu.c Fri Mar 2 11:12:07 2001 +++ linux-new/arch/alpha/kernel/pci_iommu.c Mon May 21 01:25:25 2001 @@ -402,8 +402,20 @@ sg_fill(struct scatterlist *leader, stru paddr = ~PAGE_MASK; npages = calc_npages(paddr + size); dma_ofs = iommu_arena_alloc(arena, npages); - if (dma_ofs 0) - return -1; + if (dma_ofs 0) { + /* If we attempted a direct map above but failed, die. */ + if (leader-dma_address == 0) + return -1; + + /* Otherwise, break up the remaining virtually contiguous + hunks into individual direct maps. */ + for (sg = leader; sg end; ++sg) + if (sg-dma_address == 2 || sg-dma_address == -2) + sg-dma_address = 0; + + /* Retry. */ + return sg_fill(leader, end, out, arena, max_dma); + } out-dma_address = arena-dma_base + dma_ofs*PAGE_SIZE + paddr; out-dma_length = size; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 10:53:39AM -0700, Richard Henderson wrote: should probably just go ahead and allocate the 512M or 1G scatter-gather arena. I just have a bugreport in my mailbox about pci_map faliures even after I enlarged to window to 1G argghh (at first it looked apparently stable by growing the window), so I'm stuck again, it seems I was right in not being careless about the pci_map_* bugs today even if the 1G window looked to offer a rasonable marging at first. The pci_map_* failed triggers during a benchmark with a certain driver that does massive DMA (similar to the examples I did previously), the developers of the driver simply told me the hardware wants to do massive zerocopy dma to userspace and they apparently excluded it could be a memleak in the driver missing some pci_unmap_* after I told them to check for that. Even enabling HIGHMEM would not be enough because they do dma on userspace but on the network side, so it won't be taken care by create_bounces(), so I at least would need to put another bounce buffer layer in the driver to make highmem to work. Other more efficient ways to go besides highmem plus additional bounce buffer layer are: 2) fixing all buggy drivers now (would be a great pain as it seems to me I should do that alone apparently as it seems everybody else doesn't care about those bugs for 2.4) 3) let the massing DMA hardware to use DAC Theoritically I could also cheat again and take a way 4) that is to try to enlarge the window beyond 1G and see if the bugs gets hided also during the benchmark that way, but I would take this as last resort as this would again not be a definitive solution and I'd risk to get stuck again tomorrow like I'm right now. I think I will prefer to take a dirty way 3) just for those drivers to solve this production problem even if it won't be implemented in a generic manner at first (I got the idea from the quadrics folks that do this just now with their nics if I understood well). If I understand correctly on the tsunami enabling DAC simply means to enable the pchip-pctl |= MWIN (monster window) bit during the boot stage on both pchip. Then the device driver of the massive DMA hardware should simply program the registers of the nic to do use DAC with bus addresses that are the phys address of the destination/source memory of the DMA, only changed to have bit 40th set to 1. Those should be all the needed changes necessary to make pci64 to work on tsunami at the same time of pci32 direct/dynamic windows and it would be very efficient and it sounds the best way to workaround the broken pci_map_* in 2.4 given fixing the pci_map_* the right way is a pain. Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
David S. Miller writes: 1) I showed you in a private email that I calculated the maximum possible IOMMU space that one could allocate to bttv cards in a fully loaded Sunfire sparc64 system to be between 300MB and 400MB. This is assuming that every PCI slot contained a bttv card, and it still used only ~%35 of the available IOMMU resources. This is totally wrong in two ways. Let me fix this, the IOMMU on these machines is per PCI bus, so this figure should be drastically lower. Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Then we have: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: Assume I have a dozen of PCI cards that does DMA using SG tables that can map up to some houndred mbytes of ram each, so I can just program the cards to start the dma on those houndred mbyte of ram, most of the time the I/O is not simulaneous, but very rarely it happens to be simultaneous and in turn it tries to pci_map_sg more than 4G of physical ram. What are these devices, and what drivers just program the cards to start the dma on those hundred mbyte of ram? Are we designing Linux for hypothetical systems with hypothetical devices and drivers, or for the real world? Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 06:55:29AM -0700, Jonathan Lundell wrote: 8 slots (and you're right, 6 is a practical upper limit, fewer for 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot limit becomes largely irrelevant. True, but the bandwidth limit is highly relevant. That's why modern systems have multiple root buses, not a bridged ones. Ivan. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote: How many physical PCI slots on a Tsunami system? (I know the on tsunamis probably not many, but on a Typhoon (the one in the es40 that is the 4-way extension) I don't know, but certainly the box is large. Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote: Andi Kleen writes: [BTW, the 2.4.4 netstack does not seem to make any attempt to handle the pagecache 4GB case on IA32 for sendfile, as the pci_* functions are dummies here. It probably needs bounce buffers there for this case] egrep illegal_highdma net/core/dev.c There is just no portable way for the driver to figure out if it should set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally set even on IA32. Currently it requires an architecture ifdef to set properly. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 02:30:09AM -0700, David S. Miller wrote: Andi Kleen writes: On the topic of to the PCI DMA code: one thing I'm missing are pci_map_single()/pci_map_sg() that take struct page * instead of of direct pointers. Currently I don't see how you would implement IO-MMU IO on a 32bit box with more than 4GB of memory, because the address won't fit into the pointer. How does the buffer get there in the first place? :-) I guess you know ;) e.g. via page table tricks from user space, like the PAE mode on IA32 or via kmap. Certainly, when this changes, we can make the interfaces adapt to this. I am just curious why you didn't consider that case when designing the interfaces. Was that a deliberate decision or just an oversight? [I guess the first, but why?] Because of this, for example, the sbus IOMMU stuff on sparc32 still uses HIGHMEM exactly because of this pointer limitation. In fact, any machine using 4GB of memory currently cannot be supported without highmem enabled, which is going to enable bounce buffering in the block I/O layer, etc. That's currently the case, but at least on IA32 the block layer must be fixed soon because it's a serious performance problem in some cases (and fixing it is not very hard). Now that will probably first use DAC and not a IO-MMU, and thus not use the pci mapping API, but I would not be surprised if people came up with IO-MMU schemes for it too. [actually most IA32 boxes already have one in form of the AGP GART, it's just not commonly used for serious things yet] -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: On the topic of to the PCI DMA code: one thing I'm missing are pci_map_single()/pci_map_sg() that take struct page * instead of of direct pointers. Currently I don't see how you would implement IO-MMU IO on a 32bit box with more than 4GB of memory, because the address won't fit into the pointer. How does the buffer get there in the first place? :-) Yes, the zerocopy stuff is capable of doing this. But the block I/O layer is not, neither is any other subsystem to my knowledge. Certainly, when this changes, we can make the interfaces adapt to this. Because of this, for example, the sbus IOMMU stuff on sparc32 still uses HIGHMEM exactly because of this pointer limitation. In fact, any machine using 4GB of memory currently cannot be supported without highmem enabled, which is going to enable bounce buffering in the block I/O layer, etc. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21 2001, Andi Kleen wrote: On Mon, May 21, 2001 at 03:00:24AM -0700, David S. Miller wrote: That's currently the case, but at least on IA32 the block layer must be fixed soon because it's a serious performance problem in some cases (and fixing it is not very hard). If such a far reaching change goes into 2.4.x, I would probably begin looking at enhancing the PCI dma interfaces as needed ;-) Hmm, I don't think it'll be a far reaching change. As far as I can see all it needs is a new entry point for block device drivers that uses bh-b_page. When that entry point exists skip the create_bounce call in __make_request. After that it is purely problem for selected drivers. I've already done it, however not as a 2.4 solution. The partial and WIP patches is here: *.kernel.org/pub/linux/kernel/people/axboe/v2.5/bio-7 Block driver can indicate the need for bounce buffers above a certain page. Of course I can hack up something for 2.4 as well, but is this really a pressing need? -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli wrote: On Mon, May 21, 2001 at 04:04:28AM -0700, David S. Miller wrote: How many physical PCI slots on a Tsunami system? (I know the on tsunamis probably not many, but on a Typhoon (the one in the es40 that is the 4-way extension) I don't know, but certainly the box is large. ES40 has either 8 or 10 PCI slots across 2 PCI buses. And then there's Wildfire - 14 slots per PCI drawer (4 PCI buses) * 2 drawers/QBB * 8 QBBs = 224 PCI slots 64 PCI buses. BTW, Titan (aka ES45) has 10 slots as well, but with 3 buses instead. - Pete - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
I can be really wrong on this because I didn't checked anything about the GART yet but I suspect you cannot use the GART for this stuff on ia32 in 2.4 because I think I recall it provides not an huge marging of mapping entries that so would far too easily trigger the bugs in the device drivers not checking for pci_map_* faliures also in a common desktop/webserver/fileserver kind of usage of an high end machine. Not all chipsets support reading through GART address space from PCI either, it is meant for AGP to use. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 01:19:59PM +0200, Andrea Arcangeli wrote: Alpha in mainline is just screwedup if a single pci bus tries to dynamic map more than 128mbyte, changing it to 512mbyte is trivial, growing more Could you just describe the configuration where increasing sg window from 128 to 512Mb actually fixes out of ptes problem? I mean which drivers involved, what kind of load etc. I'm unable reproduce it with *8Mb* window, so I'm asking. Ivan. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote: I think such designs which gobble up a gig or so of DMA mappings on they maps something like 200mbyte I think. I also seen other cards doing the same kind of stuff again for the distributed computing. to be using dual address cycles, ie. 64-bit PCI addressing. It is the same response you would give to someone trying to obtain 3 or more gigabytes of user address space in a process on x86, right? You might I never seen those running on 64bit boxes even if they are supposed to run there too. Here it's a little different, 32bit virtual address space limitation isn't always a showstopper for those kind of CPU intensive apps (they don't need huge caches). respond to that person What you really need is x86-64. for example :-) for the 32bit virtual address space issues of course yes ;) To me, from this perspective, the Quadrics sounds instead like a very broken piece of hardware. And in any event, is there even a Quadrics they're not the only ones doing that, I seen others doing that kind of stuff, it's just a matter of information memory fast across a cluster, if you delegate that work to a separate engine (btw they runs a sparc32bit cpu, also guess why they aren't pci64) you can spend much more cpu cycles of the main CPU on the userspace computations. driver for sparc64? :-) (I'm a free software old-fart, so please excuse my immediate association between high end and proprietary :-) :) Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: Certainly, when this changes, we can make the interfaces adapt to this. I am just curious why you didn't consider that case when designing the interfaces. Was that a deliberate decision or just an oversight? [I guess the first, but why?] I didn't want the API to do exactly what we needed it to do but not one bit more. I tried very hard to keep it as minimal as possible, and I even fought many additions to the API (a few of which turned out to be reasonable, see pci_pool threads). To this end, since HIGHMEM is needed anyways on such machines (ie. the sizeof(void *) issue), I decided to not consider that case. Working on pages is useful even _ignoring_ the specific issues we are talking about. It really is the one generic way to represent all pieces of memory inside the kernel (regardless of HIGHMEM and similar issues). But I simply did not see anyone who would really make use of it in the 2.4.x timeframe. (and I made this estimate in the middle of 2.3.x, so I didn't even see zerocopy coming along so clearly, shrug) That's currently the case, but at least on IA32 the block layer must be fixed soon because it's a serious performance problem in some cases (and fixing it is not very hard). If such a far reaching change goes into 2.4.x, I would probably begin looking at enhancing the PCI dma interfaces as needed ;-) Now that will probably first use DAC and not a IO-MMU, and thus not use the pci mapping API, but I would not be surprised if people came up with IO-MMU schemes for it too. [actually most IA32 boxes already have one in form of the AGP GART, it's just not commonly used for serious things yet] DAC usage should go through a portable PCI dma API as well, for the reasons you mention as well as others. If we do this from the beginning, there will be no chance for things like virt_to_bus64() et al. to start sneaking into the PCI drivers :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: On Mon, May 21, 2001 at 03:34:50AM -0700, David S. Miller wrote: egrep illegal_highdma net/core/dev.c There is just no portable way for the driver to figure out if it should set this flag or not. e.g. acenic.c gets it wrong: it is unconditionally set even on IA32. Currently it requires an architecture ifdef to set properly. Well, certainly, this could perhaps be a bug in the Acenic driver. It should check if DAC cycles can be used on the platform, for example. But please, let's get back to the original problem though. The original claim is that the situation was not handled at all. All I'm trying to say is simply that the net stack does check via illegal_highdma() the condition you stated was not being checked at all. To me it sounded like you were claiming that HIGHMEM pages went totally unchecked through device transmit, and that is totally untrue. If you were trying to point out the problem with what the Acenic driver is doind, just state that next time ok? :-) There is no question that what Acenic is doing with ifdefs needs a clean portable solution. This will be part of the 64-bit DAC API interfaces (whenever those become really necessary, I simply don't see the need right now). Plainly, I'm going to be highly reluctant to make changes to the PCI dma API in 2.4.x It is already hard enough to get all the PCI drivers in line and using it. Suggesting this kind of change is similar to saying let's change the arguments to request_irq(). We would do it to fix a true people actually hit this kind of bug, of course. Yet we would avoid it at all possible costs due to the disruption this would cause. I'm not trying to be a big bad guy about this. What I'm trying to do is make sure at least one person (me :-) is thinking about the ramifications any such change has on all current drivers which use these interfaces already. And also, to port maintainers... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 12:05:40AM -0700, David S. Miller wrote: together. And it was agreed upon that the routines will not allow failure in 2.4.x and we would work on resolving this in 2.5.x and no sooner. I'm glad you at least just considered to fix all those bugs for 2.5 but that won't change that if somebody runs out of entries now with sparc64 the only thing I can do in a short term is to use HIGHMEM, so that the serialization to limit the amount of max simultaneous pci32 DMA will happen in the code that allocates the bounce buffers. Tell me a best way to get rid of those bugs all together if you can. Furthmore some arch for the legacy pci32 cards may not provide an huge amount of entries and so you could more easily trigger those bugs without the need of uncommon hardware, those bugs renders the iommu unusable for those archs in 2.4 because it would trigger the device drivers bugs far too easily. Please tell Andrew to worry about that, if somebody ever worried about that we would have all network drivers correct just now and the needed panics in the lowlevel scsi layer. This without considering bttv and friends are not even trying to use the pci_map_* yet, I hope you don't watch TV on your sparc64 if you have enough ram. I hate those kind of broken compromises between something that works almost all the time and that breaks when you are not only using a few harddisk and a few nic, and that is unfixable in the right way in a short term after it triggers (bttv is fixable in a short term of course, I'm only talking about when you run out of pci mappings). Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrea Arcangeli writes: On Mon, May 21, 2001 at 03:11:52AM -0700, David S. Miller wrote: I think such designs which gobble up a gig or so of DMA mappings on they maps something like 200mbyte I think. I also seen other cards doing the same kind of stuff again for the distributed computing. Ok, 200MB, let's see what this gives us as an example. 200MB multiplied by 6 PCI slots, which uses up about 1.2GB IOMMU space. This still leaves around 800MB IOMMU space free on that sparc64 PCI controller. It wouldn't run out of space, and this is assuming that Sun ever made a sparc64 system with 6 physical PCI card slots (I don't think they ever did honestly, I think 4 physical card slots was the maximum). Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On the topic of to the PCI DMA code: one thing I'm missing are pci_map_single()/pci_map_sg() that take struct page * instead of of direct pointers. Currently I don't see how you would implement IO-MMU IO on a 32bit box with more than 4GB of memory, because the address won't fit into the pointer. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andrew Morton writes: Well this is news to me. No drivers understand this. How long has this been the case? What platforms? The DMA interfaces may never fail and I've discussed this over and over with port maintainers a _long_ time ago. And how do you propose to implemnt cache coherent pci allocations on machines which lack the ability to have pages coherent between I/O and memory space ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 3:19 AM -0700 2001-05-21, David S. Miller wrote: This is totally wrong in two ways. Let me fix this, the IOMMU on these machines is per PCI bus, so this figure should be drastically lower. Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Then we have: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. 8 slots (and you're right, 6 is a practical upper limit, fewer for 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot limit becomes largely irrelevant. A typical quad Ethernet card, for example (and this is true for many/most multiple-device cards), has a bridge, its own internal PCI bus, and four slots (devices in PCI terminology). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
Andi Kleen writes: How about a new function (pci_nonrepresentable_address() or whatever) that returns true when page cache contains pages that are not representable physically as void *. On IA32 it would return true only if CONFIG_PAE is true and there is memory 4GB. No, if we're going to change anything, let's do it right. Sure, you'll make this one check portable, but the guts of the main ifdef stuff for DAC support is still there. I'd rather live with the hackish stuff temporarily, and get this all cleaned up in one shot when we have a real DAC support API. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. and it is the double of the iommu space that I am going to reserve for pci dynamic mappings on the tsunami (right now it is 128Mbyte... and I'll change to 512mbyte) also bttv is not doing that a large dma and by default it only uses 2 buffers in the ring. bttv is not a good example of what can really overflow the pci virtual address space in real life (when I mentioned it it was only to point out it still uses virt_to_bus), filling a pci bus with bttv cards sounds quite silly anyways ;) Andrea - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/