> > A real AMD64 machine can also run with more than 4GB of ram and do DMA
> > without having to bounce buffering to PCI devices.  We don't do
> > software bounce buffering yet to cope with this deficiency in
> > large-memory Intel AMD64-clones.
> 
> You're talking about DMA to really high memory, i.e. above physical
> 4GB, is that right?

Yes.  A real AMD64 machine can reach there.  And Intel one cannot.

> I'm not that clued in with hardware, and I don't
> know where to search to find out the answer to this, but: all I/O
> devices these days can do DMA to above 4GB in big 64-bit systems,

No, they cannot.  Especially when the physical connector of the card
is using 32 bit addressing.

> but
> a limitation in the Intel hardware means that the kernel has to
> intercept this to help,

It is not a limit of Intel hardware, per se.  It is a lack of anything
in hardware to do the translation.

> by catching it in a low memory buffer and then
> transferring the data to higher memory manually, or by doing
> memory-to-memory DMA into high memory, or something.

This is called bounce buffering.  It is a software technique, and
it is what we use on i386 machines to let the floppy controller work,
since the ISA DMA controller is limited; it can only DMA up to an address
of 16MB.

Same thing on PCI bus, except AMD fixed it in the AMD64.

. But the AMD64
> hardware can do it directly without help to bounce the data from low
> to high, supporting DMA directly to physical RAM 4GB+?

Right.  Using hacks to the GART, which let Jason Wright impliment a
standard iommu model much like sparc64 has.  This lets you do 32-bit
limited physical DMA to a 64 bit physical address.  You can completely
stuff an AMD64 machine, and do DMA directly to/from any physical ram
in the machine.

> But none of this applies to devices that want to DMA below 4GB,
> because that has always been supported.  It only applies on Intel
> machines with 4GB+ RAM when devices want to DMA to 4GB+ but their DMA
> chips or the processor itself can't do it without software help from
> the kernel.

We only use the iommu if the machine has > 4GB of ram.  If it has less
than 4GB of ram, there is no point in enabling the iommu.

The iommu code is slightly slower, since it has to do mapping
operations at various times.  But it is a hell of a lot better than
bounce buffering.

Reply via email to