Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-11 Thread Paul Walmsley
On Fri, 8 Aug 2008, Catalin Marinas wrote:

 There are already CPUs with weaker memory ordering model than ARM (e.g.
 Alpha) and they are supported by Linux. Of course, there may be problems
 with drivers since most of them are developed in x86.

For the OMAP SoC, most of the drivers are specific to ARM.  The devices 
just aren't available for x86 or other architectures.  (With a few 
exceptions - smc91x and MUSB are the two that come to mind.)


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-08 Thread Russell King - ARM Linux
On Thu, Aug 07, 2008 at 06:07:50PM -0500, Woodruff, Richard wrote:
 If I write a series of control register commands to device A, then
 write a go operation to device  B, I would hope all of A's writes
 had completed before B gets the go.  SO gives you this.  DEVICE may
 not with out barriers.

This, I think, is where the problem lies.

Device regions of the same type *are* ordered with respect to each other.

So, shared device accesses occur in program order.  Unshared device
accesses occur in program order.

However, shared device accesses may occur out of order with unshared
device accesses or memory accesses.  Unshared device accesses may
occur out of order with shared device accesses or memory accesses.

So, if both device A and device B are mapped as shared devices, then
accesses to both occur in program order.

If device A is mapped as a shared device and device B as an unshared
device, then you have to use read backs and possibly barriers to
ensure ordering.  Remember, a barrier only affects up to the CPU.
It doesn't affect write posting downstream of the CPU.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-08 Thread Paul Walmsley
On Thu, 7 Aug 2008, Catalin Marinas wrote:

 Many of the architecture people in ARM seem to be on holiday, I'll try
 to get clarification in about a week time.

That would be really great.  For reference, here is the assembly code in 
question:

http://www.mail-archive.com/linux-omap@vger.kernel.org/msg01349.html

It's the code following the omap34xx_sram_configure_core_dpll entry point.


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-08 Thread Catalin Marinas
On Thu, 2008-08-07 at 22:20 +0100, Russell King - ARM Linux wrote:
 On Thu, Aug 07, 2008 at 03:38:55PM -0500, Woodruff, Richard wrote:
  
   From: Russell King - ARM Linux [mailto:[EMAIL PROTECTED]
Is DEVICE really safe for things other than FIFOs with out the use of
barriers?
  
   As far as I'm aware, yes - and that comment is based solely upon the
   fact that no one has reported any problems with the kernel which have
   been tracked down to using the device memory type on ARMv6 and above...
  
We do in some drivers today get spurious interrupts when DEVICE is
used but don't see them when using SO.
  
   ... until now, or even that very sentence.
  
  That is our fault then I suppose for not discussing this on arm-linux.
  In OMAP2 and OMAP3 this has been observed.  In vendor kernels where
  time stands still and lots of validation has happened we did stick
  with SO for OMAP2.  On some internal kernels already we have gone to
  SO for OMAP3 as customers ramp and need the errors gone.  The faster
  the system clocks the more it seems to show.
 
 To do that, and then ask about when Linux is going to start exploiting
 the weak memory types is a little unfair don't you think?

There are already CPUs with weaker memory ordering model than ARM (e.g.
Alpha) and they are supported by Linux. Of course, there may be problems
with drivers since most of them are developed in x86.

On the strongly-ordered instead of normal uncached memory, the unaligned
accesses to SO memory which are not faulted have unpredictable behaviour
(according to the ARM ARM, though some v7 implementations may not be
bothered). If you use such memory for skbuff for example, is there any
risk of unaligned accesses when the network packets are processed? Is
there any other example that would make this fail?

  The thing with these effects, especially spurious IRQs is there usually
  are several reasons they show up and several ways to make them go away.
  In the beginning there have been lots then they drop off as the system
  software matures.  Then if the program survives long enough to be
  optimized they start to show up again but in lesser numbers.  This has
  been the OMAP2/3 experience so far.  Going SO to control regions has
  stamped them out at this point.
 
 What you're therefore asking for is a weak memory ordering model which
 doesn't require any effort on the software programmers part - that's
 a CPU architecture thing which you'll need to talk to ARM about.
 
 x86 can do this for the most part because x86's development has been
 such that the hardware has had to work around the software to make
 improvements.  On ARM, normally when there's updates, software has
 to work around the hardware.

For ARM CPU (RISC architecture) to get faster while keeping the power
consumption low, it must become weakly ordered (maybe CISC architectures
can cope with this). Anyway, it seems that ia64 requires some barriers
as well. That's more like evolution in the CPU field and while the
software becomes more complex, the overall performance is better.

   That's not unexpected if you don't have the right barriers in place
   at the end of things such as IRQ controllers ack/mask functions.
  
  Yes. I've submitted patches (to linux-omap) and Catalin did submit
  patches (to arm-linux) for PIC barriers.  In the past they have been
  rejected by Tony or you for different reasons.  Tony last rejected
  it because he thought it should be generic at the ARM level.  I
  don't recall what your last stance was.
 
 Looking back, I never commented on that patch.  I did on the previous
 patch which was adding DSBs in a way which would break stuff.  The
 patch to add them to the interrupt controllers has never been reposted.
 
 However, adding barriers may not be the correct answer for this.
 See Documentation/io_ordering.txt - reading back from a safe register
 on the target device ensures that the previous writes should hit the
 device before the read completes, without the overhead of a full
 barrier.
 
 This point is even more important if you have some form of write
 posting between the CPU and the device (eg, a PCI bus) - a DSB
 won't reach down to the target PCI device which may be behind some
 write-posting bridges.
 
 So, in the case of arch/arm/common/gic.c, we should be reading one of
 the gic control registers after the writes.  In the case of
 arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
 should be a sufficient solution.

I need to check in ARM when people come from holidays but a simple LDR
might not be enough to guarantee that a CPSIE etc. happens after it. You
may need to add either an LDR + CMP (or some other usage of the loaded
register) or LDR + DSB. I agree that DSB alone is not enough.

  Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping
  to write to registers of that device.  Then when you go to write to its
  FIFO use a DEVICE mapping.
 
 I believe ARMv7 has some 

Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-08 Thread Russell King - ARM Linux
On Fri, Aug 08, 2008 at 12:44:49PM +0100, Catalin Marinas wrote:
 There are already CPUs with weaker memory ordering model than ARM (e.g.
 Alpha) and they are supported by Linux. Of course, there may be problems
 with drivers since most of them are developed in x86.

There are, and they are _constantly_ complaining about drivers not
having the necessary barriers.

Consider that for a moment - how long has Linux supported had weakly
ordered architectures, and how long does it take to fix ordering
problems... 10 or so years?

  So, in the case of arch/arm/common/gic.c, we should be reading one of
  the gic control registers after the writes.  In the case of
  arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
  should be a sufficient solution.
 
 I need to check in ARM when people come from holidays but a simple LDR
 might not be enough to guarantee that a CPSIE etc. happens after it. You
 may need to add either an LDR + CMP (or some other usage of the loaded
 register) or LDR + DSB. I agree that DSB alone is not enough.

Okay, I give up on this issue.  Weak memory ordering seems to be a
very very big can of worms.  And then there's this:

14:07  rmk so we're back to making readl() itself do something with the
 data... which brings us back to that question about why bother
 with weak ordering
14:10  willy you can't have weak ordering for device control registers
14:11  rmk yes you can, provided they're ordered wrt each other.
14:11  willy weak ordering works great for SMP or for just covering up latency
14:12  willy no, you can't.  see writel(); readl(); udelay(1); writel();.
   You didn't wait for 1 microsecond before accessing the device
   again.

Or, to put it another way, it seems that on Linux _all_ devices must
be strongly ordered or be seen to Linux as being strongly ordered
(iow, readl and writel and friends _must_ have a barrier.)

And of course, putting barriers into readl and writel, we might as well
use strongly ordered mappings anyway, because that'll save us a few
bytes of program memory.

TBH, this is becoming soo much of a joke, it's untrue.

Let's go back to having a strongly ordered memory model.  Please.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Paul Walmsley
Hello Catalin,

On Wed, 6 Aug 2008, Catalin Marinas wrote:

 On Tue, 2008-08-05 at 07:15 -0500, Woodruff, Richard wrote:
   Is the controller allowed to write dirty cache lines out at any time
   it
   likes? Surely a better fix is to drain the cache of the changes before
   changing the clock for the SDRAM?
  
  - Previously the SRAM was marked as cached.  _Execution_ using that
  attribute will generate line fetches to SRAM.  This will cause
  displacement write-outs of resident DDR lines.  Similarly,
  _load/store_ sequences in that code have the same effect.  This cast
  outs dead lock the CPU and it can't fetch to progress.
 [...]
  * Flushing the entire L1  L2 cache frequently is very expensive and
  better not done if you don't have.  Also, it is not sufficient for the
  context-restore path which needs to NOT live in DDR.
  
  The code need to execute in a non-cached region.
 
 I don't think there is any guarantee that dirty cache line won't be
 evicted to SDRAM even if your code uses uncached memory only. The CPU is
 allowed to do speculative reads from the normal cached memory and these
 reads may force a dirty cache line to be written back to memory. You may
 need to do at least a cache clean operation (invalidate not necessary).

If we turn off speculative reads via a CP15 control register Z-bit write 
for the duration of the SRAM code execution, and use either normal 
non-cached memory or strongly-ordered memory for the SRAM code, will that 
effectively prevent any cache line writeback during that time?  (assuming 
interrupts are disabled, that is).

Also, a somewhat-related question about strongly-ordered memory regions: 
these are described as bracketing accesses to those regions with data 
memory barriers.  In this case, are those barriers specific to the 
strongly-ordered pages, or will they affect all memory transactions, even 
to normal cached memory?


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Russell King - ARM Linux
On Wed, Aug 06, 2008 at 07:21:14AM -0500, Woodruff, Richard wrote:
 Most of the weak memory attributes in newer ARMs are not exploited
 today in tree.  I'll guess this was more a correctness and capability
 judgment from Russell.

Not entirely true.  We do as much as is safe to do - which is basically
using 'device' mappings for devices and ioremap, and 'memory' mappings
for the main memory, module and vmalloc mappings.

What we don't do is mark DMA memory ask being normal uncached memory,
thereby allowing that to be reordered with device accesses - we make
it strongly ordered.  The reason being that the kernel doesn't have
barriers necessary to ensure that writes to DMA memory hit physical
memory before the device access to enable DMA hits the DMA controller.

Those kinds of bugs can be absolute hell to track down - think about
a DMA controller accessing an uninitialised DMA descriptor, resulting
in it scribbing over random bits of memory.

The only real way to do this is to audit lots of drivers to ensure
that:

1. DMA is not started until accesses to memory allocated by
   dma_alloc_coherent() have hit memory
2. accesses to dma_alloc_coherent() memory always read current data, even
   if the DMA controller has just updated the descriptor you're reading.

Linux presently - and quite rightly - assumes that accesses to DMA
coherent memory _are_ coherent with DMA.  If not, the API would be
a joke.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Catalin Marinas
On Thu, 2008-08-07 at 08:30 +0100, Russell King - ARM Linux wrote:
 What we don't do is mark DMA memory ask being normal uncached memory,
 thereby allowing that to be reordered with device accesses - we make
 it strongly ordered.  The reason being that the kernel doesn't have
 barriers necessary to ensure that writes to DMA memory hit physical
 memory before the device access to enable DMA hits the DMA controller.

We have mb() and related which provides the ordering (there is also
mmiowb() but my understanding is that we don't need this on ARM).

http://lwn.net/Articles/283776/

 Those kinds of bugs can be absolute hell to track down - think about
 a DMA controller accessing an uninitialised DMA descriptor, resulting
 in it scribbing over random bits of memory.

Yes, indeed, but ARM is not the only architecture with a weak memory
ordering model so drivers should be fixed, in theory.

 Linux presently - and quite rightly - assumes that accesses to DMA
 coherent memory _are_ coherent with DMA.  If not, the API would be
 a joke.

As I understand it, the DMA mapping doesn't guarantee any ordering,
drivers must use barriers. According to Documentation/DMA-mapping.txt:

- Consistent DMA mappings which are usually mapped at driver
  initialization, unmapped at the end and for which the hardware should
  guarantee that the device and the CPU can access the data
  in parallel and will see updates made by each other without any
  explicit software flushing.

[...]

  IMPORTANT: Consistent DMA memory does not preclude the usage of
 proper memory barriers.  The CPU may reorder stores to
 consistent memory just as it may normal memory.

-- 
Catalin

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Woodruff, Richard

 From: Russell King - ARM Linux [mailto:[EMAIL PROTECTED]
  Most of the weak memory attributes in newer ARMs are not exploited
  today in tree.  I'll guess this was more a correctness and
 capability
  judgment from Russell.

 Not entirely true.  We do as much as is safe to do - which is
 basically
 using 'device' mappings for devices and ioremap, and 'memory' mappings
 for the main memory, module and vmalloc mappings.

Is DEVICE really safe for things other than FIFOs with out the use of barriers? 
 Accesses are in order but they can be buffered.  The bus is free to 
post/buffer these writes as long as it preserves the order (and it does). I 
want to know the write happened before moving on in some code.  Yes some 
barrier can be added to the code to fix it but its not there in most drivers 
today.

I recall you gave an explanation on list a while back where MPCORE had to use 
DEVICE or it wouldn't work.  That restriction is not there for others.

We do in some drivers today get spurious interrupts when DEVICE is used but 
don't see them when using SO.

Originally the IC-Architect wanted two memory windows per device, one SO for 
register control and one DEVICE for FIFO access.  Given that we do DMA (which 
doesn't care about how ARM sees the world) on the performance hungry devices 
not doing this was ok.

 What we don't do is mark DMA memory ask being normal uncached memory,
 thereby allowing that to be reordered with device accesses - we make
 it strongly ordered.  The reason being that the kernel doesn't have
 barriers necessary to ensure that writes to DMA memory hit physical
 memory before the device access to enable DMA hits the DMA controller.

For an experiment a couple years back we did convert the dma alloc pool 
addresses as NC.  All worked -except- for OHCI-USB which started failing some 
tests.

Regards,
Richard W.

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Russell King - ARM Linux
On Thu, Aug 07, 2008 at 01:56:40PM -0500, Woodruff, Richard wrote:
  From: Russell King - ARM Linux [mailto:[EMAIL PROTECTED]
   Most of the weak memory attributes in newer ARMs are not exploited
   today in tree.  I'll guess this was more a correctness and capability
   judgment from Russell.
 
  Not entirely true.  We do as much as is safe to do - which is
  basically
  using 'device' mappings for devices and ioremap, and 'memory' mappings
  for the main memory, module and vmalloc mappings.
 
 Is DEVICE really safe for things other than FIFOs with out the use of
 barriers?

As far as I'm aware, yes - and that comment is based solely upon the
fact that no one has reported any problems with the kernel which have
been tracked down to using the device memory type on ARMv6 and above...

 We do in some drivers today get spurious interrupts when DEVICE is
 used but don't see them when using SO.

... until now, or even that very sentence.

That's not unexpected if you don't have the right barriers in place
at the end of things such as IRQ controllers ack/mask functions.

Can you give me more information - which OMAP platform, which IRQ
controller, which device is easiest to provoke this behaviour, and
I'll look at it.

 Originally the IC-Architect wanted two memory windows per device, one
 SO for register control and one DEVICE for FIFO access.  Given that we
 do DMA (which doesn't care about how ARM sees the world) on the
 performance hungry devices not doing this was ok.

I'm not sure what point you're making there.

 For an experiment a couple years back we did convert the dma alloc
 pool addresses as NC.  All worked -except- for OHCI-USB which started
 failing some tests.

If we go down the route of marking DMA as 'normal memory non-cacheable'
we're going to have a never ending stream of drivers which don't work
correctly.  We're forever going to be bug hunting drivers, having to
add barriers as required.  Arguably those barriers should be there
already, but if drivers are developed on platforms without weak ordering,
authors just don't think about it, and _certainly_ can't test them.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Woodruff, Richard

 From: Russell King - ARM Linux [mailto:[EMAIL PROTECTED]
  Is DEVICE really safe for things other than FIFOs with out the use of
  barriers?

 As far as I'm aware, yes - and that comment is based solely upon the
 fact that no one has reported any problems with the kernel which have
 been tracked down to using the device memory type on ARMv6 and above...

  We do in some drivers today get spurious interrupts when DEVICE is
  used but don't see them when using SO.

 ... until now, or even that very sentence.

That is our fault then I suppose for not discussing this on arm-linux. In OMAP2 
and OMAP3 this has been observed.  In vendor kernels where time stands still 
and lots of validation has happened we did stick with SO for OMAP2.  On some 
internal kernels already we have gone to SO for OMAP3 as customers ramp and 
need the errors gone.  The faster the system clocks the more it seems to show.

The thing with these effects, especially spurious IRQs is there usually are 
several reasons they show up and several ways to make them go away.  In the 
beginning there have been lots then they drop off as the system software 
matures.  Then if the program survives long enough to be optimized they start 
to show up again but in lesser numbers.  This has been the OMAP2/3 experience 
so far.  Going SO to control regions has stamped them out at this point.

 That's not unexpected if you don't have the right barriers in place
 at the end of things such as IRQ controllers ack/mask functions.

Yes. I've submitted patches (to linux-omap) and Catalin did submit patches (to 
arm-linux) for PIC barriers.  In the past they have been rejected by Tony or 
you for different reasons.  Tony last rejected it because he thought it should 
be generic at the ARM level.  I don't recall what your last stance was.

It is consistently observed, that with these irq-controller barriers in place, 
spurious irqs go down (but not necessarily away).

Our internal kernels still have this in them for OMAP2 and OMAP3.

 Can you give me more information - which OMAP platform, which IRQ
 controller, which device is easiest to provoke this behaviour, and
 I'll look at it.

Lately, a full OMAP3 system running with the 3d-GFX driver is causing these. 
Camera driver operation has been one which raised them on and off.  If it 
persists exporting an environment to you should be possible.  I expect it will 
take time and coordination to do this.

The pure linux-omap kernel has periodically seen spurious irqs with UART.  
However, if you use the irq-controller barriers they tend to go away.

  Originally the IC-Architect wanted two memory windows per device, one
  SO for register control and one DEVICE for FIFO access.  Given that we
  do DMA (which doesn't care about how ARM sees the world) on the
  performance hungry devices not doing this was ok.

 I'm not sure what point you're making there.

Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping to 
write to registers of that device.  Then when you go to write to its FIFO use a 
DEVICE mapping.

Say TX IRQ happens at UART, I might check status bits through a SO mapping, but 
when it comes time to fill the FIFO I write to the DEVICE mapping.  Maybe I can 
even arrange it such that I burst in order using the natural FIFO width.  Even 
if you don't burst you can take advantage of the bus posting effects.  Fill the 
FIFO and get out of there with out a big stall time.

Like I said previously, a system likely will use DMA to the FIFO if performance 
matters, so not optimizing here has been the choice.

  For an experiment a couple years back we did convert the dma alloc
  pool addresses as NC.  All worked -except- for OHCI-USB which started
  failing some tests.

 If we go down the route of marking DMA as 'normal memory non-cacheable'
 we're going to have a never ending stream of drivers which don't work
 correctly.  We're forever going to be bug hunting drivers, having to
 add barriers as required.  Arguably those barriers should be there
 already, but if drivers are developed on platforms without weak ordering,
 authors just don't think about it, and _certainly_ can't test them.

Is this just the case for an attribute to be made available from an API 
change/addition to allow a driver to make use of it?  The default can always be 
conservative.

The trend is ARMs are depending more on pipeline and prefetch tricks to 
perform. For these tricks to work weak memory features need to be used at times.

Regards,
Richard W.

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Russell King - ARM Linux
On Thu, Aug 07, 2008 at 03:38:55PM -0500, Woodruff, Richard wrote:
 
  From: Russell King - ARM Linux [mailto:[EMAIL PROTECTED]
   Is DEVICE really safe for things other than FIFOs with out the use of
   barriers?
 
  As far as I'm aware, yes - and that comment is based solely upon the
  fact that no one has reported any problems with the kernel which have
  been tracked down to using the device memory type on ARMv6 and above...
 
   We do in some drivers today get spurious interrupts when DEVICE is
   used but don't see them when using SO.
 
  ... until now, or even that very sentence.
 
 That is our fault then I suppose for not discussing this on arm-linux.
 In OMAP2 and OMAP3 this has been observed.  In vendor kernels where
 time stands still and lots of validation has happened we did stick
 with SO for OMAP2.  On some internal kernels already we have gone to
 SO for OMAP3 as customers ramp and need the errors gone.  The faster
 the system clocks the more it seems to show.

To do that, and then ask about when Linux is going to start exploiting
the weak memory types is a little unfair don't you think?

 The thing with these effects, especially spurious IRQs is there usually
 are several reasons they show up and several ways to make them go away.
 In the beginning there have been lots then they drop off as the system
 software matures.  Then if the program survives long enough to be
 optimized they start to show up again but in lesser numbers.  This has
 been the OMAP2/3 experience so far.  Going SO to control regions has
 stamped them out at this point.

What you're therefore asking for is a weak memory ordering model which
doesn't require any effort on the software programmers part - that's
a CPU architecture thing which you'll need to talk to ARM about.

x86 can do this for the most part because x86's development has been
such that the hardware has had to work around the software to make
improvements.  On ARM, normally when there's updates, software has
to work around the hardware.

  That's not unexpected if you don't have the right barriers in place
  at the end of things such as IRQ controllers ack/mask functions.
 
 Yes. I've submitted patches (to linux-omap) and Catalin did submit
 patches (to arm-linux) for PIC barriers.  In the past they have been
 rejected by Tony or you for different reasons.  Tony last rejected
 it because he thought it should be generic at the ARM level.  I
 don't recall what your last stance was.

Looking back, I never commented on that patch.  I did on the previous
patch which was adding DSBs in a way which would break stuff.  The
patch to add them to the interrupt controllers has never been reposted.

However, adding barriers may not be the correct answer for this.
See Documentation/io_ordering.txt - reading back from a safe register
on the target device ensures that the previous writes should hit the
device before the read completes, without the overhead of a full
barrier.

This point is even more important if you have some form of write
posting between the CPU and the device (eg, a PCI bus) - a DSB
won't reach down to the target PCI device which may be behind some
write-posting bridges.

So, in the case of arch/arm/common/gic.c, we should be reading one of
the gic control registers after the writes.  In the case of
arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
should be a sufficient solution.

But, not a barrier.

 However, if you use the irq-controller barriers they tend to go away.

Great, so solving that should prevent them.

   Originally the IC-Architect wanted two memory windows per device, one
   SO for register control and one DEVICE for FIFO access.  Given that we
   do DMA (which doesn't care about how ARM sees the world) on the
   performance hungry devices not doing this was ok.
 
  I'm not sure what point you're making there.
 
 Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping
 to write to registers of that device.  Then when you go to write to its
 FIFO use a DEVICE mapping.

I believe ARMv7 has some restrictions on dual mapping of the same
space with different types, so don't expect this technique to always
work.

 Say TX IRQ happens at UART, I might check status bits through a SO mapping,
 but when it comes time to fill the FIFO I write to the DEVICE mapping.

Why?  Firstly, the read _has_ to complete before the program can
continue.  (If it hasn't completed, you don't have the data to decide
what to do next.)  Secondly, any previous device writes will have to
complete before the read completes.

So what does reading the status bits through a SO mapping gain you?
The answer is, all other reads and writes previously issued by the
program completing.  Does that affect the status that the UART is
giving you?

   For an experiment a couple years back we did convert the dma alloc
   pool addresses as NC.  All worked -except- for OHCI-USB which started
   failing some tests.
 
  If we go down the route of 

Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-07 Thread Russell King - ARM Linux
On Thu, Aug 07, 2008 at 10:20:33PM +0100, Russell King - ARM Linux wrote:
 In the case of arch/arm/mach-omap2/irq.c, reading the INTC_REVISION
 reg after masking should be a sufficient solution.

And here's a patch to do exactly that.

diff --git a/arch/arm/mach-omap2/irq.c b/arch/arm/mach-omap2/irq.c
index 9ef15b3..27610b1 100644
--- a/arch/arm/mach-omap2/irq.c
+++ b/arch/arm/mach-omap2/irq.c
@@ -48,6 +48,7 @@ static struct omap_irq_bank {
 static void omap_ack_irq(unsigned int irq)
 {
__raw_writel(0x1, irq_banks[0].base_reg + INTC_CONTROL);
+   __raw_readl(irq_banks[0].base_reg + INTC_REVISION);
 }
 
 static void omap_mask_irq(unsigned int irq)
@@ -61,6 +62,7 @@ static void omap_mask_irq(unsigned int irq)
}
 
__raw_writel(1  irq, irq_banks[0].base_reg + INTC_MIR_SET0 + offset);
+   __raw_readl(irq_banks[0].base_reg + INTC_REVISION);
 }
 
 static void omap_unmask_irq(unsigned int irq)

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-06 Thread Catalin Marinas
On Mon, 2008-08-04 at 17:40 -0600, Paul Walmsley wrote:
 Add the MT_MEMORY_STRONGLY_ORDERED memory type for ARM strongly ordered
 memory.
 
 This is used on OMAP3 for on-board SRAM.  On OMAP, SRAM is used for code 
 that changes the SDRAM controller's clock, temporarily blocking access to 
 SDRAM.  During this period, as code executes from SRAM, the ARM cache 
 controller can attempt to write dirty cache lines back to SDRAM to make 
 room for SRAM cache lines, causing the MPU subsystem to hang.  To avoid 
 this, we mark SRAM as strongly- ordered memory.

Why not use normal uncached memory? Strongly ordered is pretty
inefficient as it cannot do any reordering or write buffer merging (it's
like having a memory barrier before and after each instruction).
Speculative accesses are not allowed either. Strongly ordered memory is
not really meant for executing code from.

 + [MT_MEMORY_STRONGLY_ORDERED] = {
 + .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
 + PMD_SECT_UNCACHED,

You can add PMD_SECT_TEX(1) for normal uncached memory.

-- 
Catalin

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-06 Thread Catalin Marinas
On Tue, 2008-08-05 at 07:15 -0500, Woodruff, Richard wrote:
  Is the controller allowed to write dirty cache lines out at any time
  it
  likes? Surely a better fix is to drain the cache of the changes before
  changing the clock for the SDRAM?
 
 - Previously the SRAM was marked as cached.  _Execution_ using that
 attribute will generate line fetches to SRAM.  This will cause
 displacement write-outs of resident DDR lines.  Similarly,
 _load/store_ sequences in that code have the same effect.  This cast
 outs dead lock the CPU and it can't fetch to progress.
[...]
 * Flushing the entire L1  L2 cache frequently is very expensive and
 better not done if you don't have.  Also, it is not sufficient for the
 context-restore path which needs to NOT live in DDR.
 
 The code need to execute in a non-cached region.

I don't think there is any guarantee that dirty cache line won't be
evicted to SDRAM even if your code uses uncached memory only. The CPU is
allowed to do speculative reads from the normal cached memory and these
reads may force a dirty cache line to be written back to memory. You may
need to do at least a cache clean operation (invalidate not necessary).

-- 
Catalin

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-06 Thread Woodruff, Richard
 Why not use normal uncached memory? Strongly ordered is pretty
 inefficient as it cannot do any reordering or write buffer merging
 (it's
 like having a memory barrier before and after each instruction).
 Speculative accesses are not allowed either. Strongly ordered memory
 is
 not really meant for executing code from.

It could be.  This is a discussion we were having off line.

The code in question is a small bit of assembly interacting with hardware 
mainly and has not been audited for full pipeline/buffering correctness.

Most of the weak memory attributes in newer ARMs are not exploited today in 
tree.  I'll guess this was more a correctness and capability judgment from 
Russell.

Regards,
Richard W.

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM MMU: add strongly-ordered memory type

2008-08-05 Thread Ben Dooks
On Mon, Aug 04, 2008 at 05:40:57PM -0600, Paul Walmsley wrote:
 
 Add the MT_MEMORY_STRONGLY_ORDERED memory type for ARM strongly ordered
 memory.
 
 This is used on OMAP3 for on-board SRAM.  On OMAP, SRAM is used for code 
 that changes the SDRAM controller's clock, temporarily blocking access to 
 SDRAM.  During this period, as code executes from SRAM, the ARM cache 
 controller can attempt to write dirty cache lines back to SDRAM to make 
 room for SRAM cache lines, causing the MPU subsystem to hang.  To avoid 
 this, we mark SRAM as strongly- ordered memory.

Is the controller allowed to write dirty cache lines out at any time it
likes? Surely a better fix is to drain the cache of the changes before
changing the clock for the SDRAM?
 
 Problem noted by Richard Woodruff [EMAIL PROTECTED].  Fix derived
 from the TI CDP codebase.
 
 Signed-off-by: Paul Walmsley [EMAIL PROTECTED]
 ---
 
  arch/arm/mm/mmu.c  |5 +
  include/asm-arm/mach/map.h |   13 +++--
  2 files changed, 12 insertions(+), 6 deletions(-)
 
 diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
 index 2d6d682..5b56539 100644
 --- a/arch/arm/mm/mmu.c
 +++ b/arch/arm/mm/mmu.c
 @@ -239,6 +239,11 @@ static struct mem_type mem_types[] = {
   .prot_sect = PMD_TYPE_SECT,
   .domain= DOMAIN_KERNEL,
   },
 + [MT_MEMORY_STRONGLY_ORDERED] = {
 + .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
 + PMD_SECT_UNCACHED,
 + .domain= DOMAIN_KERNEL,
 + },
  };
  
  const struct mem_type *get_mem_type(unsigned int type)
 diff --git a/include/asm-arm/mach/map.h b/include/asm-arm/mach/map.h
 index 7ef3c83..8cb46b7 100644
 --- a/include/asm-arm/mach/map.h
 +++ b/include/asm-arm/mach/map.h
 @@ -19,12 +19,13 @@ struct map_desc {
  };
  
  /* types 0-3 are defined in asm/io.h */
 -#define MT_CACHECLEAN4
 -#define MT_MINICLEAN 5
 -#define MT_LOW_VECTORS   6
 -#define MT_HIGH_VECTORS  7
 -#define MT_MEMORY8
 -#define MT_ROM   9
 +#define MT_CACHECLEAN4
 +#define MT_MINICLEAN 5
 +#define MT_LOW_VECTORS   6
 +#define MT_HIGH_VECTORS  7
 +#define MT_MEMORY8
 +#define MT_ROM   9
 +#define MT_MEMORY_STRONGLY_ORDERED   10
  
  #define MT_NONSHARED_DEVICE  MT_DEVICE_NONSHARED
  #define MT_IXP2000_DEVICEMT_DEVICE_IXP2000
 
 
 ---
 List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
 FAQ:http://www.arm.linux.org.uk/mailinglists/faq.php
 Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

-- 
-- 
Ben

Q:  What's a light-year?
A:  One-third less calories than a regular year.

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html