[linux-sunxi] Re: [U-Boot] [PATCH v3 9/9] net/designware: Reduce DMA burst length

2014-04-28 Thread Alexey Brodkin
Hi Marek,

On Mon, 2014-04-28 at 07:55 +0200, Marek Vasut wrote:
> On Monday, April 28, 2014 at 07:51:49 AM, Chen-Yu Tsai wrote:
> > On Mon, Apr 28, 2014 at 2:08 AM, Marek Vasut  wrote:
> > > On Sunday, April 27, 2014 at 05:29:29 PM, Chen-Yu Tsai wrote:
> > >> On Sun, Apr 27, 2014 at 11:25 PM, Ian Campbell  
> > >> wrote:
> > >> > On Sat, 2014-04-26 at 20:28 +0200, Marek Vasut wrote:
> > >> >> On Friday, April 18, 2014 at 08:05:50 PM, Ian Campbell wrote:
> > >> >> > From: Jens Kuske 
> > >> >> > 
> > >> >> > The GMAC module in Allwinner sunxi SoCs seems to have problems with
> > >> >> > burst lengths > 8.
> > >> >> 
> > >> >> Is there any explanation for the problems please ?
> > >> > 
> > >> > Jens or Wens, can you answer this?
> > >> 
> > >> IIRC, with burst lengths > 8, GMAC doesn't work, no ping, no DHCP.
> > >> I don't remember if it was TX or RX that suffered, or even both.
> > >> 
> > >> Hope this clarifies things a bit.
> > > 
> > > No, it does not at all, sorry. What you describe are symptoms, but what I
> > > want to know is what is the root cause of those symptoms. You did not
> > > explain that.
> > 
> > I can not offer much more explanation. The value was hardcoded in
> > Allwinner's code. The datasheets don't offer much, except this line
> > might be related:
> > 
> >   (DMA) Descriptor architecture, allowing large blocks of data transfer
> >   with minimum CPU intervention; each descriptor can transfer up to
> >   4 KB data.
> > 
> > Also probably related:
> > 
> >   4KB TX FIFO for transmission packets and 16KB RX FIFO for reception
> >   packets.
> > 
> > I'm not an expert in hardware. We could ask Allwinner, but given past
> > inquiries from the linux-sunxi community, I'd say getting a reply on
> > hardware specifics is unlikely to happen.
> > 
> > So my guess is that this is limited by the DWMAC IP Allwinner licensed
> > from Synopsys.

Even though I'm not an expert in DW GMAC I may confirm that it has tons
of settings people may use for fine-tuning performance and I may assume
that there're corner cases when some settings may lead to real problems.

And IMHO if different boards may need different configurations why don't
we satisfy their needs. But it's good to keep others who used to use
existing settings happy too.

If you look in corresponding driver in Linux kernel
(https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/stmicro)
 you'll see much more complicated code compared to driver in U-Boot. And among 
other things you may see some GMAC parameters could be set per platform.

And Programmable Burst Length (PBL) is one of them. Look at
stmmac_init_dma_engine() in
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

You may see default PBL (DEFAULT_DMA_PBL = 8) is used if other value is
not provided by platform.

So I would propose to act this way:

1. Introduce driver-specific config value. Something like
"DW_GMAC_DEFAULT_DMA_PBL" and in "designware.h" set it to "BURST_8". I
hope driver in Linux was used a lot and this value could be treated as
safe.

2. Ask people to try this new setting on existing boards that use DW
GMAC. If everybody is happy there's nothing else to do here.

3. Otherwise if people report regressions add mentioned config option in
board configuration files for problematic boards.

I think with this approach everybody will be happy.

-Alexey

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[linux-sunxi] Re: [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-28 Thread Alexey Brodkin
Dear Ian,

On Sun, 2014-04-27 at 19:47 +0100, Ian Campbell wrote:

> This is the driver for one particular ARM cache controller and not the
> one used for the SoC. In any case it does "proper" start/end handling
> only for cache flush operations, not cache invalidate.
> 
> Cache invalidate is a potentially destructive operation (throwing away
> data in the caches), having it operate on anything more than the precise
> region requested would be very surprising to almost anyone I think.
...
> I think you are missing the important differences between a cache 
> flush and a cache invalidate.

IMHO cache invalidation and flush operations are sort of antipodes.

With invalidation you discard all data in corresponding line in cache
and replace it with freshly read data from memory.

With flush you move cache line to corresponding memory location
overriding previously existing values in memory.

So if you deal with 2 independent data fields which both share the same
one cache line it's potentially dangerous to do both flush and
invalidate of this cache line.

In case of MMU utilization we have a luxury of uncached access, so we
may safely access control structures in memory with granularity which is
available for this particular CPU. This is AFAIK drivers deal with
buffer descriptors in Linux kernel.

In case of U-Boot where we prefer to keep things simple we don't use
MMU. So no generic way for cache bypassing. Still some architectures
like ARC700 have special instructions for accessing memory bypassing
cache but I prever to not use them and keep sources
platform-independent.

And in this situation IMHO the only safe solution could be in proper
design of data layout. In other words we need to keep independent data
blocks aligned to cache line.

And as you may see from "designware.h" buffer descriptor structure is
aligned:
==
struct dmamacdescr {
u32 txrx_status;
u32 dmamac_cntl;
void *dmamac_addr;
struct dmamacdescr *dmamac_next;
} __aligned(ARCH_DMA_MINALIGN);
==

Regards,
Alexey


-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[linux-sunxi] Re: [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-25 Thread Alexey Brodkin
Hi Ian,

On Thu, 2014-04-24 at 20:14 +0100, Ian Campbell wrote:
> On Thu, 2014-04-24 at 17:41 +0000, Alexey Brodkin wrote:
> 
> > 1. Don't invalidate "sizeof(struct dmamacdescr)" but only
> > "roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
> 
> OK. (Although given the realities of the real world values of
> ARCH_DMA_MINALIGN on every arch and the sizes of the structs & fields
> involved this isn't actually buying you anything at all)

Well this particular structure is of size sizeof(uint32_t) * 4 = 16
bytes. And I may suppose that cache lines could be shorter than 16 bytes
even though it could be pretty rare situation. So definitely not a big
deal.

But since we're dealing with macros here all mentioned calculations will
be done by pre-processor and execution performance won't be affected.

> > 2. In the following lines implements rounding as well:
> 
> Will fix as well.
> 
> > 3. Check carefully if there're other instances of probably unaligned
> > cache operations.

I thought a bit more about this situation and now I'm not that sure if
we need to align addresses we pass to cache invalidate/flush functions.

Because IMHO drivers shouldn't care about specifics of particular
platform or architecture. Otherwise we'll need to patch each and every
driver only for cache invalidate/flush functions. I looked how these
functions are used in other drivers and see that in most of cases no
additional alignment precautions were implemented. People just pass
start and end addresses.

In its turn platform and architecture provides cache invalidate/flush
functions implement its functionality depending on hardware specifics.

For example on architectures that may only flush/invalidate with
granularity of 1 cache line cache invalidate/flush functions make sure
to start processing from the start of the cache line to which start
address falls and end processing when cache line where end address falls
is processed.

I may assume that there're architectures that automatically understand
from which cache line to start and at which line to stop processing.

But if your architecture requires cache line aligned addresses to be
used for start/end addresses you may look for examples in ARC
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),,
 MIPS 
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c),
 SH 
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),

and what's interesting even implementation you use have semi-proper
start/end addresses handling -
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c

Here's your invalidation procedure:

/* invalidate memory from start to stop-1 */
void v7_outer_cache_inval_range(u32 start, u32 stop)
{
/* PL310 currently supports only 32 bytes cache line */
u32 pa, line_size = 32;

/*
 * If start address is not aligned to cache-line do not
 * invalidate the first cache-line
 */
if (start & (line_size - 1)) {
printf("ERROR: %s - start address is not aligned - 0x%08x\n",
__func__, start);
/* move to next cache line */
start = (start + line_size - 1) & ~(line_size - 1);
}

/*
 * If stop address is not aligned to cache-line do not
 * invalidate the last cache-line
 */
if (stop & (line_size - 1)) {
printf("ERROR: %s - stop address is not aligned - 0x%08x\n",
__func__, stop);
/* align to the beginning of this cache line */
stop &= ~(line_size - 1);
}

for (pa = start; pa < stop; pa = pa + line_size)
writel(pa, &pl310->pl310_inv_line_pa);

pl310_cache_sync();
}


1. I don't understand why start from the next cache line if start
address is not aligned to cache line boundary? I'd say that you want to
invalidate cache line that contains unaligned start address. Otherwise
first bytes won't be invalidated, right?

2. Why do we throw _error_ message. I may understand if you emit
_warning_ message in case of debug build (with DEBUG defined). Well in
current implementation (see 1) it could be error because behavior is
really dangerous. But if you start from correct cache line only warning
in debug mode makes sense (IMHO).

3. Stop/end address in contrast might need to be extended depending on
HW implementation (see above comment).

And here's your flush procedure:
===
void v7_outer_cache_flush_range(u32 start, u32 stop)
{
/* PL310 currently supports only 32 bytes cache line */
u32 pa, line_size = 32;

/*
 * Align to the beginning of cache-line -

[linux-sunxi] Re: [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-24 Thread Alexey Brodkin
Dear Ian,

On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote:
> - /* Invalidate only "status" field for the following check */
> - invalidate_dcache_range((unsigned long)&desc_p->txrx_status,
> - (unsigned long)&desc_p->txrx_status +
> - sizeof(desc_p->txrx_status));
> + /* Strictly we only need to invalidate the "status" field for
> +  * the following check, but on some platforms we cannot
> +  * invalidate only 4 bytes, so invalidate the the whole thing
> +  * which is known to be DMA aligned. */
> + invalidate_dcache_range((unsigned long)desc_p,
> + (unsigned long)desc_p +
> + sizeof(struct dmamacdescr));
>  
>   /* Check if the descriptor is owned by CPU */
>   if (desc_p->txrx_status & DESC_TXSTS_OWNBYDMA) {

Unfortunately I cannot recall exactly why I wanted to invalidate only
"status" field.

Now looking at this code I may assume that I wanted to save some CPU
cycles. Because:

1. We don't care about all other fields except "status". GMAC only
changes "status" field when it resets "OWNED_BY_DMA" flag and all other
fields CPU writes but not reads while sending packets.

2. We may save quite a few CPU cycles if only invalidating minimum
amount of bytes (remember each read from external memory may cost 100s
of cycles).

So I would advise:

1. Don't invalidate "sizeof(struct dmamacdescr)" but only
"roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".

2. In the following lines implements rounding as well:

/* Flush data to be sent */
flush_dcache_range((unsigned long)desc_p->dmamac_addr,
   (unsigned long)desc_p->dmamac_addr + length);


We may be sure "desc_p->dmamac_addr" is properly aligned, but length
could be not-aligned. So I'd replace "length" with "roundup(length,
ARCH_DMA_MINALIGN)" as you did in 3rd patch.

3. Check carefully if there're other instances of probably unaligned
cache operations. I erroneously didn't care about alignment on cache
invalidation/flushing because my implementation of those cache
operations deals with non-aligned start/end internally within
invalidate/flush functions - which might be not that good even if it's
convenient for me.

4. Why don't you squeeze all 3 patches in 1 and name it like "fix
alignment issues with caches on some platforms"? Basically with all 3
patches you fix one and only issue and application of any one of those 3
patches doesn't solve your problem, right?

Regards,
Alexey


-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[linux-sunxi] Re: [PATCH 1/3] net/designware: ensure device private data is DMA aligned.

2014-04-24 Thread Alexey Brodkin
Dear Ian,

On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote:
> struct dw_eth_dev contains fields which are accessed via DMA, so make sure it
> is aligned to a dma boundary. Without this I see:
> ERROR: v7_dcache_inval_range - start address is not aligned - 0x7fb677e0
> 
> Signed-off-by: Ian Campbell 
> ---
>  drivers/net/designware.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/designware.c b/drivers/net/designware.c
> index 6ece479..1120f70 100644
> --- a/drivers/net/designware.c
> +++ b/drivers/net/designware.c
> @@ -412,7 +412,8 @@ int designware_initialize(ulong base_addr, u32 interface)
>* Since the priv structure contains the descriptors which need a strict
>* buswidth alignment, memalign is used to allocate memory
>*/
> - priv = (struct dw_eth_dev *) memalign(16, sizeof(struct dw_eth_dev));
> + priv = (struct dw_eth_dev *) memalign(ARCH_DMA_MINALIGN,
> +   sizeof(struct dw_eth_dev));
>   if (!priv) {
>   free(dev);
>   return -ENOMEM;

Thanks for this fix.
It was a left-over from initially submitted driver and I missed this
hard-coded item.

Still I haven't tried to execute this on the real board.
Hope to do it soon but I don't expect any issues.

Regards,
Alexey

Reviewed-by: Alexey Brodkin 

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.