Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread Joerg Sonnenberger
On Tue, Feb 24, 2015 at 12:57:04PM +, Zbigniew Bodek wrote:
>   In fact, the assumption that 'struct ip' is always 4-byte aligned
>   is definitely false, as we have no impact on data alignment of packet
>   stream received.

Pretty much all drivers do guarantee this by setting up buffers
accordingly. Why can't yours do the same?

Joerg
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread Adrian Chadd
Hi,

Erm, did you test this stuff on MIPS too?

We should be smarter about accessing fields in the IP header, rather
than trying to play alignment games like this.



-adrian


On 24 February 2015 at 04:57, Zbigniew Bodek  wrote:
> Author: zbb
> Date: Tue Feb 24 12:57:03 2015
> New Revision: 279236
> URL: https://svnweb.freebsd.org/changeset/base/279236
>
> Log:
>   Change struct attribute to avoid aligned operations mismatch
>
>   Previous __alignment(4) allowed compiler to assume that operations are
>   performed on aligned region. On ARM processor, this led to alignment fault
>   as shown below:
>   trapframe: 0xda9e5b10
>   FSR=0001, FAR=a67b680e, spsr=6113
>   r0 =, r1 =0068, r2 =007c, r3 =
>   r4 =a67b6826, r5 =a67b680e, r6 =0014, r7 =0068
>   r8 =0068, r9 =da9e5bd0, r10=0011, r11=da9e5c10
>   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
>   <...>
>   udp_input+0x264: ldmia r5, {r0-r3, r6}
>   udp_input+0x268: stmia r12, {r0-r3, r6}
>
>   This was due to instructions which do not support unaligned access,
>   whereas for __alignment(2) compiler replaced ldmia/stmia with some
>   logically equivalent memcpy operations.
>   In fact, the assumption that 'struct ip' is always 4-byte aligned
>   is definitely false, as we have no impact on data alignment of packet
>   stream received.
>
>   Another possible solution would be to explicitely perform memcpy()
>   on objects of 'struct ip' type, which, however, would suffer from
>   performance drop, and be merely a problem hiding.
>
>   Please, note that this has nothing to do with
>   ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to
>   compiler behaviour.
>
>   Submitted by:  Wojciech Macek 
>   Reviewed by:   glebius, ian
>   Obtained from: Semihalf
>
> Modified:
>   head/sys/netinet/ip.h
>
> Modified: head/sys/netinet/ip.h
> ==
> --- head/sys/netinet/ip.h   Tue Feb 24 12:31:08 2015(r279235)
> +++ head/sys/netinet/ip.h   Tue Feb 24 12:57:03 2015(r279236)
> @@ -67,7 +67,7 @@ struct ip {
> u_char  ip_p;   /* protocol */
> u_short ip_sum; /* checksum */
> struct  in_addr ip_src,ip_dst;  /* source and dest address */
> -} __packed __aligned(4);
> +} __packed __aligned(2);
>
>  #defineIP_MAXPACKET65535   /* maximum packet size */
>
>
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread Joerg Sonnenberger
On Tue, Feb 24, 2015 at 11:59:02AM -0700, Ian Lepore wrote:
> ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
> arbitrary boundary.

Isn't it generally more a case of "if the device's DMA engine isn't
extremely bad designed"? There is a strong correlation between devices
not supporting DMA to arbitrary locations and devices requiring a single
data segment. Heck, from a design point of few there is no justification
for not supporting skipping a few byte at the start of the receive. The
device can easily just add two bytes of noise to make it double word
aligned if necessary... VIA Rhine forever!

Joerg
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread John-Mark Gurney
Ian Lepore wrote this message on Tue, Feb 24, 2015 at 11:59 -0700:
> On Tue, 2015-02-24 at 09:34 -0800, John-Mark Gurney wrote:
> > Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +:
> > > Author: zbb
> > > Date: Tue Feb 24 12:57:03 2015
> > > New Revision: 279236
> > > URL: https://svnweb.freebsd.org/changeset/base/279236
> > > 
> > > Log:
> > >   Change struct attribute to avoid aligned operations mismatch
> > >   
> > >   Previous __alignment(4) allowed compiler to assume that operations are
> > >   performed on aligned region. On ARM processor, this led to alignment 
> > > fault
> > >   as shown below:
> > >   trapframe: 0xda9e5b10
> > >   FSR=0001, FAR=a67b680e, spsr=6113
> > >   r0 =, r1 =0068, r2 =007c, r3 =
> > >   r4 =a67b6826, r5 =a67b680e, r6 =0014, r7 =0068
> > >   r8 =0068, r9 =da9e5bd0, r10=0011, r11=da9e5c10
> > >   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
> > >   <...>
> > >   udp_input+0x264: ldmia r5, {r0-r3, r6}
> > >   udp_input+0x268: stmia r12, {r0-r3, r6}
> > >   
> > >   This was due to instructions which do not support unaligned access,
> > >   whereas for __alignment(2) compiler replaced ldmia/stmia with some
> > >   logically equivalent memcpy operations.
> > >   In fact, the assumption that 'struct ip' is always 4-byte aligned
> > >   is definitely false, as we have no impact on data alignment of packet
> > >   stream received.
> > 
> > So, the whole point of ETHER_ALIGN is to make struct ip aligned on
> > 4 byte offsets...  This will probably impact performance on arm for
> > properly aligned struct ip...
> > 
> 
> ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
> arbitrary boundary.  Of course, if you're on such a platform it can
> probably just access the word-sized values on halfword boundaries
> anyway.  For arm, the only solution at the driver level is to memcpy()
> every incoming packet to another buffer to realign it.  If you think
> that makes receive performance really bad, you'd be right.

Having working on the TS-7200 which had this restriction, I know very
well the issues...

> Many arm platforms can only DMA on a cacheline boundary.  The size of an

Are you sure?  Last time I looked at the various drivers, about a year
or two ago, I think I found only one ARM driver that didn't do at
least 2 byte DMA alignment..

Either that, or they haven't been declaring their restrictions properly
in bus_dma_tag_create...

> mbuf header is like 24 or 28 or something, definitely not cache aligned.
> So in addition to the extra copying the drivers do for ETHER_ALIGN,
> there could also be bounce-buffer copying involved due to the alignment
> in the busdma tag.

Can you list a few of these drivers for me?  I'd be curious to look
at them again...

> The latter issue could be fixed with an MD padding field at the end of
> the mbuf header to make the data portion start on a cache line boundary.
> When I experimented with that concept on imx6 I gained 10 MB/sec
> performance from the reduced copying.

ffec allows byte aligned rx dma, but it looks like 16 byte TX align...
If the ffec allows segments that are not a multiple of 16, there is/was
talk about supporting this, and then you'd only have to bounce a few
bytes instead of the entire packet, but I'm not sure if we can do that
w/ the existing bus_dma framework..

I can't find the define/var right now, but I remeber there being
an offset that is processed to try to put the packet at the correct
byte offset...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread Ian Lepore
On Tue, 2015-02-24 at 09:34 -0800, John-Mark Gurney wrote:
> Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +:
> > Author: zbb
> > Date: Tue Feb 24 12:57:03 2015
> > New Revision: 279236
> > URL: https://svnweb.freebsd.org/changeset/base/279236
> > 
> > Log:
> >   Change struct attribute to avoid aligned operations mismatch
> >   
> >   Previous __alignment(4) allowed compiler to assume that operations are
> >   performed on aligned region. On ARM processor, this led to alignment fault
> >   as shown below:
> >   trapframe: 0xda9e5b10
> >   FSR=0001, FAR=a67b680e, spsr=6113
> >   r0 =, r1 =0068, r2 =007c, r3 =
> >   r4 =a67b6826, r5 =a67b680e, r6 =0014, r7 =0068
> >   r8 =0068, r9 =da9e5bd0, r10=0011, r11=da9e5c10
> >   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
> >   <...>
> >   udp_input+0x264: ldmia r5, {r0-r3, r6}
> >   udp_input+0x268: stmia r12, {r0-r3, r6}
> >   
> >   This was due to instructions which do not support unaligned access,
> >   whereas for __alignment(2) compiler replaced ldmia/stmia with some
> >   logically equivalent memcpy operations.
> >   In fact, the assumption that 'struct ip' is always 4-byte aligned
> >   is definitely false, as we have no impact on data alignment of packet
> >   stream received.
> 
> So, the whole point of ETHER_ALIGN is to make struct ip aligned on
> 4 byte offsets...  This will probably impact performance on arm for
> properly aligned struct ip...
> 

ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
arbitrary boundary.  Of course, if you're on such a platform it can
probably just access the word-sized values on halfword boundaries
anyway.  For arm, the only solution at the driver level is to memcpy()
every incoming packet to another buffer to realign it.  If you think
that makes receive performance really bad, you'd be right.

Many arm platforms can only DMA on a cacheline boundary.  The size of an
mbuf header is like 24 or 28 or something, definitely not cache aligned.
So in addition to the extra copying the drivers do for ETHER_ALIGN,
there could also be bounce-buffer copying involved due to the alignment
in the busdma tag.

The latter issue could be fixed with an MD padding field at the end of
the mbuf header to make the data portion start on a cache line boundary.
When I experimented with that concept on imx6 I gained 10 MB/sec
performance from the reduced copying.

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r279236 - head/sys/netinet

2015-02-24 Thread John-Mark Gurney
Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +:
> Author: zbb
> Date: Tue Feb 24 12:57:03 2015
> New Revision: 279236
> URL: https://svnweb.freebsd.org/changeset/base/279236
> 
> Log:
>   Change struct attribute to avoid aligned operations mismatch
>   
>   Previous __alignment(4) allowed compiler to assume that operations are
>   performed on aligned region. On ARM processor, this led to alignment fault
>   as shown below:
>   trapframe: 0xda9e5b10
>   FSR=0001, FAR=a67b680e, spsr=6113
>   r0 =, r1 =0068, r2 =007c, r3 =
>   r4 =a67b6826, r5 =a67b680e, r6 =0014, r7 =0068
>   r8 =0068, r9 =da9e5bd0, r10=0011, r11=da9e5c10
>   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
>   <...>
>   udp_input+0x264: ldmia r5, {r0-r3, r6}
>   udp_input+0x268: stmia r12, {r0-r3, r6}
>   
>   This was due to instructions which do not support unaligned access,
>   whereas for __alignment(2) compiler replaced ldmia/stmia with some
>   logically equivalent memcpy operations.
>   In fact, the assumption that 'struct ip' is always 4-byte aligned
>   is definitely false, as we have no impact on data alignment of packet
>   stream received.

So, the whole point of ETHER_ALIGN is to make struct ip aligned on
4 byte offsets...  This will probably impact performance on arm for
properly aligned struct ip...

>   Another possible solution would be to explicitely perform memcpy()
>   on objects of 'struct ip' type, which, however, would suffer from
>   performance drop, and be merely a problem hiding.

I have thought of this, and think that this is the way to go... do
a copy of struct ip into a stack variable, manipulate it from there,
and pull it out later, though I haven't done any performance
measurements, and for items that touch a field or two it'd probably
hurt..

Maybe have two versions of struct ip, one that is aligned to 4 bytes
and another that is aligned to 2 bytes?

Also, did you do the same adjustment for struct tcp and friends? If
struct ip isn't aligned, then struct tcp won't be either causing
similar issues...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"