Re: svn commit: r305814 - head/sys/boot/common

2016-09-14 Thread Bruce Evans

On Wed, 14 Sep 2016, Emmanuel Vadot wrote:


Bruce Evans  wrote:


On Wed, 14 Sep 2016, Emmanuel Vadot wrote:


Log:
 ufsread: Do not cast struct direct from void *
 This cause alignment problem on ARM (and possibly other archs), instead copy 
it.
...


This looks like a good pessimization for space.  boot2 on i386 has to
fit in 8192 bytes and has a negative number to spare (features are
already left out).


Do you have any suggestion on making the code better ?
This was the last patch for having EFI working on ARMv6 and this is
something that I want to be enabled by default at some point.


At least copy to a local variable.  Ifdefs for the space-constrained &&
non-strict-alignment arches work of course, but shouldn't be needed.

You will have to investigate the -ffreestanding and builtin situation.
-ffreestanding is set in 24 sub-Makefiles instead of correctly in 1
Makefile.inc.  I think it applies to all of boot/i386.  boot code
almost never recovers from this using __builtin_*.  Beware that some
builtins just call an extern function which might not exist in boot
code.  Even memcpy barely exists in small boot2's.  Some compilers
have buggy -ffreestanding and call memcpy() for struct copying, so it
is preferred to bcopy() in boot code.

Bruce
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r305814 - head/sys/boot/common

2016-09-14 Thread Emmanuel Vadot

 Hi Bruce,

On Thu, 15 Sep 2016 05:37:19 +1000 (EST)
Bruce Evans  wrote:

> On Wed, 14 Sep 2016, Emmanuel Vadot wrote:
> 
> > Log:
> >  ufsread: Do not cast struct direct from void *
> >  This cause alignment problem on ARM (and possibly other archs), instead 
> > copy it.
> >
> >  MFC after: 1 week
> >
> > Modified:
> >  head/sys/boot/common/ufsread.c
> >
> > Modified: head/sys/boot/common/ufsread.c
> > ==
> > --- head/sys/boot/common/ufsread.c  Wed Sep 14 16:47:17 2016
> > (r305813)
> > +++ head/sys/boot/common/ufsread.c  Wed Sep 14 17:43:32 2016
> > (r305814)
> > @@ -97,21 +97,21 @@ static __inline uint8_t
> > fsfind(const char *name, ufs_ino_t * ino)
> > {
> > static char buf[DEV_BSIZE];
> > -   struct direct *d;
> > +   static struct direct d;
> > char *s;
> > ssize_t n;
> 
> This looks like a good pessimization for space.  boot2 on i386 has to
> fit in 8192 bytes and has a negative number to spare (features are
> already left out).

 Do you have any suggestion on making the code better ?
 This was the last patch for having EFI working on ARMv6 and this is
something that I want to be enabled by default at some point.

> 
> With auto buffers, and also builtin memcpy, the compiler can optimize
> the memcpy to nothing on arches with no alignment, and should do this
> with -Os.  I think -ffreestanding in boot programs kills the builtin,
> and this is almost intentional since compilers have poor support for
> -Os so the inline memcpy is sometimes larger unless it is null.
> 
> >
> > fs_off = 0;
> > while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0)
> > for (s = buf; s < buf + DEV_BSIZE;) {
> > -   d = (void *)s;
> > +   memcpy(, s, sizeof(struct direct));
> > if (ls)
> > -   printf("%s ", d->d_name);
> > -   else if (!strcmp(name, d->d_name)) {
> > -   *ino = d->d_ino;
> > -   return d->d_type;
> > +   printf("%s ", d.d_name);
> > +   else if (!strcmp(name, d.d_name)) {
> > +   *ino = d.d_ino;
> > +   return d.d_type;
> > }
> > -   s += d->d_reclen;
> > +   s += d.d_reclen;
> > }
> > if (n != -1 && ls)
> > printf("\n");
> 
> The static buffer in the old code also looks like a pessimization for
> space.  It doesn't seem to be needed to preserver or return results.
> Compilers don't seem to be smart enough to see this and optimize it
> to auto or vice versa.
> 
> Testing shows that the static buffer is space optimization for gcc
> and a space pessimization for clang:
> 
> X #include 
> X #include 
> X 
> X int
> X foo(void)
> X {
> X static char buf[101];
> X static int buf1[3];
> X 
> X read(0, buf, sizeof(buf));
> X memcpy(buf1, [2], sizeof(buf1));
> X return (buf[0] + buf1[0]);
> X }
> 
> 2 static buffers give the best pessimizations for clang -Os -m32.
> Even with -Os, clang prefers to use large instructions to reduce the
> number of instructions.  The static buffers give largest instructions,
> and also prevent optimizing away the memcpy().  With auto buf1, it
> takes -ffreestanding to kill the optimization of memcpy() and with
> that optimization the size is reduced by almost a factor of 2.  With
> auto buf too, other pessimizations are applied to give a size reduction
> of just 1 byte.  gcc-4.2.1 never inlines the memcpy(), but it does
> the other pessimizations less well to give an in-between size.
> 
> Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc,
> 33 for clang with autos and builtins, 28 for tweaked output for clang
> with static buf, auto buf1 and builtins (null memcpy).
> 
> Probably the static buf is an intentional optimization for gcc.
> Compilers do too good a job of generating large code for large stack
> offsets.  In my test code, the offsets are about 101 which is less than
> 128 so it takes only 2 bytes.  DEV_BSIZE is 512 so it needs 5-byte
> offsets from the usual pessimizations for the base pointer (these
> are to point %ebp at the top of the variables and put all the scalar
> variables below the buffer so that the offsets are largest).  But
> clang also does good pessimizations for offsets from the static buffer:
> 
>   pushl   $foo.buf# 5 bytes
>   movsbl  foo.buf, %eax   # 7 bytes
>   addlfoo.buf+2, %eax # 6 bytes
> 
> Actually optimizing for space:
> 
>   movl$foo.buf,%ebx   # 5 bytes
>   pushl   %ebx# 1 byte
>   movsbl  (%ebx), %eax# 3 bytes
>   addl2(%ebx), %eax   # 3 bytes
> 
> Bruce

 Cheers,

-- 
Emmanuel Vadot
___
svn-src-all@freebsd.org mailing list

Re: svn commit: r305814 - head/sys/boot/common

2016-09-14 Thread Bruce Evans

On Wed, 14 Sep 2016, Emmanuel Vadot wrote:


Log:
 ufsread: Do not cast struct direct from void *
 This cause alignment problem on ARM (and possibly other archs), instead copy 
it.

 MFC after: 1 week

Modified:
 head/sys/boot/common/ufsread.c

Modified: head/sys/boot/common/ufsread.c
==
--- head/sys/boot/common/ufsread.c  Wed Sep 14 16:47:17 2016
(r305813)
+++ head/sys/boot/common/ufsread.c  Wed Sep 14 17:43:32 2016
(r305814)
@@ -97,21 +97,21 @@ static __inline uint8_t
fsfind(const char *name, ufs_ino_t * ino)
{
static char buf[DEV_BSIZE];
-   struct direct *d;
+   static struct direct d;
char *s;
ssize_t n;


This looks like a good pessimization for space.  boot2 on i386 has to
fit in 8192 bytes and has a negative number to spare (features are
already left out).

With auto buffers, and also builtin memcpy, the compiler can optimize
the memcpy to nothing on arches with no alignment, and should do this
with -Os.  I think -ffreestanding in boot programs kills the builtin,
and this is almost intentional since compilers have poor support for
-Os so the inline memcpy is sometimes larger unless it is null.



fs_off = 0;
while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0)
for (s = buf; s < buf + DEV_BSIZE;) {
-   d = (void *)s;
+   memcpy(, s, sizeof(struct direct));
if (ls)
-   printf("%s ", d->d_name);
-   else if (!strcmp(name, d->d_name)) {
-   *ino = d->d_ino;
-   return d->d_type;
+   printf("%s ", d.d_name);
+   else if (!strcmp(name, d.d_name)) {
+   *ino = d.d_ino;
+   return d.d_type;
}
-   s += d->d_reclen;
+   s += d.d_reclen;
}
if (n != -1 && ls)
printf("\n");


The static buffer in the old code also looks like a pessimization for
space.  It doesn't seem to be needed to preserver or return results.
Compilers don't seem to be smart enough to see this and optimize it
to auto or vice versa.

Testing shows that the static buffer is space optimization for gcc
and a space pessimization for clang:

X #include 
X #include 
X 
X int

X foo(void)
X {
X   static char buf[101];
X   static int buf1[3];
X 
X 	read(0, buf, sizeof(buf));

X   memcpy(buf1, [2], sizeof(buf1));
X   return (buf[0] + buf1[0]);
X }

2 static buffers give the best pessimizations for clang -Os -m32.
Even with -Os, clang prefers to use large instructions to reduce the
number of instructions.  The static buffers give largest instructions,
and also prevent optimizing away the memcpy().  With auto buf1, it
takes -ffreestanding to kill the optimization of memcpy() and with
that optimization the size is reduced by almost a factor of 2.  With
auto buf too, other pessimizations are applied to give a size reduction
of just 1 byte.  gcc-4.2.1 never inlines the memcpy(), but it does
the other pessimizations less well to give an in-between size.

Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc,
33 for clang with autos and builtins, 28 for tweaked output for clang
with static buf, auto buf1 and builtins (null memcpy).

Probably the static buf is an intentional optimization for gcc.
Compilers do too good a job of generating large code for large stack
offsets.  In my test code, the offsets are about 101 which is less than
128 so it takes only 2 bytes.  DEV_BSIZE is 512 so it needs 5-byte
offsets from the usual pessimizations for the base pointer (these
are to point %ebp at the top of the variables and put all the scalar
variables below the buffer so that the offsets are largest).  But
clang also does good pessimizations for offsets from the static buffer:

pushl   $foo.buf# 5 bytes
movsbl  foo.buf, %eax   # 7 bytes
addlfoo.buf+2, %eax # 6 bytes

Actually optimizing for space:

movl$foo.buf,%ebx   # 5 bytes
pushl   %ebx# 1 byte
movsbl  (%ebx), %eax# 3 bytes
addl2(%ebx), %eax   # 3 bytes

Bruce
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


svn commit: r305814 - head/sys/boot/common

2016-09-14 Thread Emmanuel Vadot
Author: manu
Date: Wed Sep 14 17:43:32 2016
New Revision: 305814
URL: https://svnweb.freebsd.org/changeset/base/305814

Log:
  ufsread: Do not cast struct direct from void *
  This cause alignment problem on ARM (and possibly other archs), instead copy 
it.
  
  MFC after:1 week

Modified:
  head/sys/boot/common/ufsread.c

Modified: head/sys/boot/common/ufsread.c
==
--- head/sys/boot/common/ufsread.c  Wed Sep 14 16:47:17 2016
(r305813)
+++ head/sys/boot/common/ufsread.c  Wed Sep 14 17:43:32 2016
(r305814)
@@ -97,21 +97,21 @@ static __inline uint8_t
 fsfind(const char *name, ufs_ino_t * ino)
 {
static char buf[DEV_BSIZE];
-   struct direct *d;
+   static struct direct d;
char *s;
ssize_t n;
 
fs_off = 0;
while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0)
for (s = buf; s < buf + DEV_BSIZE;) {
-   d = (void *)s;
+   memcpy(, s, sizeof(struct direct));
if (ls)
-   printf("%s ", d->d_name);
-   else if (!strcmp(name, d->d_name)) {
-   *ino = d->d_ino;
-   return d->d_type;
+   printf("%s ", d.d_name);
+   else if (!strcmp(name, d.d_name)) {
+   *ino = d.d_ino;
+   return d.d_type;
}
-   s += d->d_reclen;
+   s += d.d_reclen;
}
if (n != -1 && ls)
printf("\n");
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"