Re: odd behaviour of some programs on i386 cross-built from amd64

2020-04-27 Thread Greg A. Woods
At Wed, 22 Apr 2020 21:08:46 -0700, "Greg A. Woods"  wrote:
Subject: odd behaviour of some programs on i386 cross-built from amd64
>
>   # od
>   od: "8/2  " %06o " "\n"": bad format
>   # file /usr/bin/od
>   /usr/bin/od: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
> statically linked, for NetBSD 8.99.32, stripped

So at least the problems with 'awk' and 'od' seem to be related somehow
to the kernels I built.

I found this by installing a stock 9.0/i386 on real i386 hardware (after
repairing said hardware -- it had not run in years) and finding it could
run the static-linked 'od' I had previously built without problems.

Curious I decided to cross-build the same source tree on the newly
installed 9.0 system and, perhaps without surprise, it generated an
identical binary for, e.g. 'od':

$ cd $MY_DESTDIR
$ cmp /future/build/woods/future/current-i386-ppro-destdir/usr/bin/od usr/bin/od
-r-xr-xr-x  2 woods  wheel  244288 Apr 16 15:38 
/future/build/woods/future/current-i386-ppro-destdir/usr/bin/od
-r-xr-xr-x  2 woods  wheel  244288 Apr 26 02:36 usr/bin/od
$

Similarly 'awk' looks the same works just fine.

One of the kernels seems to be the same too, assuming one takes into
account the obvious difference in vers.o:

   textdata bss dec hex filename
20222422 515508  926144 2166407414a914a 
sys/arch/i386/compile/MONOLITHIC/netbsd
20222382 515508  926144 2166403414a9122 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/MONOLITHIC/netbsd

   textdata bss dec hex filename
795   0   0 795 31b sys/arch/i386/compile/MONOLITHIC/vers.o
755   0   0 755 2f3 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/MONOLITHIC/vers.o



However the other kernel I've tested is somehow not the same:

   textdata bss dec hex filename
4776885   80232 1347584 6204701  5ead1d 
sys/arch/i386/compile/XEN3PAE_DOMU/netbsd
4776585   80232 1347584 6204401  5eabf1 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/netbsd

   textdata bss dec hex filename
799   0   0 799 31f 
sys/arch/i386/compile/XEN3PAE_DOMU/vers.o
791   0   0 791 317 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/vers.o



Next I'll try testing these kernels on Xen and the Soekris box.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpKTUf0bQ9F4.pgp
Description: OpenPGP Digital Signature


Re: odd behaviour of some programs on i386 cross-built from amd64

2020-04-28 Thread Greg A. Woods
At Mon, 27 Apr 2020 21:17:04 -0700, "Greg A. Woods"  wrote:
Subject: Re: odd behaviour of some programs on i386 cross-built from amd64
>
> One of the kernels seems to be the same too, assuming one takes into
> account the obvious difference in vers.o:
>
>textdata bss dec hex filename
> 20222422 515508  926144 2166407414a914a 
> sys/arch/i386/compile/MONOLITHIC/netbsd
> 20222382 515508  926144 2166403414a9122 
> /future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/MONOLITHIC/netbsd
>
>textdata bss dec hex filename
> 795   0   0 795 31b 
> sys/arch/i386/compile/MONOLITHIC/vers.o
> 755   0   0 755 2f3 
> /future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/MONOLITHIC/vers.o

Indeed with a direct "cmp" of the .o files in each build, all are
identical except for vers.o and debugsyms.o -- the latter differs only
in one pathname to the compile directory, which is also to be expected
since the path prefixes differ between the two build environments.

Unfortunately this native-built kernel also works identically on the
Soekris hardware (as expected, since there are no difference in the
object code), i.e. causing the same odd behaviour to a few odd binaries.

I guess now I'll have to dig into my local kernel changes to see what
might be incompatible with a 32-bit system.  Maybe I can also try an ARM
build to see if the same problems happen on an RPi or Beaglebone.


> However the other kernel I've tested is somehow not the same:

The differences are few, but inexplicable.  Here are all the object
files which differ in the XEN3PAE_DOMU builds:

/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/arc4.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/arc4.o
 differ: char 33, line 1
/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/bf_ecb.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/bf_ecb.o
 differ: char 33, line 1
/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/camellia-api.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/camellia-api.o
 differ: char 33, line 1
/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/debugsyms.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/debugsyms.o
 differ: char 180, line 1
/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/quota1_subr.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/quota1_subr.o
 differ: char 33, line 1
/build/woods/once.local/current-i386-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/vers.o
 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/sys/arch/i386/compile/XEN3PAE_DOMU/vers.o
 differ: char 33, line 1

Again debugsyms.o and vers.o differences can be ignored, but the rest
are so far inexplicably different.

All are built from the same source tree, with the same build options.
The only difference is one build host ("future") is amd64 (running under
Xen, also built from the same source tree), and the other ("once") is
running on native i386 hardware (an old Dell PE2650, but running stock
9.0).

(this sdiff output shows best at 132 column width)

sys/arch/i386/compile/XEN3PAE_DOMU/arc4.o: file format elf3 | 
/future/build/woods/future/current-amd64-i386-ppro-obj/usr/src/


Disassembly of section .text: Disassembly 
of section .text:

 :    
:
   0:   55  push   %ebp |0: b8 08 
04 00 00  mov$0x408,%eax
   1:   b8 08 04 00 00  mov$0x408,%eax  |5: c3  
ret
   6:   89 e5   mov%esp,%ebp|6: 8d b4 
26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
   8:   5d  pop%ebp |d: 8d 76 
00lea0x0(%esi),%esi
   9:   c3  ret <
   a:   8d b6 00 00 00 00   lea0x0(%esi),%esi   <

0010 :   0010 
:
  10:   55  push   %ebp 10: 55  
push   %ebp
  11:   31 c0   xor%eax,%eax11: 31 c0   
xor%eax,%eax
  13:   89 e5   mov%esp,%ebp|   13: 57 

Re: odd behaviour of some programs on i386 cross-built from amd64

2020-05-03 Thread Greg A. Woods
At Tue, 28 Apr 2020 16:05:10 -0700, "Greg A. Woods"  wrote:
Subject: Re: odd behaviour of some programs on i386 cross-built from amd64
>
> I guess now I'll have to dig into my local kernel changes to see what
> might be incompatible with a 32-bit system.  Maybe I can also try an ARM
> build to see if the same problems happen on an RPi or Beaglebone.

I'm still working on the evbarm build -- there were a number of weird
little botches I have had to fix to get the build to go all the way
through to a complete release, but it's nearly there.

In the mean time I finally remembered to try running old binaries from
previous releases, and they work A-OK, so this may give me a way forward
to debugging what's going wrong with the new binaries (and maybe make
the Soekris machine more usable in the mean time too):

# pwd
/building/build/woods/building/netbsd-5-i386-ppro-destdir
# ll usr/bin/od
368 -r-xr-xr-x  2 woods  wheel  187404 Jun  5  2016 usr/bin/od
# file usr/bin/od
usr/bin/od: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
statically linked, for NetBSD 5.2, stripped
# usr/bin/od
asdlkfj
000   071541  066144  063153  005152
010
# /usr/bin/od
od: "8/2  " %06o " "\n"": bad format


However at the moment while the old gdb (6.5 from 5.2_STABLE) works, it
won't read symbols from a new binary ("Dwarf Error"):


# file hexdump
hexdump: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically 
linked, for NetBSD 8.99.32, with debug_info, not stripped
# old-gdb hexdump
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...Dwarf Error: wrong version in 
compilation unit header (is 4, should be 2) [in module 
/future/build/woods/future/current-amd64-i386-ppro-obj/more/work/woods/m-NetBSD-current/usr.bin/hexdump/hexdump]

(gdb) run
Starting program: 
/future/build/woods/future/current-amd64-i386-ppro-obj/more/work/woods/m-NetBSD-current/usr.bin/hexdump/hexdump
warning: shared library handler failed to enable breakpoint
hexdump: ""%07.7_ax " 8/2 "%04x " "\n"": bad format

Program exited with code 01.
(gdb) break add
No symbol table is loaded.  Use the "file" command.
(gdb)

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpZGfUOA5T1T.pgp
Description: OpenPGP Digital Signature


SOLVED (mostly) Re: odd behaviour of some programs on i386 cross-built from amd64

2020-05-09 Thread Greg A. Woods
So, I can now report I've been a victim of my own aging eyes and
clumsiness.  :-)

In summary the problem was due to accidentally typing an errant
character in a source file while browsing it (sometime back in
February), and worse yet I saved it without knowing I had done so, and
further having the bad luck for that character to not trigger any errors
during compilation.

The errant character was a tilde ('~'), and it landed at the beginning
of line #122 in src/lib/libc/gen/ctype_.c.  Obviously, but unfortunately
for me, this did not generate a syntax error, but instead just changed
the value of one entry in the _ctype_tab_ table (the one for the space
character).

The long version of the story is that after all the previously mentioned
problems with debuggers, etc., I started debugging by inserting some
better error messages in usr.bin/hexdump/parse.c to see if I could
discover exactly what line the problem was occurring on, and sure enough
it seemed to be with the  macros.  Then I found I was able to
work around the problem by locally defining a naive version of isdigit()
(probably (I have not verified) this worked because the new value for
the space character in _ctype_tab_ was now identified as a digit, and my
naive replacement avoided this problem).

The final mystery is why the affected programs work when run with either
a newer kernel, or on amd64.  Although I can reproduce the bug in
hexdump, I cannot seem to reproduce it exactly.  I.e. if I reproduce the
bug by locally defining _ctype_tab_ et al with the errant value then
hexdump, when compiled for i386, exhibits the same problem on both i386
and amd64 with matching and newer kernels.  I.e. the reproduced bug does
not disappear in the scenarios where it disappeared before.  The old
buggy binary still only exhibits the bug only on a real i386 with a
matching kernel, and of course it still works OK on both amd64 with a
matching kernel and on a real i386 with a newer kernel.  Keep in mind
this is a static-linked binary.

Here's the buggy version working fine on a real i386 with a newer kernel:

$ uname -a
NetBSD once.local 9.0 NetBSD 9.0 (GENERIC) #0: Fri Feb 14 00:06:28 UTC 2020  
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/i386/compile/GENERIC i386
$ /more/home/more/woods/tmp/hexdump-
asdf
000 7361 6664 000a
005
$ file /more/home/more/woods/tmp/hexdump-
/more/home/more/woods/tmp/hexdump-: ELF 32-bit LSB executable, Intel 80386, 
version 1 (SYSV), statically linked, for NetBSD 8.99.32, stripped

Here's the buggy version working fine on amd64 (with a matching kernel):

$ uname -a
NetBSD future 8.99.32 NetBSD 8.99.32 (XEN3_DOMU) #1: Thu Nov 28 18:31:36 PST 
2019  
woods@future:/build/woods/future/current-amd64-amd64-obj/more/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEN3_DOMU
 amd64
$ ~/tmp/hexdump-
asdf
000 7361 6664 000a
005

Here's the buggy version failing on a real i386 with a matching kernel:

$ uname -a
NetBSD lilbit 8.99.32 NetBSD 8.99.32 (NET5501) #3: Fri May  1 16:55:04 PDT 2020 
 
woods@once.local:/build/woods/once.local/current-i386-i386-ppro-obj/more/work/woods/m-NetBSD-current/sys/arch/i386/compile/NET5501
 i386
$ /more/home/more/woods/tmp/hexdump-
hexdump-: ""%07.7_ax " 8/2 "%04x " "\n"": bad format


I guess the most interesting test would be to step instruction by
instruction through the execution on the real i386 with a newer kernel
and see if I can understand how it manages to work.  I don't think I
kept a copy of hexdump.debug though -- I may have to rebuild the whole
tree with the original error to make that less arduous to do.  Oh well,
I guess it only takes about 4 hours on my speediest build machine.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpL_axIbcpDS.pgp
Description: OpenPGP Digital Signature