Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Adam Heath
On Fri, 12 Jan 2001, Wichert Akkerman wrote:

 Previously Ben Collins wrote:
  My only concern now is, does 1.7.2 work if I recompile it against the
  current libc6-dev. If it does, then the thing to do is start checking
  the diff between these two versions for possible alignment issues.
 
 Run it on an alpha and you'll get alignment warninrs in the kernel log.

The part of the code it is having a sigbus(I wish I had an strace) in is not
in any code that I modified for the 1.8 series.  The actual file it is
erroring in has had very little changes since 1.7.2.

BEGIN GEEK CODE BLOCK
Version: 3.12
GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS--
PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z?
-END GEEK CODE BLOCK-
BEGIN PGP INFO
Adam Heath [EMAIL PROTECTED]Finger Print | KeyID
67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP
AD46 C888 F587 F8A3 A6DA  3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG
-END PGP INFO-



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
On Fri, Jan 12, 2001 at 01:36:23AM -0600, Adam Heath wrote:
 On Fri, 12 Jan 2001, Wichert Akkerman wrote:
 
  Previously Ben Collins wrote:
   My only concern now is, does 1.7.2 work if I recompile it against the
   current libc6-dev. If it does, then the thing to do is start checking
   the diff between these two versions for possible alignment issues.
  
  Run it on an alpha and you'll get alignment warninrs in the kernel log.
 
 The part of the code it is having a sigbus(I wish I had an strace) in is not
 in any code that I modified for the 1.8 series.  The actual file it is
 erroring in has had very little changes since 1.7.2.

The misalignment may be caused someplace else though. Memory alignment
is affected by the entire program, and may misalign parts used
elsewhere.

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
On Thu, Jan 11, 2001 at 10:59:03PM -0500, Ben Collins wrote:
 On Thu, Jan 11, 2001 at 10:24:52PM +0100, Tomas Berndtsson wrote:
  Ben Collins [EMAIL PROTECTED] writes:
  
   I originally thought it was a kernel issue from a user. Then when it
   happened to be, I thought it was a kernel issue. After trying out an
   older kernel, known to work well, I thought it was libc6, then I find
   out that dpkg 1.7.2 works perfectly well. So someone needs to figure out
   why this thing is giving a sigbus.
   
   Sparc users, keep your old dpkg on hold, don't upgrade it.
  
  Bus error, and only on the Sparc, sounds like a typical alignment
  problem, but I wouldn't know where to start looking for it in the
  sources.
 
 My only concern now is, does 1.7.2 work if I recompile it against the
 current libc6-dev. If it does, then the thing to do is start checking
 the diff between these two versions for possible alignment issues.

Well, I tested a recompile of 1.7.2, and it worked fine. I then tried
1.8.0, and it barfed. So the problem showed up there. I am now going
over the diff's to see what might have happened (there were a lot of
changes between those two versions).

Ben

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
Ok, I've narrowed down the offending code to this 40k patch. Note, it
has nothing to do with the zlib using code, since I already tried
compiling --without-zlib, and it still gives a sigbus. If I take this
patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus.
I've spent way too much time messing with this already, so I'm turning
it over to someone (doogie?) else to work with.

Also Adam, I noticed in your stream/fd code, you have a return of type
ssize_t. You set this value with either read() or fread(), but note that
fread() returns size_t, not ssize_t like read(). You need to cast that
value, or better handle it.

Ben

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
damnit, here's the patch.

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'


main-dpkg.diff.bz2
Description: Binary data


Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Tomas Berndtsson
Ben Collins [EMAIL PROTECTED] writes:

 Ok, I've narrowed down the offending code to this 40k patch. Note, it
 has nothing to do with the zlib using code, since I already tried
 compiling --without-zlib, and it still gives a sigbus. If I take this
 patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus.
 I've spent way too much time messing with this already, so I'm turning
 it over to someone (doogie?) else to work with.

I couldn't find the error, but I found the exact place where it gets
the Bus error. In the file main/processarc.c, function
process_archive(), where it looks like this (the fprintf's are mine):


  debug(dbg_eachfile, process_archive: checking %s for same files on 
  upgrade/downgrade, fnamevb.buf);
  if (!lstat(fnamevb.buf, oldfs)  !S_ISDIR(oldfs.st_mode)) {
for (cfile = newfileslist; cfile; cfile = cfile-next) {
  if (!cfile-namenode-filestat) {
cfile-namenode-filestat = (struct stat *) nfmalloc(sizeof(struct 
stat));
fprintf(stderr, %s(%d): cfile=%p\n, __FILE__, __LINE__, cfile);
fprintf(stderr, %s(%d): cfile-namenode=%p\n, __FILE__, __LINE__, 
cfile-namenode);
fprintf(stderr, %s(%d): cfile-namenode-filestat=%p\n, __FILE__, __LINE__, 
cfile-namenode-filestat);
fprintf(stderr, %s(%d): cfile-namenode-name=%p\n, __FILE__, __LINE__, 
cfile-namenode-name);
fprintf(stderr, %s(%d): cfile-namenode-name='%s'\n, __FILE__, __LINE__, 
cfile-namenode-name);
if (lstat(cfile-namenode-name, cfile-namenode-filestat)) {
fprintf(stderr, %s(%d)\n, __FILE__, __LINE__);
  cfile-namenode-filestat= 0;
  continue;
}
fprintf(stderr, %s(%d)\n, __FILE__, __LINE__);
  }


The printout of these fprintf's is:

/home/tomas/src/dpkg/main/processarc.c(603): cfile=0xadfc0
/home/tomas/src/dpkg/main/processarc.c(604): cfile-namenode=0x55086c
/home/tomas/src/dpkg/main/processarc.c(605): cfile-namenode-filestat=0x8496ac
/home/tomas/src/dpkg/main/processarc.c(606): cfile-namenode-name=0x550894
/home/tomas/src/dpkg/main/processarc.c(607): cfile-namenode-name='/.'
Bus error


If I change the offending line

if (lstat(cfile-namenode-name, cfile-namenode-filestat)) {

into

if (1 || lstat(cfile-namenode-name, cfile-namenode-filestat)) {

as if the lstat always fails, the installation will proceed, and
finish without crashes.

It always crashes the first time it reaches this point. I can't see
anything weird with the pointers printed out above. They all seem
properly aligned to me. Since it crashes in lstat(), could it be a
libc bug after all?

Well, I hope you guys understand more of this, than I do. :)


Greetings,

Tomas



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Adam Heath
The following patch(taking from suggestions in bug#74259) is the culprit(as
verified by Ben).

Index: lib/nfmalloc.c
===
RCS file: /cvs/dpkg/dpkg/lib/nfmalloc.c,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- lib/nfmalloc.c  1999/01/29 08:54:00 1.1
+++ lib/nfmalloc.c  2000/12/20 07:33:47 1.2
@@ -30,7 +30,7 @@
 #define UNIQUE  4096
 
 union maxalign {
-  long l; long double d;
+  long l;
   void *pv; char *pc; union maxalign *ps; void (*pf)(void);
 };
 
I am not sure if he did my suggestion of moving the long l to the last field
in the union, to see if that helps it.

BEGIN GEEK CODE BLOCK
Version: 3.12
GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS--
PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z?
-END GEEK CODE BLOCK-
BEGIN PGP INFO
Adam Heath [EMAIL PROTECTED]Finger Print | KeyID
67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP
AD46 C888 F587 F8A3 A6DA  3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG
-END PGP INFO-



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
On Fri, Jan 12, 2001 at 08:39:49PM +0100, Tomas Berndtsson wrote:
 Ben Collins [EMAIL PROTECTED] writes:
 
  Ok, I've narrowed down the offending code to this 40k patch. Note, it
  has nothing to do with the zlib using code, since I already tried
  compiling --without-zlib, and it still gives a sigbus. If I take this
  patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus.
  I've spent way too much time messing with this already, so I'm turning
  it over to someone (doogie?) else to work with.
 
 I couldn't find the error, but I found the exact place where it gets
 the Bus error. In the file main/processarc.c, function
 process_archive(), where it looks like this (the fprintf's are mine):

Adam and I already found the error. He modifed a union in nfmalloc.c
(oddly enough called maxalign), that through it off kilter. I want to
see if I can fix the problem, and still retain his change (it was part
of a memory savings patch).

Ben

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-12 Thread Ben Collins
On Fri, Jan 12, 2001 at 01:47:14PM -0600, Adam Heath wrote:
  
 I am not sure if he did my suggestion of moving the long l to the last field
 in the union, to see if that helps it.
 

Actually, it doesn't help. Union ordering isn't really significant, it's
just that the long double apprently sets the alignment for the struct.
So you'll need to reverse this patch.

IMO, dpkg needs to be converted to using malloc/alloc/obstack anyway.
These internal functions (varbuf, nfmalloc) are doing nothing but code
bloat. I think Ian had these things around back when libc wasn't as
stable/robust as it is now. They just aren't needed anymore.

Ben

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Wichert Akkerman
Previously Ben Collins wrote:
 Sparc users, keep your old dpkg on hold, don't upgrade it.

No, do upgrade it and try to figure out where exactly it goes wrong.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Tomas Berndtsson
Ben Collins [EMAIL PROTECTED] writes:

 I originally thought it was a kernel issue from a user. Then when it
 happened to be, I thought it was a kernel issue. After trying out an
 older kernel, known to work well, I thought it was libc6, then I find
 out that dpkg 1.7.2 works perfectly well. So someone needs to figure out
 why this thing is giving a sigbus.
 
 Sparc users, keep your old dpkg on hold, don't upgrade it.

Bus error, and only on the Sparc, sounds like a typical alignment
problem, but I wouldn't know where to start looking for it in the
sources.


Greetings,

Tomas



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Adam Heath
On Thu, 11 Jan 2001, Wichert Akkerman wrote:

 Previously Ben Collins wrote:
  Sparc users, keep your old dpkg on hold, don't upgrade it.
 
 No, do upgrade it and try to figure out where exactly it goes wrong.

Btw, I'll look at this when I get home later tonight.  This isn't saying I
know what is wrong tho.

BEGIN GEEK CODE BLOCK
Version: 3.12
GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS--
PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z?
-END GEEK CODE BLOCK-
BEGIN PGP INFO
Adam Heath [EMAIL PROTECTED]Finger Print | KeyID
67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP
AD46 C888 F587 F8A3 A6DA  3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG
-END PGP INFO-



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Tomas Berndtsson
Wichert Akkerman [EMAIL PROTECTED] writes:

 Previously Ben Collins wrote:
  Sparc users, keep your old dpkg on hold, don't upgrade it.
 
 No, do upgrade it and try to figure out where exactly it goes wrong.

See my original mail to debian-devel and debian-sparc for a piece of
the dpkg -D output. I sent it earlier today.


Tomas



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Ben Collins
On Thu, Jan 11, 2001 at 10:13:24PM +0100, Wichert Akkerman wrote:
 Previously Ben Collins wrote:
  Sparc users, keep your old dpkg on hold, don't upgrade it.
 
 No, do upgrade it and try to figure out where exactly it goes wrong.

No, I don't want them to, because I will. I almost crapped up the sparc
buildd because dpkg died in the middle of the libc6 install. Don't
encourage users to install broken software.

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'



Re: dpkg 1.8.1.2 gives SIGBUS on sparc

2001-01-11 Thread Ben Collins
On Thu, Jan 11, 2001 at 10:24:52PM +0100, Tomas Berndtsson wrote:
 Ben Collins [EMAIL PROTECTED] writes:
 
  I originally thought it was a kernel issue from a user. Then when it
  happened to be, I thought it was a kernel issue. After trying out an
  older kernel, known to work well, I thought it was libc6, then I find
  out that dpkg 1.7.2 works perfectly well. So someone needs to figure out
  why this thing is giving a sigbus.
  
  Sparc users, keep your old dpkg on hold, don't upgrade it.
 
 Bus error, and only on the Sparc, sounds like a typical alignment
 problem, but I wouldn't know where to start looking for it in the
 sources.

My only concern now is, does 1.7.2 work if I recompile it against the
current libc6-dev. If it does, then the thing to do is start checking
the diff between these two versions for possible alignment issues.

-- 
 ---===-=-==-=---==-=--
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  --  [EMAIL PROTECTED]  '
 `---=--===-=-=-=-===-==---=--=---'