Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Fri, 12 Jan 2001, Wichert Akkerman wrote: Previously Ben Collins wrote: My only concern now is, does 1.7.2 work if I recompile it against the current libc6-dev. If it does, then the thing to do is start checking the diff between these two versions for possible alignment issues. Run it on an alpha and you'll get alignment warninrs in the kernel log. The part of the code it is having a sigbus(I wish I had an strace) in is not in any code that I modified for the 1.8 series. The actual file it is erroring in has had very little changes since 1.7.2. BEGIN GEEK CODE BLOCK Version: 3.12 GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS-- PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z? -END GEEK CODE BLOCK- BEGIN PGP INFO Adam Heath [EMAIL PROTECTED]Finger Print | KeyID 67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP AD46 C888 F587 F8A3 A6DA 3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG -END PGP INFO-
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Fri, Jan 12, 2001 at 01:36:23AM -0600, Adam Heath wrote: On Fri, 12 Jan 2001, Wichert Akkerman wrote: Previously Ben Collins wrote: My only concern now is, does 1.7.2 work if I recompile it against the current libc6-dev. If it does, then the thing to do is start checking the diff between these two versions for possible alignment issues. Run it on an alpha and you'll get alignment warninrs in the kernel log. The part of the code it is having a sigbus(I wish I had an strace) in is not in any code that I modified for the 1.8 series. The actual file it is erroring in has had very little changes since 1.7.2. The misalignment may be caused someplace else though. Memory alignment is affected by the entire program, and may misalign parts used elsewhere. -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Thu, Jan 11, 2001 at 10:59:03PM -0500, Ben Collins wrote: On Thu, Jan 11, 2001 at 10:24:52PM +0100, Tomas Berndtsson wrote: Ben Collins [EMAIL PROTECTED] writes: I originally thought it was a kernel issue from a user. Then when it happened to be, I thought it was a kernel issue. After trying out an older kernel, known to work well, I thought it was libc6, then I find out that dpkg 1.7.2 works perfectly well. So someone needs to figure out why this thing is giving a sigbus. Sparc users, keep your old dpkg on hold, don't upgrade it. Bus error, and only on the Sparc, sounds like a typical alignment problem, but I wouldn't know where to start looking for it in the sources. My only concern now is, does 1.7.2 work if I recompile it against the current libc6-dev. If it does, then the thing to do is start checking the diff between these two versions for possible alignment issues. Well, I tested a recompile of 1.7.2, and it worked fine. I then tried 1.8.0, and it barfed. So the problem showed up there. I am now going over the diff's to see what might have happened (there were a lot of changes between those two versions). Ben -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
Ok, I've narrowed down the offending code to this 40k patch. Note, it has nothing to do with the zlib using code, since I already tried compiling --without-zlib, and it still gives a sigbus. If I take this patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus. I've spent way too much time messing with this already, so I'm turning it over to someone (doogie?) else to work with. Also Adam, I noticed in your stream/fd code, you have a return of type ssize_t. You set this value with either read() or fread(), but note that fread() returns size_t, not ssize_t like read(). You need to cast that value, or better handle it. Ben -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
damnit, here's the patch. -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---' main-dpkg.diff.bz2 Description: Binary data
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
Ben Collins [EMAIL PROTECTED] writes: Ok, I've narrowed down the offending code to this 40k patch. Note, it has nothing to do with the zlib using code, since I already tried compiling --without-zlib, and it still gives a sigbus. If I take this patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus. I've spent way too much time messing with this already, so I'm turning it over to someone (doogie?) else to work with. I couldn't find the error, but I found the exact place where it gets the Bus error. In the file main/processarc.c, function process_archive(), where it looks like this (the fprintf's are mine): debug(dbg_eachfile, process_archive: checking %s for same files on upgrade/downgrade, fnamevb.buf); if (!lstat(fnamevb.buf, oldfs) !S_ISDIR(oldfs.st_mode)) { for (cfile = newfileslist; cfile; cfile = cfile-next) { if (!cfile-namenode-filestat) { cfile-namenode-filestat = (struct stat *) nfmalloc(sizeof(struct stat)); fprintf(stderr, %s(%d): cfile=%p\n, __FILE__, __LINE__, cfile); fprintf(stderr, %s(%d): cfile-namenode=%p\n, __FILE__, __LINE__, cfile-namenode); fprintf(stderr, %s(%d): cfile-namenode-filestat=%p\n, __FILE__, __LINE__, cfile-namenode-filestat); fprintf(stderr, %s(%d): cfile-namenode-name=%p\n, __FILE__, __LINE__, cfile-namenode-name); fprintf(stderr, %s(%d): cfile-namenode-name='%s'\n, __FILE__, __LINE__, cfile-namenode-name); if (lstat(cfile-namenode-name, cfile-namenode-filestat)) { fprintf(stderr, %s(%d)\n, __FILE__, __LINE__); cfile-namenode-filestat= 0; continue; } fprintf(stderr, %s(%d)\n, __FILE__, __LINE__); } The printout of these fprintf's is: /home/tomas/src/dpkg/main/processarc.c(603): cfile=0xadfc0 /home/tomas/src/dpkg/main/processarc.c(604): cfile-namenode=0x55086c /home/tomas/src/dpkg/main/processarc.c(605): cfile-namenode-filestat=0x8496ac /home/tomas/src/dpkg/main/processarc.c(606): cfile-namenode-name=0x550894 /home/tomas/src/dpkg/main/processarc.c(607): cfile-namenode-name='/.' Bus error If I change the offending line if (lstat(cfile-namenode-name, cfile-namenode-filestat)) { into if (1 || lstat(cfile-namenode-name, cfile-namenode-filestat)) { as if the lstat always fails, the installation will proceed, and finish without crashes. It always crashes the first time it reaches this point. I can't see anything weird with the pointers printed out above. They all seem properly aligned to me. Since it crashes in lstat(), could it be a libc bug after all? Well, I hope you guys understand more of this, than I do. :) Greetings, Tomas
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
The following patch(taking from suggestions in bug#74259) is the culprit(as verified by Ben). Index: lib/nfmalloc.c === RCS file: /cvs/dpkg/dpkg/lib/nfmalloc.c,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- lib/nfmalloc.c 1999/01/29 08:54:00 1.1 +++ lib/nfmalloc.c 2000/12/20 07:33:47 1.2 @@ -30,7 +30,7 @@ #define UNIQUE 4096 union maxalign { - long l; long double d; + long l; void *pv; char *pc; union maxalign *ps; void (*pf)(void); }; I am not sure if he did my suggestion of moving the long l to the last field in the union, to see if that helps it. BEGIN GEEK CODE BLOCK Version: 3.12 GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS-- PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z? -END GEEK CODE BLOCK- BEGIN PGP INFO Adam Heath [EMAIL PROTECTED]Finger Print | KeyID 67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP AD46 C888 F587 F8A3 A6DA 3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG -END PGP INFO-
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Fri, Jan 12, 2001 at 08:39:49PM +0100, Tomas Berndtsson wrote: Ben Collins [EMAIL PROTECTED] writes: Ok, I've narrowed down the offending code to this 40k patch. Note, it has nothing to do with the zlib using code, since I already tried compiling --without-zlib, and it still gives a sigbus. If I take this patch and do a -R with it on a 1.8.0 tree, and compile, I get no sigbus. I've spent way too much time messing with this already, so I'm turning it over to someone (doogie?) else to work with. I couldn't find the error, but I found the exact place where it gets the Bus error. In the file main/processarc.c, function process_archive(), where it looks like this (the fprintf's are mine): Adam and I already found the error. He modifed a union in nfmalloc.c (oddly enough called maxalign), that through it off kilter. I want to see if I can fix the problem, and still retain his change (it was part of a memory savings patch). Ben -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Fri, Jan 12, 2001 at 01:47:14PM -0600, Adam Heath wrote: I am not sure if he did my suggestion of moving the long l to the last field in the union, to see if that helps it. Actually, it doesn't help. Union ordering isn't really significant, it's just that the long double apprently sets the alignment for the struct. So you'll need to reverse this patch. IMO, dpkg needs to be converted to using malloc/alloc/obstack anyway. These internal functions (varbuf, nfmalloc) are doing nothing but code bloat. I think Ian had these things around back when libc wasn't as stable/robust as it is now. They just aren't needed anymore. Ben -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
Previously Ben Collins wrote: Sparc users, keep your old dpkg on hold, don't upgrade it. No, do upgrade it and try to figure out where exactly it goes wrong. Wichert. -- / Generally uninteresting signature - ignore at your convenience \ | [EMAIL PROTECTED] http://www.liacs.nl/~wichert/ | | 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
Ben Collins [EMAIL PROTECTED] writes: I originally thought it was a kernel issue from a user. Then when it happened to be, I thought it was a kernel issue. After trying out an older kernel, known to work well, I thought it was libc6, then I find out that dpkg 1.7.2 works perfectly well. So someone needs to figure out why this thing is giving a sigbus. Sparc users, keep your old dpkg on hold, don't upgrade it. Bus error, and only on the Sparc, sounds like a typical alignment problem, but I wouldn't know where to start looking for it in the sources. Greetings, Tomas
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Thu, 11 Jan 2001, Wichert Akkerman wrote: Previously Ben Collins wrote: Sparc users, keep your old dpkg on hold, don't upgrade it. No, do upgrade it and try to figure out where exactly it goes wrong. Btw, I'll look at this when I get home later tonight. This isn't saying I know what is wrong tho. BEGIN GEEK CODE BLOCK Version: 3.12 GCS d- s: a-- c+++ UL P+ L !E W+ M o+ K- W--- !O M- !V PS-- PE++ Y+ PGP++ t* 5++ X+ tv b+ D++ G e h*! !r z? -END GEEK CODE BLOCK- BEGIN PGP INFO Adam Heath [EMAIL PROTECTED]Finger Print | KeyID 67 01 42 93 CA 37 FB 1E63 C9 80 1D 08 CF 84 0A | DE656B05 PGP AD46 C888 F587 F8A3 A6DA 3261 8A2C 7DC2 8BD4 A489 | 8BD4A489 GPG -END PGP INFO-
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
Wichert Akkerman [EMAIL PROTECTED] writes: Previously Ben Collins wrote: Sparc users, keep your old dpkg on hold, don't upgrade it. No, do upgrade it and try to figure out where exactly it goes wrong. See my original mail to debian-devel and debian-sparc for a piece of the dpkg -D output. I sent it earlier today. Tomas
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Thu, Jan 11, 2001 at 10:13:24PM +0100, Wichert Akkerman wrote: Previously Ben Collins wrote: Sparc users, keep your old dpkg on hold, don't upgrade it. No, do upgrade it and try to figure out where exactly it goes wrong. No, I don't want them to, because I will. I almost crapped up the sparc buildd because dpkg died in the middle of the libc6 install. Don't encourage users to install broken software. -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'
Re: dpkg 1.8.1.2 gives SIGBUS on sparc
On Thu, Jan 11, 2001 at 10:24:52PM +0100, Tomas Berndtsson wrote: Ben Collins [EMAIL PROTECTED] writes: I originally thought it was a kernel issue from a user. Then when it happened to be, I thought it was a kernel issue. After trying out an older kernel, known to work well, I thought it was libc6, then I find out that dpkg 1.7.2 works perfectly well. So someone needs to figure out why this thing is giving a sigbus. Sparc users, keep your old dpkg on hold, don't upgrade it. Bus error, and only on the Sparc, sounds like a typical alignment problem, but I wouldn't know where to start looking for it in the sources. My only concern now is, does 1.7.2 work if I recompile it against the current libc6-dev. If it does, then the thing to do is start checking the diff between these two versions for possible alignment issues. -- ---===-=-==-=---==-=-- / Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \ ` [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- [EMAIL PROTECTED] ' `---=--===-=-=-=-===-==---=--=---'