Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
Edmund Grimley Evans: >> http://xcas.e.ujf-grenoble.fr/XCAS/viewtopic.php?p=8963#p8963 >> >> Do you have any suggestions on how to move forward? The easiest option is >> just to give the test two possible things to diff against, but this buries >> the issue and does not really solve it. > > That looks worrying. It might be a real bug. > > If I've understood correctly, you get different behaviour with and > without the patch on amd64. But the patch consists of a load of > independent changes. So, if we can't think of anything else, there's > the option here of doing a bisection search to find out which hunk of > the patch causes the difference. (Though it could be more complicated > I suppose: like whether an odd or even number of hunks are applied...) > Not amd64 but arm64 - the Debian name for aarch64 / armv8. But yes to the other parts of what you said. Alright, thanks for the tips, I'll try the bisect when I get some time. Actually there was a paper posted to Hacker News yesterday: https://www.st.cs.uni-saarland.de/publications/files/zeller-esec-1999.pdf whose algorithm would be perfect for this sort of thing, unfortunately I don't think it was released as a piece of executable software :( > Is -1 cast to a pointer being used as a special value somewhere? That > value would not survive being packed and unpacked. > >> Another thing now: your robopatch results in the following patch: >> >> https://anonscm.debian.org/cgit/debian-science/packages/giac.git/tree/debian/patches/fix-48-bit-addr.patch?id=075cd498f2590ed067e73da827a5cb07b4d1aa5b >> >> As you can see, it makes some changes to src/cocoa.cc that are not guarded >> by #ifdef SMARTPTR64 conditions. Judging by your perl expression, I guess >> this should also be unpatched? > > I think that code in cocoa.cc is wrong either way: (1<<31) is an > overflow already, whatever you cast it to afterwards. It should > probably be (1LL<31), and then there's no need to convert it to > longlong or ulonglong, i.e.: gen p(int((1LL<<31)-1)) > I see right, according to the C/C++ standards you shouldn't perform operations that require more than 16 bits on these. But I think the existing results that we're getting probably wouldn't be affected since they are running on machines where ints do have >= 32 bits so it wouldn't be overflowing in practice in these cases. >> Similarly, src/ifactor.cc and the third hunk of src/vecteur.cc should >> probably be reverted just for "neatness" purposes, but I don't think this >> would have affected any of the results described. > > src/ifactor.cc looks like a false positive: the << is not a shift. So > revert that. > > Third hunk of vecteur.cc should make no difference either way. > > So I'd recommend trying to discover which part of that patch changes > the test result on amd64, and maybe it will then be possible to > understand why... > X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
On Thu, 29 Jun 2017 20:40:26 +0100 Edmund Grimley Evanswrote: > This robopatch seems to fix the problem on arm64 with 48-bit addresses: > > perl -i -pe 's/longlong/ulonglong/g if /\(\s*longlong.*(<<|>>)/ && > !/gen\(longlong/;' src/*.cc > > The idea is to change the type whenever there seems to be a cast > followed by a shift. The last condition is to avoid a couple of > harmful false positives. > > [..] Hey Edmund, thanks for all your help with this! I've tested your robopatch and it works. However, now I'm experiencing this issue: http://xcas.e.ujf-grenoble.fr/XCAS/viewtopic.php?p=8963#p8963 Do you have any suggestions on how to move forward? The easiest option is just to give the test two possible things to diff against, but this buries the issue and does not really solve it. Another thing now: your robopatch results in the following patch: https://anonscm.debian.org/cgit/debian-science/packages/giac.git/tree/debian/patches/fix-48-bit-addr.patch?id=075cd498f2590ed067e73da827a5cb07b4d1aa5b As you can see, it makes some changes to src/cocoa.cc that are not guarded by #ifdef SMARTPTR64 conditions. Judging by your perl expression, I guess this should also be unpatched? I tried this, things still work, unfortunately chk_fhan16 still fails. But from what I understand of your explanation, it would be best to leave this part out of the patch. Is that right? Similarly, src/ifactor.cc and the third hunk of src/vecteur.cc should probably be reverted just for "neatness" purposes, but I don't think this would have affected any of the results described. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
This robopatch seems to fix the problem on arm64 with 48-bit addresses: perl -i -pe 's/longlong/ulonglong/g if /\(\s*longlong.*(<<|>>)/ && !/gen\(longlong/;' src/*.cc The idea is to change the type whenever there seems to be a cast followed by a shift. The last condition is to avoid a couple of harmful false positives. For easier maintenance you might want to move the code that packs and unpacks addresses into one place rather than have it scattered all over the place. -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
So giac was supposed to be working now on arm64, but it failed on the buildd: https://buildd.debian.org/status/package.php?p=giac=sid Having recently seen something similar I think I can guess what's happening. User virtual addresses on Linux arm64 may have 39, 42 or 48 bits, depending on how the kernel is configured: https://www.kernel.org/doc/Documentation/arm64/memory.txt It seems that giac now works with the smaller virtual addresses, but fails on the buildd, which uses 48-bit addresses. According to the comment in src/gen.h, SMARTPTR64 should handle 48-bit addresses, but up to now it has probably only been tested on amd64, which uses 47-bit addresses. A problem with the top bit? Sign extension perhaps? Sure enough, in src/gen.cc there is code like this: #ifdef SMARTPTR64 (*((longlong *) ) >> 16) I suspect that the fix will be to replace some of those longlong with ulonglong. -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
I was able to build giac 1.2.3.25+dfsg1-3 on arm64 with this "patch": perl -i -pe 's/^#ifdef __x86_64__$/#if 1/;' src/gen.h perl -i -pe 's/^#ifndef __x86_64__$/#if 0/;' src/first.h Obviously that change would break it on 32-bit architectures. A proper fix might be to use something like ~(uintptr_t)3 in gen.h, avoiding the preprocessor, and in first.h something like: #include #if UINTPTR_MAX < 0x -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
On arm64, if you run under GDB and look at the address that faulted it's clear that the address has been truncated to 32 bits. And there's some obvious code in src/gen.h that looks as if it's truncating addresses to 32 bits on any architecture that isn't x86_64. However, I don't think gen.h is the only problem. I'd recommend grepping all the code for __x86_64__ to see if there are other places where it's assumed that all other architectures are 32-bit. In src/gen.h you could just use something like (_ptr & (uintptr_t)-4): no need for the preprocessor. Even if you do have to use the preprocessor I'd advise against listing all the 64-bit architectures you can think of. You could use UINTPTR_MAX from , for example. It's encouraging that a statically linked icas was reported to have worked on arm64: http://xcas.e.ujf-grenoble.fr/XCAS/viewtopic.php?f=4=1785 That suggests that pointer truncation is perhaps the only problem (and the pointers happen to be 32-bit with static linking on that system). So perhaps quite a small patch would make this package work on other 64-bit architectures. -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
On Mon, 13 Feb 2017 16:12:14 -0500 "Aaron M. Ucko"wrote: > [..] > > xvfb-run ../../src/icas "algo.tex" > ./algo.tex:4: Warning: Command not found: \textheight > /usr/share/hevea/hyperref.hva:65: Warning: Ignoring option: 'pdftex' > /usr/share/hevea/hyperref.hva:65: Warning: Ignoring option: 'colorlinks' > ./algo.tex:33: Warning: Application of '\~' on 'p' failed > Exclude comment 'comment' > // Using locale /usr/share/locale/ > // C > // /usr/share/locale/ > // giac > // UTF-8 > // Maximum number of parallel threads 3 > // Unable to find keyword file doc/en/keywords > Help file doc/en/aide_cas not found > Added 0 synonyms > Giac pdflatex and HTML5 output > Partly inspired from pgiac by Jean-Michel Sarlat > Segmentation fault > Makefile:648: recipe for target 'algo.pdf' failed > make[4]: *** [algo.pdf] Error 139 > make[4]: *** Waiting for unfinished jobs > > [..] I was unable to get to the bottom of this, however here are my findings so far: Upstream attempts to optimise on space, defining SMARTPTR64 when it is possible to store pointers in less than 64 bits. From src/gen.h: /* Warning: the size of a gen depend on the architecture and of compile-time flags Define -DSMARTPTR64 on 64 bit CPU if the pointers allocated by new are 48 bits this will make sizeof(gen)==8 instead of 16 [..] This *appears* to be force-disabled on ppc64el. From src/first.h: #ifndef __x86_64__ #ifdef SMARTPTR64 #undef SMARTPTR64 #endif // SMARTPTR64 [..] Further evidence that it is force-disabled: (sid_ppc64el-dchroot)infinity0@plummer:~/giac$ uname -a Linux plummer 3.16.0-4-powerpc64le #1 SMP Debian 3.16.39-1 (2016-12-30) ppc64le GNU/Linux (sid_ppc64el-dchroot)infinity0@plummer:~/giac$ cat test.cc #include "src/giac/giac.h" #include int main() { printf("%d\n", SMARTPTR64); } (sid_ppc64el-dchroot)infinity0@plummer:~/giac$ g++ test.cc test.cc: In function 'int main()': test.cc:3:29: error: 'SMARTPTR64' was not declared in this scope int main() { printf("%d\n", SMARTPTR64); } ^~ (sid_ppc64el-dchroot)infinity0@plummer:~/giac$ g++ -DSMARTPTR64 test.cc test.cc: In function 'int main()': test.cc:3:29: error: 'SMARTPTR64' was not declared in this scope int main() { printf("%d\n", SMARTPTR64); } ^~ (sid_ppc64el-dchroot)infinity0@plummer:~/giac$ g++ -DSMARTPTR64=1 test.cc test.cc: In function 'int main()': test.cc:3:29: error: 'SMARTPTR64' was not declared in this scope int main() { printf("%d\n", SMARTPTR64); } ^~ By editing src/icas one can run the failing build command in gdb: (sid_ppc64el-dchroot)infinity0@plummer:~/giac/doc/fr$ diff -ru ../../src/icas{.orig,} --- ../../src/icas.orig 2017-02-15 16:48:15.962658720 + +++ ../../src/icas 2017-02-15 16:48:29.294839196 + @@ -114,7 +114,7 @@ $ECHO "icas:icas:$LINENO: newargv[0]: $progdir/$program" 1>&2 func_lt_dump_args ${1+"$@"} 1>&2 fi - exec "$progdir/$program" ${1+"$@"} + exec gdb -q -d ../../src "$progdir/$program" ${1+"$@"} $ECHO "$0: cannot exec $program $*" 1>&2 exit 1 (sid_ppc64el-dchroot)infinity0@plummer:~/giac/doc/fr$ xvfb-run ../../src/icas Reading symbols from /home/infinity0/giac/src/.libs/icas...done. (gdb) run algo.tex Starting program: /home/infinity0/giac/src/.libs/icas algo.tex [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1". // Using locale /usr/share/locale/ // C // /usr/share/locale/ // giac // UTF-8 // Maximum number of parallel threads 16 // Unable to find keyword file doc/en/keywords Help file doc/en/aide_cas not found Added 0 synonyms Giac pdflatex and HTML5 output Partly inspired from pgiac by Jean-Michel Sarlat [New Thread 0x3fffb4f7eaa0 (LWP 17730)] Thread 2 "icas" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3fffb4f7eaa0 (LWP 17730)] giac::gen::in_eval (this=0x5e38b0b8, level=, evaled=..., contextptr=0x3fff59f8) at gen.cc:2105 2105evaled=(*Sommet.ptr())(evaled,contextptr); (gdb) bt #0 giac::gen::in_eval (this=0x5e38b0b8, level=, evaled=..., contextptr=0x3fff59f8) at gen.cc:2105 #1 0x3fffb7d3b3e8 in giac::eval_VECT (g=..., evaled=..., subtype=, level=, contextptr=0x3fff59f8) at gen.cc:1755 #2 0x3fffb7d391cc in giac::in_eval_vect (g=..., evaled=..., level=25, contextptr=0x3fff59f8) at gen.cc:2025 #3 0x3fffb7d3aa34 in giac::gen::in_eval (this=0x5e38b4f0, level=25, evaled=..., contextptr=0x3fff59f8) at gen.cc:2046 #4 0x3fffb7d3adb8 in giac::gen::in_eval
Bug#855078: giac: FTBFS: [algo.pdf] Error 139 (Segmentation fault)
Source: giac Version: 1.2.3.25+dfsg1-1 Severity: important Justification: fails to build from source On several architectures, icas failed with a segmentation fault while attempting to produce doc/fr/algo.pdf -- e.g., on arm64, xvfb-run ../../src/icas "algo.tex" ./algo.tex:4: Warning: Command not found: \textheight /usr/share/hevea/hyperref.hva:65: Warning: Ignoring option: 'pdftex' /usr/share/hevea/hyperref.hva:65: Warning: Ignoring option: 'colorlinks' ./algo.tex:33: Warning: Application of '\~' on 'p' failed Exclude comment 'comment' // Using locale /usr/share/locale/ // C // /usr/share/locale/ // giac // UTF-8 // Maximum number of parallel threads 3 // Unable to find keyword file doc/en/keywords Help file doc/en/aide_cas not found Added 0 synonyms Giac pdflatex and HTML5 output Partly inspired from pgiac by Jean-Michel Sarlat Segmentation fault Makefile:648: recipe for target 'algo.pdf' failed make[4]: *** [algo.pdf] Error 139 make[4]: *** Waiting for unfinished jobs I see this (general) failure mode on arm64, mips, mips64el, ppc64el, s390x, and the non-release architectures alpha, hppa, m68k, powerpc, ppc64, sparc64, and x32. Could you please take a look? Thanks! NB: some of the above output may be from hevea, which ran in parallel with icas. -- Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org) http://www.mit.edu/~amu/ | http://stuff.mit.edu/cgi/finger/?a...@monk.mit.edu -- debian-science-maintainers mailing list debian-science-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-science-maintainers