Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Tue, 16 Apr 2013 14:43:58 +0200 If we cannot make an configure test, we need to know if there is a release where the assembler can be trusted. After some discussions with my Oracle contact, I think a configure test will actually be easy, the assembler on Solaris 10 emits well formed version information. For example as -V gives: as: SunOS 5.10 118683-09 Patch 01/23/2013 So we can use that to detect if the proper fixes are installed. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
David Miller da...@davemloft.net writes: From: Torbjorn Granlund t...@gmplib.org Date: Tue, 16 Apr 2013 14:43:58 +0200 If we cannot make an configure test, we need to know if there is a release where the assembler can be trusted. After some discussions with my Oracle contact, I think a configure test will actually be easy, the assembler on Solaris 10 emits well formed version information. For example as -V gives: as: SunOS 5.10 118683-09 Patch 01/23/2013 So we can use that to detect if the proper fixes are installed. I have a slight preference of checking for functionality than of checking aganst a database of version numbers. My experience is that version number formatting changes forth and back, and that it is therefore fragile to detect all faulty ones. The old as uses one format: as: Sun Compiler Common 10 Patch 05/06/2005 And the new one another: as: SunOS 5.10 118683-10 Patch 03/14/2013 And how about SunOS 9 with a patched assembler...? The horror example is Mac OS X. Their compiler tools are buggier than all other tools put together, and the version numbers and apparent date stamps seem absolutely non-linear. I had to give up supporting most Xcode releases, and just tell people try another Xcode release when they run into bugs with GMP. But if a real feature/bug test is too hard, or hard to make reliable, version detection is what we have to do. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Wed, 17 Apr 2013 00:00:37 +0200 But if a real feature/bug test is too hard, or hard to make reliable, version detection is what we have to do. My plan is to shoot for a full functionality+bug test, and if that's too hard then if the assembler accepts the expressions then I do a version check of some kind. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
Where to go from here? If we want to clean up some old SPARC code, then we have learnt that we to test the result on several key platforms. We also don't want to create slower code, unless the old code is clearly broken (in more than a hypothetical way). For the 64bit case, it is safe to assume that GMP's internal references are not 2GiB away from the code. GMP is not that bloated! :-) We therefore do not need to mess with 64-bit or even 44-bit offsets in PIC; doing that is just slower. *External* references is a different story, and if we ever get the urge to refer such symbols from assembly code, we need a slower/larger code sequence. We may well put data in the text segment rather than rodata to allow for plainer code. (Incidentally, this same might be a bad idea of x86, where some processors refuse to keep a cache line in both I-cache and D-cache, and we might end up with a false sharing situation. That can happen as a result of speculative instruction prefetch, even if we align things to a cache line.) 64-bit static address generation is a pain. It adds a lot of overhead. I wonder if it is ever going to be used. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Mon, 15 Apr 2013 17:13:34 +0200 Where to go from here? Please run make -k in that tarball I posted for you last night, it's very important. None of what's happening makes any sense, and we can't make wise decisions about how to proceed until we know exactly what the Solaris assembler and linker are doing with these symbols and expressions. I bet in your libgmp.so on these machines, that .rodata object is in the TLS section, even with all my changes reverted. Wouldn't you like that fixed and understand why it happens? So please get the information I need from that tarball, thanks. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: ni...@lysator.liu.se (Niels Möller) Date: Mon, 15 Apr 2013 18:57:53 +0200 Torbjorn Granlund t...@gmplib.org writes: We may well put data in the text segment rather than rodata to allow for plainer code. At least in theory, there should be little difference. PC-relative offsets should be linktime constants anyway, right? Yes, they would. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
swift gmake -k gcc -m64 -fPIC -c -o test1_shared.o test1.S /usr/ccs/bin/as: /var/tmp//ccqorjdc.s: , approx line 18: internal error: pic_relocs(): hh reltype? gmake: *** [test1_shared.o] Error 1 gcc -m64 -c -o test1_static.o test1.S gcc -m64 -fPIC -c -o test2_shared.o test2.S /usr/ccs/bin/as: /var/tmp//ccTj0fRw.s: , approx line 24: internal error: pic_relocs(): hh reltype? gmake: *** [test2_shared.o] Error 1 gcc -m64 -c -o test2_static.o test2.S gcc -m64 -fPIC -c -o test3_shared.o test3.S /usr/ccs/bin/as: /var/tmp//ccz9VzNB.s: , approx line 20: internal error: pic_relocs(): hh reltype? gmake: *** [test3_shared.o] Error 1 gcc -m64 -c -o test3_static.o test3.S gmake: Target `all' not remade because of errors. sol2_test.tar.bz2 Description: Binary data The experiemnt seemed to have failed. :-( -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
David Miller da...@davemloft.net writes: BTW, you traded one failure for another, now PIC is broke for ultrasparct3 builds, because now in invert_limb.asm we're back to: diff -r bd92f35223f8 mpn/sparc64/ultrasparct3/invert_limb.asm --- a/mpn/sparc64/ultrasparct3/invert_limb.asm Sun Apr 14 23:24:54 2013 +0200 +++ b/mpn/sparc64/ultrasparct3/invert_limb.asm Mon Apr 15 10:24:55 2013 -0700 @@ -31,13 +31,11 @@ ASM_START() REGISTER(%g2,#scratch) REGISTER(%g3,#scratch) - LEA_THUNK(g3) - TEXT PROLOGUE(mpn_invert_limb) srlxd, 55, %g1 add %g1, %g1, %g1 - LEA_LEAF(approx_tab,g2,g3) - sub %g2, 512, %g2 + sethi %hi(approx_tab-512), %g2 + or %g2, %lo(approx_tab-512), %g2 lduh[%g2+%g1], %g3 srlxd, 24, %g4 add %g4, 1, %g4 which will only work on 64-bit static builds. I know. It was easier to go back to the previous state for all assembly files first. The code is now correct in the repo, I think. I might have introduced other bugs, but I ran what I hope was adequate tests on both a Solaris and a GNU/Linux system. I might have reverted some TYPE statements. These should be put back. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sun, 14 Apr 2013 19:21:36 +0200 I tried some timing of call to a pc loading thunk versus an rdpc instruction. Approximate cycle counts: rdpcthunk US2 5 2 US3 6 6 T1 6 10 I assume US1=US2, US3=US4, and T1=T2. US1, US2 are the least relevant machines, and the only ones where I could see a slowdown for rdpc. T1 is also getting irrelevant, more so than US3,US4 I think. Ok, good to know. T3 and T4 are of course quite relevant, so we should take these into account. If they run rdpc no slower than the thunk call, then we should use rdpc unconditionally. I used this test program: I'll take a look at this. At http://docs.oracle.com/cd/E26502_01/html/E28387/gentextid-2583.html Oracle assumes one uses rdpc. They also seem to say that the gdop stuff is for the 64-bit ABI, and now we use if in sparc32. They are using %pc reads for simplicity, not because it's the most performant thing to do. The SunPRO compiler uses PIC thunks. It is also not true that gotdata relocs are for 64-bit only, GCC as well as SunPRO generate them for both 32-bit and 64-bit PIC code and have done so for years. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sun, 14 Apr 2013 19:21:36 +0200 T3 and T4 are of course quite relevant, so we should take these into account. If they run rdpc no slower than the thunk call, then we should use rdpc unconditionally. I used this test program: Ok, on T4, %pc reads are definitely faster: call: 16sec rdpc: 3sec On T3: call: 34sec rdpc: 41sec I bet on T3 a rdpc makes the cpu strand unavilable the next cycle. In all the tests above I changed the %g1 initialization to be that of the cpu in question's clock rate. Since using rdpc avoids the whole issue of corrupting the return address stack, it seems pretty desirable to move over to it. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
David Miller da...@davemloft.net writes: Since using rdpc avoids the whole issue of corrupting the return address stack, it seems pretty desirable to move over to it. Let's do it. Well see a slight slowdown for T3, but probably its general slowness will make this new slowdown almost unnoticeable. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sun, 14 Apr 2013 23:26:31 +0200 David Miller da...@davemloft.net writes: Sure, let's revert v9/sqr_diagonal.asm and sparc64/gcd_1.asm back to their previous state for now, and try to work from that. Here's a patch. 2013-04-14 David S. Miller da...@davemloft.net * mpn/sparc32/v9/sqr_diagonal.asm: Revert LEA and INT32 changes. * mpn/sparc64/gcd_1.asm: Likewise. Applied, after making sure this is necessary and sufficient for getting is back to working Solaris 10 support. Thanks. I'd like to investigate what went on here in more detail, and I think I can do it if you build the test images in the attached tarball for me. This will unpack into a directory named sol2_test, just 'cd' into there and run make on the Solaris machine that showed all of these problems. After the target objects are all made please tar up the result and send it to me. Thanks a lot! sol2_test.tar.gz Description: Binary data ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
Torbjorn Granlund t...@gmplib.org writes: Torbjorn Granlund t...@gmplib.org writes: ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol There are also new check failures for a 32-bit sparc-solaris build: http://gmplib.org/devel/testmachines/check/failure/swift.nada.kth.se:32.txt This is caused the changes to by sparc32/v9/sqr_diagonal.asm. The last code used to use RDPC for PIC code, using the sequence, .Lpc: rd %pc,%o7 ld [%o7+.Lnoll-.Lpc],%f8 while the new code uses the longer sequence, sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7 call__sparc_get_pc_thunk.l7 or %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7 sethi %gdop_hix22(.Lnoll), %l0 xor %l0, %gdop_lox10(.Lnoll), %l0 ld [%l7 + %l0], %l0, %gdop(.Lnoll) ld [%l0], %f8 where the call is to a local function: __sparc_get_pc_thunk.l7: retl add%o7, %l7, %l7 Aside from that the new sequence (for to me unknown reasons) fails, it is not clear why it would an improvement, had it worked. Or in general, why should we not use RDPC always for PIC? I spotted a comment in gcc, ;; Even on V9 we use this call sequence with a stub, instead of rd %pc, ... ;; because the RDPC instruction is extremely expensive and incurs a complete ;; instruction pipeline flush. which perhaps answers my question. But is that true in general or for some sparcv9 implementations? It would be nice to avoid these long insns sequences where they can be avoided. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 15:40:38 +0200 I spotted a comment in gcc, ;; Even on V9 we use this call sequence with a stub, instead of rd %pc, ... ;; because the RDPC instruction is extremely expensive and incurs a complete ;; instruction pipeline flush. which perhaps answers my question. But is that true in general or for some sparcv9 implementations? It would be nice to avoid these long insns sequences where they can be avoided. rd %pc is very expensive on every single chip I've tried it on. It tends to flush the entire pipeline, which for example means a minimum of 9 cycles on Ultra12. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 12:10:33 +0200 David Miller da...@davemloft.net writes: * mpn/sparc32/sparc-defs.m4 (LEA): Remove unused local label. (LEA_LEAF): Likewise. This patch helped the get past the Slowlaris assembler, which can only cope with single-digit labels. But there are now errors when greating the shared library: /bin/bash ./libtool --tag=CC --mode=link gcc -std=gnu99 -O2 -pedantic \ -m64 -mptr64 -mcpu=ultrasparc3 -Wc,-m64 -version-info 11:1:1 -o \ libgmp.la -rpath /usr/local/lib assert.lo compat.lo ... gcc -std=gnu99 -shared -fPIC -DPIC -Wl,-z -Wl,text -Wl,-h \ -Wl,libgmp.so.10 -o .libs/libgmp.so.10.1.1 .libs/assert.o \ .libs/compat.o ... rand/.libs/randmui.o -lc -O2 -m64 -mptr64 \ -mcpu=ultrasparc3 -m64 ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol TLS? Thread local storage? Sun's tools give the worst diagnostics in the world. Yes, that's what it means by TLS. And no I have no idea why it's complaining like this :-/ Maybe because ctz_zero is in .rodata? That shouldn't matter at all, gcc emits things like that all the time. Is there a ctz_table in libc.so by chance? If so, then changing the name of the table should be sufficient to fix the problem. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 21:10:35 +0200 ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol TLS? Thread local storage? Sun's tools give the worst diagnostics in the world. Yes, that's what it means by TLS. Which seems nonsensical. I think I found the problem, from the GCC install notes: sparc-sun-solaris2.10 There is a bug in older versions of the Sun assembler which breaks thread-local storage (TLS). A typical error message is ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//ccamPA1v.o: symbol unknown: bad symbol type SECT: symbol type must be TLS This bug is fixed in Sun patch 118683-03 or later. From that patch: 6728528 assembler does not handle __thread code correctly We're probably hitting that bug. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
David Miller da...@davemloft.net writes: ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol sparc-sun-solaris2.10 There is a bug in older versions of the Sun assembler which breaks thread-local storage (TLS). A typical error message is ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//ccamPA1v.o: symbol unknown: bad symbol type SECT: symbol type must be TLS This bug is fixed in Sun patch 118683-03 or later. We're probably hitting that bug. Really? What does our case have to do with TLS? The example error message uses a TLS reloc, we don't. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 21:52:40 +0200 Really? What does our case have to do with TLS? The example error message uses a TLS reloc, we don't. Implicit section at the beginning of assembly? Here, try these two things: 1) Build: static const char foo[] = { 1, 2, 3, 4, 5, 6 }; const char *test(void) { return foo[0]; } with gcc -m64 -O2 -fPIC -S -o test.s test.c, let me know what gcc emits. 2) Put ctz_table at the end of gcd_1.asm and see if that makes a difference. We'll need to do these kinds of experiments anyways, because once we determine that it's a Solaris AS bug we'll need to know precisely how to work around it or add a acinclude.m4 test for the problem. Thanks. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: David Miller da...@davemloft.net Date: Sat, 13 Apr 2013 15:58:44 -0400 (EDT) From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 21:52:40 +0200 Really? What does our case have to do with TLS? The example error message uses a TLS reloc, we don't. Implicit section at the beginning of assembly? BTW, I say this because the Solaris assembler has various section switching bugs, f.e. the one they hit in libgomp++: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29987 ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: David Miller da...@davemloft.net Date: Sat, 13 Apr 2013 15:59:49 -0400 (EDT) From: David Miller da...@davemloft.net Date: Sat, 13 Apr 2013 15:58:44 -0400 (EDT) From: Torbjorn Granlund t...@gmplib.org Date: Sat, 13 Apr 2013 21:52:40 +0200 Really? What does our case have to do with TLS? The example error message uses a TLS reloc, we don't. Implicit section at the beginning of assembly? BTW, I say this because the Solaris assembler has various section switching bugs, f.e. the one they hit in libgomp++: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29987 Finally, if you could grab the PIC gcd_1.o from one of those Solaris10 builds that I would find most useful. Thanks again! ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
There are syntax errors for swift.nada.kth.se, a Solaris system. See http://gmplib.org/devel/tm-date.html. The offending lines: swift (ABI=64) 99: sethi %gdop_hix22(ctz_table), %i5 swift-32 (ABI=32) 99: sethi %gdop_hix22(.Lnoll), %l0 We need things to work on Solaris, *BSD. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
I've done some more research into this. I first made sure that we are using the same test that GCC uses to enable the use of gotdata relocations. Then I read over the new m4 LEA macros a few times, the only thing I found was that I left around a local label that was only necessary for an earlier revision of my changes, patch below to delete it. Next, I tried to reproduce the asm -- s file made for gcd_1.asm to try and double check the assembler output, I did this by configuring for ultrasparc3-linux and forcing HAVE_SHARED_THUNKS to no in the created config.m4 The line numbers match up with your report and the assembler line looks fine as far as I can tell. Also the lines surrounding look ok too, just in case the line number reported by the assembler is not correct for some reason. The last remaining possible difference I can come up with is that the build will pass -K PIC to the assembler (because of -fPIC in the gcc command line) but for the relocation test in acinclude.m4 we don't pass that option. Could you try, on swift.nada.kth.se, a test file: .text sethi %gdop_hix22(ctz_table), %i5 xor %i5, %gdop_lox10(ctz_table), %i5 ldx [%l7 + %i5], %i5, %gdop(ctz_table) and then try to build it with: gcc -O2 -m64 -c -o test.o test.s and then: gcc -O2 -m64 -fPIC -c -o test.o test.s Finally, try to fetch the gcc command line used by the gotdata test in config.log Maybe we can include the config.log output in the build farm links just like config.h currently is? That would help diagnose things like this. Thanks! 2013-04-11 David S. Miller da...@davemloft.net * mpn/sparc32/sparc-defs.m4 (LEA): Remove unused local label. (LEA_LEAF): Likewise. diff -r ace68333a9dc mpn/sparc32/sparc-defs.m4 --- a/mpn/sparc32/sparc-defs.m4 Wed Apr 10 22:42:33 2013 +0200 +++ b/mpn/sparc32/sparc-defs.m4 Thu Apr 11 12:39:33 2013 -0700 @@ -50,7 +50,7 @@ sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %`$3' call__sparc_get_pc_thunk.`$3' or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3' -99:sethi %gdop_hix22(`$1'), %`$2' + sethi %gdop_hix22(`$1'), %`$2' xor %`$2', %gdop_lox10(`$1'), %`$2' ifdef(`HAVE_ABI_64',` ldx [%`$3' + %`$2'], %`$2', %gdop(`$1')',` @@ -58,7 +58,7 @@ sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %`$3' call__sparc_get_pc_thunk.`$3' or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3' -99:sethi %hi(`$1'), %`$2' + sethi %hi(`$1'), %`$2' or %`$2', %lo(`$1'), %`$2' ifdef(`HAVE_ABI_64',` ldx [%`$3' + %`$2'], %`$2'',` @@ -82,7 +82,7 @@ mov %o7, %`$2' call__sparc_get_pc_thunk.`$3' or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3' -99:mov %`$2', %o7 + mov %`$2', %o7 sethi %gdop_hix22(`$1'), %`$2' xor %`$2', %gdop_lox10(`$1'), %`$2' ifdef(`HAVE_ABI_64',` @@ -92,7 +92,7 @@ mov %o7, %`$2' call__sparc_get_pc_thunk.`$3' or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3' -99:mov %`$2', %o7 + mov %`$2', %o7 sethi %hi(`$1'), %`$2' or %`$2', %lo(`$1'), %`$2' ifdef(`HAVE_ABI_64',` ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
[PATCH] Improve and consolidate sparc PIC assembler.
This patch aims to: 1) Consolidate all of the address loading details of PIC vs. non-PIC into one place, via helper macros. 2) Add support for GOTDATA relocations when the tools support it. 3) When supported by the tools, use comdat et al. in order to have shared PIC thunks. All PIC thunks operating on the same PIC register get emitted with the same name, and the linker only retains one copy in the final image. The PIC thunk names are choosen to match the ones emitted by gcc, thus we'll have only one %l7 thunk for the entire libgmp shared library image. As a side effect it also fixes ultrasparct3/invert_limb.asm on PIC. For the non-PIC cases we use set for 32-bit and setx for 64-bit. We need a pic_reg for the PIC cases, so it was easy to accomodate the need for the temporary that setx requires. A configure .bootstrap will need to be run after installing these changes. I tested all of: sparc-unkown-linux (ABI=32) sparcv8-unkown-linux (ABI=32) sparcv9-unkown-linux (ABI=32) ultrasparct1-unkown-linux (ABI=32) sparc64-unknown-linux (ABI=64) ultrasparc1-unknown-linux (ABI=64) ultrasparct1-unknown-linux (ABI=64) ultrasparct3-unknown-linux (ABI=64) Each with normal builds and one done with --disable-shared --enable-static (in order to force the testsuite to link against the static versions of all of these routines). They all passed the testsuite. A side note that the plain 32-bit sparc-unknown-linux (pre-v8) configuration takes hours to run the testsuite even on a modern cpu. 2013-04-10 David S. Miller da...@davemloft.net * acinclude.m4 (GMP_ASM_SPARC_GOTDATA, GMP_ASM_SPARC_SHARED_THUNKS): New feature tests. * configure.ac: Call GMP_ASM_SPARC_GOTDATA and GMP_ASM_SPARC_SHARED_THUNKS on sparc. * mpn/sparc32/sparc-defs.m4 (LOAD_SYMBOL, LOAD_SYMBOL_LEAF, LOAD_SYMBOL_THUNK): New macros. * mpn/sparc32/udiv.asm: Convert over to LOAD_SYMBOL, LOAD_SYMBOL_LEAF, and LOAD_SYMBOL_THUNK. * mpn/sparc32/v8/addmul_1.asm: Likewise. * mpn/sparc32/v8/mul_1.asm: Likewise. * mpn/sparc32/v8/supersparc/udiv.asm: Likewise. * mpn/sparc32/v8/udiv.asm: Likewise. * mpn/sparc64/gcd_1.asm: Likewise. * mpn/sparc64/ultrasparct3/dive_1.asm: Likewise. * mpn/sparc64/ultrasparct3/invert_limb.asm: Likewise. * mpn/sparc64/ultrasparct3/mode1o.asm: Likewise. * mpn/sparc32/v9/sqr_diagonal.asm: Likewise and use INT32. diff -r a51d8e63e08e acinclude.m4 --- a/acinclude.m4 Tue Apr 09 15:05:39 2013 +0200 +++ b/acinclude.m4 Tue Apr 09 21:10:30 2013 -0700 @@ -3090,6 +3090,57 @@ ]) +dnl GMP_ASM_SPARC_GOTDATA +dnl -- +dnl Determine whether the assembler accepts gotdata relocations. +dnl +dnl See also mpn/sparc32/sparc-defs.m4 which uses the result of this test. + +AC_DEFUN([GMP_ASM_SPARC_GOTDATA], +[AC_REQUIRE([GMP_ASM_TEXT]) +AC_CACHE_CHECK([if the assembler accepts gotdata relocations], + gmp_cv_asm_sparc_gotdata, +[GMP_TRY_ASSEMBLE( +[ $gmp_cv_asm_text + .text + sethi %gdop_hix22(symbol), %g1 + or %g1, %gdop_lox10(symbol), %g1 +], +[gmp_cv_asm_sparc_gotdata=yes], +[gmp_cv_asm_sparc_gotdata=no])]) + +GMP_DEFINE_RAW([define(HAVE_GOTDATA,$gmp_cv_asm_sparc_gotdata)]) +]) + + +dnl GMP_ASM_SPARC_SHARED_THUNKS +dnl -- +dnl Determine whether the assembler supports all of the features +dnl necessary in order to emit shared PIC thunks on sparc. +dnl +dnl See also mpn/sparc32/sparc-defs.m4 which uses the result of this test. + +AC_DEFUN([GMP_ASM_SPARC_SHARED_THUNKS], +[AC_REQUIRE([GMP_ASM_TEXT]) +AC_CACHE_CHECK([if the assembler can support shared PIC thunks], + gmp_cv_asm_sparc_shared_thunks, +[GMP_TRY_ASSEMBLE( +[ $gmp_cv_asm_text + .section .text.__sparc_get_pc_thunk.l7,axG,@progbits,__sparc_get_pc_thunk.l7,comdat + .weak __sparc_get_pc_thunk.l7 + .hidden __sparc_get_pc_thunk.l7 + .type __sparc_get_pc_thunk.l7, #function +__sparc_get_pc_thunk.l7: + jmp %o7+8 +add%o7, %l7, %l7 +], +[gmp_cv_asm_sparc_shared_thunks=yes], +[gmp_cv_asm_sparc_shared_thunks=no])]) + +GMP_DEFINE_RAW([define(HAVE_SHARED_THUNKS,$gmp_cv_asm_sparc_shared_thunks)]) +]) + + dnl GMP_C_ATTRIBUTE_CONST dnl - diff -r a51d8e63e08e configure.ac --- a/configure.ac Tue Apr 09 15:05:39 2013 +0200 +++ b/configure.ac Tue Apr 09 21:10:30 2013 -0700 @@ -3483,12 +3483,14 @@ power*-*-aix*) GMP_INCLUDE_MPN(powerpc32/aix.m4) ;; -sparcv9*-*-* | ultrasparc*-*-* | sparc64-*-*) +*sparc*-*-*) case $ABI in 64) GMP_ASM_SPARC_REGISTER ;; esac + GMP_ASM_SPARC_GOTDATA + GMP_ASM_SPARC_SHARED_THUNKS ;; X86_PATTERN | X86_64_PATTERN) GMP_ASM_ALIGN_FILL_0x90 diff -r a51d8e63e08e mpn/sparc32/sparc-defs.m4 --- a/mpn/sparc32/sparc-defs.m4
Re: [PATCH] Improve and consolidate sparc PIC assembler.
Please use LEA* instead of LOAD_SYMBOL*, since that's what we use elsewhere. (OK, LEA might be a misnomer, but a well-established one in and outside of GMP.) I assume your broad testing covers every modified file. Do you have an idea of whether that is true. Whn testing shared libs, I have found that libtool sometimes prefers an instaled version to the newly compiled version. That happens more often with 32-bit libs on 64-bit systems, since libtool doesn't set LD_32_LIBRARY_PATH. Please make sure the shared builds' libraries have actually been tested. That patch looks good to me, apart from the LEA issue. Once you have addressed that, I would like to commit this to the main repo. Thanks! -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Wed, 10 Apr 2013 14:35:13 +0200 Please use LEA* instead of LOAD_SYMBOL*, since that's what we use elsewhere. (OK, LEA might be a misnomer, but a well-established one in and outside of GMP.) Ok. I assume your broad testing covers every modified file. Do you have an idea of whether that is true. I rechecked everything and the one case I missed was supersparc-* Even the current tree has a build problem of the supersparc target with current tools due to combination of a bug in gcc specs handling and new binutils enforcements of setting the cpu ABI correctly. The issue is that gcc doesn't specify at least v8 in the assembler invocations when -mcpu=supersparc is given so binutils complains when it sees integer multiply and divide instructions since it defaults to v7. I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have this problem, and have had them for some time, so I think we should work around it. A workaround that works is to pass -mcpu=v8 -mcpu=supersparc instead of just plain -mcpu=supersparc For the sake of evaluating this LEA patch, I forced this CFLAGS by hand on the make command line to make sure my LEA patches didn't introduce any new problems. Whn testing shared libs, I have found that libtool sometimes prefers an instaled version to the newly compiled version. That happens more often with 32-bit libs on 64-bit systems, since libtool doesn't set LD_32_LIBRARY_PATH. Please make sure the shared builds' libraries have actually been tested. I've verified that this works as intended. LD_32_LIBRARY_PATH seems to be a FreeBSD invention. That patch looks good to me, apart from the LEA issue. Once you have addressed that, I would like to commit this to the main Here is the new version, thanks: 2013-04-10 David S. Miller da...@davemloft.net * acinclude.m4 (GMP_ASM_SPARC_GOTDATA, GMP_ASM_SPARC_SHARED_THUNKS): New feature tests. * configure.ac: Call GMP_ASM_SPARC_GOTDATA and GMP_ASM_SPARC_SHARED_THUNKS on sparc. * mpn/sparc32/sparc-defs.m4 (LEA, LEA_LEAF, LEA_THUNK): New macros. * mpn/sparc32/udiv.asm: Convert over to LEA, LEA_LEAF, and LEA_THUNK. * mpn/sparc32/v8/addmul_1.asm: Likewise. * mpn/sparc32/v8/mul_1.asm: Likewise. * mpn/sparc32/v8/supersparc/udiv.asm: Likewise. * mpn/sparc32/v8/udiv.asm: Likewise. * mpn/sparc64/gcd_1.asm: Likewise. * mpn/sparc64/ultrasparct3/dive_1.asm: Likewise. * mpn/sparc64/ultrasparct3/invert_limb.asm: Likewise. * mpn/sparc64/ultrasparct3/mode1o.asm: Likewise. * mpn/sparc32/v9/sqr_diagonal.asm: Likewise and use INT32. diff -r a51d8e63e08e acinclude.m4 --- a/acinclude.m4 Tue Apr 09 15:05:39 2013 +0200 +++ b/acinclude.m4 Wed Apr 10 10:01:13 2013 -0700 @@ -3090,6 +3090,57 @@ ]) +dnl GMP_ASM_SPARC_GOTDATA +dnl -- +dnl Determine whether the assembler accepts gotdata relocations. +dnl +dnl See also mpn/sparc32/sparc-defs.m4 which uses the result of this test. + +AC_DEFUN([GMP_ASM_SPARC_GOTDATA], +[AC_REQUIRE([GMP_ASM_TEXT]) +AC_CACHE_CHECK([if the assembler accepts gotdata relocations], + gmp_cv_asm_sparc_gotdata, +[GMP_TRY_ASSEMBLE( +[ $gmp_cv_asm_text + .text + sethi %gdop_hix22(symbol), %g1 + or %g1, %gdop_lox10(symbol), %g1 +], +[gmp_cv_asm_sparc_gotdata=yes], +[gmp_cv_asm_sparc_gotdata=no])]) + +GMP_DEFINE_RAW([define(HAVE_GOTDATA,$gmp_cv_asm_sparc_gotdata)]) +]) + + +dnl GMP_ASM_SPARC_SHARED_THUNKS +dnl -- +dnl Determine whether the assembler supports all of the features +dnl necessary in order to emit shared PIC thunks on sparc. +dnl +dnl See also mpn/sparc32/sparc-defs.m4 which uses the result of this test. + +AC_DEFUN([GMP_ASM_SPARC_SHARED_THUNKS], +[AC_REQUIRE([GMP_ASM_TEXT]) +AC_CACHE_CHECK([if the assembler can support shared PIC thunks], + gmp_cv_asm_sparc_shared_thunks, +[GMP_TRY_ASSEMBLE( +[ $gmp_cv_asm_text + .section .text.__sparc_get_pc_thunk.l7,axG,@progbits,__sparc_get_pc_thunk.l7,comdat + .weak __sparc_get_pc_thunk.l7 + .hidden __sparc_get_pc_thunk.l7 + .type __sparc_get_pc_thunk.l7, #function +__sparc_get_pc_thunk.l7: + jmp %o7+8 +add%o7, %l7, %l7 +], +[gmp_cv_asm_sparc_shared_thunks=yes], +[gmp_cv_asm_sparc_shared_thunks=no])]) + +GMP_DEFINE_RAW([define(HAVE_SHARED_THUNKS,$gmp_cv_asm_sparc_shared_thunks)]) +]) + + dnl GMP_C_ATTRIBUTE_CONST dnl - diff -r a51d8e63e08e configure.ac --- a/configure.ac Tue Apr 09 15:05:39 2013 +0200 +++ b/configure.ac Wed Apr 10 10:01:13 2013 -0700 @@ -3483,12 +3483,14 @@ power*-*-aix*) GMP_INCLUDE_MPN(powerpc32/aix.m4) ;; -sparcv9*-*-* | ultrasparc*-*-* | sparc64-*-*) +*sparc*-*-*) case $ABI in 64) GMP_ASM_SPARC_REGISTER
Re: [PATCH] Improve and consolidate sparc PIC assembler.
I assume your broad testing covers every modified file. Do you have an idea of whether that is true. I rechecked everything and the one case I missed was supersparc-* Even the current tree has a build problem of the supersparc target with current tools due to combination of a bug in gcc specs handling and new binutils enforcements of setting the cpu ABI correctly. The issue is that gcc doesn't specify at least v8 in the assembler invocations when -mcpu=supersparc is given so binutils complains when it sees integer multiply and divide instructions since it defaults to v7. Good that you found that! I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have this problem, and have had them for some time, so I think we should work around it. A workaround that works is to pass -mcpu=v8 -mcpu=supersparc instead of just plain -mcpu=supersparc I though -mcpu=foo -mcpu=bar would either be equivalent to just -mcpu=bar or just -mcpu=foo... Whn testing shared libs, I have found that libtool sometimes prefers an instaled version to the newly compiled version. That happens more often with 32-bit libs on 64-bit systems, since libtool doesn't set LD_32_LIBRARY_PATH. Please make sure the shared builds' libraries have actually been tested. I've verified that this works as intended. LD_32_LIBRARY_PATH seems to be a FreeBSD invention. On Slowaris it is LD_LIBRARY_PATH_32... (But the semantics of these paths might not be the same.) That patch looks good to me, apart from the LEA issue. Once you have addressed that, I would like to commit this to the main Here is the new version, thanks: Thanks, will commit shortly after a quick read-through. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Improve and consolidate sparc PIC assembler.
From: Torbjorn Granlund t...@gmplib.org Date: Wed, 10 Apr 2013 20:07:52 +0200 I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have this problem, and have had them for some time, so I think we should work around it. A workaround that works is to pass -mcpu=v8 -mcpu=supersparc instead of just plain -mcpu=supersparc I though -mcpu=foo -mcpu=bar would either be equivalent to just -mcpu=bar or just -mcpu=foo... As per what the compiler decides to enable internally in the backend, that expression evaluates to the last -mcpu= specifier. But as far as specs are concerned, it evaluates differently, and different enough for the assembler option logic for -mcpu=v8 in the to kick in, in this case. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel