Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-16 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Tue, 16 Apr 2013 14:43:58 +0200

 If we cannot make an configure test, we need to know if there is a
 release where the assembler can be trusted.

After some discussions with my Oracle contact, I think a configure
test will actually be easy, the assembler on Solaris 10 emits well
formed version information.

For example as -V gives:

as: SunOS 5.10 118683-09 Patch 01/23/2013

So we can use that to detect if the proper fixes are installed.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-16 Thread Torbjorn Granlund
David Miller da...@davemloft.net writes:

  From: Torbjorn Granlund t...@gmplib.org
  Date: Tue, 16 Apr 2013 14:43:58 +0200
  
   If we cannot make an configure test, we need to know if there is a
   release where the assembler can be trusted.
  
  After some discussions with my Oracle contact, I think a configure
  test will actually be easy, the assembler on Solaris 10 emits well
  formed version information.
  
  For example as -V gives:
  
  as: SunOS 5.10 118683-09 Patch 01/23/2013
  
  So we can use that to detect if the proper fixes are installed.
  
I have a slight preference of checking for functionality than of
checking aganst a database of version numbers.  My experience is that
version number formatting changes forth and back, and that it is
therefore fragile to detect all faulty ones.

The old as uses one format:

as: Sun Compiler Common 10 Patch 05/06/2005

And the new one another:

as: SunOS 5.10 118683-10 Patch 03/14/2013

And how about SunOS 9 with a patched assembler...?

The horror example is Mac OS X.  Their compiler tools are buggier than
all other tools put together, and the version numbers and apparent date
stamps seem absolutely non-linear.  I had to give up supporting most
Xcode releases, and just tell people try another Xcode release when
they run into bugs with GMP.

But if a real feature/bug test is too hard, or hard to make reliable,
version detection is what we have to do.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-16 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Wed, 17 Apr 2013 00:00:37 +0200

 But if a real feature/bug test is too hard, or hard to make reliable,
 version detection is what we have to do.

My plan is to shoot for a full functionality+bug test, and if
that's too hard then if the assembler accepts the expressions
then I do a version check of some kind.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread Torbjorn Granlund
Where to go from here?  If we want to clean up some old SPARC code,
then we have learnt that we to test the result on several key platforms.
We also don't want to create slower code, unless the old code is clearly
broken (in more than a hypothetical way).

For the 64bit case, it is safe to assume that GMP's internal references
are not  2GiB away from the code.  GMP is not that bloated!  :-)

We therefore do not need to mess with 64-bit or even 44-bit offsets in
PIC; doing that is just slower.  *External* references is a different
story, and if we ever get the urge to refer such symbols from assembly
code, we need a slower/larger code sequence.

We may well put data in the text segment rather than rodata to allow for
plainer code.  (Incidentally, this same might be a bad idea of x86,
where some processors refuse to keep a cache line in both I-cache and
D-cache, and we might end up with a false sharing situation.  That can
happen as a result of speculative instruction prefetch, even if we align
things to a cache line.)

64-bit static address generation is a pain.  It adds a lot of overhead.
I wonder if it is ever going to be used.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Mon, 15 Apr 2013 17:13:34 +0200

 Where to go from here?

Please run make -k in that tarball I posted for you last night, it's
very important.

None of what's happening makes any sense, and we can't make wise decisions
about how to proceed until we know exactly what the Solaris assembler and
linker are doing with these symbols and expressions.

I bet in your libgmp.so on these machines, that .rodata object is in
the TLS section, even with all my changes reverted.  Wouldn't you like
that fixed and understand why it happens?

So please get the information I need from that tarball, thanks.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread David Miller
From: ni...@lysator.liu.se (Niels Möller)
Date: Mon, 15 Apr 2013 18:57:53 +0200

 Torbjorn Granlund t...@gmplib.org writes:
 
 We may well put data in the text segment rather than rodata to allow for
 plainer code.
 
 At least in theory, there should be little difference. PC-relative
 offsets should be linktime constants anyway, right?

Yes, they would.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread Torbjorn Granlund
swift gmake -k
gcc -m64 -fPIC -c -o test1_shared.o test1.S
/usr/ccs/bin/as: /var/tmp//ccqorjdc.s: , approx line 18: internal error: 
pic_relocs(): hh reltype? 
gmake: *** [test1_shared.o] Error 1
gcc -m64 -c -o test1_static.o test1.S
gcc -m64 -fPIC -c -o test2_shared.o test2.S
/usr/ccs/bin/as: /var/tmp//ccTj0fRw.s: , approx line 24: internal error: 
pic_relocs(): hh reltype? 
gmake: *** [test2_shared.o] Error 1
gcc -m64 -c -o test2_static.o test2.S
gcc -m64 -fPIC -c -o test3_shared.o test3.S
/usr/ccs/bin/as: /var/tmp//ccz9VzNB.s: , approx line 20: internal error: 
pic_relocs(): hh reltype? 
gmake: *** [test3_shared.o] Error 1
gcc -m64 -c -o test3_static.o test3.S
gmake: Target `all' not remade because of errors.



sol2_test.tar.bz2
Description: Binary data

The experiemnt seemed to have failed.  :-(

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread Torbjorn Granlund
David Miller da...@davemloft.net writes:

  BTW, you traded one failure for another, now PIC is broke
  for ultrasparct3 builds, because now in invert_limb.asm we're
  back to:
  
  diff -r bd92f35223f8 mpn/sparc64/ultrasparct3/invert_limb.asm
  --- a/mpn/sparc64/ultrasparct3/invert_limb.asm  Sun Apr 14 23:24:54 2013 +0200
  +++ b/mpn/sparc64/ultrasparct3/invert_limb.asm  Mon Apr 15 10:24:55 2013 -0700
  @@ -31,13 +31,11 @@
   ASM_START()
  REGISTER(%g2,#scratch)
  REGISTER(%g3,#scratch)
  -   LEA_THUNK(g3)
  -   TEXT
   PROLOGUE(mpn_invert_limb)
  srlxd, 55, %g1
  add %g1, %g1, %g1
  -   LEA_LEAF(approx_tab,g2,g3)
  -   sub %g2, 512, %g2
  +   sethi   %hi(approx_tab-512), %g2
  +   or  %g2, %lo(approx_tab-512), %g2
  lduh[%g2+%g1], %g3
  srlxd, 24, %g4
  add %g4, 1, %g4
  
  which will only work on 64-bit static builds.
  
I know.  It was easier to go back to the previous state for all assembly
files first.

The code is now correct in the repo, I think.  I might have introduced
other bugs, but I ran what I hope was adequate tests on both a Solaris
and a GNU/Linux system.

I might have reverted some TYPE statements.  These should be put back.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-14 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sun, 14 Apr 2013 19:21:36 +0200

 I tried some timing of call to a pc loading thunk versus an rdpc
 instruction.  Approximate cycle counts:
 
rdpcthunk
 US2  5   2
 US3  6   6
 T1   6  10
 
 I assume US1=US2, US3=US4, and T1=T2.  US1, US2 are the least relevant
 machines, and the only ones where I could see a slowdown for rdpc.
 T1 is also getting irrelevant, more so than US3,US4 I think.

Ok, good to know.

 T3 and T4 are of course quite relevant, so we should take these into
 account.  If they run rdpc no slower than the thunk call, then we should
 use rdpc unconditionally.
 
 I used this test program:

I'll take a look at this.

 At http://docs.oracle.com/cd/E26502_01/html/E28387/gentextid-2583.html
 Oracle assumes one uses rdpc.  They also seem to say that the gdop stuff
 is for the 64-bit ABI, and now we use if in sparc32.

They are using %pc reads for simplicity, not because it's the most
performant thing to do.  The SunPRO compiler uses PIC thunks.

It is also not true that gotdata relocs are for 64-bit only, GCC as
well as SunPRO generate them for both 32-bit and 64-bit PIC code and
have done so for years.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-14 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sun, 14 Apr 2013 19:21:36 +0200

 T3 and T4 are of course quite relevant, so we should take these into
 account.  If they run rdpc no slower than the thunk call, then we should
 use rdpc unconditionally.
 
 I used this test program:

Ok, on T4, %pc reads are definitely faster:

call:   16sec
rdpc:   3sec

On T3:

call:   34sec
rdpc:   41sec

I bet on T3 a rdpc makes the cpu strand unavilable the next cycle.

In all the tests above I changed the %g1 initialization to be that
of the cpu in question's clock rate.

Since using rdpc avoids the whole issue of corrupting the return
address stack, it seems pretty desirable to move over to it.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-14 Thread Torbjorn Granlund
David Miller da...@davemloft.net writes:

  Since using rdpc avoids the whole issue of corrupting the return
  address stack, it seems pretty desirable to move over to it.
  
Let's do it.

Well see a slight slowdown for T3, but probably its general slowness
will make this new slowdown almost unnoticeable.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-14 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sun, 14 Apr 2013 23:26:31 +0200

 David Miller da...@davemloft.net writes:
 
   Sure, let's revert v9/sqr_diagonal.asm and sparc64/gcd_1.asm back to
   their previous state for now, and try to work from that.  Here's a
   patch.
   
   2013-04-14  David S. Miller  da...@davemloft.net
   
   * mpn/sparc32/v9/sqr_diagonal.asm: Revert LEA and INT32 changes.
   * mpn/sparc64/gcd_1.asm: Likewise.
   
 Applied, after making sure this is necessary and sufficient for getting
 is back to working Solaris 10 support.

Thanks.

I'd like to investigate what went on here in more detail, and I think
I can do it if you build the test images in the attached tarball for
me.

This will unpack into a directory named sol2_test, just 'cd' into
there and run make on the Solaris machine that showed all of these
problems.  After the target objects are all made please tar up the
result and send it to me.

Thanks a lot!


sol2_test.tar.gz
Description: Binary data
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes:

  Torbjorn Granlund t...@gmplib.org writes:
  
ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file 
mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol
ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: 
symbol ctz_table: relocation illegal for TLS symbol

  There are also new check failures for a 32-bit sparc-solaris build:
  
  http://gmplib.org/devel/testmachines/check/failure/swift.nada.kth.se:32.txt

This is caused the changes to by sparc32/v9/sqr_diagonal.asm.

The last code used to use RDPC for PIC code, using the sequence,

.Lpc:   rd  %pc,%o7
ld  [%o7+.Lnoll-.Lpc],%f8

while the new code uses the longer sequence,

sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
call__sparc_get_pc_thunk.l7
 or %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
sethi   %gdop_hix22(.Lnoll), %l0
xor %l0, %gdop_lox10(.Lnoll), %l0
ld  [%l7 + %l0], %l0, %gdop(.Lnoll)
ld  [%l0], %f8

where the call is to a local function:

__sparc_get_pc_thunk.l7:
retl
 add%o7, %l7, %l7

Aside from that the new sequence (for to me unknown reasons) fails, it
is not clear why it would an improvement, had it worked.

Or in general, why should we not use RDPC always for PIC?

I spotted a comment in gcc,

;; Even on V9 we use this call sequence with a stub, instead of rd %pc, ...
;; because the RDPC instruction is extremely expensive and incurs a complete
;; instruction pipeline flush.

which perhaps answers my question.  But is that true in general or for
some sparcv9 implementations?  It would be nice to avoid these long
insns sequences where they can be avoided.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sat, 13 Apr 2013 15:40:38 +0200

 I spotted a comment in gcc,
 
 ;; Even on V9 we use this call sequence with a stub, instead of rd %pc, ...
 ;; because the RDPC instruction is extremely expensive and incurs a complete
 ;; instruction pipeline flush.
 
 which perhaps answers my question.  But is that true in general or for
 some sparcv9 implementations?  It would be nice to avoid these long
 insns sequences where they can be avoided.

rd %pc is very expensive on every single chip I've tried it on.

It tends to flush the entire pipeline, which for example means a
minimum of 9 cycles on Ultra12.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sat, 13 Apr 2013 12:10:33 +0200

 David Miller da...@davemloft.net writes:
 
   * mpn/sparc32/sparc-defs.m4 (LEA): Remove unused local label.
   (LEA_LEAF): Likewise.
   
 This patch helped the get past the Slowlaris assembler, which
 can only cope with single-digit labels.
 
 But there are now errors when greating the shared library:
 
 /bin/bash ./libtool --tag=CC   --mode=link gcc -std=gnu99  -O2 -pedantic \
 -m64 -mptr64 -mcpu=ultrasparc3 -Wc,-m64  -version-info 11:1:1  -o \
 libgmp.la -rpath /usr/local/lib assert.lo compat.lo ...
 
 gcc -std=gnu99 -shared  -fPIC -DPIC -Wl,-z -Wl,text -Wl,-h \
 -Wl,libgmp.so.10 -o .libs/libgmp.so.10.1.1  .libs/assert.o \
 .libs/compat.o ... rand/.libs/randmui.o   -lc  -O2 -m64 -mptr64 \
 -mcpu=ultrasparc3 -m64
 
 ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file 
 mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol
 ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: 
 symbol ctz_table: relocation illegal for TLS symbol
 
 TLS?  Thread local storage?

Sun's tools give the worst diagnostics in the world.  Yes, that's what it
means by TLS.

And no I have no idea why it's complaining like this :-/

Maybe because ctz_zero is in .rodata?  That shouldn't matter at all, gcc
emits things like that all the time.

Is there a ctz_table in libc.so by chance?  If so, then changing the
name of the table should be sufficient to fix the problem.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sat, 13 Apr 2013 21:10:35 +0200

ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file 
 mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol
ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: 
 symbol ctz_table: relocation illegal for TLS symbol

TLS?  Thread local storage?
   
   Sun's tools give the worst diagnostics in the world.  Yes, that's what it
   means by TLS.
   
 Which seems nonsensical.

I think I found the problem, from the GCC install notes:


sparc-sun-solaris2.10

There is a bug in older versions of the Sun assembler which breaks thread-local 
storage (TLS). A typical error message is

 ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file 
/var/tmp//ccamPA1v.o:
   symbol unknown: bad symbol type SECT: symbol type must be TLS
This bug is fixed in Sun patch 118683-03 or later.



From that patch:

6728528 assembler does not handle __thread code correctly 

We're probably hitting that bug.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread Torbjorn Granlund
David Miller da...@davemloft.net writes:

  ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file 
mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol
  ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file 
mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol

  
  
  sparc-sun-solaris2.10
  
  There is a bug in older versions of the Sun assembler which breaks 
thread-local storage (TLS). A typical error message is
  
   ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file 
/var/tmp//ccamPA1v.o:
 symbol unknown: bad symbol type SECT: symbol type must be TLS
  This bug is fixed in Sun patch 118683-03 or later.
  
  
  

  We're probably hitting that bug.
  
Really?  What does our case have to do with TLS?  The example error
message uses a TLS reloc, we don't.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Sat, 13 Apr 2013 21:52:40 +0200

 Really?  What does our case have to do with TLS?  The example error
 message uses a TLS reloc, we don't.

Implicit section at the beginning of assembly?

Here, try these two things:

1) Build:

static const char foo[] = { 1, 2, 3, 4, 5, 6 };

const char *test(void)
{
return foo[0];
}

  with gcc -m64 -O2 -fPIC -S -o test.s test.c, let me know what
  gcc emits.

2) Put ctz_table at the end of gcd_1.asm and see if that makes
   a difference.

We'll need to do these kinds of experiments anyways, because once
we determine that it's a Solaris AS bug we'll need to know precisely
how to work around it or add a acinclude.m4 test for the problem.

Thanks.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: David Miller da...@davemloft.net
Date: Sat, 13 Apr 2013 15:58:44 -0400 (EDT)

 From: Torbjorn Granlund t...@gmplib.org
 Date: Sat, 13 Apr 2013 21:52:40 +0200
 
 Really?  What does our case have to do with TLS?  The example error
 message uses a TLS reloc, we don't.
 
 Implicit section at the beginning of assembly?

BTW, I say this because the Solaris assembler has various
section switching bugs, f.e. the one they hit in libgomp++:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29987
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-13 Thread David Miller
From: David Miller da...@davemloft.net
Date: Sat, 13 Apr 2013 15:59:49 -0400 (EDT)

 From: David Miller da...@davemloft.net
 Date: Sat, 13 Apr 2013 15:58:44 -0400 (EDT)
 
 From: Torbjorn Granlund t...@gmplib.org
 Date: Sat, 13 Apr 2013 21:52:40 +0200
 
 Really?  What does our case have to do with TLS?  The example error
 message uses a TLS reloc, we don't.
 
 Implicit section at the beginning of assembly?
 
 BTW, I say this because the Solaris assembler has various
 section switching bugs, f.e. the one they hit in libgomp++:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29987

Finally, if you could grab the PIC gcd_1.o from one of those
Solaris10 builds that I would find most useful.

Thanks again!
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-11 Thread Torbjorn Granlund
There are syntax errors for swift.nada.kth.se, a Solaris system.  See
http://gmplib.org/devel/tm-date.html.

The offending lines:

swift (ABI=64)
99: sethi   %gdop_hix22(ctz_table), %i5
swift-32 (ABI=32)
99: sethi   %gdop_hix22(.Lnoll), %l0

We need things to work on Solaris, *BSD.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-11 Thread David Miller

I've done some more research into this.

I first made sure that we are using the same test that GCC uses to
enable the use of gotdata relocations.

Then I read over the new m4 LEA macros a few times, the only thing
I found was that I left around a local label that was only necessary
for an earlier revision of my changes, patch below to delete it.

Next, I tried to reproduce the asm -- s file made for gcd_1.asm to
try and double check the assembler output, I did this by configuring
for ultrasparc3-linux and forcing HAVE_SHARED_THUNKS to no in the
created config.m4

The line numbers match up with your report and the assembler line
looks fine as far as I can tell.  Also the lines surrounding look
ok too, just in case the line number reported by the assembler is
not correct for some reason.

The last remaining possible difference I can come up with is that
the build will pass -K PIC to the assembler (because of -fPIC
in the gcc command line) but for the relocation test in acinclude.m4
we don't pass that option.  Could you try, on swift.nada.kth.se,
a test file:

.text
sethi   %gdop_hix22(ctz_table), %i5
xor %i5, %gdop_lox10(ctz_table), %i5
ldx [%l7 + %i5], %i5, %gdop(ctz_table)

and then try to build it with:

gcc -O2 -m64 -c -o test.o test.s

and then:

gcc -O2 -m64 -fPIC -c -o test.o test.s

Finally, try to fetch the gcc command line used by the gotdata test in
config.log   Maybe we can include the config.log output in the build
farm links just like config.h currently is?  That would help diagnose
things like this.

Thanks!

2013-04-11  David S. Miller  da...@davemloft.net

* mpn/sparc32/sparc-defs.m4 (LEA): Remove unused local label.
(LEA_LEAF): Likewise.

diff -r ace68333a9dc mpn/sparc32/sparc-defs.m4
--- a/mpn/sparc32/sparc-defs.m4 Wed Apr 10 22:42:33 2013 +0200
+++ b/mpn/sparc32/sparc-defs.m4 Thu Apr 11 12:39:33 2013 -0700
@@ -50,7 +50,7 @@
sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %`$3'
call__sparc_get_pc_thunk.`$3'
 or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3'
-99:sethi   %gdop_hix22(`$1'), %`$2'
+   sethi   %gdop_hix22(`$1'), %`$2'
xor %`$2', %gdop_lox10(`$1'), %`$2'
 ifdef(`HAVE_ABI_64',`
ldx [%`$3' + %`$2'], %`$2', %gdop(`$1')',`
@@ -58,7 +58,7 @@
sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %`$3'
call__sparc_get_pc_thunk.`$3'
 or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3'
-99:sethi   %hi(`$1'), %`$2'
+   sethi   %hi(`$1'), %`$2'
or  %`$2', %lo(`$1'), %`$2'
 ifdef(`HAVE_ABI_64',`
ldx [%`$3' + %`$2'], %`$2'',`
@@ -82,7 +82,7 @@
mov %o7, %`$2'
call__sparc_get_pc_thunk.`$3'
 or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3'
-99:mov %`$2', %o7
+   mov %`$2', %o7
sethi   %gdop_hix22(`$1'), %`$2'
xor %`$2', %gdop_lox10(`$1'), %`$2'
 ifdef(`HAVE_ABI_64',`
@@ -92,7 +92,7 @@
mov %o7, %`$2'
call__sparc_get_pc_thunk.`$3'
 or %`$3', %lo(_GLOBAL_OFFSET_TABLE_+4), %`$3'
-99:mov %`$2', %o7
+   mov %`$2', %o7
sethi   %hi(`$1'), %`$2'
or  %`$2', %lo(`$1'), %`$2'
 ifdef(`HAVE_ABI_64',`
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


[PATCH] Improve and consolidate sparc PIC assembler.

2013-04-10 Thread David Miller

This patch aims to:

1) Consolidate all of the address loading details of PIC
   vs. non-PIC into one place, via helper macros.

2) Add support for GOTDATA relocations when the tools
   support it.

3) When supported by the tools, use comdat et al. in order to have
   shared PIC thunks.  All PIC thunks operating on the same PIC
   register get emitted with the same name, and the linker only
   retains one copy in the final image.

   The PIC thunk names are choosen to match the ones emitted by
   gcc, thus we'll have only one %l7 thunk for the entire libgmp
   shared library image.

As a side effect it also fixes ultrasparct3/invert_limb.asm on PIC.

For the non-PIC cases we use set for 32-bit and setx for 64-bit.
We need a pic_reg for the PIC cases, so it was easy to accomodate
the need for the temporary that setx requires.

A configure .bootstrap will need to be run after installing these
changes.

I tested all of:

sparc-unkown-linux (ABI=32)
sparcv8-unkown-linux (ABI=32)
sparcv9-unkown-linux (ABI=32)
ultrasparct1-unkown-linux (ABI=32)
sparc64-unknown-linux (ABI=64)
ultrasparc1-unknown-linux (ABI=64)
ultrasparct1-unknown-linux (ABI=64)
ultrasparct3-unknown-linux (ABI=64)

Each with normal builds and one done with --disable-shared --enable-static
(in order to force the testsuite to link against the static versions of
all of these routines).

They all passed the testsuite.  A side note that the plain 32-bit
sparc-unknown-linux (pre-v8) configuration takes hours to run the
testsuite even on a modern cpu.

2013-04-10  David S. Miller  da...@davemloft.net

* acinclude.m4 (GMP_ASM_SPARC_GOTDATA,
GMP_ASM_SPARC_SHARED_THUNKS): New feature tests.
* configure.ac: Call GMP_ASM_SPARC_GOTDATA and
GMP_ASM_SPARC_SHARED_THUNKS on sparc.
* mpn/sparc32/sparc-defs.m4 (LOAD_SYMBOL, LOAD_SYMBOL_LEAF,
LOAD_SYMBOL_THUNK): New macros.
* mpn/sparc32/udiv.asm: Convert over to LOAD_SYMBOL,
LOAD_SYMBOL_LEAF, and LOAD_SYMBOL_THUNK.
* mpn/sparc32/v8/addmul_1.asm: Likewise.
* mpn/sparc32/v8/mul_1.asm: Likewise.
* mpn/sparc32/v8/supersparc/udiv.asm: Likewise.
* mpn/sparc32/v8/udiv.asm: Likewise.
* mpn/sparc64/gcd_1.asm: Likewise.
* mpn/sparc64/ultrasparct3/dive_1.asm: Likewise.
* mpn/sparc64/ultrasparct3/invert_limb.asm: Likewise.
* mpn/sparc64/ultrasparct3/mode1o.asm: Likewise.
* mpn/sparc32/v9/sqr_diagonal.asm: Likewise and use INT32.

diff -r a51d8e63e08e acinclude.m4
--- a/acinclude.m4  Tue Apr 09 15:05:39 2013 +0200
+++ b/acinclude.m4  Tue Apr 09 21:10:30 2013 -0700
@@ -3090,6 +3090,57 @@
 ])
 
 
+dnl  GMP_ASM_SPARC_GOTDATA
+dnl  --
+dnl  Determine whether the assembler accepts gotdata relocations.
+dnl
+dnl  See also mpn/sparc32/sparc-defs.m4 which uses the result of this test.
+
+AC_DEFUN([GMP_ASM_SPARC_GOTDATA],
+[AC_REQUIRE([GMP_ASM_TEXT])
+AC_CACHE_CHECK([if the assembler accepts gotdata relocations],
+   gmp_cv_asm_sparc_gotdata,
+[GMP_TRY_ASSEMBLE(
+[  $gmp_cv_asm_text
+   .text
+   sethi   %gdop_hix22(symbol), %g1
+   or  %g1, %gdop_lox10(symbol), %g1
+],
+[gmp_cv_asm_sparc_gotdata=yes],
+[gmp_cv_asm_sparc_gotdata=no])])
+
+GMP_DEFINE_RAW([define(HAVE_GOTDATA,$gmp_cv_asm_sparc_gotdata)])
+])
+
+
+dnl  GMP_ASM_SPARC_SHARED_THUNKS
+dnl  --
+dnl  Determine whether the assembler supports all of the features
+dnl  necessary in order to emit shared PIC thunks on sparc.
+dnl
+dnl  See also mpn/sparc32/sparc-defs.m4 which uses the result of this test.
+
+AC_DEFUN([GMP_ASM_SPARC_SHARED_THUNKS],
+[AC_REQUIRE([GMP_ASM_TEXT])
+AC_CACHE_CHECK([if the assembler can support shared PIC thunks],
+   gmp_cv_asm_sparc_shared_thunks,
+[GMP_TRY_ASSEMBLE(
+[  $gmp_cv_asm_text
+   .section
.text.__sparc_get_pc_thunk.l7,axG,@progbits,__sparc_get_pc_thunk.l7,comdat
+   .weak   __sparc_get_pc_thunk.l7
+   .hidden __sparc_get_pc_thunk.l7
+   .type   __sparc_get_pc_thunk.l7, #function
+__sparc_get_pc_thunk.l7:
+   jmp %o7+8
+add%o7, %l7, %l7
+],
+[gmp_cv_asm_sparc_shared_thunks=yes],
+[gmp_cv_asm_sparc_shared_thunks=no])])
+
+GMP_DEFINE_RAW([define(HAVE_SHARED_THUNKS,$gmp_cv_asm_sparc_shared_thunks)])
+])
+
+
 dnl  GMP_C_ATTRIBUTE_CONST
 dnl  -
 
diff -r a51d8e63e08e configure.ac
--- a/configure.ac  Tue Apr 09 15:05:39 2013 +0200
+++ b/configure.ac  Tue Apr 09 21:10:30 2013 -0700
@@ -3483,12 +3483,14 @@
 power*-*-aix*)
   GMP_INCLUDE_MPN(powerpc32/aix.m4)
   ;;
-sparcv9*-*-* | ultrasparc*-*-* | sparc64-*-*)
+*sparc*-*-*)
   case $ABI in
 64)
   GMP_ASM_SPARC_REGISTER
   ;;
   esac
+  GMP_ASM_SPARC_GOTDATA
+  GMP_ASM_SPARC_SHARED_THUNKS
   ;;
 X86_PATTERN | X86_64_PATTERN)
   GMP_ASM_ALIGN_FILL_0x90
diff -r a51d8e63e08e mpn/sparc32/sparc-defs.m4
--- a/mpn/sparc32/sparc-defs.m4 

Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-10 Thread Torbjorn Granlund
Please use LEA* instead of LOAD_SYMBOL*, since that's what we use
elsewhere.  (OK, LEA might be a misnomer, but a well-established one in
and outside of GMP.)

I assume your broad testing covers every modified file.  Do you have an
idea of whether that is true.

Whn testing shared libs, I have found that libtool sometimes prefers an
instaled version to the newly compiled version.  That happens more often
with 32-bit libs on 64-bit systems, since libtool doesn't set
LD_32_LIBRARY_PATH.  Please make sure the shared builds' libraries have
actually been tested.

That patch looks good to me, apart from the LEA issue.

Once you have addressed that, I would like to commit this to the main
repo.

Thanks!

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-10 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Wed, 10 Apr 2013 14:35:13 +0200

 Please use LEA* instead of LOAD_SYMBOL*, since that's what we use
 elsewhere.  (OK, LEA might be a misnomer, but a well-established one in
 and outside of GMP.)

Ok.

 I assume your broad testing covers every modified file.  Do you have an
 idea of whether that is true.

I rechecked everything and the one case I missed was supersparc-*

Even the current tree has a build problem of the supersparc target
with current tools due to combination of a bug in gcc specs handling
and new binutils enforcements of setting the cpu ABI correctly.

The issue is that gcc doesn't specify at least v8 in the assembler
invocations when -mcpu=supersparc is given so binutils complains when
it sees integer multiply and divide instructions since it defaults to
v7.

I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have
this problem, and have had them for some time, so I think we should
work around it.  A workaround that works is to pass -mcpu=v8
-mcpu=supersparc instead of just plain -mcpu=supersparc

For the sake of evaluating this LEA patch, I forced this CFLAGS by
hand on the make command line to make sure my LEA patches didn't
introduce any new problems.

 Whn testing shared libs, I have found that libtool sometimes prefers an
 instaled version to the newly compiled version.  That happens more often
 with 32-bit libs on 64-bit systems, since libtool doesn't set
 LD_32_LIBRARY_PATH.  Please make sure the shared builds' libraries have
 actually been tested.

I've verified that this works as intended.  LD_32_LIBRARY_PATH seems
to be a FreeBSD invention.

 That patch looks good to me, apart from the LEA issue.
 
 Once you have addressed that, I would like to commit this to the main

Here is the new version, thanks:

2013-04-10  David S. Miller  da...@davemloft.net

* acinclude.m4 (GMP_ASM_SPARC_GOTDATA,
GMP_ASM_SPARC_SHARED_THUNKS): New feature tests.
* configure.ac: Call GMP_ASM_SPARC_GOTDATA and
GMP_ASM_SPARC_SHARED_THUNKS on sparc.
* mpn/sparc32/sparc-defs.m4 (LEA, LEA_LEAF, LEA_THUNK): New
macros.
* mpn/sparc32/udiv.asm: Convert over to LEA, LEA_LEAF, and
LEA_THUNK.
* mpn/sparc32/v8/addmul_1.asm: Likewise.
* mpn/sparc32/v8/mul_1.asm: Likewise.
* mpn/sparc32/v8/supersparc/udiv.asm: Likewise.
* mpn/sparc32/v8/udiv.asm: Likewise.
* mpn/sparc64/gcd_1.asm: Likewise.
* mpn/sparc64/ultrasparct3/dive_1.asm: Likewise.
* mpn/sparc64/ultrasparct3/invert_limb.asm: Likewise.
* mpn/sparc64/ultrasparct3/mode1o.asm: Likewise.
* mpn/sparc32/v9/sqr_diagonal.asm: Likewise and use INT32.

diff -r a51d8e63e08e acinclude.m4
--- a/acinclude.m4  Tue Apr 09 15:05:39 2013 +0200
+++ b/acinclude.m4  Wed Apr 10 10:01:13 2013 -0700
@@ -3090,6 +3090,57 @@
 ])
 
 
+dnl  GMP_ASM_SPARC_GOTDATA
+dnl  --
+dnl  Determine whether the assembler accepts gotdata relocations.
+dnl
+dnl  See also mpn/sparc32/sparc-defs.m4 which uses the result of this test.
+
+AC_DEFUN([GMP_ASM_SPARC_GOTDATA],
+[AC_REQUIRE([GMP_ASM_TEXT])
+AC_CACHE_CHECK([if the assembler accepts gotdata relocations],
+   gmp_cv_asm_sparc_gotdata,
+[GMP_TRY_ASSEMBLE(
+[  $gmp_cv_asm_text
+   .text
+   sethi   %gdop_hix22(symbol), %g1
+   or  %g1, %gdop_lox10(symbol), %g1
+],
+[gmp_cv_asm_sparc_gotdata=yes],
+[gmp_cv_asm_sparc_gotdata=no])])
+
+GMP_DEFINE_RAW([define(HAVE_GOTDATA,$gmp_cv_asm_sparc_gotdata)])
+])
+
+
+dnl  GMP_ASM_SPARC_SHARED_THUNKS
+dnl  --
+dnl  Determine whether the assembler supports all of the features
+dnl  necessary in order to emit shared PIC thunks on sparc.
+dnl
+dnl  See also mpn/sparc32/sparc-defs.m4 which uses the result of this test.
+
+AC_DEFUN([GMP_ASM_SPARC_SHARED_THUNKS],
+[AC_REQUIRE([GMP_ASM_TEXT])
+AC_CACHE_CHECK([if the assembler can support shared PIC thunks],
+   gmp_cv_asm_sparc_shared_thunks,
+[GMP_TRY_ASSEMBLE(
+[  $gmp_cv_asm_text
+   .section
.text.__sparc_get_pc_thunk.l7,axG,@progbits,__sparc_get_pc_thunk.l7,comdat
+   .weak   __sparc_get_pc_thunk.l7
+   .hidden __sparc_get_pc_thunk.l7
+   .type   __sparc_get_pc_thunk.l7, #function
+__sparc_get_pc_thunk.l7:
+   jmp %o7+8
+add%o7, %l7, %l7
+],
+[gmp_cv_asm_sparc_shared_thunks=yes],
+[gmp_cv_asm_sparc_shared_thunks=no])])
+
+GMP_DEFINE_RAW([define(HAVE_SHARED_THUNKS,$gmp_cv_asm_sparc_shared_thunks)])
+])
+
+
 dnl  GMP_C_ATTRIBUTE_CONST
 dnl  -
 
diff -r a51d8e63e08e configure.ac
--- a/configure.ac  Tue Apr 09 15:05:39 2013 +0200
+++ b/configure.ac  Wed Apr 10 10:01:13 2013 -0700
@@ -3483,12 +3483,14 @@
 power*-*-aix*)
   GMP_INCLUDE_MPN(powerpc32/aix.m4)
   ;;
-sparcv9*-*-* | ultrasparc*-*-* | sparc64-*-*)
+*sparc*-*-*)
   case $ABI in
 64)
   GMP_ASM_SPARC_REGISTER
   

Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-10 Thread Torbjorn Granlund

   I assume your broad testing covers every modified file.  Do you have an
   idea of whether that is true.
  
  I rechecked everything and the one case I missed was supersparc-*
  
  Even the current tree has a build problem of the supersparc target
  with current tools due to combination of a bug in gcc specs handling
  and new binutils enforcements of setting the cpu ABI correctly.
  
  The issue is that gcc doesn't specify at least v8 in the assembler
  invocations when -mcpu=supersparc is given so binutils complains when
  it sees integer multiply and divide instructions since it defaults to
  v7.
  
Good that you found that!

  I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have
  this problem, and have had them for some time, so I think we should
  work around it.  A workaround that works is to pass -mcpu=v8
  -mcpu=supersparc instead of just plain -mcpu=supersparc
  
I though -mcpu=foo -mcpu=bar would either be equivalent to just
-mcpu=bar or just -mcpu=foo...

   Whn testing shared libs, I have found that libtool sometimes prefers an
   instaled version to the newly compiled version.  That happens more often
   with 32-bit libs on 64-bit systems, since libtool doesn't set
   LD_32_LIBRARY_PATH.  Please make sure the shared builds' libraries have
   actually been tested.
  
  I've verified that this works as intended.  LD_32_LIBRARY_PATH seems
  to be a FreeBSD invention.
  
On Slowaris it is LD_LIBRARY_PATH_32...
(But the semantics of these paths might not be the same.)

   That patch looks good to me, apart from the LEA issue.
   
   Once you have addressed that, I would like to commit this to the main
  
  Here is the new version, thanks:
  
Thanks, will commit shortly after a quick read-through.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-10 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org
Date: Wed, 10 Apr 2013 20:07:52 +0200

   I'll get those bugs sorted out, but at least gcc-4.6 and gcc-4.7 have
   this problem, and have had them for some time, so I think we should
   work around it.  A workaround that works is to pass -mcpu=v8
   -mcpu=supersparc instead of just plain -mcpu=supersparc
   
 I though -mcpu=foo -mcpu=bar would either be equivalent to just
 -mcpu=bar or just -mcpu=foo...

As per what the compiler decides to enable internally in the backend,
that expression evaluates to the last -mcpu= specifier.

But as far as specs are concerned, it evaluates differently, and
different enough for the assembler option logic for -mcpu=v8 in the to
kick in, in this case.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel