Re: RFR: 8207851: Implement JEP 352

2019-06-05 Thread Gustavo Romero

Hi Andrew,

On 05/24/2019 07:06 AM, Andrew Dinn wrote:

Ping!

Any takers for a review?


I found some trailing space in v5 and it seems they are in v6 as well.

You might want to check the followings:

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp
+// data cache line writeback
+StubRoutines::_data_cache_writeback = generate_data_cache_writeback();
+StubRoutines::_data_cache_writeback_sync = 
generate_data_cache_writeback_sync();
+  <-- here


src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
+if (_dcpop || UsePOCForPOP) {
+  _data_cache_line_flush_size = dcache_line;
+}
+  }
+  <-- here


src/hotspot/cpu/x86/macroAssembler_x86.cpp
+// no need for fence when using CLFLUSH
+clflush(line);
+  }   <-- here


src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
in generate_data_cache_writeback():
+__ ret(0);
+
+return start;<-- here

in generate_data_cache_writeback_sync():
+__ jcc(Assembler::notEqual, skip); <-- here

and
+// data cache line writeback
+StubRoutines::_data_cache_writeback = generate_data_cache_writeback();
+StubRoutines::_data_cache_writeback_sync = 
generate_data_cache_writeback_sync();
+<-- here (like in aarch64 file)


src/java.base/share/classes/java/nio/MappedByteBuffer.java
+private final boolean isSync;
+<-- here

and
+private boolean isSync() {
+return isSync;
+}
+<-- here


src/java.base/share/classes/jdk/internal/misc/Unsafe.java
+ *that must be guaranteed written back to memory.
+ *<-- here

and here:
+ * @since 13
+ */
+<-- here

+checkSize(length);
+}
+<-- here

+ * calling method {@link writeback #writeback} if it is disabled).
+ * <-- here

+throw new RuntimeException("writebackMemory not enabled!");
+}
+}
+<-- here

+// following memory writes
+@HotSpotIntrinsicCandidate
+private native void writebackPostSync0();
+<-- here


src/java.base/share/classes/sun/nio/ch/FileChannelImpl.java
+incrementStats();
+}
+<-- here

 totalCapacity -= cap;
 }
 }
 }
+<-- here

+super(address, size, cap, fd);
+incrementStats();
+}
+<-- here

+totalCapacity -= cap;
+}
+}
+}
+<-- here


src/java.base/unix/native/libnio/ch/FileChannelImpl.c
+// should never be called with map_sync and prot == PRIVATE
+assert((prot != sun_nio_ch_FileChannelImpl_MAP_PV) ||! map_sync);
+<-- here

+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE 0x03
+#endif
+<-- here

test/jdk/java/nio/MappedByteBuffer/PmemTest.java
+import com.sun.nio.file.ExtendedMapMode;
+<-- here

+public static final int NUM_KBS = 16;
+<-- here

Best regards,
Gustavo



Re: RFR: 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace

2018-12-12 Thread Gustavo Romero

Hi Martin,

On 12/12/2018 05:10 AM, Doerr, Martin wrote:

thanks for your improvements. You can also remove the trailing \t from the opto 
assembly.
Note that there's no need to re-push newer version to jdk/submit when only PPC 
files were changed. jdk/submit doesn't look at them.


Got it.

Pushed to jdk/jdk:
http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860

Thank you.

Best regards,
Gustavo
 

Best regards,
Martin


-Original Message-----
From: Gustavo Romero 
Sent: Mittwoch, 12. Dezember 2018 03:08
To: Michihiro Horie 
Cc: core-libs-dev@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; Doerr, Martin 
; Roger Riggs ; Vladimir Kozlov 
; Simonis, Volker 
Subject: Re: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

Hi Michi,

On 12/11/2018 11:12 PM, Michihiro Horie wrote:

Thank you for finding the issue on Power8. You do not need a check with 
has_darn in the ppc.ad. It is better to add a check in vm_versoin_ppc.


I agree.


I uploaded webrev.08 based on your webrev.07. (Thanks for the enhancement of 
opto assembly and removing trailing spaces!)
http://cr.openjdk.java.net/~mhorie/8213754/webrev.08/ 
<http://cr.openjdk.java.net/%7Emhorie/8213754/webrev.08/>


Thanks for the updated webrev. Looks good!
I've just pushed webrev.08 to jdk/submit expecting no failures as .07 passed 
fine.

Once I get the jdk/submit results tomorrow I'll push.

Best regards,
Gustavo





Re: RFR: 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace

2018-12-11 Thread Gustavo Romero

Hi Michi,

On 12/11/2018 11:12 PM, Michihiro Horie wrote:

Thank you for finding the issue on Power8. You do not need a check with 
has_darn in the ppc.ad. It is better to add a check in vm_versoin_ppc.


I agree.


I uploaded webrev.08 based on your webrev.07. (Thanks for the enhancement of 
opto assembly and removing trailing spaces!)
http://cr.openjdk.java.net/~mhorie/8213754/webrev.08/ 
<http://cr.openjdk.java.net/%7Emhorie/8213754/webrev.08/>


Thanks for the updated webrev. Looks good!
I've just pushed webrev.08 to jdk/submit expecting no failures as .07 passed 
fine.

Once I get the jdk/submit results tomorrow I'll push.

Best regards,
Gustavo



Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Gustavo Romero" ---2018/12/12 07:57:21---From: "Gustavo Romero"  To: 
"Do"Gustavo Romero" ---2018/12/12 07:57:21---From: "Gustavo Romero"  To: "Doerr, 
Martin" , Michihiro Horie/Japan/IBM@IBMJP

From: "Gustavo Romero" 
To: "Doerr, Martin" , Michihiro Horie/Japan/IBM@IBMJP
Cc: "core-libs-dev@openjdk.java.net" , "hotspot-compiler-...@openjdk.java.net" 
, "Roger Riggs" , "Vladimir Kozlov" 
, "Simonis, Volker" 
Date: 2018/12/12 07:57
Subject: Re: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

--



Hi Michi and Martin,

On 12/11/2018 01:30 PM, Doerr, Martin wrote:
 > thanks for updating. I think it can get shipped when Gustavo is fine with it 
and jdk/submit tests have passed.

webrev.06 removed has_darn from match_rule_supported and the JVM now crashes
with SIGILL (cmprb) on CPUs <= POWER8. I think that what we want is both
UseCharacterCompareIntrinsics and has_darn in the predicate, so:

diff -r 01b519fcb8a8 -r f3bd7a422a6c src/hotspot/cpu/ppc/ppc.ad
--- a/src/hotspot/cpu/ppc/ppc.ad        Tue Dec 11 11:01:02 2018 -0500
+++ b/src/hotspot/cpu/ppc/ppc.ad        Tue Dec 11 12:29:41 2018 -0500
@@ -2257,6 +2257,11 @@
      return SuperwordUseVSX;
    case Op_PopCountVI:
      return (SuperwordUseVSX && UsePopCountInstruction);
+  case Op_Digit:
+  case Op_LowerCase:
+  case Op_UpperCase:
+  case Op_Whitespace:
+    return (UseCharacterCompareIntrinsics && VM_Version::has_darn());
    }

Martin, is what you meant on your last comment about it?

I tested the change on POWER9 and it looks good (from webrev.06 and also with
webrev.06 + the fix above)!

I think that the Opto assembly could be improved a little bit around 'done:'
label, like:

diff -r 89aab62d10cd src/hotspot/cpu/ppc/ppc.ad
--- a/src/hotspot/cpu/ppc/ppc.ad        Tue Dec 11 12:29:41 2018 -0500
+++ b/src/hotspot/cpu/ppc/ppc.ad        Tue Dec 11 15:55:34 2018 -0500
@@ -12440,7 +12440,8 @@
              "LIS     $src2, (signed short)0xAAB5\n\t"
              "ORI     $src2, $src2, 0xBABA\n\t"
              "INSRDI  $src2, $src2, 32, 0\n\t"
-            "CMPEQB  $crx, 1, $src1, $src2\n\t"
+            "CMPEQB  $crx, 1, $src1, $src2\n"
+            "done:\n\t"
              "SETB    $dst, $crx" %}

    size(48);
@@ -12484,7 +12485,8 @@
              "BGT     $crx, done\n\t"
              "LIS     $src2, (signed short)0xD6C0\n\t"
              "ORI     $src2, $src2, 0xDED8\n\t"
-            "CMPRB   $crx, 1, $src1, $src2\n\t"
+            "CMPRB   $crx, 1, $src1, $src2\n"
+            "done:\n\t"
              "SETB    $dst, $crx" %}

so the output would be:

024   B2: #     B5 B3 <- B1  Freq: 1
024     LI      R14, 0x5A41
         CMPRB   CCR6, 0, R3, R14
         BGT     CCR6, done
         LIS     R14, (signed short)0xD6C0
         ORI     R14, R14, 0xDED8
         CMPRB   CCR6, 1, R3, R14
done:
         SETB    R15, CCR6
040     CMPWI   CCR6, R15, #0
044     Beq     CCR6, B5  P=0.00 C=5377.00

ins

Re: RFR: 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace

2018-12-11 Thread Gustavo Romero

Hi Michi and Martin,

On 12/11/2018 01:30 PM, Doerr, Martin wrote:

thanks for updating. I think it can get shipped when Gustavo is fine with it 
and jdk/submit tests have passed.


webrev.06 removed has_darn from match_rule_supported and the JVM now crashes
with SIGILL (cmprb) on CPUs <= POWER8. I think that what we want is both
UseCharacterCompareIntrinsics and has_darn in the predicate, so:

diff -r 01b519fcb8a8 -r f3bd7a422a6c src/hotspot/cpu/ppc/ppc.ad
--- a/src/hotspot/cpu/ppc/ppc.adTue Dec 11 11:01:02 2018 -0500
+++ b/src/hotspot/cpu/ppc/ppc.adTue Dec 11 12:29:41 2018 -0500
@@ -2257,6 +2257,11 @@
 return SuperwordUseVSX;
   case Op_PopCountVI:
 return (SuperwordUseVSX && UsePopCountInstruction);
+  case Op_Digit:
+  case Op_LowerCase:
+  case Op_UpperCase:
+  case Op_Whitespace:
+return (UseCharacterCompareIntrinsics && VM_Version::has_darn());
   }
  
Martin, is what you meant on your last comment about it?


I tested the change on POWER9 and it looks good (from webrev.06 and also with
webrev.06 + the fix above)!

I think that the Opto assembly could be improved a little bit around 'done:'
label, like:

diff -r 89aab62d10cd src/hotspot/cpu/ppc/ppc.ad
--- a/src/hotspot/cpu/ppc/ppc.adTue Dec 11 12:29:41 2018 -0500
+++ b/src/hotspot/cpu/ppc/ppc.adTue Dec 11 15:55:34 2018 -0500
@@ -12440,7 +12440,8 @@
 "LIS $src2, (signed short)0xAAB5\n\t"
 "ORI $src2, $src2, 0xBABA\n\t"
 "INSRDI  $src2, $src2, 32, 0\n\t"
-"CMPEQB  $crx, 1, $src1, $src2\n\t"
+"CMPEQB  $crx, 1, $src1, $src2\n"
+"done:\n\t"
 "SETB$dst, $crx" %}
 
   size(48);

@@ -12484,7 +12485,8 @@
 "BGT $crx, done\n\t"
 "LIS $src2, (signed short)0xD6C0\n\t"
 "ORI $src2, $src2, 0xDED8\n\t"
-"CMPRB   $crx, 1, $src1, $src2\n\t"
+"CMPRB   $crx, 1, $src1, $src2\n"
+"done:\n\t"
 "SETB$dst, $crx" %}

so the output would be:
 
024   B2: # B5 B3 <- B1  Freq: 1

024 LI  R14, 0x5A41
CMPRB   CCR6, 0, R3, R14
BGT CCR6, done
LIS R14, (signed short)0xD6C0
ORI R14, R14, 0xDED8
CMPRB   CCR6, 1, R3, R14
done:
SETBR15, CCR6
040 CMPWI   CCR6, R15, #0
044 Beq CCR6, B5  P=0.00 C=5377.00

instead of:

024   B2: # B5 B3 <- B1  Freq: 1
024 LI  R14, 0x5A41
CMPRB   CCR6, 0, R3, R14
BGT CCR6, done
LIS R14, (signed short)0xD6C0
ORI R14, R14, 0xDED8
CMPRB   CCR6, 1, R3, R14
SETBR15, CCR6
040 CMPWI   CCR6, R15, #0
044 Beq CCR6, B5  P=0.00 C=5379.00

If nobody opposes to that tiny enhancement I would like to keep it.

... and a nit: several trailing spaces here and there :)

I generated a webrev with has_darn in match_rule_supported, with the change to
the Opto asm, and with the trailing spaces removed. I uploaded it to:

http://cr.openjdk.java.net/~gromero/8213754/webrev.07/

I pushed that same version to jdk/submit [1] not expecting any failure and once
I get the fine results and if you, Martin are ok with the version in webrev.07
I'll push it to jdk/jdk.

I'll update this thread a soon as I get the results back from jdk/submit repo.
I think we have time until Thurday 16:00 UTC, even taking into account the
3+ different timezones working on that change :)

Thank you and best regards,
Gustavo

[1] 
http://mail.openjdk.java.net/pipermail/jdk-submit-changes/2018-December/004418.html


Note that changes pushed before Thursday 16:00 UTC will go into jdk12.

Best regards,

Martin

*From:*Michihiro Horie 
*Sent:* Dienstag, 11. Dezember 2018 14:08
*To:* Doerr, Martin 
*Cc:* core-libs-dev@openjdk.java.net; Gustavo Romero ; 
hotspot-compiler-...@openjdk.java.net; Roger Riggs ; Vladimir Kozlov 
; Simonis, Volker 
*Subject:* RE: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

Hi Martin,

Thank you so much for your review and pointing out the issue on flag enablement 
on PPC64.

Latest webrev enables the flag on PPC64 using has_darn in vm_version_ppc and it 
is used in match_rule_supported in the ad file.
http://cr.openjdk.java.net/~mhorie/8213754/webrev.06/ 
<http://cr.openjdk.java.net/%7Emhorie/8213754/webrev.06/>


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2018/12/11 20:31:40---Hi Michihiro, the 
shared code looks good to me, now."Doerr, Martin" ---2018/12/11 20:31:40---Hi Michihiro, 
the shared code looks good to me, now.

From: "Doerr, Martin" mailto:martin.do...@sap.com>>
To: Vladimir Kozlov mailto:vladimir.koz...@oracle.com>>

Re: RFR: 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace

2018-12-11 Thread Gustavo Romero

Hi Martin,

On 12/11/2018 01:30 PM, Doerr, Martin wrote:

thanks for updating. I think it can get shipped when Gustavo is fine with it 
and jdk/submit tests have passed.


Yep, I'm working on that right now. I'll update soon.
So I have to push to jdk/submit, wait the test results and if it's fine I'll 
push to jdk/jdk after, right?



Note that changes pushed before Thursday 16:00 UTC will go into jdk12.


Got it.


Thank you.

Best regards,
Gustavo


Best regards,

Martin

*From:*Michihiro Horie 
*Sent:* Dienstag, 11. Dezember 2018 14:08
*To:* Doerr, Martin 
*Cc:* core-libs-dev@openjdk.java.net; Gustavo Romero ; 
hotspot-compiler-...@openjdk.java.net; Roger Riggs ; Vladimir Kozlov 
; Simonis, Volker 
*Subject:* RE: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

Hi Martin,

Thank you so much for your review and pointing out the issue on flag enablement 
on PPC64.

Latest webrev enables the flag on PPC64 using has_darn in vm_version_ppc and it 
is used in match_rule_supported in the ad file.
http://cr.openjdk.java.net/~mhorie/8213754/webrev.06/ 
<http://cr.openjdk.java.net/%7Emhorie/8213754/webrev.06/>


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2018/12/11 20:31:40---Hi Michihiro, the 
shared code looks good to me, now."Doerr, Martin" ---2018/12/11 20:31:40---Hi Michihiro, 
the shared code looks good to me, now.

From: "Doerr, Martin" mailto:martin.do...@sap.com>>
To: Vladimir Kozlov mailto:vladimir.koz...@oracle.com>>, Michihiro Horie 
mailto:ho...@jp.ibm.com>>, Gustavo Romero mailto:grom...@linux.vnet.ibm.com>>
Cc: "core-libs-dev@openjdk.java.net <mailto:core-libs-dev@openjdk.java.net>" mailto:core-libs-dev@openjdk.java.net>>, "hotspot-compiler-...@openjdk.java.net <mailto:hotspot-compiler-...@openjdk.java.net>" 
mailto:hotspot-compiler-...@openjdk.java.net>>, Roger Riggs mailto:roger.ri...@oracle.com>>, "Simonis, Volker" mailto:volker.simo...@sap.com>>
Date: 2018/12/11 20:31
Subject: RE: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

--




Hi Michihiro,

the shared code looks good to me, now.

Looks like the PPC64 is not consistent any more.
Where do you enable UseCharacterCompareIntrinsics on PPC64?
Why aren't you using it for match_rule_supported in the ad file?
I think has_darn could be used to enable it in vm_version_ppc.

Best regards,
Martin


-Original Message-
From: Vladimir Kozlov mailto:vladimir.koz...@oracle.com>>
Sent: Montag, 10. Dezember 2018 20:33
To: Michihiro Horie mailto:ho...@jp.ibm.com>>; Gustavo Romero 
mailto:grom...@linux.vnet.ibm.com>>
Cc: core-libs-dev@openjdk.java.net <mailto:core-libs-dev@openjdk.java.net>; hotspot-compiler-...@openjdk.java.net 
<mailto:hotspot-compiler-...@openjdk.java.net>; Doerr, Martin mailto:martin.do...@sap.com>>; Roger Riggs mailto:roger.ri...@oracle.com>>; 
Simonis, Volker mailto:volker.simo...@sap.com>>
Subject: Re: RFR: 8213754: PPC64: Add Intrinsics for 
isDigit/isLowerCase/isUpperCase/isWhitespace

On 12/9/18 5:28 PM, Michihiro Horie wrote:

Hi Vladimir,

Thanks a lot for your review. I also fixed the copyright in intrinsicnode.hpp.

 >Did you look on code generated by C2 with your latest changes?
Yes,I usually check generated code with Oprofile and they were as expected like:
:
90 0.0905 : 3fff64c27654: rlwinm r12,r9,24,8,31
12 0.0121 : 3fff64c27658: li r11,14640
15 0.0151 : 3fff64c2765c: cmprb cr5,0,r31,r11
5527 5.5587 : 3fff64c27660: setb r11,cr5
36010 36.2164 : 3fff64c27664: stb r11,16(r15)


Good.


:

I found a CSR FAQ that mentions adding a diagnostic flag do not need CSR 
process. This time I do not need CSR.
https://wiki.openjdk.java.net/display/csr/CSR+FAQs


Thank is correct.



Hi Gustavo,
Could I ask you to sponsor the latest 

Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-14 Thread Gustavo Romero

Hi Goetz and Severin,

On 09/14/2018 07:12 AM, Lindenmaier, Goetz wrote:

Hi,

We are only talking about reducing O3 to O2 for the compilation
step of fdlibm, right?


Yes.



As I know, this is being replaced by Java coding, anyways.
Therefore I don't have a strong opinion for any of the
choices.


I totally agree with Goetz considering OpenJDK 12 and 11u. I was actually more
concerned about a possible future backport of that change, for instance: it gets
into 8u, but Java implementations never get. Maybe I forgot to mention my
concern about a 8u backport earlier.

Severin, anyway, as Goetz is also fine with either option, I'm fine with what
you and Andrew decide as the best option.

Thanks bearing with the discussion on ppc64.


Best regards,
Gustavo


Best regards,
   Goetz.


-Original Message-
From: core-libs-dev  On Behalf
Of Gustavo Romero
Sent: Freitag, 14. September 2018 01:30
To: Severin Gehwolf ; David Holmes
; Erik Joelsson ;
build-dev ; core-libs-dev 
Subject: Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-
optimized compilation

Hello Severin,

On 09/12/2018 04:48 AM, Severin Gehwolf wrote:

Hi David,

On Wed, 2018-09-12 at 13:57 +1000, David Holmes wrote:

Hi Severin,

Sorry I'm a bit confused now.

Originally for ppc/s390x/aarch64 the optimization level for fdlibm was
HIGH. But now it will be LOW due to:

 45 ifneq ($(FDLIBM_CFLAGS), )
 46   BUILD_LIBFDLIBM_OPTIMIZATION := LOW

as those platforms will use gcc/clang with support for -ffp-contract and
so FDLIBM_CFLAGS will not be empty.

??


Correct. That was done based on Andrew Haley's comment here:
http://mail.openjdk.java.net/pipermail/build-dev/2018-

September/023160.html


I've done some measurements with -O2 and -O3. -O2 is sufficient for a
2.75 speedup for sin/cos on ppc64le as compared to the -O0 baseline. On
the other hand the speedup of -O3 as compared to -O2 is only 1.05 for
sin/cos on ppc64le.

Since the difference between them were not huge, I've used -O2, a.k.a
LOW. See also:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-

8210416/microbenchmarks_results/


To me it seems -O2 is sufficiently good. Note that HIGH == -O3 and LOW
== -O2 on gcc arches. Does that clear things up?

FWIW, I'm happy to change it back to HIGH if there are concerns. See
also Gustavo's reply here for some more data points:
http://mail.openjdk.java.net/pipermail/build-dev/2018-

September/023193.html

I'm wondering why the possible issue of using -O3 on fdlibm would not affect
other parts of the JVM code (like the Opto code in libjvm.so) that afaics
are also built using -O3. Also, the gap of 1.05 for sin/cos, for instance,
still sounds like a regression to me.

Maybe a better approach to this would be to figure out a way to test -O3 and
-O2 with the real world case in which -O3 can have a bad impact. I'm not
sure if SPECjbb would qualify for that.


Best regards,
Gustavo


Thanks,
Severin


David

On 12/09/2018 2:14 AM, Severin Gehwolf wrote:

Hi Erik,

Thanks for the review!

On Tue, 2018-09-11 at 08:57 -0700, Erik Joelsson wrote:

Hello Severin,

Even if using the macro, I still think you need to add a condition on
the compiler types where the switch can be reasonably expected to

exist.


How about this?
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-

8210416/webrev.05/


Thanks,
Severin


On 2018-09-11 05:02, Severin Gehwolf wrote:

On Mon, 2018-09-10 at 09:29 -0700, Erik Joelsson wrote:

I see. I was not aware of that issue, but we clearly need to file a bug
for it and fix it. In this case I think it's fine to us the macro however.


OK. Update webrev, which now uses

FLAGS_COMPILER_CHECK_ARGUMENTS.


http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-

8210416/webrev.04/


Micro-benchmark results from running [1] for x86_64 and ppc64le are
here (-O2 is sufficient it seems):

http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-

8210416/microbenchmarks_results/


More thoughts?

Thanks,
Severin

[1] https://github.com/gromero/strictmath/












Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-13 Thread Gustavo Romero

Hello Severin,

On 09/12/2018 04:48 AM, Severin Gehwolf wrote:

Hi David,

On Wed, 2018-09-12 at 13:57 +1000, David Holmes wrote:

Hi Severin,

Sorry I'm a bit confused now.

Originally for ppc/s390x/aarch64 the optimization level for fdlibm was
HIGH. But now it will be LOW due to:

45 ifneq ($(FDLIBM_CFLAGS), )
46   BUILD_LIBFDLIBM_OPTIMIZATION := LOW

as those platforms will use gcc/clang with support for -ffp-contract and
so FDLIBM_CFLAGS will not be empty.

??


Correct. That was done based on Andrew Haley's comment here:
http://mail.openjdk.java.net/pipermail/build-dev/2018-September/023160.html

I've done some measurements with -O2 and -O3. -O2 is sufficient for a
2.75 speedup for sin/cos on ppc64le as compared to the -O0 baseline. On
the other hand the speedup of -O3 as compared to -O2 is only 1.05 for
sin/cos on ppc64le.

Since the difference between them were not huge, I've used -O2, a.k.a
LOW. See also:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/microbenchmarks_results/

To me it seems -O2 is sufficiently good. Note that HIGH == -O3 and LOW
== -O2 on gcc arches. Does that clear things up?

FWIW, I'm happy to change it back to HIGH if there are concerns. See
also Gustavo's reply here for some more data points:
http://mail.openjdk.java.net/pipermail/build-dev/2018-September/023193.html


I'm wondering why the possible issue of using -O3 on fdlibm would not affect
other parts of the JVM code (like the Opto code in libjvm.so) that afaics
are also built using -O3. Also, the gap of 1.05 for sin/cos, for instance,
still sounds like a regression to me.

Maybe a better approach to this would be to figure out a way to test -O3 and
-O2 with the real world case in which -O3 can have a bad impact. I'm not
sure if SPECjbb would qualify for that.


Best regards,
Gustavo


Thanks,
Severin


David

On 12/09/2018 2:14 AM, Severin Gehwolf wrote:

Hi Erik,

Thanks for the review!

On Tue, 2018-09-11 at 08:57 -0700, Erik Joelsson wrote:

Hello Severin,

Even if using the macro, I still think you need to add a condition on
the compiler types where the switch can be reasonably expected to exist.


How about this?
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/webrev.05/

Thanks,
Severin


On 2018-09-11 05:02, Severin Gehwolf wrote:

On Mon, 2018-09-10 at 09:29 -0700, Erik Joelsson wrote:

I see. I was not aware of that issue, but we clearly need to file a bug
for it and fix it. In this case I think it's fine to us the macro however.


OK. Update webrev, which now uses FLAGS_COMPILER_CHECK_ARGUMENTS.

http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/webrev.04/

Micro-benchmark results from running [1] for x86_64 and ppc64le are
here (-O2 is sufficient it seems):

http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/microbenchmarks_results/

More thoughts?

Thanks,
Severin

[1] https://github.com/gromero/strictmath/










Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-12 Thread Gustavo Romero

Hi Andrew,

On 09/12/2018 12:51 AM, Gustavo Romero wrote:


Maybe Andrew can enlight us.


I was not sure if 'enlight us' was the correct form here, so I did some search
and I found with horror today that 'enlighten us' can also be considered an
insult (!?).

That's really not the case here. So if it's really possible to interpret it in
a non-positive / non-cordial way I sincerely apologize for the wrong use.

I was really only asking for an advice. :)


Regards,
Gustavo



Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-11 Thread Gustavo Romero

Hi Severin,

On 09/11/2018 09:02 AM, Severin Gehwolf wrote:

Micro-benchmark results from running [1] for x86_64 and ppc64le are
here (-O2 is sufficient it seems):

http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/microbenchmarks_results/

More thoughts?


Thanks a lot for checking it on PPC64.

I did additional tests on a POWER8 using the same microbench. I share the
results below. The numbers are the arithmetic mean of 10 runs. Unit is second.

  1 O3   |  O2
  2  MathStrictMath  |   MathStrictMath
  3  |
  4 acos:3.0728  3.0728  |  acos:3.07325 3.07375
  5 asin:3.0933  3.0916  |  asin:3.09337 3.0915
  6 atan2:   10.9743 10.9759 |  atan2:   11.2994 11.2977
  7 atan:7.653   7.6602  |  atan:7.48463 7.48225
  8 cbrt:13.444  13.4433 |  cbrt:13.4535 13.4527
  9 cos: 30.1561 33.9689 |  cos: 30.157  35.1162
 10 cosh:3.0911  3.1074  |  cosh:3.09975 3.09088
 11 exp: 1.413   3.0572  |  exp: 1.41313 3.09737
 12 expm1:   3.0281  3.0349  |  expm1:   2.98813 2.98913
 13 hypot:   33.7668 33.7471 |  hypot:   27.9664 27.9675
 14 log10:   4.2213  10.0832 |  log10:   4.221   10.1109
 15 log1p:   9.2886  9.2888  |  log1p:   9.08713 9.08075
 16 log: 3.1362  7.6327  |  log: 3.136   7.633
 17 pow: 13.7787 17.4739 |  pow: 13.7829 17.483
 18 sin: 30.216  33.9733 |  sin: 30.2163 35.1684
 19 sinh:3.1478  3.152   |  sinh:3.18638 3.18375
 20 sqrt:0.665   0.6714  |  sqrt:0.665   0.671
 21 tan: 31.6409 36.461  |  tan: 31.6385 37.7166
 22 tanh:3.0753  3.0925  |  tanh:3.11088 3.10675


Based upon that, it looks like that O3 is better for most of the functions,
specially for sin, cos, and tan.

On the other hand, it looks like that O3 hurts hypot.

Thus it seems -O2 is not quite the same as -O3, but it's not straightforward
to decide which one is better too.

Maybe Andrew can enlight us.

I also CC:ed ppc-aix-port in case somebody there wishes to comment on that.

Thank you.


Best regards,
Gustavo


Thanks,
Severin

[1] https://github.com/gromero/strictmath/





Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-10 Thread Gustavo Romero

Hi Severin,

On 09/10/2018 06:27 AM, Severin Gehwolf wrote:

On Mon, 2018-09-10 at 10:05 +0100, Andrew Haley wrote:

On 09/05/2018 02:12 PM, Severin Gehwolf wrote:

Is there a good
reason to not use -O3 -ffp-contract=off everywhere?


Is there a good reason to use -O3 rather than -O2?


Not sure. I was following what JDK-8170153 did, which was using
OPTIMIZATION := HIGH corresponding to -O3. cc'ing Gustavo. Gustavo,
would you know why HIGH was chosen over, LOW?


I don't remember exactly, but at least for ppc64 I discussed that a bit with
the toolchain folks (also regarding the precision issue, etc) and they never
said anything against using -O3. Unfortunately it was long time ago so I
don't remember exactly the numbers on ppc64 for -O2 to check if it was
worse and so I selected -O3 instead.



-O3 can bloat the
code which can increase cache pressure, which is not always noticeable
in benchmarks but hurts real-world programs. Unless benchmarks are
significantly better at -O3, -O2 is a good default choice.


OK, thanks! I'll re-test and change to LOW (-O2) if it gives similar
results.


That's interesting. Andrew, do you mean bloat in the sense of final code size
(for instance, due to unrolling), right?

BTW (I just remembered that), on RISC the lack of optimization hurts way more
than the lack of optimization on CISC, so I recall that it puzzled me the fact
that turning on the optimization on x86_64 did not change much the scenario,
contrary to the conspicuous gains on on ppc64 when turning on the optimization.
I took me some time so to understand that the optimization flag was the culprit
(a much simpler case lucky), because I tried first to profile and optimize the
fdlibm code (after extracting it from JVM for detailed analysis) and only after
getting to a dead end I turned to look at simpler causes.

Are you checking the difference between -O2 and -O3 only on x86_64?


Best regards,
Gustavo



Re: RFR: 8210416: [linux] Poor StrictMath performance due to non-optimized compilation

2018-09-05 Thread Gustavo Romero

Hello,

On 09/05/2018 04:15 PM, joe darcy wrote:

Hello,


On 9/5/2018 6:12 AM, Severin Gehwolf wrote:

Hi,

Cross-posting this review-thread on core-libs-dev and build-dev as this
is a build change, but affects fdlibm which is core-libs.

With JDK-8170153 optimization for fdlibm code has been turned on for
ppc64 s390 and aarch64. This patch intends to turn it on on all arches
on Linux. I've not observed any precision issues. Is there a good
reason to not use -O3 -ffp-contract=off everywhere?

Bug:    https://bugs.openjdk.java.net/browse/JDK-8210416
webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8210416/webrev.01/

Testing: - java/lang/Math, java/lang/StrictMath tests (all pass).
  - Currently running through submit repo.

A simple micro benchmark from JDK-8170153[1] shows these numbers for
StrictMath:

Function  | before | after
--
sin   | 0m33.382s  | 0m18.731s
cos   | 0m31.562s  | 0m18.796s
tan   | 0m33.657s  | 0m21.093s
atan  | 0m5.714s   | 0m4.902s
log   | 0m6.212s   | 0m4.439s
log10 | 0m7.946s   | 0m5.543s
sqrt  | 0m0.481s   | 0m0.449s
cbrt  | 0m5.295s   | 0m5.214s
tanh  | 0m1.404s   | 0m1.307s
log1p | 0m6.457s   | 0m5.131s
IEEEremainder | 0m10.629s  | 0m6.048s
atan2 | 0m8.037s   | 0m5.668s
hypot | 0m2.171s   | 0m2.147s




Note that pow (JDK-8134795), hypot (JDK-7130085), cbrt (JDK-8136799), and exp 
(JDK-8139688), have been ported to Java as of JDK 9. The sqrt method is 
commonly handled as an intrinsic.

Testing that was not geared toward finding precision/rounding issues would be 
unlikely to find them.

I don't see the sources of the microbenchmark in JDK-8170153.


The microbench can be found on the first email I sent discussing that issue on 
Power.
Check it at the end, but there is nothing special in there:

http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-November/002738.html

There is also some discussing on the formal RFR:

http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-November/002751.html

Severin, the precision issues were quite conspicuous on Power when -O3 was 
turned on.
There are some discussions about it in the threads above IIRC. SAP helped to 
test the
change against additional tests, like TCK.

HTH.


Best regards,
Gustavo
 

Cheers,

-Joe





Re: RFC: Add new JCA provider to support hardware RNGs

2018-06-21 Thread Gustavo Romero

Hi Bernd,

Thanks for your comments.

On 06/20/2018 05:05 PM, Bernd Eckenfels wrote:

Just a FYI under Linux when you read from urandom the Linux kernel will always 
XOR with random bytes generated with x64 rdrand instruction 
(arch_get_random_lomg() - if supported). Since it is a XOR it does not have to 
trust the quality of this black box hardware implementation.


Yes, I know to some extent how the use of rdrand on Intel by Linux
generated debates in the past.



I would not implement a generator which does not have its own software 
whitening. (And most likely there is no need for one different than urandom on 
Linux). If you do implement whitening I would use a DRBG construction and no 
longer use a (low state) SHA1PRNG.


My main concern to use 'darn' on Power was to get some additional speed and
throughput on obtaining true random numbers (or on cases where /dev/random
- which blocks - was preferred to /dev/urandom). That concern seems valid
specially when the code is not JITed yet by the C2 compiler (so for
Interpreted and C1 Compiler). Anyway, the idea to have a separated provider
and that the use of the intrinsic is by no means enabled by default is to
let the user to decide if he/she wants to use that kind of generator. I
think OpenSSL disabled by default the use of rdrand similarly but let the
user to enable it if desired?

The idea of DRBG looks valid. I took SHA1PRNG as a fallback. But if DRBG
would be used for whitening that would need a seed from /dev/urandom?
(I'm trying to think the impact on performance here).


Best regards,
Gustavo


Gruss
Bernd

Gruss
Bernd
--
http://bernd.eckenfels.net

From: core-libs-dev  on behalf of Gustavo 
Romero 
Sent: Wednesday, June 20, 2018 2:59:47 AM
To: core-libs-dev; security-dev
Cc: vladimir.koz...@oracle.com; Doerr, Martin
Subject: RFC: Add new JCA provider to support hardware RNGs

Hi,

Please, could I get comments on the following change?

Since it's related to security, I would be glad if security
experts could also comment on that.

webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v6_rebased/

It introduces a way to get benefits from hardware instructions in userspace
that can delivery a random number and so be used to speed up and avoid
blocking in some SecureRandom methods, notably generateSeed() and
nextBytes(), without loosing the random number quality. Particularly on
Power, the new POWER9 CPUs introduced a hardware instruction called 'darn'
that can be used for that purpose.

The main idea is to add a new JCA provider called "HWTRNG" (if no better
name is suggested), adding new helper methods that can then be intrinsified
and that will be used  by generateSeed() and nextBytes(). In that sense,
this change also adds the proper mechanisms in the Interpreter,
C1 Compiler, and C2 compiler (for PPC64, but also paving the way for other
platforms) to intrinsify these helper methods that will be used in the end
by generateSeed() and nextBytes() methods. The added helpers are:

byte[] getRandomSequence0(int)
byte[] getRandomSequence1(byte[])
long validRandomLong(void)
long randomLong(void)

The helpers also take care of checking if the returned random number is
valid and attempt 10 times (as recommended by ISA) get a valid random
number before falling back to a software alternative (HWTRNG is based
mostly on JCA provider NativePRNG and so the fall back mechanism will use
the exactly same methods found in NativePRNG and hence behave similarly.
Nonetheless, in my experience such a hardware failures (which are the main
cause of a returned invalid random number) are quite rare: in my tests I
was never able to exhaust the HW RNG and force it to generate an invalid
random number).

The user, besides having to specify explicitly the use of a non-default
provider (HWTRNG), will have to unlock the VM experimental options to
effectively use the hardware RNG as an unique random source by passing
options "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics".

On Power, for the Interpreter and C1 Compiler, besides avoiding the
blocking effect of a low entropy on /dev/random, I was able to get about
126 Mbps (3x higher than the version without 'darn') on generaSeed() and
nextBytes(). The maximum theoretical value on a POWER9 machine is usually
128 Mbps.

I'll address the details about the file headers (Copyright, dates, authors,
versions, etc) after I get some feedback about this change.


Thanks in advance.

Best regards,
Gustavo





RFC: Add new JCA provider to support hardware RNGs

2018-06-19 Thread Gustavo Romero

Sorry for resending it. I missed a few MLs.

--

Hi,

Please, could I get comments on the following change?

Since it's related to security, I would be glad if security
experts could also comment on that.

webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v6_rebased/

It introduces a way to get benefits from hardware instructions in userspace
that can delivery a random number and so be used to speed up and avoid
blocking in some SecureRandom methods, notably generateSeed() and
nextBytes(), without loosing the random number quality. Particularly on
Power, the new POWER9 CPUs introduced a hardware instruction called 'darn'
that can be used for that purpose.

The main idea is to add a new JCA provider called "HWTRNG" (if no better
name is suggested), adding new helper methods that can then be intrinsified
and that will be used  by generateSeed() and nextBytes(). In that sense,
this change also adds the proper mechanisms in the Interpreter,
C1 Compiler, and C2 compiler (for PPC64, but also paving the way for other
platforms) to intrinsify these helper methods that will be used in the end
by generateSeed() and nextBytes() methods. The added helpers are:

   byte[] getRandomSequence0(int)
   byte[] getRandomSequence1(byte[])
   long validRandomLong(void)
   long randomLong(void)

The helpers also take care of checking if the returned random number is
valid and attempt 10 times (as recommended by ISA) get a valid random
number before falling back to a software alternative (HWTRNG is based
mostly on JCA provider NativePRNG and so the fall back mechanism will use
the exactly same methods found in NativePRNG and hence behave similarly.
Nonetheless, in my experience such a hardware failures (which are the main
cause of a returned invalid random number) are quite rare: in my tests I
was never able to exhaust the HW RNG and force it to generate an invalid
random number).

The user, besides having to specify explicitly the use of a non-default
provider (HWTRNG), will have to unlock the VM experimental options to
effectively use the hardware RNG as an unique random source by passing
options "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics".

On Power, for the Interpreter and C1 Compiler, besides avoiding the
blocking effect of a low entropy on /dev/random, I was able to get about
126 Mbps (3x higher than the version without 'darn') on generaSeed() and
nextBytes(). The maximum theoretical value on a POWER9 machine is usually
128 Mbps.

I'll address the details about the file headers (Copyright, dates, authors,
versions, etc) after I get some feedback about this change.


Thanks in advance.

Best regards,
Gustavo



RFC: Add new JCA provider to support hardware RNGs

2018-06-19 Thread Gustavo Romero

Hi,

Please, could I get comments on the following change?

Since it's related to security, I would be glad if security
experts could also comment on that.

webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v6_rebased/

It introduces a way to get benefits from hardware instructions in userspace
that can delivery a random number and so be used to speed up and avoid
blocking in some SecureRandom methods, notably generateSeed() and
nextBytes(), without loosing the random number quality. Particularly on
Power, the new POWER9 CPUs introduced a hardware instruction called 'darn'
that can be used for that purpose.

The main idea is to add a new JCA provider called "HWTRNG" (if no better
name is suggested), adding new helper methods that can then be intrinsified
and that will be used  by generateSeed() and nextBytes(). In that sense,
this change also adds the proper mechanisms in the Interpreter,
C1 Compiler, and C2 compiler (for PPC64, but also paving the way for other
platforms) to intrinsify these helper methods that will be used in the end
by generateSeed() and nextBytes() methods. The added helpers are:

  byte[] getRandomSequence0(int)
  byte[] getRandomSequence1(byte[])
  long validRandomLong(void)
  long randomLong(void)

The helpers also take care of checking if the returned random number is
valid and attempt 10 times (as recommended by ISA) get a valid random
number before falling back to a software alternative (HWTRNG is based
mostly on JCA provider NativePRNG and so the fall back mechanism will use
the exactly same methods found in NativePRNG and hence behave similarly.
Nonetheless, in my experience such a hardware failures (which are the main
cause of a returned invalid random number) are quite rare: in my tests I
was never able to exhaust the HW RNG and force it to generate an invalid
random number).

The user, besides having to specify explicitly the use of a non-default
provider (HWTRNG), will have to unlock the VM experimental options to
effectively use the hardware RNG as an unique random source by passing
options "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics".

On Power, for the Interpreter and C1 Compiler, besides avoiding the
blocking effect of a low entropy on /dev/random, I was able to get about
126 Mbps (3x higher than the version without 'darn') on generaSeed() and
nextBytes(). The maximum theoretical value on a POWER9 machine is usually
128 Mbps.

I'll address the details about the file headers (Copyright, dates, authors,
versions, etc) after I get some feedback about this change.


Thanks in advance.

Best regards,
Gustavo



Re: [aarch64-port-dev ] RFR(s) PPC64/s390x/aarch64: Poor StrictMath performance due to non-optimized compilation

2016-11-29 Thread Gustavo Romero
Hi Erik, Volker, Andrew

On 29-11-2016 16:15, Andrew Haley wrote:
> On 29/11/16 18:06, Volker Simonis wrote:
>> @Andrew: are you fine with Gustavos latest version of the change?
> 
> Sure.  The StrictMath versions all seem to give the same results.
> 
> Andrew.
> 

Thanks for reviewing the change!

I changed the "FC Extension Request" status to "reviewed".


Regards,
Gustavo



Re: [aarch64-port-dev ] RFR(s) PPC64/s390x/aarch64: Poor StrictMath performance due to non-optimized compilation

2016-11-29 Thread Gustavo Romero
Hi Andrew,

On 29-11-2016 13:35, Andrew Haley wrote:
> On 29/11/16 09:41, Volker Simonis wrote:
>> Thanks Gustavo,
>>
>> the change looks good.
>>
>> So now we're just waiting for another review from somebody of the aarch64 
>> folks.
>> Once we have that and the fc-request is approved I'll push the changes.
> 
> One thing I don't understand:
> 
> cos 0.17098435541865692 1m7.433s 0.1709843554185943 0m56.678s
> sin 1.7136493465700289 1m10.654s 1.7136493465700542 0m57.114s
> 
> Do you know what causes the lower digits to be different?  Is
> it that Math and StrictMath use different algorithms, not just
> different optimization levels?

I don't know exactly what's the root cause for that difference (in the result).
The difference is not present on x64, however on PPC64 even with -O0 (as it is
by now) that difference exists.

Math methods are intrisified, but StricMath are not. But I understand that Math
and StrictMath share the fdlibm code since I already changed some code in fdlibm
that reflected both on Math and StrictMath, so it's not clear to me where the
Math relaxation occurs on PPC64 (given that such a relaxation is allowed [1]).

For sure others much more experienced than I can comment about difference.


Regards,
Gustavo

[1] https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html



RFR(s) PPC64/s390x/aarch64: Poor StrictMath performance due to non-optimized compilation

2016-11-28 Thread Gustavo Romero
Hi all,

I'm re-sending due to JDK title update to include s390x and aarch64 archs.

Could the following webrev be reviewed, please?

webrev 1/2: http://cr.openjdk.java.net/~gromero/8170153/v2/
webrev 2/2: http://cr.openjdk.java.net/~gromero/8170153/v2/jdk/
bug:https://bugs.openjdk.java.net/browse/JDK-8170153

Thank you.


Regards,
Gustavo



Re: RFR(s) 8170153: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-28 Thread Gustavo Romero
Hi Volker,

Sorry for not replying earlier, it was day-off on Friday here...

On 25-11-2016 11:32, Volker Simonis wrote:
> Hi Gustavo,
> 
> we've realized that we have exactly the same problem on Linux/s390 so
> I hope you don't mind that I've updated the bug and the webrev to also
> include the fix for Linux/s390:
> 
> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170153.top/
> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170153.jdk/
> https://bugs.openjdk.java.net/browse/JDK-8170153
> 
> The top-level change stays the same (I've only added the current
> reviewers) and for the jdk change I've just added Linux/s390 as
> another platform which can compile fdlibm with HIGH optimization.

Actually, it's really cool to know that an analysis on PPC64 contributed
also to the s390 arch! :)

Thanks for providing the updated webrevs.


Regards,
Gustavo



Re: RFR(s) 8170153: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-23 Thread Gustavo Romero
Hi Erik,

On 23-11-2016 12:29, Erik Joelsson wrote:
> Build changes look ok.
> 
> In CoreLibraries.gmk, I think it would have been ok to keep the conditional 
> checking (OPENJDK_TARGET_CPU_ARCH, ppc), but this certainly works too.

Thanks a lot for reviewing the change.


Regards,
Gustavo



Re: RFR(s) 8170153: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-23 Thread Gustavo Romero
Hi Martin,

On 23-11-2016 12:38, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thanks for providing the webrevs.
> 
> I have ran the StrictMath jck tests which fail when building with -O3 and 
> without -ffp-contract=off:
> FailedTests:
> api/java_lang/StrictMath/desc.html#acos 
> javasoft.sqe.tests.api.java.lang.StrictMath.acos_test
> api/java_lang/StrictMath/desc.html#asin 
> javasoft.sqe.tests.api.java.lang.StrictMath.asin_test
> api/java_lang/StrictMath/desc.html#atan 
> javasoft.sqe.tests.api.java.lang.StrictMath.atan_test
> api/java_lang/StrictMath/desc.html#atan2 
> javasoft.sqe.tests.api.java.lang.StrictMath.atan2_test
> api/java_lang/StrictMath/desc.html#cos 
> javasoft.sqe.tests.api.java.lang.StrictMath.cos_test
> api/java_lang/StrictMath/desc.html#exp 
> javasoft.sqe.tests.api.java.lang.StrictMath.exp_test
> api/java_lang/StrictMath/desc.html#log 
> javasoft.sqe.tests.api.java.lang.StrictMath.log_test
> api/java_lang/StrictMath/desc.html#sin 
> javasoft.sqe.tests.api.java.lang.StrictMath.sin_test
> api/java_lang/StrictMath/desc.html#tan 
> javasoft.sqe.tests.api.java.lang.StrictMath.tan_test
> api/java_lang/StrictMath/index.html#expm1 
> javasoft.sqe.tests.api.java.lang.StrictMath.expm1Tests -TestCaseID ALL
> api/java_lang/StrictMath/index.html#log10 
> javasoft.sqe.tests.api.java.lang.StrictMath.log10Tests -TestCaseID ALL
> api/java_lang/StrictMath/index.html#log1p 
> javasoft.sqe.tests.api.java.lang.StrictMath.log1pTests -TestCaseID ALL
> 
> All of them have passed when building with -O3 and -ffp-contract=off (on 
> linuxppc64le).

Thank you very much for running the additional StrictMath jck tests against the 
change!


Best regards,
Gustavo



Re: RFR(s) 8170153: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-23 Thread Gustavo Romero
Hi Volker,

On 23-11-2016 12:05, Volker Simonis wrote:
> thanks a lot for tracking this down!

Happy to contribute :)


> The change looks good and I a can sponsor it once you get another
> review from the build group and the FC Extension Request was approved.

Thanks a lot for sponsoring it!


> In general I'd advise to sign the OCTLA [1] to get access to the Java
> SE TCK [2] as this contains quite a lot of additional conformance
> tests which can be quite valuable for changes like this.
> 
> Regards,
> Volker
> 
> [1] http://openjdk.java.net/legal/octla-java-se-8.pdf
> [2] http://openjdk.java.net/groups/conformance/JckAccess/

Right. I'll check the documentation and find a way to get access to the TCK.


Best regards,
Gustavo



RFR(s) 8170153: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-21 Thread Gustavo Romero
Hi,

Could the following change be reviewed, please?

webrev 1/2: http://cr.openjdk.java.net/~gromero/8170153/
webrev 2/2: http://cr.openjdk.java.net/~gromero/8170153/jdk/
bug:https://bugs.openjdk.java.net/browse/JDK-8170153

It enables fdlibm optimization on Linux PPC64 LE & BE and hence speeds up the
StrictMath methods (in some cases up to 3x) on that platform.

On PPC64 fdlibm optimization can be done without precision issues if
floating-point expression contraction is disable, i.e. if the compiler does not
use floating-point multiply-add (FMA). For further details please refer to gcc
bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78386

No regression was observed on Math and StrictMath tests:

Passed: java/lang/Math/AbsPositiveZero.java
Passed: java/lang/Math/Atan2Tests.java
Passed: java/lang/Math/CeilAndFloorTests.java
Passed: java/lang/Math/CubeRootTests.java
Passed: java/lang/Math/DivModTests.java
Passed: java/lang/Math/ExactArithTests.java
Passed: java/lang/Math/Expm1Tests.java
Passed: java/lang/Math/FusedMultiplyAddTests.java
Passed: java/lang/Math/HyperbolicTests.java
Passed: java/lang/Math/HypotTests.java
Passed: java/lang/Math/IeeeRecommendedTests.java
Passed: java/lang/Math/Log10Tests.java
Passed: java/lang/Math/Log1pTests.java
Passed: java/lang/Math/MinMax.java
Passed: java/lang/Math/MultiplicationTests.java
Passed: java/lang/Math/PowTests.java
Passed: java/lang/Math/Rint.java
Passed: java/lang/Math/RoundTests.java
Passed: java/lang/Math/SinCosCornerCasesTests.java
Passed: java/lang/Math/TanTests.java
Passed: java/lang/Math/WorstCaseTests.java
Test results: passed: 21

Passed: java/lang/StrictMath/CubeRootTests.java
Passed: java/lang/StrictMath/ExactArithTests.java
Passed: java/lang/StrictMath/Expm1Tests.java
Passed: java/lang/StrictMath/HyperbolicTests.java
Passed: java/lang/StrictMath/HypotTests.java
Passed: java/lang/StrictMath/Log10Tests.java
Passed: java/lang/StrictMath/Log1pTests.java
Passed: java/lang/StrictMath/PowTests.java
Test results: passed: 8

and also on the following hotspot tests:

Passed: compiler/intrinsics/mathexact/sanity/AddExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/AddExactLongTest.java
Passed: compiler/intrinsics/mathexact/sanity/DecrementExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/DecrementExactLongTest.java
Passed: compiler/intrinsics/mathexact/sanity/IncrementExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/IncrementExactLongTest.java
Passed: compiler/intrinsics/mathexact/sanity/MultiplyExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/MultiplyExactLongTest.java
Passed: compiler/intrinsics/mathexact/sanity/NegateExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/NegateExactLongTest.java
Passed: compiler/intrinsics/mathexact/sanity/SubtractExactIntTest.java
Passed: compiler/intrinsics/mathexact/sanity/SubtractExactLongTest.java
Passed: compiler/intrinsics/mathexact/AddExactICondTest.java
Passed: compiler/intrinsics/mathexact/AddExactIConstantTest.java
Passed: compiler/intrinsics/mathexact/AddExactILoadTest.java
Passed: compiler/intrinsics/mathexact/AddExactILoopDependentTest.java
Passed: compiler/intrinsics/mathexact/AddExactINonConstantTest.java
Passed: compiler/intrinsics/mathexact/AddExactIRepeatTest.java
Passed: compiler/intrinsics/mathexact/AddExactLConstantTest.java
Passed: compiler/intrinsics/mathexact/AddExactLNonConstantTest.java
Passed: compiler/intrinsics/mathexact/CompareTest.java
Passed: compiler/intrinsics/mathexact/DecExactITest.java
Passed: compiler/intrinsics/mathexact/DecExactLTest.java
Passed: compiler/intrinsics/mathexact/GVNTest.java
Passed: compiler/intrinsics/mathexact/IncExactITest.java
Passed: compiler/intrinsics/mathexact/IncExactLTest.java
Passed: compiler/intrinsics/mathexact/MulExactICondTest.java
Passed: compiler/intrinsics/mathexact/MulExactIConstantTest.java
Passed: compiler/intrinsics/mathexact/MulExactILoadTest.java
Passed: compiler/intrinsics/mathexact/MulExactILoopDependentTest.java
Passed: compiler/intrinsics/mathexact/MulExactINonConstantTest.java
Passed: compiler/intrinsics/mathexact/MulExactIRepeatTest.java
Passed: compiler/intrinsics/mathexact/MulExactLConstantTest.java
Passed: compiler/intrinsics/mathexact/MulExactLNonConstantTest.java
Passed: compiler/intrinsics/mathexact/NegExactIConstantTest.java
Passed: compiler/intrinsics/mathexact/NegExactILoadTest.java
Passed: compiler/intrinsics/mathexact/NegExactILoopDependentTest.java
Passed: compiler/intrinsics/mathexact/NegExactINonConstantTest.java
Passed: compiler/intrinsics/mathexact/NegExactLConstantTest.java
Passed: compiler/intrinsics/mathexact/NegExactLNonConstantTest.java
Passed: compiler/intrinsics/mathexact/NestedMathExactTest.java
Passed: compiler/intrinsics/mathexact/SplitThruPhiTest.java
Passed: compiler/intrinsics/mathexact/SubExactICondTest.java
Passed: compiler/intrinsics/mathexact/SubExactIConstantTest.java
Passed: compiler/intrinsics/mathexact/SubExactILoadTest.java
Passed: c

Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-21 Thread Gustavo Romero
Hi Derek,

On 17-11-2016 20:47, White, Derek wrote:
> Hi Joe,
> 
> Although neither a floating point expert (as I think I've proven to you over 
> the years), or a gcc expert, I checked with our in-house gcc expert and got 
> this following answer:
> 
>   "Yes using -fno-strict-aliasing fixes the issues.  Also there are many 
> forks of fdlibm which has this fixed including the code inside glibc. "

I've tried on PPC64 -O3 and -fno-strict-aliasing but it didn't work. Disabling
the FMA fixed the issue although.

Do you know if the gap between Math and StrictMath is also huge on aarch64?

Thank you.


Regards,
Gustavo:



Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-21 Thread Gustavo Romero
Hi Chris,

On 17-11-2016 19:48, Chris Plummer wrote:
>> The fdlibm code relies on aliasing a two-element array of int with a double 
>> to do bit-level reads and writes of floating-point values. As I understand 
>> it, the C spec allows compilers to assume
>> values of different types don't overlap in memory. The compilation 
>> environment has to be configured in such a way that the C compiler disables 
>> code generation and optimization techniques that would
>> run afoul of these fdlibm coding practices.
> This is the strict aliasing issue right? It's a long standing problem with 
> fdlibm that kept getting worse as gcc got smarter. IIRC, compiling with 
> -fno-strict-aliasing fixes it, but it's been more
> than 12 years since I last dealt with fdlibm and compiler aliasing issues.

I've tested with -O3 and -fno-strict-aliasing as you suggested but it did not
fix the fp precision issue on PPC64.

After finding that -fno-expensive-optimizations solved the problem, we narrowed
down the problem to the FMA: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78386

Thank you.


Regards,
Gustavo



Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-21 Thread Gustavo Romero
Hi Joe,

On 17-11-2016 19:33, joe darcy wrote:
 Currently, optimization for building fdlibm is disabled, except for the
 "solaris" OS target [1].
>>> The reason for that is because historically the Solaris compilers have had 
>>> sufficient discipline and control regarding floating-point semantics and 
>>> compiler optimizations to still implement the
>>> Java-mandated results when optimization was enabled. The gcc family of 
>>> compilers, for example, has lacked such discipline.
>> oh, I see. Thanks for clarifying that. I was exactly wondering why fdlibm
>> optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 only, does
>> not affect the precision, even if setting -O3 does not improve the 
>> performance
>> as much as on PPC64.
> 
> The fdlibm code relies on aliasing a two-element array of int with a double 
> to do bit-level reads and writes of floating-point values. As I understand 
> it, the C spec allows compilers to assume values
> of different types don't overlap in memory. The compilation environment has 
> to be configured in such a way that the C compiler disables code generation 
> and optimization techniques that would run afoul
> of these fdlibm coding practices.

On discussing with the Power toolchain folks we narrowed down the issue on PPC64
to the FMA. -fno-strict-aliasing has no effect and when used with an aggressive
optimization does not solve the issue on precision. Thus -ffp-contract=off is
the best options we have by now to optimize the fdlibm on PPC64.


>>> Methods in the Math class, such as pow, are often intrinsified and use a 
>>> different algorithm so a straight performance comparison may not be as fair 
>>> or meaningful in those cases.
>> I agree. It's just that the issue on StrictMath methods was first noted due 
>> to
>> that huge gap (Math vs StrictMath) on PPC64, which is not prominent on x64.
> 
> Depending on how Math.{sin, cos} is implemented on PPC64, compiling the 
> fdlibm sin/cos with more aggressive optimizations should not be expected to 
> close the performance gap. In particular, if
> Math.{sin, cos} is an intrinsic on PPC64 (I haven't checked the sources) that 
> used platform-specific feature (say fused multiply add instructions) then 
> just compiling fdlibm more aggressively wouldn't
> necessarily make up that gap.

In our case (PPC64) it does close the gap. Non-optimized code will suffer a lot,
for instance, from load-hit-store issues. Contrary to what happens on PPC64, the
gap on x64 seems to be quite small as you said.


> 
> To allow cross-platform and cross-release reproducibility, StrictMath is 
> specified to use the particular fdlibm algorithms, which precludes using 
> better algorithms developed more recently. If we were
> to start with a clean slate today, to get such reproducibility we would 
> specify correctly-rounded behavior of all those methods, but such an approach 
> was much less tractable technical 20+ years ago
> without benefit of the research that was been done in the interim, such as 
> the work of Prof. Muller and associates: 
> https://lipforge.ens-lyon.fr/projects/crlibm/.
> 
>>
>>
>>> Accumulating the the results of the functions and comparisons the sums is 
>>> not a sufficiently robust way of checking to see if the optimized versions 
>>> are indeed equivalent to the non-optimized ones.
>>> The specification of StrictMath requires a particular result for each set 
>>> of floating-point arguments and sums get round-away low-order bits that 
>>> differ.
>> That's really good point, thanks for letting me know about that. I'll 
>> re-test my
>> change under that perspective.
>>
>>
>>> Running the JDK math library regression tests and corresponding JCK tests 
>>> is recommended for work in this area.
>> Got it. By "the JDK math library regression tests" you mean exactly which 
>> test
>> suite? the jtreg tests?
> 
> Specifically, the regression tests under test/java/lang/Math and 
> test/java/lang/StrictMath in the jdk repository. There are some other math 
> library tests in the hotspot repo, but I don't know where
> they are offhand.
> 
> A note on methodologies, when I've been writing test for my port I've tried 
> to include test cases that exercise all the branches point in the code. Due 
> to the large input space (~2^64 for a
> single-argument method), random sampling alone is an inefficient way to try 
> to find differences in behavior.
>> For testing against JCK/TCK I'll need some help on that.
>>
> 
> I believe the JCK/TCK does have additional testcases relevant here.
> 
> HTH; thanks,
> 
> -Joe
> 

Thank you very much for the valuable comments.

I'll send a webrev accordingly for review.

I filed a bug: https://bugs.openjdk.java.net/browse/JDK-8170153


Best regards,
Gustavo



Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-17 Thread Gustavo Romero
Hi Joe,

Thanks a lot for your valuable comments.

On 17-11-2016 15:35, joe darcy wrote:
>> Currently, optimization for building fdlibm is disabled, except for the
>> "solaris" OS target [1].
> 
> The reason for that is because historically the Solaris compilers have had 
> sufficient discipline and control regarding floating-point semantics and 
> compiler optimizations to still implement the
> Java-mandated results when optimization was enabled. The gcc family of 
> compilers, for example, has lacked such discipline.

oh, I see. Thanks for clarifying that. I was exactly wondering why fdlibm
optimization is off even for x86_x64 as it, AFAICS regarding gcc 5 only, does
not affect the precision, even if setting -O3 does not improve the performance
as much as on PPC64.


>> As a consequence on PPC64 (Linux) StrictMath methods like, but not limited 
>> to,
>> sin(), cos(), and tan() perform verify poor in comparison to the same methods
>> in Math class [2]:
> 
> If you are doing your work against JDK 9, note that the pow, hypot, and cbrt 
> fdlibm methods required by StrictMath have been ported to Java (JDK-8134780: 
> Port fdlibm to Java). I have intentions to
> port the remaining methods to Java, but it is unclear whether or not this 
> will occur for JDK 9.

Yes, I'm doing my work against 9. So is there any problem if I proceed with my
change? I understand that there is no conflict as JDK-8134780 progresses and
replaces the StrictMath methods by their counterparts in Java. Please, advice.

Is it intended to downport JDK-8134780 to 8?


> Methods in the Math class, such as pow, are often intrinsified and use a 
> different algorithm so a straight performance comparison may not be as fair 
> or meaningful in those cases.

I agree. It's just that the issue on StrictMath methods was first noted due to
that huge gap (Math vs StrictMath) on PPC64, which is not prominent on x64.


> Accumulating the the results of the functions and comparisons the sums is not 
> a sufficiently robust way of checking to see if the optimized versions are 
> indeed equivalent to the non-optimized ones.
> The specification of StrictMath requires a particular result for each set of 
> floating-point arguments and sums get round-away low-order bits that differ.

That's really good point, thanks for letting me know about that. I'll re-test my
change under that perspective.


> Running the JDK math library regression tests and corresponding JCK tests is 
> recommended for work in this area.

Got it. By "the JDK math library regression tests" you mean exactly which test
suite? the jtreg tests?

For testing against JCK/TCK I'll need some help on that.

Thank you very much.


Regards,
Gustavo



Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-17 Thread Gustavo Romero
Hi Erik,

On 17-11-2016 07:17, Erik Joelsson wrote:
> Overall this looks reasonable to me. However, if we want to introduce a new 
> possible tuple for specifying compilation flags to SetupNativeCompilation, we 
> (the build team) would prefer if we used
> OPENJDK_TARGET_CPU instead of OPENJDK_TARGET_CPU_ARCH.

Got it. Thanks a lot for that info. I'll take that into account.

Regards,
Gustavo



Re: PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-17 Thread Gustavo Romero
Hi David,

On 17-11-2016 00:31, David Holmes wrote:
> Adding in build-dev as they need to scrutinize all build changes.

Thanks a lot.

Regards,
Gustavo



PPC64: Poor StrictMath performance due to non-optimized compilation

2016-11-16 Thread Gustavo Romero
Hi,

Currently, optimization for building fdlibm is disabled, except for the
"solaris" OS target [1].

As a consequence on PPC64 (Linux) StrictMath methods like, but not limited to,
sin(), cos(), and tan() perform verify poor in comparison to the same methods
in Math class [2]:

   Math  StrictMath
  =  ==
sin   0m29.984s   1m41.184s
cos   0m30.031s   1m41.200s
tan   0m31.772s   1m46.976s
asin   0m4.577s0m4.543s
acos   0m4.539s0m4.525s
atan  0m12.929s   0m12.896s
exp0m1.071s0m4.570s
log0m3.272s   0m14.239s
log10  0m4.362s   0m20.236s
sqrt   0m0.913s0m0.981s
cbrt  0m10.786s   0m10.808s
sinh   0m4.438s0m4.433s
cosh   0m4.496s0m4.478s
tanh   0m3.360s0m3.353s
expm1  0m4.076s0m4.094s
log1p  0m13.518s  0m13.527s
IEEEremainder  0m38.803s  0m38.909s
atan2  0m20.100s  0m20.057s
pow0m14.096s  0m19.938s
hypot  0m5.136s0m5.122s


Switching on the O3 optimization can damage precision of those methods,
nonetheless it's possible to avoid that side effect and yet get huge benefits of
the -O3 optimization on PPC64 if -fno-expensive-optimizations is passed in
addition to the -O3  optimization flag.

In that sense the following change is proposed to resolve the issue:

diff -r 81eb4bd34611 make/lib/CoreLibraries.gmk
--- a/make/lib/CoreLibraries.gmkWed Nov 09 13:37:19 2016 +0100
+++ b/make/lib/CoreLibraries.gmkWed Nov 16 19:11:11 2016 -0500
@@ -33,10 +33,16 @@
 # libfdlibm is statically linked with libjava below and not delivered into the
 # product on its own.

-BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
+BUILD_LIBFDLIBM_OPTIMIZATION := NONE

-ifneq ($(OPENJDK_TARGET_OS), solaris)
-  BUILD_LIBFDLIBM_OPTIMIZATION := NONE
+ifeq ($(OPENJDK_TARGET_OS), solaris)
+  BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
+endif
+
+ifeq ($(OPENJDK_TARGET_OS), linux)
+  ifeq ($(OPENJDK_TARGET_CPU_ARCH), ppc)
+BUILD_LIBFDLIBM_OPTIMIZATION := HIGH
+  endif
 endif

 LIBFDLIBM_SRC := $(JDK_TOPDIR)/src/java.base/share/native/libfdlibm
@@ -51,6 +57,7 @@
   CFLAGS := $(CFLAGS_JDKLIB) $(LIBFDLIBM_CFLAGS), \
   CFLAGS_windows_debug := -DLOGGING, \
   CFLAGS_aix := -qfloat=nomaf, \
+  CFLAGS_linux_ppc := -fno-expensive-optimizations, \
   DISABLED_WARNINGS_gcc := sign-compare, \
   DISABLED_WARNINGS_microsoft := 4146 4244 4018, \
   ARFLAGS := $(ARFLAGS), \


diff -r 2a1f97c0ad3d make/common/NativeCompilation.gmk
--- a/make/common/NativeCompilation.gmk Wed Nov 09 15:32:39 2016 +0100
+++ b/make/common/NativeCompilation.gmk Wed Nov 16 19:08:06 2016 -0500
@@ -569,16 +569,19 @@
   $1_ALL_OBJS := $$(sort $$($1_EXPECTED_OBJS) $$($1_EXTRA_OBJECT_FILES))

   # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent 
variables for CFLAGS.
-  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) 
$$($1_CFLAGS_$(OPENJDK_TARGET_OS))
+  $1_EXTRA_CFLAGS:=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)) 
$$($1_CFLAGS_$(OPENJDK_TARGET_OS)) \
+  $$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH))
   ifneq ($(DEBUG_LEVEL),release)
 # Pickup extra debug dependent variables for CFLAGS
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_debug)
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_debug)
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_debug)
+
$1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_debug)
   else
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_release)
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS_TYPE)_release)
 $1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_release)
+
$1_EXTRA_CFLAGS+=$$($1_CFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU_ARCH)_release)
   endif

   # Pickup extra OPENJDK_TARGET_OS_TYPE and/or OPENJDK_TARGET_OS dependent 
variables for CXXFLAGS.


After enabling the optimization it's possible to again up to 3x on performance
regarding the aforementioned methods without losing precision:

   StrictMath, original  StrictMath, 
optimized
      

sin1.7136493465700542 1m41.184s  1.7136493465700542  
0m33.895s
cos0.1709843554185943 1m41.200s  0.1709843554185943  
0m33.884s
tan -5.5500322522995315E7 1m46.976s   -5.5500322522995315E7  
0m36.461s
asin  NaN  0m4.543s NaN   
0m3.175s
acos  NaN  0m4.525s NaN   
0m3.211s
atan 1.5707961389886132E8 0m12.896s1.5707961389886132E8   
0m7.100s
exp  Infinity  0m4.570sInfinity   
0m3.187s
log  1.7420680845245087E9 0m14.239s1.7420680845245087E9   
0m7.170s
log10 7.565705562087342E8 0m20.236s 7.565705562087342E8   
0m9.610