Re: RFR: 8248238: Implementation of JEP: Windows AArch64 Support

2020-09-19 Thread Andrew Haley
On 18/09/2020 11:14, Monica Beckwith wrote:
> This is a continuation of 
> https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-August/009566.html

The diffs in assembler_aarch64.cpp are mostly spurious. Please try this.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
diff --git a/src/hotspot/cpu/aarch64/aarch64-asmtest.py 
b/src/hotspot/cpu/aarch64/aarch64-asmtest.py
index f5a5c6b5aee..43bac8e8142 100644
--- a/src/hotspot/cpu/aarch64/aarch64-asmtest.py
+++ b/src/hotspot/cpu/aarch64/aarch64-asmtest.py
@@ -12,8 +12,11 @@ class Operand(object):
 class Register(Operand):
 
 def generate(self):
-self.number = random.randint(0, 30)
-return self
+while True:
+self.number = random.randint(0, 30)
+# r18 is used for TLS on Windows ABI.
+if self.number != 18:
+return self
 
 def astr(self, prefix):
 return prefix + str(self.number)
@@ -36,8 +39,10 @@ class GeneralRegister(Register):
 class GeneralRegisterOrZr(Register):
 
 def generate(self):
-self.number = random.randint(0, 31)
-return self
+while True:
+self.number = random.randint(0, 31)
+if self.number != 18:
+return self
 
 def astr(self, prefix = ""):
 if (self.number == 31):
@@ -53,8 +58,10 @@ class GeneralRegisterOrZr(Register):
 
 class GeneralRegisterOrSp(Register):
 def generate(self):
-self.number = random.randint(0, 31)
-return self
+while True:
+self.number = random.randint(0, 31)
+if self.number != 18:
+return self
 
 def astr(self, prefix = ""):
 if (self.number == 31):
@@ -1331,7 +1338,7 @@ generate(SpecialCases, [["ccmn",   "__ ccmn(zr, zr, 3u, 
Assembler::LE);",
 ["st1w",   "__ sve_st1w(z0, __ S, p1, Address(r0, 
7));", "st1w\t{z0.s}, p1, [x0, #7, MUL VL]"],
 ["st1b",   "__ sve_st1b(z0, __ B, p2, Address(sp, 
r1));","st1b\t{z0.b}, p2, [sp, x1]"],
 ["st1h",   "__ sve_st1h(z0, __ H, p3, Address(sp, 
r8));","st1h\t{z0.h}, p3, [sp, x8, LSL #1]"],
-["st1d",   "__ sve_st1d(z0, __ D, p4, Address(r0, 
r18));",   "st1d\t{z0.d}, p4, [x0, x18, LSL #3]"],
+["st1d",   "__ sve_st1d(z0, __ D, p4, Address(r0, 
r17));",   "st1d\t{z0.d}, p4, [x0, x17, LSL #3]"],
 ["ldr","__ sve_ldr(z0, Address(sp));", 
  "ldr\tz0, [sp]"],
 ["ldr","__ sve_ldr(z31, Address(sp, -256));",  
  "ldr\tz31, [sp, #-256, MUL VL]"],
 ["str","__ sve_str(z8, Address(r8, 255));",
  "str\tz8, [x8, #255, MUL VL]"],


Re: RFR: 8248238: Implementation of JEP: Windows AArch64 Support

2020-09-19 Thread Andrew Haley
On 18/09/2020 11:14, Monica Beckwith wrote:

> This is a continuation of
> https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-August/009566.html
>
> Changes since then:
> * We've improved the write barrier as suggested by Andrew [1]

It's still wrong, I'm afraid. This is not a full barrier:

+#define FULL_MEM_BARRIER atomic_thread_fence(std::memory_order_acq_rel);

it is only StoreStore|LoadStore|LoadLoad, but you need StoreLoad as
well. It might well be that you get the same DMB ISH instruction, but
unless you use a StoreLoad barrier here it's theoretically possible
for a compiler to reorder accesses so that a processor sees its own
stores before other processors do. (And it's confusing for the reader
too.)

Use:

+#define FULL_MEM_BARRIER atomic_thread_fence(std::memory_order_seq_cst);

See here:

https://en.cppreference.com/w/cpp/atomic/memory_order

memory_order_seq_cst "...plus a single total order exists in which all
threads observe all modifications in the same order (see
Sequentially-consistent ordering below)"

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671