Re: RFR: 8224974: Implement JEP 352

Andrew Dinn Wed, 31 Jul 2019 04:30:39 -0700

Hi Aleksey,

I have an update regarding the change to the computation of the
CPU_FLUSH flag. After posting the new webrev I built a debug version
with the change that tests the clflush bit on x86_64. It crashed when
the assert is reached in VM_Version::supports_cpuflush:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (src/hotspot/cpu/x86/vm_version_x86.hpp:978),
pid=29597, tid=29602
#  assert((_features & ((uint64_t)(0x20000000000ULL))) != 0) failed:
clflush should be available

So, it seems the clflush bit is not set on my x86_64 box even though I
am able to execute clflush quite happily.

  "Toto, it looks like we are no longer in Antarctica."

So, I will revert this change (in the next webrev).

regards,


Andrew Dinn
-----------

On 31/07/2019 11:55, Andrew Dinn wrote:
> Hi Aleksey
> 
> On 30/07/2019 17:00, Aleksey Shipilev wrote:
>> On 7/30/19 5:04 PM, Andrew Dinn wrote:
>>> JEP 352 has now been targeted for inclusion in JDK14. The latest webrev
>>> for the implementation JIRA has been rebased to apply to the current
>>> tree. Is it now ok to push this change set?
>>>
>>> JIRA:   https://bugs.openjdk.java.net/browse/JDK-8224974
>>> webrev: http://cr.openjdk.java.net/~adinn/8224974/webrev.09
>> Is it okay to have a late code review? Here it goes:
> 
> Sure, welcome to the party even if most of the booze has been drunk :-)
> 
> Thank you for a very thorough review. Comments (mostly in agreement) are
> inline below. They include a few refusals and a walk past some of the
> jumps pending confirmation they truly form part of the obstacle course.
> 
> A new webrev against the latest jdk/jdk dev tree including all
> uncontested changes is here:
> 
>   http://cr.openjdk.java.net/~adinn/8224974/webrev.10/
> 
>> === General:
>>
>> So if pre wbsync is no-op, why do we need to handle it everywhere? We seem 
>> to be falling through all
>> the way to the stub to do nothing there, maybe we should instead cut much 
>> earlier, e.g. when
>> inlining Unsafe.writeBackPresync0? Would it be better to not emit 
>> CacheWBPreSync at all?
> 
> Ah, I see you brought your own bottle ;-)
> 
> The pre sync is definitely not needed at present. However, I put it
> there because I didn't know for sure if some future port of this
> capability (e.g. to ppc) might need to sync prior writes before writing
> back cache lines. [Indeed, the old Intel documentation stated that
> pre-sync was needed on x86 for clflush to be safe but it is definitely
> not needed.]
> 
> Anyway, folding out the pre-sync as you suggest would mean taking what
> is still potentially an architecture-specific decision in generic code.
> That may well prove to be a redundant precaution but it also costs
> little -- every affected /compile/ will last a tad longer and be a tad
> fatter in its memory footprint. If you think change is important I will
> happily oblige.
> 
>> === General:
>>
>> IIRC, notproduct and PRODUCT defines are legacy, and should be avoided? 
>> develop and ASSERT must be
>> the substitutes for this. See some discussion here: 
>> https://bugs.openjdk.java.net/browse/JDK-8183287
> 
> Ok done.
> 
> I changed #ifndef PRODUCT to #ifdef ASSERT in library_call.cpp and
> unsafe.cpp. I also redefined global flag TraceMemoryWriteback using
> develop rather than notproduct.
> 
> I also removed the AArch64 /product/ compiler option UsePOCForPOP. This
> allowed the JIT to use a memory writeback instruction with weakened
> semantics if the full writeback is not available in a product build.
> product was chosen because this was needed to establish a benchmark for
> writeback on existing kit when using a pseudo-NVRAM memory device based
> on volatile memory. The point was to be able to do an apples to apples
> comparison with writeback to a disk device. Obviously, it's not much
> use doing that test with a a non-product build. This is only important
> during development as a one-off test once the OS support for
> pseudo-NVRAM devices arrives in Linux AArch64 (which should happen well
> before new kit with the POP writeback arrives). I can still perform it
> using a custom build so I dropped this from the patch.
> 
>> === src/hotspot/cpu/aarch64/aarch64.ad
>>     src/hotspot/cpu/x86/x86.ad
>>
>> This should probably be just "!VM_Version...". Braces around the statement 
>> would not hurt either.
>>
>> 2196       if (VM_Version::supports_data_cache_line_flush() == false)
>> 2197         ret_value = false;
> 
> Done.
> 
>> === src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
>>     src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>
>> Braces style mismatch, should be on the same line, as in the rest of the 
>> file:
>>
>> 5837 void MacroAssembler::cache_wb(Address line)
>> 5838 {
>>
>> 5856 void MacroAssembler::cache_wbsync(bool is_pre)
>> 5857 {
> 
> Done.
> 
>> === src/hotspot/cpu/x86/assembler_x86.cpp
>>
>> It feels like these comments are redundant, especially L8630 and L8646 which 
>> mention magic values
>> "6" and "7", not present in the code:
>>
>> 8624   // instruction prefix is 0x66
>>
>> 8627   // opcode family is 0x0f 0xAE
>>
>> 8630   // extended opcode byte is 7 == rdi
>>
>> 8640   // instruction prefix is 0x66
>>
>> 8643   // opcode family is 0x0f 0xAE
>>
>> 8646   // extended opcode byte is 6 == rsi
> 
> Well, I think the question of redundancy depends on how you look at it.
> The comments are not there to explain that these values are the ones
> being passed to the emit functions -- that is pretty obvious. They are
> there to explain what those values mean i.e. to explain what elements of
> the fully assembled instruction the emit calls assembling piecemeal.
> Perhaps that reading woudl be clearer if each such equation of terms was
> done in reverse order:
> 
> 8624   // 0x66 is instruction prefix
> 
> 8627   // 0x0f 0xAE is opcode family
> 
> 8630   // rdi == 7 is extended opcode byte
> . . .
> 
> Given that the code is simply stuffing numbers (whether supplied as
> literals or as symbolic constants) into a byte stream I think these
> comments are a help when it comes to cross-checking each specific
> assembly against the corresponding numbers declared in the Intel
> manuals. So, I don't really want to remove them. Would you prefer me to
> reverse the wording as above?
> 
>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>
>> These comments feel redundant too. Well, maybe they should instead talk 
>> about the choices the
>> subsequent code makes.
>>
>> 9915   // pick the correct implementation
>>
>> 9936   // pick the correct implementation
> 
> I agree it would help to expand the comment to explain the logic of the
> choice in the first case:
> 
>   // prefer clwb (potentially parallel writeback without evict)
>   // otherwise prefer clflushopt (potentially parallel writeback
>   // with evict)
>   // otherwise fallback on clflush (serial writeback with evict)
> 
> In the second case the comment is redundant because the need for an
> sfence is covered by the existing comment inside the if:
> 
>     // need an sfence for post flush when using clflushopt or clwb
>     // otherwise no no need for any synchroniaztion
> 
>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>    src/hotspot/cpu/x86/vm_version_x86.hpp
>>
>> The docs say "The CLFLUSH instruction was introduced with the SSE2 
>> extensions; however, because it
>> has its own CPUID feature flag, it can be implemented in IA-32 processors 
>> that do not include the
>> SSE2 extensions. Also, detecting the presence of the SSE2 extensions with 
>> the CPUID instruction does
>> not guarantee that the CLFLUSH instruction is implemented in the processor."
>>
>> Yet, we have:
>>
>> 9910   // 64 bit cpus always support clflush
>> 9911   assert(VM_Version::supports_clflush(), "should not reach here on 
>> 32-bit");
>>
>>  505 #ifdef _LP64
>>  506     // cpflush is always available on x86_64
>>  507     result |= CPU_FLUSH;
>>  508 #else
>>  509     if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0)
>>  510       result |= CPU_FLUSH;
>>  511 #endif
>>
>>  967   // 64 bit cpus always support clflush which writes back and evicts
>>  968   // on 32 bit cpus support is recorded via a feature flag
>>
>>  980   static bool supports_clflush() { return true; }
>>
>> I think the assert should just say "clflush should be available", and 
>> clflush cpu flag detected for
>> 64-bits too?
> 
> I think that documentation is only talking about x86_32. I remember
> reading somewhere that clflush is always guaranteed to be available on
> x86_64 only I cannot remember where (I think maybe it was in Intel's
> libpmem code?). Anyway, I do not take the above to imply that clflush
> has been, or even can be, omitted on a 64-bit processor. However, let us
> assume that it can  in principle be omitted from an x86_64
> implementation. If so then I'll suggest that such a beast is a thing
> only in the sense that the Kingdom of Antarctica is a thing (i.e. it is
> an unactualized possible). As a non-existence proof I'll note that the
> JVM  already relies implicitly (i.e. without checking any CPU bits) on
> being able to call clflush on x86_64 (see icache_x86.cpp).
> 
> Anyway, granted the possibility of said Magic Kingdom I have changed the
> assert to say "clflush should be available". I have also
> 
> i) modified setting of the CPU_FLUSH flag to test the relevant bit on
> both 32 and 64 bits
> 
> +    // cpflush should always be available on x86_64
> +    // test the flag anyway and assert later
> +    if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0)
> +      result |= CPU_FLUSH;
>      . . .
> 
> ii) added an assert that it is set to the 64-bit version of
> supports_clflush:
> 
> +#ifdef _LP64
> +  static bool supports_clflush() {
> +    assert((_features & CPU_FLUSH) != 0, "clflush should be available");
> +    return true;
> +  }
>    . . .
> 
>> === src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>
>> Accidental camelCase, while hotspot code expects snake_case:
>>
>> 1794   void cache_wbsync(bool isPre);
> 
> Fixed.
> 
>> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>>
>> Any reason to avoid inline here, e.g. "__ cache_wb(Address(src, 0))"?
>>
>> 2921     const Address line(src, 0);
>> 2922     __ cache_wb(line);
> 
> Well, I guess only if you're "not into that whole brevity thing" (c.f.
> The Big Lebowski). Condensed accordingly.
> 
>> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>>
>> Is it really "cmpl" here, not "cmpb"? I think aarch64 code tests for byte.
>>
>> 2942     __ cmpl(is_pre, 0);
> 
> This is a Java boolean input. I believe that means the value will be
> loaded into c_arg0 as an int so this test ought to be adequate.
> 
> The Arch64 code actually tests the input value as a long word:
> 
>     __ enter();
>     __ cbnz(is_pre, skip);
>     . . .
> 
> n.b. cbnz == compare and branch if non-zero (64 bit). This latter code
> could (arguably) employ a cbnzw (i.e. 32-bit variant). However, the
> argument will have been loaded into c_rarg0 using an AArch64 int load
> which guarantees to zero the top 32 bits. So, it doesn't really make any
> difference whether this uses cbnz or cbnzw.
> 
>> === src/hotspot/share/classfile/vmSymbols.hpp
>>
>> Excess space before "jdk_internal..." here:
>>
>> 1096   do_intrinsic(_writebackPostSync0,        jdk_internal_misc_Unsafe,
>> writebackPostSync0_name, void_method_signature , F_RN)   \
> 
> Fixed.
> 
>> === src/hotspot/share/opto/c2compiler.cpp
>>
>> Why inject new cases here, instead of at the end of switch? Saves sudden 
>> "break":
>>
>>  578     break;
>>  579   case vmIntrinsics::_writeback0:
>>  580     if (!Matcher::match_rule_supported(Op_CacheWB)) return false;
>>  581     break;
>>  582   case vmIntrinsics::_writebackPreSync0:
>>  583     if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false;
>>  584     break;
>>  585   case vmIntrinsics::_writebackPostSync0:
>>  586     if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return 
>> false;
>>  587     break;
> 
> I placed them here so they were close to the other Unsafe intrinsics. In
> particular they precede _allocateInstance, an ordering which is also the
> case in the declarations in vmSymbols.hpp.
> 
> In what sense do you mean that an extra 'break' is saved? That would be
> true as regards the textual layout. It wouldn't affect the logic of
> folding different ranges of values into branching range tests (which is
> only determined by the numeric values of the intrinsics). If you are
> concerned about the former then I would argue that placing the values in
> declaration order seems to me to be the more important concern.
> 
>> === src/hotspot/share/opto/library_call.cpp
>>
>> Accidental camelCase here, should be snake_case:
>>
>>   257   bool inline_unsafe_writebackSync0(bool isPre);
>>
>>  2870 bool LibraryCallKit::inline_unsafe_writebackSync0(bool isPre) {
> 
> Fixed.
> 
>> === src/hotspot/share/prims/unsafe.cpp
>>
>> Odd indenting near "CC":
>>
>> 1122     {CC "writeback0",         CC "(" "J" ")V",           
>> FN_PTR(Unsafe_WriteBack0)},
>> 1123     {CC "writebackPreSync0",     CC "()V",               
>> FN_PTR(Unsafe_WriteBackPreSync0)},
>> 1124     {CC "writebackPostSync0",     CC "()V",              
>> FN_PTR(Unsafe_WriteBackPostSync0)},
> 
> Fixed.
> 
>> camelCase:
>>   464 static void doWriteBackSync0(bool isPre)
> 
> Fixed.
> 
>> === src/hotspot/share/prims/unsafe.cpp
>>
>> Do we really need this function pointer mess?
>>
>>  457   void (*wb)(void *);
>>  458   void *a = addr_from_java(line);
>>  459   wb = (void (*)(void *)) StubRoutines::data_cache_writeback();
>>  460   assert(wb != NULL, "generate writeback stub!");
>>  461   (*wb)(a);
>>
>> Seems easier to:
>>
>>  assert(StubRoutines::data_cache_writeback() != NULL, "sanity");
>>  StubRoutines::data_cache_writeback()(addr_from_java(line));
> Hmm, "that whole brevity thing" again? Well, I guess you must now call
> me "El Duderino".
> 
> Are you sure that all the compilers used to build openJDK will happily
> eat the second line of your replacement? If you can guarantee that I'll
> happily remove the type declarations.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
> 

-- 
regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

Re: RFR: 8224974: Implement JEP 352

Reply via email to