Hi Aleksey, I have an update regarding the change to the computation of the CPU_FLUSH flag. After posting the new webrev I built a debug version with the change that tests the clflush bit on x86_64. It crashed when the assert is reached in VM_Version::supports_cpuflush:
# A fatal error has been detected by the Java Runtime Environment: # # Internal Error (src/hotspot/cpu/x86/vm_version_x86.hpp:978), pid=29597, tid=29602 # assert((_features & ((uint64_t)(0x20000000000ULL))) != 0) failed: clflush should be available So, it seems the clflush bit is not set on my x86_64 box even though I am able to execute clflush quite happily. "Toto, it looks like we are no longer in Antarctica." So, I will revert this change (in the next webrev). regards, Andrew Dinn ----------- On 31/07/2019 11:55, Andrew Dinn wrote: > Hi Aleksey > > On 30/07/2019 17:00, Aleksey Shipilev wrote: >> On 7/30/19 5:04 PM, Andrew Dinn wrote: >>> JEP 352 has now been targeted for inclusion in JDK14. The latest webrev >>> for the implementation JIRA has been rebased to apply to the current >>> tree. Is it now ok to push this change set? >>> >>> JIRA: https://bugs.openjdk.java.net/browse/JDK-8224974 >>> webrev: http://cr.openjdk.java.net/~adinn/8224974/webrev.09 >> Is it okay to have a late code review? Here it goes: > > Sure, welcome to the party even if most of the booze has been drunk :-) > > Thank you for a very thorough review. Comments (mostly in agreement) are > inline below. They include a few refusals and a walk past some of the > jumps pending confirmation they truly form part of the obstacle course. > > A new webrev against the latest jdk/jdk dev tree including all > uncontested changes is here: > > http://cr.openjdk.java.net/~adinn/8224974/webrev.10/ > >> === General: >> >> So if pre wbsync is no-op, why do we need to handle it everywhere? We seem >> to be falling through all >> the way to the stub to do nothing there, maybe we should instead cut much >> earlier, e.g. when >> inlining Unsafe.writeBackPresync0? Would it be better to not emit >> CacheWBPreSync at all? > > Ah, I see you brought your own bottle ;-) > > The pre sync is definitely not needed at present. However, I put it > there because I didn't know for sure if some future port of this > capability (e.g. to ppc) might need to sync prior writes before writing > back cache lines. [Indeed, the old Intel documentation stated that > pre-sync was needed on x86 for clflush to be safe but it is definitely > not needed.] > > Anyway, folding out the pre-sync as you suggest would mean taking what > is still potentially an architecture-specific decision in generic code. > That may well prove to be a redundant precaution but it also costs > little -- every affected /compile/ will last a tad longer and be a tad > fatter in its memory footprint. If you think change is important I will > happily oblige. > >> === General: >> >> IIRC, notproduct and PRODUCT defines are legacy, and should be avoided? >> develop and ASSERT must be >> the substitutes for this. See some discussion here: >> https://bugs.openjdk.java.net/browse/JDK-8183287 > > Ok done. > > I changed #ifndef PRODUCT to #ifdef ASSERT in library_call.cpp and > unsafe.cpp. I also redefined global flag TraceMemoryWriteback using > develop rather than notproduct. > > I also removed the AArch64 /product/ compiler option UsePOCForPOP. This > allowed the JIT to use a memory writeback instruction with weakened > semantics if the full writeback is not available in a product build. > product was chosen because this was needed to establish a benchmark for > writeback on existing kit when using a pseudo-NVRAM memory device based > on volatile memory. The point was to be able to do an apples to apples > comparison with writeback to a disk device. Obviously, it's not much > use doing that test with a a non-product build. This is only important > during development as a one-off test once the OS support for > pseudo-NVRAM devices arrives in Linux AArch64 (which should happen well > before new kit with the POP writeback arrives). I can still perform it > using a custom build so I dropped this from the patch. > >> === src/hotspot/cpu/aarch64/aarch64.ad >> src/hotspot/cpu/x86/x86.ad >> >> This should probably be just "!VM_Version...". Braces around the statement >> would not hurt either. >> >> 2196 if (VM_Version::supports_data_cache_line_flush() == false) >> 2197 ret_value = false; > > Done. > >> === src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> src/hotspot/cpu/x86/macroAssembler_x86.cpp >> >> Braces style mismatch, should be on the same line, as in the rest of the >> file: >> >> 5837 void MacroAssembler::cache_wb(Address line) >> 5838 { >> >> 5856 void MacroAssembler::cache_wbsync(bool is_pre) >> 5857 { > > Done. > >> === src/hotspot/cpu/x86/assembler_x86.cpp >> >> It feels like these comments are redundant, especially L8630 and L8646 which >> mention magic values >> "6" and "7", not present in the code: >> >> 8624 // instruction prefix is 0x66 >> >> 8627 // opcode family is 0x0f 0xAE >> >> 8630 // extended opcode byte is 7 == rdi >> >> 8640 // instruction prefix is 0x66 >> >> 8643 // opcode family is 0x0f 0xAE >> >> 8646 // extended opcode byte is 6 == rsi > > Well, I think the question of redundancy depends on how you look at it. > The comments are not there to explain that these values are the ones > being passed to the emit functions -- that is pretty obvious. They are > there to explain what those values mean i.e. to explain what elements of > the fully assembled instruction the emit calls assembling piecemeal. > Perhaps that reading woudl be clearer if each such equation of terms was > done in reverse order: > > 8624 // 0x66 is instruction prefix > > 8627 // 0x0f 0xAE is opcode family > > 8630 // rdi == 7 is extended opcode byte > . . . > > Given that the code is simply stuffing numbers (whether supplied as > literals or as symbolic constants) into a byte stream I think these > comments are a help when it comes to cross-checking each specific > assembly against the corresponding numbers declared in the Intel > manuals. So, I don't really want to remove them. Would you prefer me to > reverse the wording as above? > >> === src/hotspot/cpu/x86/macroAssembler_x86.cpp >> >> These comments feel redundant too. Well, maybe they should instead talk >> about the choices the >> subsequent code makes. >> >> 9915 // pick the correct implementation >> >> 9936 // pick the correct implementation > > I agree it would help to expand the comment to explain the logic of the > choice in the first case: > > // prefer clwb (potentially parallel writeback without evict) > // otherwise prefer clflushopt (potentially parallel writeback > // with evict) > // otherwise fallback on clflush (serial writeback with evict) > > In the second case the comment is redundant because the need for an > sfence is covered by the existing comment inside the if: > > // need an sfence for post flush when using clflushopt or clwb > // otherwise no no need for any synchroniaztion > >> === src/hotspot/cpu/x86/macroAssembler_x86.cpp >> src/hotspot/cpu/x86/vm_version_x86.hpp >> >> The docs say "The CLFLUSH instruction was introduced with the SSE2 >> extensions; however, because it >> has its own CPUID feature flag, it can be implemented in IA-32 processors >> that do not include the >> SSE2 extensions. Also, detecting the presence of the SSE2 extensions with >> the CPUID instruction does >> not guarantee that the CLFLUSH instruction is implemented in the processor." >> >> Yet, we have: >> >> 9910 // 64 bit cpus always support clflush >> 9911 assert(VM_Version::supports_clflush(), "should not reach here on >> 32-bit"); >> >> 505 #ifdef _LP64 >> 506 // cpflush is always available on x86_64 >> 507 result |= CPU_FLUSH; >> 508 #else >> 509 if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0) >> 510 result |= CPU_FLUSH; >> 511 #endif >> >> 967 // 64 bit cpus always support clflush which writes back and evicts >> 968 // on 32 bit cpus support is recorded via a feature flag >> >> 980 static bool supports_clflush() { return true; } >> >> I think the assert should just say "clflush should be available", and >> clflush cpu flag detected for >> 64-bits too? > > I think that documentation is only talking about x86_32. I remember > reading somewhere that clflush is always guaranteed to be available on > x86_64 only I cannot remember where (I think maybe it was in Intel's > libpmem code?). Anyway, I do not take the above to imply that clflush > has been, or even can be, omitted on a 64-bit processor. However, let us > assume that it can in principle be omitted from an x86_64 > implementation. If so then I'll suggest that such a beast is a thing > only in the sense that the Kingdom of Antarctica is a thing (i.e. it is > an unactualized possible). As a non-existence proof I'll note that the > JVM already relies implicitly (i.e. without checking any CPU bits) on > being able to call clflush on x86_64 (see icache_x86.cpp). > > Anyway, granted the possibility of said Magic Kingdom I have changed the > assert to say "clflush should be available". I have also > > i) modified setting of the CPU_FLUSH flag to test the relevant bit on > both 32 and 64 bits > > + // cpflush should always be available on x86_64 > + // test the flag anyway and assert later > + if (_cpuid_info.std_cpuid1_edx.bits.clflush != 0) > + result |= CPU_FLUSH; > . . . > > ii) added an assert that it is set to the 64-bit version of > supports_clflush: > > +#ifdef _LP64 > + static bool supports_clflush() { > + assert((_features & CPU_FLUSH) != 0, "clflush should be available"); > + return true; > + } > . . . > >> === src/hotspot/cpu/x86/macroAssembler_x86.hpp >> >> Accidental camelCase, while hotspot code expects snake_case: >> >> 1794 void cache_wbsync(bool isPre); > > Fixed. > >> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp >> >> Any reason to avoid inline here, e.g. "__ cache_wb(Address(src, 0))"? >> >> 2921 const Address line(src, 0); >> 2922 __ cache_wb(line); > > Well, I guess only if you're "not into that whole brevity thing" (c.f. > The Big Lebowski). Condensed accordingly. > >> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp >> >> Is it really "cmpl" here, not "cmpb"? I think aarch64 code tests for byte. >> >> 2942 __ cmpl(is_pre, 0); > > This is a Java boolean input. I believe that means the value will be > loaded into c_arg0 as an int so this test ought to be adequate. > > The Arch64 code actually tests the input value as a long word: > > __ enter(); > __ cbnz(is_pre, skip); > . . . > > n.b. cbnz == compare and branch if non-zero (64 bit). This latter code > could (arguably) employ a cbnzw (i.e. 32-bit variant). However, the > argument will have been loaded into c_rarg0 using an AArch64 int load > which guarantees to zero the top 32 bits. So, it doesn't really make any > difference whether this uses cbnz or cbnzw. > >> === src/hotspot/share/classfile/vmSymbols.hpp >> >> Excess space before "jdk_internal..." here: >> >> 1096 do_intrinsic(_writebackPostSync0, jdk_internal_misc_Unsafe, >> writebackPostSync0_name, void_method_signature , F_RN) \ > > Fixed. > >> === src/hotspot/share/opto/c2compiler.cpp >> >> Why inject new cases here, instead of at the end of switch? Saves sudden >> "break": >> >> 578 break; >> 579 case vmIntrinsics::_writeback0: >> 580 if (!Matcher::match_rule_supported(Op_CacheWB)) return false; >> 581 break; >> 582 case vmIntrinsics::_writebackPreSync0: >> 583 if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false; >> 584 break; >> 585 case vmIntrinsics::_writebackPostSync0: >> 586 if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return >> false; >> 587 break; > > I placed them here so they were close to the other Unsafe intrinsics. In > particular they precede _allocateInstance, an ordering which is also the > case in the declarations in vmSymbols.hpp. > > In what sense do you mean that an extra 'break' is saved? That would be > true as regards the textual layout. It wouldn't affect the logic of > folding different ranges of values into branching range tests (which is > only determined by the numeric values of the intrinsics). If you are > concerned about the former then I would argue that placing the values in > declaration order seems to me to be the more important concern. > >> === src/hotspot/share/opto/library_call.cpp >> >> Accidental camelCase here, should be snake_case: >> >> 257 bool inline_unsafe_writebackSync0(bool isPre); >> >> 2870 bool LibraryCallKit::inline_unsafe_writebackSync0(bool isPre) { > > Fixed. > >> === src/hotspot/share/prims/unsafe.cpp >> >> Odd indenting near "CC": >> >> 1122 {CC "writeback0", CC "(" "J" ")V", >> FN_PTR(Unsafe_WriteBack0)}, >> 1123 {CC "writebackPreSync0", CC "()V", >> FN_PTR(Unsafe_WriteBackPreSync0)}, >> 1124 {CC "writebackPostSync0", CC "()V", >> FN_PTR(Unsafe_WriteBackPostSync0)}, > > Fixed. > >> camelCase: >> 464 static void doWriteBackSync0(bool isPre) > > Fixed. > >> === src/hotspot/share/prims/unsafe.cpp >> >> Do we really need this function pointer mess? >> >> 457 void (*wb)(void *); >> 458 void *a = addr_from_java(line); >> 459 wb = (void (*)(void *)) StubRoutines::data_cache_writeback(); >> 460 assert(wb != NULL, "generate writeback stub!"); >> 461 (*wb)(a); >> >> Seems easier to: >> >> assert(StubRoutines::data_cache_writeback() != NULL, "sanity"); >> StubRoutines::data_cache_writeback()(addr_from_java(line)); > Hmm, "that whole brevity thing" again? Well, I guess you must now call > me "El Duderino". > > Are you sure that all the compilers used to build openJDK will happily > eat the second line of your replacement? If you can guarantee that I'll > happily remove the type declarations. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander > -- regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander