Re: Confusion regarding 'mark-sweep-compact' naming
On 8/16/19 10:07 AM, Gil Tene wrote: > Classification terms evolve as the art evolves. What I want to emphasize here that this discussion reinvents the meaning of "sweep". That definition is not used in the way you describe in the sources I know of. Granted, definitions drift over time, but we have to be careful to separate what is the "agreed on" definition, and what is whatever definition we want to be the emergent one. > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev wrote: > I am trying to understand what is your central argument here. This seems > to be it. Are you saying > that "sweeping" is when you visit dead objects, and non-"sweeping" is > when you do not? > > Yes. I think that's a very good summary. Would be helpful if we started from this next time around! > Is walking _dead_ objects the discriminator for "sweeping" then? So in > your book, if we take the > same Lisp2 collector, and compute-new-addresses and adjust-pointers steps > are walking the > self-parsable heap (visiting dead objects along the way), it is M-S-C? > But if it uses marking > bitmap > (thus only visiting live objects), it becomes M-C? [This would be a weird > flex, but okay]. > > > Exactly. "In my book" adding an efficient livemark bit vector with possible > summary layers would > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact (with > no sweep). So this is what irks me here. In my Epsilon's mark-compact toy collector (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to self-parsable heap traversal is about 5 lines of code. This does not feel like passing the bar for reclassification of the collector algo between major classes. It feels more like the implementation detail that can swing both ways, depending on circumstances. -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/3bc6ee45-67d2-076c-879a-4bbf487f63a4%40gmail.com. signature.asc Description: OpenPGP digital signature
Re: Confusion regarding 'mark-sweep-compact' naming
On 8/14/19 9:21 PM, Gil Tene wrote: > The fact that they visit and parse dead objects in order to identify > recycle-able memory is what makes them > sweepers. Other techniques that don't visit dead objects do not perform a > sweep. I am trying to understand what is your central argument here. This seems to be it. Are you saying that "sweeping" is when you visit dead objects, and non-"sweeping" is when you do not? >> I would leave "sweep" as something that reclaims the storage of the objects >> (i.e. "sweeps away" the >> garbage), which makes M-C and M-S classes more clearly defined and easier to >> reason about. > > When M-C implies "also does a full linear traversal that visits all dead > objects" (aka "a sweep"), that> terminology fails. Examples like C4, G1GC, > Shenandoah, and ZGC are all Mark-Compact (NO Sweep) > collectors, while ParallelGC and Serial GC are Mark-Compact (WITH Sweep) > collectors. Since the math > behind those varies dramatically in wasy that effect everyday choices, using > "M-C" as a classification > that covers both leads people to misunderstand the interactions between e.g. > heap size and GC work. M-C does not imply "full linear traversal that visits all dead objects". As the counter-example, it is very practical to use off-side marking bitmaps to walk only live objects of the heap. Careful design of such the marking bitmap would yield dramatically better results than using whatever self-parsable-heap walk. You can even make it multi-level and quite dense to skip over large chunks of sparse heap. When the marking phase is a separate phase, it stands to reason that the subsequent phases would have to process the marking results somehow. That would naturally do some sort of walk over the data structure that handles marks. If we call that walk "sweeping", then every Mark-* algorithm is necessarily "sweeping" too. So we have to break this somehow to make the definition viable. Is walking _dead_ objects the discriminator for "sweeping" then? So in your book, if we take the same Lisp2 collector, and compute-new-addresses and adjust-pointers steps are walking the self-parsable heap (visiting dead objects along the way), it is M-S-C? But if it uses marking bitmap (thus only visiting live objects), it becomes M-C? [This would be a weird flex, but okay]. -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/1de1d683-164c-7163-7138-ee2f12f66638%40gmail.com. signature.asc Description: OpenPGP digital signature
Re: Confusion regarding 'mark-sweep-compact' naming
On 8/6/19 7:38 AM, Peter Veentjer wrote: > There is still some confusion on my side. This time it is regarding the > algorithmic complexity of > mark-sweep-compact. > > The complexity of the mark phase is linear to the live set. > The complexity of the sweep phase is linear to the heap size. > The complexity of the compact phase is linear to the live set. Well... The multiplicative factors in either case might make one linearity perform better than the other on wildly different heap sizes and live sets. The adversarial example of this is N large byte arrays of size M. Compact might have to copy N*M bytes (especially when doing sliding compaction), and sweep might only need to visit N locations to check on object metadata. There are second-order heap size effects for both mark and compact. Take two heaps with the same live set, one very dense and one very sparse, and the mere fact you have to walk memory far away adds up the secondary effect of heap size. It would not linger for compacting collectors, though, as first few compaction cycles would collapse the heap to denser variant. That is all to say that algorithmic complexity is a seriously bad way to reason about (GC) performance. > Why would you ever want to include the a sweep phase because the complexity > of mark-sweep-compact is > now linear to the heap size; a lot worse compared to 'linear to the live > set'. Get rid of the sweep > and complexity goes down to 'linear to the live set'. What is the added value > of such an inefficient > GC implementation? I think m-s-c nomenclature is confusing, as the actual implementations are doing either m-s or m-c during their individual phases. The benefit of doing either is well covered. The hard naming question is how to properly label the collector where different phases employ different algos. For example, in CMS the young collection is copying, old collection is mark-sweep, fall-back full GC is mark-compact. How would you label CMS then? I believe marking it m-s-c is borderline tolerable, until users get confused thinking both sweep and compact happen in the same phase. Also with implementations that manage backing storage in fine-grained chunks (for example, free lists), "sweep" might mean dealing with them on per-dead-object basis. For example, this might mean preparing the storage for subsequent compaction by "sweeping" out old objects before compaction kicks in, which requires free space. Practical implementations I know opt to instead reconstruct the backing storage metadata after massive compaction moves, though. My $0.02c, anyway. -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/e1755e01-58d6-3db4-bb65-ee7bc72b0ead%40gmail.com. signature.asc Description: OpenPGP digital signature
Re: Release fence pollution in constructors and builders
On 7/17/19 9:06 AM, Andriy Plokhotnyuk wrote: > 1) Using of release (and sometime store-store) fences in all constructors in > GraalVM: > https://github.com/oracle/graal/blob/e9dbc9d6d3080daf22dec38ce8c234417e1a8e3c/compiler/src/org.graalvm.compiler.replacements/src/org/graalvm/compiler/replacements/DefaultJavaLoweringProvider.java#L925 I don't follow. It puts LoadStore for all constructors that store final fields (as expected), and StoreStore for everything else (as expected). StoreStore is needed to protect the recently allocated object metadata from the races. Note that you can see the "default" values of Java fields because publishing the reference to the object may appear "before" constructor stores are observed. But the same thing would be catastrophic for "metadata" stores like initial mark and klass words: in the _best_ case scenario, JVM would just crash. Graal is not alone in this, see e.g. https://bugs.openjdk.java.net/browse/JDK-8032481 > 2) Using of release fences in constructors or builders of immutable > collections in Scala 2.13: > https://github.com/scala/scala/search?q=releaseFence%28%29=Code I vaguely remember this being discussed, but cannot remember the conclusion. It tries to emulate final fields without declaring them final? It does not work completely without LoadLoad on the reader side (https://shipilev.net/blog/2014/all-fields-are-final/#_implementation_support), but it is safer than not doing the release stores at all. > The question is: Why all these fences cannot be moved to places which work > with concurrency > immediately: GCs, static field initializers, concurrent data structures and > APIs like futures, > actors, etc.? I don't follow this suggestion. These fences are about surviving races (this is what finals are for), and thus every follow-up access to the object is exposed on every path, including those that do not deal with concurrency "immediately". -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/3cca63d5-0085-2387-13c1-b5dfe2d26158%40gmail.com. signature.asc Description: OpenPGP digital signature
Re: JMM- conservative and formal point of view.
On 05/16/2018 09:19 PM, John Hening wrote: > By the way, in your transcription you said: > | > inta; > > volatilebooleanready =false; > > voidactor(IntResult1r){ > while(!ready){}; > r.r1 =a; > } > > voidobserver(){ > a =41; > a =42; > ready =true; > a =43; > } > | > > > You said that possible values for r1 are 42 and 43. I cannot understand why > 41 is impossible. What > does prevent write(a, 41) and write(a, 42) from being reordered? Formally, there is no plausible JMM execution that yields both read(ready):true and read(a):41, which means observing 41 is forbidden in any conforming implementation. Proof sketch starts from the realization that in all executions where we observed r(ready):true it is inevitable that: w(ready, true) --hb--> r(ready):true ...which transitively extends to: w(a, 41) --hb--> w(a, 42) --hb--> w(ready, true) --hb--> r(ready):true --hb--> r(a):? Happens-before consistency mandates that reads have to see either the latest writes in happens-before order, or any other write that does not connected by happens-before. So, in the execution above, you can only have r(a):42 [latest in HB], or r(a):43 [race]. But all of that is too low-level for day-to-day work. What you need is a more high-level abstraction: ready = true "released" the consistent view of stores in actor, and once observer read "ready == true", it "acquired" that consistent view. That's what safe publication (or release-acquire) basically is. Moreover, if there are no races, you cannot see inconsistent (racy) values, and then your program behaves as if it is sequentially-consistent (and with caveats, this is what SC-DRF says). This high-level understanding plays well when you retract the thinking to single-threaded case. What exactly prevents the single thread seeing 41 in this example? What exactly prevents the reordering of two stores? ;) void test() { a = 41; a = 42; r1 = a; } You can use the similar JMM proof to work out that only 42 is allowed. But on high level, it is the same "consistent view", and release-acquire just extends it to the multi-threaded case. > JMM is a weak model. It means that JMM is allowed to reorder all normal > memory operations provided > that reordered program is equivalent in one-threaded environment. In my > example I see nothing > prevents reordering. That's where the problem lies: once you think about the reorderings, you are slipped into thinking about the implementation, then barriers show up, and everything gets too confusing. The trick is to slap yourself every time "barrier" or "reordering" bubbles up :) Hardware and runtime would speak to each other to maintain the illusion that release-acquire works as you would expect. Coming back to the original slide you mentioned, roach motel semantics basically says that it is almost always safe for the implementation to move operations into the release-acquire. It is harder to prove that moving operations out is safe, but it is sometimes easily provable too. It is an interesting exercise to understand the implementation for mechanical-sympathetic reasons e.g. for performance modeling, but not so much for correctness proofs. -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: JMM- conservative and formal point of view.
On 05/16/2018 07:33 PM, John Hening wrote: > Subject: JMM- conservative and formal point of view. There is no "conservative" JMM point of view. There is JSR 133 Cookbook for Compiler Writers that describes the conservative implementation, but it is not the JMM itself. Reductio ad absurdum: suppose JSR 133 Fanfic for Concurrency Experts would say the easy way to be JMM conformant is to acquire lock before entering any thread -- basically implementing Global Interpreter Lock. Would you be willing to take the existence of GIL in some implementations as the guidance for writing reliable software? I hope not. > As I understand this slide points that JMM does not constitute that a > volatile write does not work > as a (Load/Store)Store barrier, and it doesn't in fact. Double negation is confusing here. That slide points out that roach motel semantics basically says that some transformations are easy (putting stuff "into" the acq/rel block), and others are still possible, but hard. There are cases where it is not hard, actually: for example, when we know the access is provably thread-local. > But, JVM is allowed to reorder a such situation only in special cases, when > the execution will be > *equivalent as if write(x,1) wouldn't be reorderd with release(g)*. In > another words, here that > reordering is allowed if and only if JVM is able to prove that it is > equivalent to the situation > without reordering. Caveat: when the *outcome* would be equivalent to some allowed JMM execution. It does not mean the actual implementation should issue machine instructions in any given order. > The question is: > > So, for considering *correctness* of program (and only *correctness*) can we > assume that volatile > store works as (Load/Store)Store barrier in fact? No. The fluff about barriers is implementation details, which gets confusing very quickly as implementations screw with your code. I ranted about this a week ago at GeeCON: https://shipilev.net/talks/geecon-May2018-jmm.pdf -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: JITed very simple method.
On 02/05/2018 09:58 PM, John Hening wrote: > 1. Why do you find it is a bug in tierred compilation machinery? It is not on > my eye. The > compilation level is: C2, level 4 Wait a minute, where exactly does it say "C2, level 4" for you? I am saying the disassembly you have provided in the original message is probably level 2/3, and if it is hot, that might be a bug in tiered compilation machinery: we are not supposed to spend a lot of time in methods with profiling enabled. But you haven't verified that disassembly is actually on the hot path. Level 4 code would be on hotpath, and it would be without profiling. > 2. I have still a doubt: why profiling counter is not synchronized. What if 2 > or more threads > executing a function That is a race, so profile is not very accurate. We (mostly) do not care about that: there is a tradeoff between profiling overhead and profile accuracy. (Yes, I know racy updates are quirky and potentially lose the unbounded number of updates). -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Is there a way to print Escape Analysis?
On 01/25/2018 03:44 AM, 'Carl Mastrangelo' via mechanical-sympathy wrote: > Consider the following code: > > public class Test { > static volatile Integer discard; > public static void main(String [] args) throws InterruptedException { > printMemory(); > System.gc(); > printMemory(); > int iterations = 1000; > int[] vals = new int[100_000_000]; > while (args.length == 0) { > printMemory(); > System.gc(); > Thread.sleep(200); > discard = iterations++; > } > } > > private static void printMemory() { > System.out.println(Runtime.getRuntime().totalMemory() - > Runtime.getRuntime().freeMemory()); > } > } > > I am surprised to see the memory used by this code starts at about 200MB, > goes up to 600MB, but > never seems to go back down. The large int array accounts for the memory > usage jump, but it never > seems to be garbage collected. Why? The variable is never read after it is > allocated. It cannot > be reordered by the compiler to be after the while loop, because I can see > the memory jump. I am > intentionally allocating memory in the loop by boxing the integer. Enabling > +PrintCompilation and > PrintInlining never shows main() being inlined; perhaps it is not being > compiled? This is not escape analysis. It is more about compiler able to figure out the reachability of local variable, and let GC act: https://shipilev.net/jvm-anatomy-park/8-local-var-reachability/ It is predicated on the condition that method is actually compiled and optimized accordingly. OSR would not cut it here, I think, because the OSR version of the method would still treat that variable alive. $ java Test 10548688 260648 410809368 410673344 400260544 400260544 400260544 400260544 400260544 400260544 It changes if you compile the method before entering it: $ java -Xcomp Test 73841024 296240 421393656 10580272 10580608 10580608 10580608 10580608 Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: JVM detection of thread at safepoint
On 12/05/2017 09:26 AM, Mark Price wrote: > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The > application in question > is also suffering from some fairly nasty I/O problems where latency-sensitive > threads are being > descheduled in uninterruptible sleep state due to needing a file-system lock. > > My question: can the JVM detect that a thread is in signal/interrupt-handler > code and thus treat it > as though it is at a safepoint (as I believe happens when a thread is in > native code via a JNI call)? > > For instance, given the stack trace below, will the JVM need to wait for the > thread to be scheduled > back on to CPU in order to come to a safepoint, or will it be treated as > "in-native"? > > 7fff81714cd9 __schedule ([kernel.kallsyms]) > 7fff817151e5 schedule ([kernel.kallsyms]) > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) > 7fff817172ad down_write ([kernel.kallsyms]) > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) > 7fff8122cc5d file_update_time ([kernel.kallsyms]) > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) > 7fff8171af48 page_fault ([kernel.kallsyms]) > java stack trace ends here I am pretty sure out-of-band page fault in Java thread does not yield a safepoint. At least because safepoint polls happen at given location in the generated code, because we need the pointer map as the part of the machine state, and that is generated by Hotspot (only) around the safepoint polls. Page faulting on random read/write insns does not have that luxury. Even if JVM had intercepted that fault, there is not enough metadata to work on. The stacktrace above seems to say you have page faulted and this incurred disk I/O? This is swapping, I think, and all performance bets are off at that point. Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: allocation memory in Java
On 11/18/2017 01:45 PM, John Hening wrote: > Thanks for your replies :) > > @Aleksey, how to get that? > > |-17.12%0.00%org.openjdk.Allperf-31615.map-0x7faaa3b2d125-16.59%OptoRuntime::new_instance_C > -11.49%InstanceKlass::allocate_instance > 2.33%BlahBlahBlahCollectedHeap::mem_allocatepoint to GC 0.35%AllocTracer::send_allocation_outside_tlab_event| > > I mean, how to get a such stacktrace with profiling information? I see > perf-31615.map what indicates > on perf. With perf. OpenJDK might need to be configured with debug symbols: --with-native-debug-symbols=internal -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: allocation memory in Java
On 11/17/2017 11:46 PM, John Hening wrote: > For my eye, common objects in Java, like > > newString(); > > is a CHeapObj. I looked at an implementation of new operator in that class > and it looks like: In related news, there is javaClasses.cpp, and therefore Java class library is written in C++! /s > No, I have no idea why it is said: allocation in Java in very fast- it is > fater than C++/C Start from here: https://shipilev.net/jvm-anatomy-park/4-tlab-allocation/ Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Mystery: Why does the JVM show more latency for the same block of code after a busy spin pause?
On 04/29/2017 07:12 PM, Aleksey Shipilev wrote: > On 04/29/2017 06:56 PM, Aleksey Shipilev wrote: >> On 04/29/2017 04:56 PM, J Crawford wrote: >>> The mystery boils down to: >>> >>> "/The exact same block of code becomes slower after a busy spin pause./" >>> >>> I posted a short source code that *unequivocally* proves and demonstrates >>> the >>> problem: >>> http://stackoverflow.com/questions/43696948/why-does-the-jvm-show-more-latency-for-the-same-block-of-code-after-a-busy-spin >> >> I don't think this is much of the mystery, because wakeup from a long sleep >> causes gradual transition from low power/frequency/scheduling states to >> higher >> ones. >> >> This is clearly visible if you do several consecutive operations after the >> sleep, e.g. by adding this: >> >>while(count < results.length) { >> +if ((count & 7) > 0) interval = 0; >> double x = busyPause(interval); > > Ah, I misread the test. busyPause is not powering down anything. What's > weirder, > the first pair of nanoTime calls after exiting busyPause is hiccuping even > without calculations in between. Still flying in blind for fun, trying different things: http://cr.openjdk.java.net/~shade/scratch/JvmPauseLatency.java a) "Warming up" both branches in busyPause seems to help first hiccup. Notice that with small interval one of the loop branches would be non-taken, and C2 would put a trap there, blowing up on first access. So much for "avoiding the deoptimization"... b) Inlining busyPause seems to help to avoid every hiccup but the first. I wonder if this is actually the machine effect when e.g. the branch mispredict on loop exit that faces unusual code shape ("ret" from uninlined method?) makes the lag worse. When method is inlined, only the first trap would introduce lag, not any subsequent one. At this point you want to dive into the generated code, and there JMH perfasm is your friend. -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Concurrency question.
On 03/03/2017 10:51 AM, Grzegorz Gierlach wrote: > It passes on x86 however for some reason I'm not 100% sure it is a correct > implementation. Is there a better way of doing it? The discussion about "correct", "way of doing it" is irrelevant if you don't describe what is your intent with this test. But mechanically, this is a Dekker test in disguise: http://hg.openjdk.java.net/code-tools/jcstress/file/tip/tests-custom/src/main/java/org/openjdk/jcstress/tests/volatiles/DekkerTest.java The only forbidden state is (0, 0) in plain old Dekker. It stands to reason that gating the 0->1 transition with CAS makes (1, 1) forbidden too. So your test should pass, modulo runtime and hardware bugs. > @JCStressTest > > @State > @Outcome(id ="1, 0",expect =ACCEPTABLE,desc ="first executed") > @Outcome(id ="0, 1",expect =ACCEPTABLE,desc ="second executed") > @Outcome(id ="1, 1",expect =FORBIDDEN,desc ="both executed!") > @Outcome(id ="0, 0",expect =FORBIDDEN,desc ="none executed!") > publicclassVolatileConcurrencyTest{ > > volatileObjectreceiver; > > volatileObjectresolution; > > AtomicBooleanapplied =newAtomicBoolean(); > > @Actor > voidactor1(IntResult2r){ > receiver =newObject(); > if(resolution !=null&(false,true)){ > r.r1 =1; > }else{ > r.r1 =0; > } > } > > @Actor > voidactor2(IntResult2r){ > resolution =newObject(); > if(receiver !=null&(false,true)){ > r.r2 =1; > }else{ > r.r2 =0; > } > } > } -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: HotSpot code cache and instruction prefetching
On 02/03/2017 10:26 AM, Chris Newland wrote: > Do you think the HotSpot designers took this into account but found > empirically > that the simple algorithm is adequate (cost complexity outweighs gains and > hotimp > methods are generally JIT-compiled together). Let's ask another question: do you have the example where that matters? > Could there be any benefit in relocating blocks for hot call chains to match > the > call pattern once the program has reached a steady state? (assuming inlining > has > already succeeded as much as possible). Well, the "hot path" is supposed to be inlined and critical path laid out sequentially within the compilation unit, so it is not catastrophic. > Since tiered compilation became the default, do you think the many (possibly > unconnected) intermediate compilations have made prefetching worse? ...but yes, for tiered, there are versions of the code that are known to be temporary (e.g. compilations on level 1,2,3), while the final compilation stays around for longer (level 4). This is why Segmented Code Cache was implemented in JDK 9: http://openjdk.java.net/jeps/197 IIRC, there were improvements on torturous workloads, and improvements in nmethod scans. Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Futex flood
On 01/23/2017 05:42 AM, Дмитрий Пеньков wrote: > Hello everyone. I have following situation: multiple HotSpot JVMs with > multiple > GCs' (ParallelGC) threads on multiple CPUs. I've already seen recommendations > to > have min 2 cores per JVM, but in my case it's 80 CPU's and about 200 JVMs. > After > working for about an hour, I have >95% of time for futex syscall futex_wait. I > tried to fix the number of parallel GC threads in 4, but it didn't make > effect. > Mb GC changing can help? How to solve this situation? $ perf record -g java ... ...and then $ perf report ...to see where those futexes are called from. It might be relatively benign, like the awaits from the GC worker taskqueue. But maybe there are actual Java locks involved. Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Operation Reordering
(triggered again) On 01/18/2017 12:33 AM, Sergey Melnikov wrote: > mov (%rax), %rbx > cmp %rbx, %rdx > jxx Lxxx > > But if you schedule them this way > > mov (%rax), %rbx > cmp %rbx, %rdx > ... few instructions > jxx Lxxx ...doesn't this give up on macro-fusion, and effectively sets up for a better chance of a "bottleneck in instruction fetch/decode phases"? :) -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Operation Reordering
On 01/17/2017 12:55 PM, Vitaly Davidovich wrote: > Atomicity of values isn't something I'd assume happens automatically. Word > tearing isn't observable from single threaded code. On 01/17/2017 09:17 PM, Michael Barker wrote: > That was my understanding too. Normal load/stores on 32 bit JVMs would tear > 64 > bit values. Although, I think object references are guaranteed to be written > atomically. (Triggered by my pet peeve) Please don't confuse "access atomicity" and "word tearing". Access atomicity means reading/writing the value in full. Word tearing (at least per JLS 17.6) means that fields are considered distinct, and the read/write of one field cannot disturb neighbors. Speaking of access atomicity, the upcoming value types would probably be non-access-atomic by default too, because enforcing access atomicity for widths larger than a machine one is painful. volatile longs/doubles had it easy on 32-bit VMs with 64-bit FPUs, compared to what variable-len values would have to experience. Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Single writer counter: how expensive is a volatile read?
On 10/30/2016 05:55 AM, Peter Veentjer wrote: > Let me clarify. > The discussion is around removing the volatile read in the inc method. Ah, sorry I misinterpreted the question. It usually goes the other way: the reads vastly outnumber the writes. Well, since the writer is single-threaded, there is no need for a volatile/atomic reads/updates in the inc(). But I think it's useless to optimize the volatile reads there, because you have the impending volatile/release store, which would dominate the cost. So, having that in mind, the better strategy might be dispensing with Atomic on every inc()? The excess branch might be much less hassle than a volatile/release store. public class Counter { private long local; private volatile long publish; public void inc(){ long n = local++; if ((n & 0xFF) == 0) { publish = n; } } public long get() { return publish; } } ...with VarHandles, you can even collude the access modes: public class Counter { static final VH = ; private long local; public void inc(){ long n = VH.get(this); n++; if ((n & 0xFF) == 0) { VH.set(this, n); } else { VH.set{Volatile,Release,Opaque}(this, n); } } public long get(){ return VH.get{Volatile,Acquire,Opaque}(this); } } ...with some choice of the modes. > A counter could be incremented >100k times a second and we don't want a > counter to cause too much of a performance degradation. A counter will > always be incremented by the same thread. There is a subtle difference between "a single writer to Counter", and "each thread gets its own Counter". I would venture to guess that (false) sharing in your case should be the concern, not the cost of volatile ops. Having a referenced AtomicLong instead of padded out "local" field would set you up for false sharing very nicely :) Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature
Re: Single writer counter: how expensive is a volatile read?
On 10/29/2016 10:13 AM, Peter Veentjer wrote: > So you get something like this: > > public class Counter { > > private final AtomicLong c = new AtomicLong(); > > public void inc(){ > long newValue = c.get()+1; > c.lazySet(newValue); > } > > public long get(){ > return c.get(); > } > } Does not work. With non-atomic updates, you set yourself for losing a virtually unbounded number of updates. See e.g: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-no-sync > But in this case you still need to do a volatile read to read the actual > value. In theory one could optimize this to: > > public class Counter { > > private final AtomicLong c = new AtomicLong(); > private long local; > > public void inc(){ > long newValue = local+1; > this.local = newValue; > c.lazySet(newValue); > } > > public long get(){ > return c.get(); > } > } I guess get() was meant to have "return local"? If so, this does not work. With no ordering implied for the reads, you set yourself up for this trouble: Counter c = ...; while (c.get() < 100); // wait for it... forever! Because optimizers are free from hoisting your read before the loop, reading the counter once, and reducing predicated loop into the infinite one. I would *guess* the fastest (not sanest, mind you) way out of this is doing VarHandles: public class Counter { static final VarHandle VH = ; long counter; public void inc(){ VH.getAndAdd(this, 1); } public long get(){ return VH.getOpaque(this); } } Still, removing volatiles from get breaks whatever happens-before guarantees for the counter. Which means, for example, you cannot use it to "version" the objects: Counter cnt = new Counter(); Thread 1: // assume cnt is zero x = 1; cnt.inc(); Thread 2: if (cnt.get() == 1) { assert (x == 1); // oops, fails. } Thanks, -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. signature.asc Description: OpenPGP digital signature