AFAIU, spec requires that two writes into different volatile variables appear in the program order. To guarantee that we need to use read/write barriers at least. I'm not sure if spec makes stronger requirement which forces us to lock the system BUS. Anyway using CMPXCHG8 with lock prefix seems to be worth trying. At least from program correctness point of view.
Thanks Evgueni On 6/1/07, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:
On 6/1/07, Mikhail Fursov <[EMAIL PROTECTED]> wrote: > Will we plan making objects aligned by 8-bytes in Q3? > AFAIU this is the only way to avoid lock prefix and performance degradation > and does not require big changes in GC: we need to have objects have size of > multiple of 8 and every memory area allocated by GC to be aligned by 8. Do I > miss something here? > > It can be less work then making temporary workarounds in JIT instead of > simple XMM moves we already have. Mikhail, to align all the objects at 8-byte boundary is indeed an easy solution, but it may cause some space overhead (compared to 4-byte boundary alignment). The space overhead in turn may lead to performance degradation. This actually can be experimented quickly to see if it indeed causes visible performance drop with representative workloads and benchmarks. The other solution is to align only certain classes' instances at 8-byte boundary, for example, those with volatile long fields. But this is not a small change in GC, needing longer time and thorough testing. Probably I can try with the all 8-byte alignment at first to to help us to make final decision. Thanks, xiaofeng > On 6/1/07, Pavel Ozhdikhin <[EMAIL PROTECTED]> wrote: > > > > On 6/1/07, Weldon Washburn <[EMAIL PROTECTED]> wrote: > > > On 31 May 2007 00:52:00 +0400, Egor Pasko <[EMAIL PROTECTED]> wrote: > > > > > > > > On the 0x2E6 day of Apache Harmony Xiao-Feng Li wrote: > > > > > On 5/30/07, George Timoshenko <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > I had a question in the JIRA about this issue: why don't we use > > > > "lock" > > > > > > > prefix for the atomic access? > > > > > > > > > > > > well... > > > > > > > > > > > > Originally we split all 64-bit memory access into 2 ones of > > 32-bit. > > > > > > It does not have sense to set #LOCK prefix for them. (there is a > > gap > > > > > > between) > > > > > > > > > > > > We can only set #LOCK to some instruction that reads/writes whole > > 64 > > > > bits. > > > > > > > > > > > > The bad thing is the only instruction (according to IA32 spec) we > > can > > > > > > set #LOCK to is CMPXCHG8B (MOVQ, MOVSD and any others can not be > > used > > > > > > with #LOCK) > > > > > > > > > > > > This monster (CMPXCHG8B) requires 4 registers: > > > > > > > > > > > > EAX > > > > > > EBX > > > > > > ECX > > > > > > EDX > > > > > > > > > > > > and (FLAGS) also. > > > > > > > > > > > > I am not sure CMPXCHG8B usage will be faster than making volatile > > > > fields > > > > > > always synchronized (artificially) > > > > > > > > > > George, I believe it should be much faster than synchronized block, > > > > > since it is non-blocking with contended locks. To use compxchg, you > > > > > need a loop to check the return result till it succeeds. With > > > > > synchronized block, the thread will go to sleep till being waken up > > by > > > > > the releasing thread. > > > > > > > > hm, if I am not mistaken most of the time that would be a spin lock > > > > with the current thread manager. So, I cannot not bet which way is > > > > faster. Maybe, some expert in TM can tell for sure? > > > > > > > > > This kind of stuff is always emprical. The task is to build, measure, > > post > > > the results. The wild cards are the workload and the > > hardware. Different > > > combos will lead to different conclusions. > > > > > > Having said the above, my hunch is to go with CMPXCHG8B for right > > now. The > > > main motivation is that this decouples register assignment from the jvm > > > thread subsystem thus makes things easier to debug. This is > > goodness. Also > > > running exhaustive studies of different workloads, different platforms > > is > > > not something of high value for a JVM at such an early stage of > > > development. In other words, do this analysis once we get real > > workloads > > > like specjappserver running. As already noted, it should be easy to > > > re-implement when the time is right. > > > > > > Interesting background material --- From Jeremy Manson's "The Java > > Memory > > > Model", POPL 2005, section 2.3 it says, "In order to allow for > > non-blocking > > > techniques that communicate between threads, we also want to allow the > > use > > > of _volatile_ variables to synchronize information between threads. The > > > properties of volatile variables arose from the need to provide a way to > > > communicate between threads without the overhead of ensuring mutual > > > exclusion." While this does not dictate a solution, it sort of suggests > > > using opcodes (lockxxx) instead of bytecodes (monenter/exit). > > > > Adding monenter/monexit pair in the place where the author of the code > > did not intended to put them may lead to deadlock. So, I'm +1 for > > prototyping with CMPXCHG8B first. > > > > Thanks, > > Pavel > > > > > > > > > > > Anyway, both implementations do not seem to be very hard, we could try > > > > both ways... > > > > > > > > -- > > > > Egor Pasko > > > > > > > > > > > > > > > > > -- > > > Weldon Washburn > > > > > > > > > -- > Mikhail Fursov > -- http://xiao-feng.blogspot.com
