AFAIU, spec requires that two writes into different volatile variables
appear in the program order. To guarantee that we need to use
read/write barriers at least. I'm not sure if spec makes stronger
requirement which forces us to lock the system BUS. Anyway using
CMPXCHG8 with lock prefix seems to be worth trying. At least from
program correctness point of view.

Thanks
Evgueni

On 6/1/07, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:
On 6/1/07, Mikhail Fursov <[EMAIL PROTECTED]> wrote:
> Will we plan making objects aligned by 8-bytes in Q3?
> AFAIU this is the only way to avoid lock prefix and performance degradation
> and does not require big changes in GC: we need to have objects have size of
> multiple of 8 and every memory area allocated by GC to be aligned by 8. Do I
> miss something here?
>
> It can be less work then making temporary workarounds in JIT instead of
> simple XMM moves we already have.

Mikhail, to align all the objects at 8-byte boundary is indeed an easy
solution, but it may cause some space overhead (compared to 4-byte
boundary alignment). The space overhead in turn may lead to
performance degradation. This actually can be experimented quickly to
see if it indeed causes visible performance drop with representative
workloads and benchmarks.

The other solution is to align only certain classes' instances at
8-byte boundary, for example, those with volatile long fields. But
this is not a small change in GC, needing longer time and thorough
testing.

Probably I can try with the all 8-byte alignment at first to to help
us to make final decision.

Thanks,
xiaofeng

> On 6/1/07, Pavel Ozhdikhin <[EMAIL PROTECTED]> wrote:
> >
> > On 6/1/07, Weldon Washburn <[EMAIL PROTECTED]> wrote:
> > > On 31 May 2007 00:52:00 +0400, Egor Pasko <[EMAIL PROTECTED]> wrote:
> > > >
> > > > On the 0x2E6 day of Apache Harmony Xiao-Feng Li wrote:
> > > > > On 5/30/07, George Timoshenko <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > > I had a question in the JIRA about this issue: why don't we use
> > > > "lock"
> > > > > > > prefix for the atomic access?
> > > > > >
> > > > > > well...
> > > > > >
> > > > > > Originally we split all 64-bit memory access into 2 ones of
> > 32-bit.
> > > > > > It does not have sense to set #LOCK prefix for them. (there is a
> > gap
> > > > > > between)
> > > > > >
> > > > > > We can only set #LOCK to some instruction that reads/writes whole
> > 64
> > > > bits.
> > > > > >
> > > > > > The bad thing is the only instruction (according to IA32 spec) we
> > can
> > > > > > set #LOCK to is CMPXCHG8B (MOVQ, MOVSD and any others can not be
> > used
> > > > > > with #LOCK)
> > > > > >
> > > > > > This monster (CMPXCHG8B) requires 4 registers:
> > > > > >
> > > > > > EAX
> > > > > > EBX
> > > > > > ECX
> > > > > > EDX
> > > > > >
> > > > > > and (FLAGS) also.
> > > > > >
> > > > > > I am not sure CMPXCHG8B usage will be faster than making volatile
> > > > fields
> > > > > >    always synchronized (artificially)
> > > > >
> > > > > George, I believe it should be much faster than synchronized block,
> > > > > since it is non-blocking with contended locks. To use compxchg, you
> > > > > need a loop to check the return result till it succeeds. With
> > > > > synchronized block, the thread will go to sleep till being waken up
> > by
> > > > > the releasing thread.
> > > >
> > > > hm, if I am not mistaken most of the time that would be a spin lock
> > > > with the current thread manager. So, I cannot not bet which way is
> > > > faster. Maybe, some expert in TM can tell for sure?
> > >
> > >
> > > This kind of stuff is always emprical. The task is to build, measure,
> > post
> > > the results.  The wild cards are the workload and the
> > hardware.  Different
> > > combos will lead to different conclusions.
> > >
> > > Having said the above, my hunch is to go with CMPXCHG8B for right
> > now.  The
> > > main motivation is that this decouples register assignment from the jvm
> > > thread subsystem thus makes things easier to debug.  This is
> > goodness.  Also
> > > running exhaustive studies of different workloads, different platforms
> > is
> > > not something of high value for a JVM at such an early stage of
> > > development.  In other words, do this analysis once we get real
> > workloads
> > > like specjappserver running.  As already noted, it should be easy to
> > > re-implement when the time is right.
> > >
> > > Interesting background material --- From Jeremy Manson's "The Java
> > Memory
> > > Model", POPL 2005, section 2.3 it says, "In order to allow for
> > non-blocking
> > > techniques that communicate between threads, we also want to allow the
> > use
> > > of _volatile_ variables to synchronize information between threads.  The
> > > properties of volatile variables arose from the need to provide a way to
> > > communicate between threads without the overhead of ensuring mutual
> > > exclusion."  While this does not dictate a solution, it sort of suggests
> > > using opcodes (lockxxx) instead of bytecodes (monenter/exit).
> >
> > Adding monenter/monexit pair in the place where the author of the code
> > did not intended to put them may lead to deadlock. So, I'm +1 for
> > prototyping with CMPXCHG8B  first.
> >
> > Thanks,
> > Pavel
> >
> > >
> > >
> > > Anyway, both implementations do not seem to be very hard, we could try
> > > > both ways...
> > > >
> > > > --
> > > > Egor Pasko
> > > >
> > > >
> > >
> > >
> > > --
> > > Weldon Washburn
> > >
> >
>
>
>
> --
> Mikhail Fursov
>


--
http://xiao-feng.blogspot.com

Reply via email to