Re: [OT] Performance tricks with multiple tomcat instances

2006-03-14 Thread Leon Rosenberg
Chuck, Darryl

I'd like to thank you both on the amazing insights of concurrency.
Learned a lot today :-)

thanx
leon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [OT] Performance tricks with multiple tomcat instances

2006-03-14 Thread Caldarale, Charles R
> From: Darryl L. Miles [mailto:[EMAIL PROTECTED] 
> Subject: Re: [OT] Performance tricks with multiple tomcat instances
> 
> I had read some comments that might lead others to think 
> that "count++" on a java "int" type is in some way atomic.

Yes, there are many misinformed people out there, even in the Java
community :-)

> You miss the goal of programming to the lowest common 
> denominator with your point, write once run anywhere is
> something of value to me

Also to me, but my perspective is a bit different.  Since I get paid to
develop a JIT, I can take advantage of optimizations available on
whatever platform I'm targeting, without changing the application's
class files.

> I don't understand your contradiction "are implicitly atomic - but 
> there's no guarantee" if there is no guarantee then by my 
> book it is not atomic.

Not from a Java programmer's perspective, but definitely useful from the
point of view of a JVM implementor - which is what your original message
started out with.

> But atomic assignment is like saying the grass is green and 
> the sky is blue.

Not at all - which is why it's explicitly stated in the Java/JVM specs.
On platforms where the pointer size is larger than the indivisible data
path to memory, it can't be taken for granted.  Even on IA32, if your
32-bit address is not on a 4-byte boundary, the update is not atomic;
this constrains one from using packed data structures.

> There is no useful point to make here, it is a given 
> that memory load/store operations using native bit 
> width operations on any CPU are atomic

See the above for IA32 on non-32-bit boundaries.  (I'm only using IA32
due to its proliferation; there are many other platforms where there's
even less provision for atomic updates.)

> you are either loading or storing not both.

What happens when there are two components (CPUs, IO channels, whatever)
updating the same 32 or 64 bits?  Any off-boundary write turns into a
read-update-write sequence, and that can be interleaved with writes from
another component.  (I've been working on multiprocessor architectures
since the late 1960s - the off-boundary writes are always an issue.
When you start throwing 32 CPUs together with hundreds of IO processors,
the interactions are always "interesting".)

> So yes in the computing world atomic assignment is absolutely 
> critical but so is register addition.

How do you account for machines without registers?  The JVM spec defines
one such, but real commercial register-less CPUs exist (your bank is
probably using one).

> Are you saying that there are some CPUs that support 
> unaligned access to memory (like x86) that do not have
> atomic writes to memory.

See the Intel architecture specs for an example.  This is true of every
system I know of.

> I was under the impression that such accesses by the CPU 
> to the memory bus were atomic, because the CPU acquired 
> the memory bus, then did one or more writes in burst before
> releasing the memory bus.

Most updates do not go directly on the bus - they are cached (usually
multiple levels these days), and are eventually trickled out to the bus
as time permits.  When a write spans a cache line (or even worse, a page
boundary), lots of intervening events can occur.  Propagating a bus lock
under such conditions would bring the whole system to a halt.

> A kernel can not task switch an assembly instruction so under 
> single CPU the read-modify-write (with or without the LOCK) is
> always 100% successful

That depends on where the interrupt points within the instruction stream
can occur.  On highly microcoded CPUs, one could define an interrupt
point between the two memory accesses, but most platforms do not, as
long as the accesses are within a single cache line.  If the accesses
span cache lines, an asynchronous interrupt can occur resulting in task
switching.

> Sorry if this is sucking eggs here.

A bit - I've been doing this a long time.

> FYI - The Sun documention that headlines the package talks in 
> terms of "volatile" values.  LOL

That is quite appropriate.  If the data item is not volatile, then
atomic operations are not needed, by definition.  All synchronized
operations within the JVM must follow the rules for volatility, in terms
of the external visibility of writes (constrains the out-of-order
operations that many RISC systems employ).

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [OT] Performance tricks with multiple tomcat instances

2006-03-14 Thread Darryl L. Miles

Caldarale, Charles R wrote:
From: Darryl L. Miles [mailto:[EMAIL PROTECTED] 
Subject: Re: Performance tricks with multiple tomcat instances


I would hope that JIT compiler engineers would as a minimum 
implicitly add the "lock" instruction before all operations 
on primitive types declared volatile that are in the platforms

native data width (32bit on i386, 32&64bit on i386_64. 64bit
on ia64).



Not done, since the Java/JVM specs don't require atomic operations
except for assignments.  If you want to insure that an increment is
atomic, you must surround it with a synchronized block.

Volatile does not imply atomic.  Please read the specs.
  
I've never used volatile in any java code I've written.  So its a 
non-issue for me.


But the point being made by me was food for thought.  I had read some 
comments that might lead others to think that "count++" on a java "int" 
type is in some way atomic.  I was pointing out that even the assembler 
instruction to increment a memory operand is not atomic (in the case of 
multi CPU SMP systems) unless special care was taken to use the provided 
"lock" instruction on the sub-set of x86 assembly that allows it.  I've 
also read others comments that using the java "volatile" keyword was 
some way out of their sticky situation; then in order for that to be 
true and be a portable performance locking primitive across JVMs on 
different target hosts it would need to restrict the allowed bit width 
to java "int" or "long" since thats all some CPU will give you.  A 
recent related issue in TC 
http://issues.apache.org/bugzilla/show_bug.cgi?id=37356


If in doubt just synchronize, then if there is a real performance hit 
fix your application design to work around the new problem.



so if you want to write out a 64bit value you have to make 
two memory write cycles and this is done with two assembler

instructions, the i386 processor does not  provide any
hardware support to make that atomic, so you have to make
that synchronization / lock happen in software there is no
2 ways about this).



Actually, that's not true anymore.  Most newer chips have a 64-bit data
path, and properly aligned 64-bit loads/stores are implicitly atomic -
but there's no guarantee.  There's also a compare-and-exchange-8-bytes
in the current IA32 instruction set, and that can be made atomic with
the lock prefix.
  
You miss the goal of programming to the lowest common denominator with 
your point, write once run anywhere is something of value to me, if this 
means I have to synchronize everything thats fine by me.  If some CPU 
has 64bit paths and the JVM has support that is a good thing, a new set 
of classes should be introduced to give access to that additional 
functionality so those users who would find those features useful to 
them have access.


But in most cases just a portable single 32bit or 64bit integer type for 
this purpose will do the trick most of the time than having a feature 
full API that works with many types is unnecessary.  The goal is to 
provide the application programmer with an API contract that works 
across a wide range of JVM implementations and CPUs.


I don't understand your contradiction "are implicitly atomic - but 
there's no guarantee" if there is no guarantee then by my book it is not 
atomic.



  
What is not said is that having atomic assignment is 
not useful on its own



Sorry, but atomic assignment is absolutely critical.  Without it, all
pointer updates and retrievals would have to be done under storage lock
- and that would be prohibitively expensive.
  
But atomic assignment is like saying the grass is green and the sky is 
blue.  There is no useful point to make here, it is a given that memory 
load/store operations using native bit width operations on any CPU are 
atomic, you are either loading or storing not both.  Its only when you 
start talking about read-modify-write operations that the word atomic 
carries any significant meaning.


So yes in the computing world atomic assignment is absolutely critical 
but so is register addition.



  

single operations are implicitly atomic



Only when properly aligned and supported by the memory system.  Since
the lowest common denominator of memory access these days is a byte, you
cannot assume that anything larger is atomic without reference to the
specs for the platform you happen to be running on.  (And there are a
lot of platforms that are not IA32 - what's in your cell phone?)
  
Are you saying that there are some CPUs that support unaligned access to 
memory (like x86) that do not have atomic writes to memory.  Please cite 
references to this point it interests me greatly.


I was under the impression that such accesses by the CPU to the memory 
bus were atomic, because the CPU acquired the memory bus, then did one 
or more writes in burst before releasing the memory bus.  No other CPU 
can get in under this access.  So even unaligned  load or store is atomic.


Your reply above 

RE: [OT] Performance tricks with multiple tomcat instances

2006-03-14 Thread Caldarale, Charles R
> From: Darryl L. Miles [mailto:[EMAIL PROTECTED] 
> Subject: Re: Performance tricks with multiple tomcat instances
> 
> I would hope that JIT compiler engineers would as a minimum 
> implicitly add the "lock" instruction before all operations 
> on primitive types declared volatile that are in the platforms
> native data width (32bit on i386, 32&64bit on i386_64. 64bit
> on ia64).

Not done, since the Java/JVM specs don't require atomic operations
except for assignments.  If you want to insure that an increment is
atomic, you must surround it with a synchronized block.

Volatile does not imply atomic.  Please read the specs.

> So maybe the JVM specification should also adopt a new 
> keyword for a specific atomic integer data type and another
> one to mark those accesses to that it must be atomic.

Very unlikely to happen, since the synchronized block covers the
required semantics.  Modern implementations of synchronized are very
fast.

> so if you want to write out a 64bit value you have to make 
> two memory write cycles and this is done with two assembler
> instructions, the i386 processor does not  provide any
> hardware support to make that atomic, so you have to make
> that synchronization / lock happen in software there is no
> 2 ways about this).

Actually, that's not true anymore.  Most newer chips have a 64-bit data
path, and properly aligned 64-bit loads/stores are implicitly atomic -
but there's no guarantee.  There's also a compare-and-exchange-8-bytes
in the current IA32 instruction set, and that can be made atomic with
the lock prefix.

> What is not said is that having atomic assignment is 
> not useful on its own

Sorry, but atomic assignment is absolutely critical.  Without it, all
pointer updates and retrievals would have to be done under storage lock
- and that would be prohibitively expensive.

> single operations are implicitly atomic

Only when properly aligned and supported by the memory system.  Since
the lowest common denominator of memory access these days is a byte, you
cannot assume that anything larger is atomic without reference to the
specs for the platform you happen to be running on.  (And there are a
lot of platforms that are not IA32 - what's in your cell phone?)

> A nice clean way to do this in java would be a with 
> class/method that the JIT would replace with optimized 
> versions, this would keep keyword pollution down that my 
> naive example above incites.

Already done - see the java.util.concurrent.atomic package in JRE 5.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you
received this in error, please contact the sender and delete the e-mail
and its attachments from all computers.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]