Re: TaskManager progress

Peter Firmstone Wed, 21 Jul 2010 20:50:32 -0700

If immutable objects are being created and discarded frequently, thebuilder and it's immutable implementations need to be in the samepackage and be able to pass references to internal state via packageprivate methods, the builder or immutable object implementations mustnot let internal state escape, and the builder can replace it's own preset internal objects using client calls and defensive copies whenrequired, the builder also has to have access to a protected staticsubroutine from the immutable object's class that gets used duringconstruction to generate the hashcode which is final, it uses this todetermine if an object already exists before creating it, it must thenbe able to pass in the objects to a package private method on theimmutable object, identical to the parameters that would usually bepassed to a constructor on the immutable object to determine if thepooled object is suitable. All constructors on the immutable objectsare package private.

Then if you're really keen and the immutable object needs to beserialized, you can instead serialize the builder, which builds anidentical immutable object on the client platform. But you must becareful to defensively copy objects during unmarshalling. That way theinternal state of the immutable object is never published.

If this seems like a lot of trouble to go to for immutability, it is, soit has to be something that will experience widespread use.

I've implemented something similar for PermissionGrant, but without thepooling or serialization, I don't think we'll have enoughPermissionGrant's to make the complexity of pooling worthwhile. Thereare 5 classes implementing PermissionGrant, in this case.


Cheers,

Peter.



Peter Firmstone wrote:

Actually a problem I have, is I don't have access to the resourcesrequired to performance test some of these implementations, as theywould be stressed under a massive cluster situation.
The other assumption I make is something that performs well today,might not tomorrow, due to the multi core revolution we have on ourhands. This may turn out to be a flawed assumption.
Generally how I like to code is, and this isn't related to yoursituation, is if it makes sense to do so, I make immutable objectbuilder / factory's that are not threadsafe, I provide a method on theimmutable object for getting a new builder instance that has the stateof the immutable object pre set, which I can modify before building areplacement immutable object.
I might use one builder to generate many immutable objects, thebuilder object is accessed only by one thread.
The builder might internally utilise a static concurrent weakreference hash pool of immutable objects, it knows the hashcodegenerator the immutable object uses, so can pool the immutableobjects, saving memory, or it might create an immutable object, thenlookup it's hashcode in the pool, find a duplicate, then discard thenew object if equals(), returning the pool copy. Pooling also speedsup the equals() operator.
The immutable objects then get used everywhere, without concern forthread synchronization. These work well with AtomicReferences wherethe new state depends on the old.
The immutability of the object could be easily abused by reflection,but you can't be expected to protect against that! The immutableobject might be a container that holds some mutable objects that arenow effectively immutable.
The immutable object can be represented by an interface, because theclient doesn't depend on a constructor, in which case you caninternally have any number of polymorphic implementations, which allappear as a single type to the client, giving a very compact API. Thepooling offsets memory consumption for immutable objects.
Cheers,

Peter.

Peter Firmstone wrote:
I have a similar mindset to Gregg, memory and disk is relativelyinexpensive these days, if I can avoid locks by using atomicoperations and immutable objects or concurrent utilities, I'm happysince it's one less possible dead lock or live lock bug I haven'tthought about.
If updated state doesn't depend on previous state, I'll go for animmutable object with a volatile reference. If the object is notimmutable and it can be defensively copied, I do that before updatingthe volatile reference and I defensively copy it again beforereturning it to a caller.
If updated state depends on previous state, I might use an immutableobject with an AtomicReference, where the update is only made when noother update was received in the interim. If I can, I try to makeobject's effectively immutable, with defensive copying.
If internal accesor methods don't need to concern themselves with areference update during a routine, I copy an object's referencerather than synchronize on it, the copy will still refer to the oldobject when the volatile reference is updated. If the routine is ina loop, and I want to restart this if the reference is updated, I'lluse while( a == b) (or something similar), where b is a reference tothe object referred to by a until a is changed.
I try to keep synchronized blocks as small as possible, not so muchfor performance, but for bugs, not even necessarily my own bugs butclient code concurrency bugs. In the synchronized block, I don'tcall objects which may be accessible from outside the object I'mcalling from. State that needs to be atomically updated, I grouptogether using the same lock, I also consider using theReadWriteLock, if reads will outnumber writes. If multiple objectsmust be updated atomically, I might group them together into anencapsulating object with the methods I need to make it atomic. Thisis better than holding multiple locks.
On some occasions I find a simple class that isn't threadsafe at allis the best approach, letting something else handle the concurrencyor ensuring it's only used by one thread.
For me it basically comes down to avoiding bugs first, followed byscale.
Obviously memory consumption can be an impediment to scale, so thereare occasions where this is the wrong approach, but it's ageneralisation, to be taken with a grain of salt.
If memory is an issue, there usually isn't much concurrency to behad, if that's the case then good old fashioned synchronization ornone at all might be the best way to go.
In that case, I might consider an interface, and separateimplementations for different platforms, one for memory, the otherfor concurrency.
It's true that concurrency is harder, people often forget to checkthe return value of putIfAbsent, on ConcurrentMap.
Horses for courses I suppose, everyone has their style, you don'thave to adopt mine, I'm just happy to have some help. There's plentyof code in River that uses synchronized and has no issues. Youprobably have enough experience to avoid the locking bugs by now, I'mhappy with your approach. It's probably more performant than mine;)Some concurrency utilities can chew some memory.
Maybe it's a reflection of my debugging abilities ;)

Cheers,

Peter.

Patricia Shanahan wrote:
On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
...
When I write code of this nature, attempting to remove allcontention, I
try
to list every "step" that changes the "view" of the world, andthink abouthow that "view" can be made atomic by using explicit ordering ofstatements
rather than synchronized{} blocks.  ...
I would like to discuss how to approach performance improvement, andespecially scaling improvement. We seem to have differentphilosophies, and I'm interested in understanding other people'sapproaches to programming.
I try to first find the really big wins, which are almost alwaysdata structure and algorithm changes. That should result in codethat is efficient in terms of total CPU time and memory. During thatpart of the process, I prefer to keep the concurrency design assimple as possible, which in Java often means using synchronizationat a coarse level, such as synchronization on a TaskManager instance.
Once that is done, I review the performance. If it is fast andscalable I stop there. If that is not the case, I look for thebottlenecks, and consider whether parallelism, or some otherstrategy, will best improve them. Any increase in concurrencycomplication has to be justified by a demonstrated improvement inperformance.
My big picture objective is to find the simplest implementation thatmeets the performance requirements (or cannot reasonably be madesignificantly faster, if the requirement is just "make it fast"). Ivalue simplicity in concurrency design over simplicity in datastructures or algorithms for two reasons:
1. Making the code more parallel does nothing to reduce the totalresources is uses. Better algorithms, on the other hand, cansignificantly reduce total resources.
2. Reasoning about data structures and algorithms is generallyeasier than reasoning about concurrency.
It sounds as though you are advocating almost the opposite approach- aim for maximum concurrency from the start, without analysis ormeasurement to see what it gains, or even having a baselineimplementation for comparison. Is that accurate? If so, could youexplain the thinking and objectives behind your approach? Or maybeI'm misunderstanding, and you can clarify a bit?
Thanks,

Patricia

Re: TaskManager progress

Reply via email to