Re: DataContext select concurrency

Andrus Adamchik Mon, 04 Nov 2013 22:44:08 -0800

> Sorry if I'm being daft. I waited a bit to see if other people would ask some 
> questions to help get my head around it. But no one took a bite, so I'm 
> having a go.


No worries, I am glad we are talking about it :)

Actually each queue will contain result lists (lists of DataRows), not 
individual objects. So yeah, the two proposals ((1) CayenneDataObject internal 
structure change and (2) concurrent selects) are generally unrelated. However 
potentially some of the implementations of (2) may take advantage of (1).

I started on (2) this weekend. For now I’ve implemented in my local git repo a 
separate DI module that allows users to create read-only DataContext 
subclasses. I also replaced all “synchronized” uses in the ObjectStore with 
ReenterantLock [1]. This resulted in more boilerplate code (lock / try / 
finally / unlock), but made all synchronization easy to turn on and off in one 
place. I guess the next step is experimenting with result processing queues.

[1] 
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html

Andrus

On Nov 4, 2013, at 1:06 PM, Aristedes Maniatis <[email protected]> wrote:

> So then queries on the same table would be queued because you don't want to 
> return a mix of fresh and non-fresh data to the user in the same response. Is 
> that the problem you want to solve with object-level atomicity, and just 
> swapping out the Object[]?
> 
> With the queue approach, are you thinking that the queue is a list of every 
> object which has been fetched from the database and Cayenne has already 
> determined that the ObjectStore is out date and needs updating? Or just a 
> list of every object fetched from the database, with checking for freshness 
> something that happens as objects are taken from the queue for processing?
> 
> I'm still getting my head around your ideas, but there appear to be two 
> different things here:
> 
> 1. Swappping out the dataObject atomically to eliminate the lock on the 
> ObjectStore. This avoids the lock held during the time it takes to update the 
> values in the objectMap. For example, here: synchronized ObjectDiff 
> registerDiff(Object nodeId, NodeDiff diff) {}. The code would then look like:
> 
> newObject = dataObject.clone();
> DataRowUtils.forceMergeWithSnapshot(context, descriptor, newObject, snapshot);
> dataObject = newObject;
> 
> Or something vaguely like that.
> 
> 
> 2. Creating a queue to allow a pool of workers to convert raw DataRows into 
> object properties, decide which records in the ObjectStore need updating, 
> create NodeDiff objects with those changes, etc.
> 
> 
> Sorry if I'm being daft. I waited a bit to see if other people would ask some 
> questions to help get my head around it. But no one took a bite, so I'm 
> having a go.
> 
> I'm not seeing how the two ideas relate to each other. They both seem 
> helpful, but they seem to solve different bottlenecks. What chaos would (1) 
> cause?
> 
> 
> Ari
> 
> 
> 
> On 4/11/2013 6:53pm, Andrus Adamchik wrote:
>> I am actually considering a read-only case here. So no modifications.
>> 
>> If the objects need to be modified, they have to be transferred to a peer 
>> ObjectContext using 'localObject'. Which sorta makes sense even now - 
>> contexts with local cache are often shared and hence de-facto have to be 
>> read-only, and contexts that track modifications are user- or request- or 
>> method- scoped.
>> 
>> A.
>> 
>> On Nov 4, 2013, at 10:42 AM, Aristedes Maniatis <[email protected]> wrote:
>> 
>>> On 26/10/2013 3:09am, Andrus Adamchik wrote:
>>> 
>>> 
>>>> 2. Queue based approach… Place each query result merge operation in an 
>>>> operation queue for a given DataContext. Polling end of the queue will 
>>>> categorize the operations by "affinity", and assign each op to a worker 
>>>> thread, selected from a thread pool based on the above "affinity". Ops 
>>>> that may potentially update the same objects are assigned to the same 
>>>> worker and are processed serially. Ops that have no chance of creating 
>>>> conflict between each other are assigned to separate workers and are 
>>>> processed in parallel. 
>>> 
>>> This queue needs to keep both SELECT and modify operations in some sort of 
>>> order? So let's imagine you get a queue like this:
>>> 
>>> 1. select table A
>>> 2. select table B
>>> 3. select table A
>>> 4. modify table B
>>> 5. select table B
>>> 6. select table A
>>> 
>>> Is the idea here that you would dispatch 1,2,3,6 to three worker threads to 
>>> be executed in parallel. But then 4 would be queued behind 2. And 5 would 
>>> also wait until 4 was complete.
>>> 
>>> Is that the idea?
>>> 
>>> 
>>> I can see some situations where this would result in worse behaviour than 
>>> we have now. If operation 1 and 3 were the same query, then today we get to 
>>> take advantage of a query cache.
>>> 
>>> 
>>> Am I getting the general idea right?
>>> 
>>> 
>>> Ari
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> -------------------------->
>>> Aristedes Maniatis
>>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>> 
>> 
> 
> -- 
> -------------------------->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>

Re: DataContext select concurrency

Reply via email to