[ 
https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834272#action_12834272
 ] 

Michael Dürig commented on JCR-2498:
------------------------------------

Some more numbers demonstrating the effect with JCR-2498-poc.patch applied. The 
'new/old time' row gives the quotients of the request times with the patch 
applied vs. without the patch applied. The 'new/old rts' row gives the 
quotients of the network round trips with the patch applied vs. without the 
patch applied. 

The first measurement includes all operations (getItem, getNode, getProperty 
and refresh) as above. 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
new/old time: 0.1, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.6, 1, 1, 1.1, 
0.8
new/old rts: 2.1, 2.8, 1.8, 2.4, 1.8, 1.4, 1.3, 1.2, 1, 1.1, 1, 1, 0.9, 1, 0.9

Most obvious is the vast performance increase (up to factor 10) for reading 
items. However this comes along with an increase of the number of network round 
trips. Three things should be noted here: 1. For realistic batch sizes the 
increase of the number of network round trips is not so significant. 2. The 
increase of the number of network round trips are caused by the refresh 
operations. In the test scenario the number of refresh operations is 
unrealistically high (every fourth operation is a refresh). 3. The items in the 
batches of the test case are not realistically distributed across the items of 
the repository. That is, the items are randomly chosen from the repository. In 
practice however, the items in a batch would be related to each other by some 
locality criteria. I assume that this would further mitigate the observed 
effect. 

For completeness sake here the same measurement as above but without refresh 
operations: 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
new/old time: 0.2, 0, 0, 0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.6, 0.7, 1, 1, 1, 1.1
new/old rts: 1, 1, 0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1


> Implement caching mechanism for ItemInfo batches
> ------------------------------------------------
>
>                 Key: JCR-2498
>                 URL: https://issues.apache.org/jira/browse/JCR-2498
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi, jackrabbit-spi
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>         Attachments: JCR-2498-poc.patch
>
>
> Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
> into the hierarchy right away. For big batch sizes this is prohibitively 
> expensive. The overhead is so great (*), that it quickly outweighs the 
> overhead of network round trips. Moreover, SPI implementations usually choose 
> the batch in a way determined by the backing persistence store and not by the 
> requirements of the consuming application on the JCR side. That is, many of 
> the items in the batch might never be actually needed. 
> I suggest to implement a cache for ItemInfo batches. Conceptually such a 
> cache would live inside jcr2spi right above the SPI API. The actual 
> implementation would be provided by SPI implementations. This approach allows 
> for fine tuning cache/batch sizes to a given persistence store and network 
> environment. This would also better separate different concerns: the purpose 
> of the existing item cache is to optimize for the requirement of the consumer 
> of the JCR API ('the application'). The new ItemInfo cache is to optimize for 
> the specific network environment and backing persistence store. 
> (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to