[ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834272#action_12834272 ]
Michael Dürig commented on JCR-2498: ------------------------------------ Some more numbers demonstrating the effect with JCR-2498-poc.patch applied. The 'new/old time' row gives the quotients of the request times with the patch applied vs. without the patch applied. The 'new/old rts' row gives the quotients of the network round trips with the patch applied vs. without the patch applied. The first measurement includes all operations (getItem, getNode, getProperty and refresh) as above. Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 new/old time: 0.1, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.6, 1, 1, 1.1, 0.8 new/old rts: 2.1, 2.8, 1.8, 2.4, 1.8, 1.4, 1.3, 1.2, 1, 1.1, 1, 1, 0.9, 1, 0.9 Most obvious is the vast performance increase (up to factor 10) for reading items. However this comes along with an increase of the number of network round trips. Three things should be noted here: 1. For realistic batch sizes the increase of the number of network round trips is not so significant. 2. The increase of the number of network round trips are caused by the refresh operations. In the test scenario the number of refresh operations is unrealistically high (every fourth operation is a refresh). 3. The items in the batches of the test case are not realistically distributed across the items of the repository. That is, the items are randomly chosen from the repository. In practice however, the items in a batch would be related to each other by some locality criteria. I assume that this would further mitigate the observed effect. For completeness sake here the same measurement as above but without refresh operations: Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 new/old time: 0.2, 0, 0, 0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.6, 0.7, 1, 1, 1, 1.1 new/old rts: 1, 1, 0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1 > Implement caching mechanism for ItemInfo batches > ------------------------------------------------ > > Key: JCR-2498 > URL: https://issues.apache.org/jira/browse/JCR-2498 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-jcr2spi, jackrabbit-spi > Reporter: Michael Dürig > Assignee: Michael Dürig > Attachments: JCR-2498-poc.patch > > > Currently all ItemInfos returned by RepositoryService#getItemInfos are placed > into the hierarchy right away. For big batch sizes this is prohibitively > expensive. The overhead is so great (*), that it quickly outweighs the > overhead of network round trips. Moreover, SPI implementations usually choose > the batch in a way determined by the backing persistence store and not by the > requirements of the consuming application on the JCR side. That is, many of > the items in the batch might never be actually needed. > I suggest to implement a cache for ItemInfo batches. Conceptually such a > cache would live inside jcr2spi right above the SPI API. The actual > implementation would be provided by SPI implementations. This approach allows > for fine tuning cache/batch sizes to a given persistence store and network > environment. This would also better separate different concerns: the purpose > of the existing item cache is to optimize for the requirement of the consumer > of the JCR API ('the application'). The new ItemInfo cache is to optimize for > the specific network environment and backing persistence store. > (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.