[ 
https://issues.apache.org/jira/browse/OAK-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080494#comment-14080494
 ] 

Chetan Mehrotra commented on OAK-2007:
--------------------------------------

[~ppakulski] The purpose of {{getAllChunkIds}} is get all the chunkIds so it 
would involve full traversal of the blob collection. The lastModified condition 
is used to ignore those entries which are recently made to avoid race condition 
where chunk is created but yet not associated with any document in {{nodes}} 
collection. In your case I believe data in lastModified index is less for given 
condition so Mongo picked that up and that would be faster.

As [~amitj_76] mentioned that 

bq. I think the problem is that with the lastMod condition, the _id index does 
not cover the query any more and I am not sure how much an index on lastMod 
would help, because it would still not cover the query. It might be better to 
have a compound index of the form 

That should help. Given that blobId are not modified much (the modified time is 
changed if the blob is referred again which should not be a frequent case) that 
should not cause much issue. We can also decrease the precision of modifiedTime 
(say store upto in minutes or hours) if we keep the maxBlobGCTime greater than 
that precision. This should reduce the memory footprint of such an index






> MongoBlobStore improvements
> ---------------------------
>
>                 Key: OAK-2007
>                 URL: https://issues.apache.org/jira/browse/OAK-2007
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob, mongomk
>    Affects Versions: 1.0.2
>            Reporter: Przemo Pakulski
>         Attachments: OAK-2007.patch
>
>
> To collect all chunk identifiers (getAllChunkIds) hint is used to force "_id" 
> index usage. 
> This index doesn't help because query is using "lastMod" field. It also 
> doesn't allow to leverage custom indexes if created. In result queries are 
> pretty slow.
> Additionally consider to create index on "lastMod" field to speed-up all 
> queries using this criteria.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to