[
https://issues.apache.org/jira/browse/OAK-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575320#comment-17575320
]
Stefan Egli commented on OAK-9880:
----------------------------------
Alternative approach could be to use the lowest {{_sdMaxRevTime}} (of all
sweepRevs, and {{oldestRevTimeStamp}}) - with the effect that rgc will have to
wait longer until it can clean up garbage, and then likely does it in larger
chunks (which has its own, performance, issues).
So with the initial example it would look like this:
{noformat}
{
"_sdType" : 70,
"_sdMaxRevTime" : {
"$lt" : NumberLong(1601010101)
},
"$or" : [
{
"_id" : /.*-1\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-1\/0/
},
{
"_id" : /.*-2\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-2\/0/
}
}
}
{noformat}
> Simplify rgc query
> ------------------
>
> Key: OAK-9880
> URL: https://issues.apache.org/jira/browse/OAK-9880
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: mongomk
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Major
>
> We have seen a repeat of long running rgc *remove* operations - similarly to
> what was described in OAK-8351.
> This time happening with the query generated by
> [queryForDefaultNoBranch|https://github.com/apache/jackrabbit-oak/blob/99b250a05ffe490f66de67374125fabee17f6fda/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoVersionGCSupport.java#L213-L242]
> with the query shape for example similar to:
> {noformat}
> {
> "_sdType" : 70,
> "_sdMaxRevTime" : {
> "$lt" : NumberLong(1603030303)
> },
> "$or" : [
> {
> "$or" : [
> {
> "_id" : /.*-1\/0/
> },
> {
> "_id" : /[^-]*/,
> "_path" : /.*-1\/0/
> }
> ],
> "_sdMaxRevTime" : {
> "$lt" : NumberLong(1602020202)
> }
> },
> {
> "$or" : [
> {
> "_id" : /.*-2\/0/
> },
> {
> "_id" : /[^-]*/,
> "_path" : /.*-2/0/
> }
> ],
> "_sdMaxRevTime" : {
> "$lt" : NumberLong(1601010101)
> }
> }
> }
> {noformat}
> While setting an index filter with the query plan in mongodb is one option,
> we could additionally also look into simplifying the above query further into
> multiple queries : eg. by having 1 query per clusterNodeId, and then
> simplifying the {{_sdMaxRevTime}} accordingly, so that the above would
> translate into the following 2 queries (with the hope that mongodb finds the
> optimal query plan) :
> {noformat}
> {
> "_sdType" : 70,
> "_sdMaxRevTime" : {
> "$lt" : NumberLong(1602020202)
> },
> "$or" : [
> {
> "_id" : /.*-1\/0/
> },
> {
> "_id" : /[^-]*/,
> "_path" : /.*-1\/0/
> }
> }
> }
> {noformat}
> and
> {noformat}
> {
> "_sdType" : 70,
> "_sdMaxRevTime" : {
> "$lt" : NumberLong(1601010101)
> },
> "$or" : [
> {
> "_id" : /.*-2\/0/
> },
> {
> "_id" : /[^-]*/,
> "_path" : /.*-2\/0/
> }
> }
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)