[jira] [Comment Edited] (OAK-6789) RDB: RevisionGC performance on Oracle

Julian Reschke (JIRA) Fri, 06 Oct 2017 04:52:22 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194484#comment-16194484
 ]


Julian Reschke edited comment on OAK-6789 at 10/6/17 11:50 AM:
---------------------------------------------------------------

Further tests show that it's indeed the condition on MODIFIED that triggers the 
behavior.

Removing it (breaking the API contract...) fixes the performance problem. So 
does adding an index on MODIFIED:

{noformat}
------------------------------------------------------------------------------------------------------

| Id  | Operation                             | Name         | Rows  | Bytes | 
Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT                      |              |     1 |   122 |  
20   (0)| 00:00:01 |

|   1 |  DELETE                               | NODES        |       |       |  
        |          |

|   2 |   INLIST ITERATOR                     |              |       |       |  
        |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NODES        |     1 |   122 |  
20   (0)| 00:00:01 |

|*  4 |     INDEX RANGE SCAN                  | MODIFIED_IDX |     8 |       |  
18   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11' AND

              "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AND "M
ODIFIED"=13 OR "ID"='x14'

              AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "ID"='x16' AN
D "MODIFIED"=16 OR

              "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='
x4' AND "MODIFIED"=4 OR

              "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='
x7' AND "MODIFIED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9)
   4 - access("MODIFIED"=1 OR "MODIFIED"=10 OR "MODIFIED"=11 OR "MODIFIED"=12 OR


              "MODIFIED"=13 OR "MODIFIED"=14 OR "MODIFIED"=15 OR "MODIFIED"=16 O
R "MODIFIED"=2 OR

              "MODIFIED"=3 OR "MODIFIED"=4 OR "MODIFIED"=5 OR "MODIFIED"=6 OR "M
ODIFIED"=7 OR "MODIFIED"=8

              OR "MODIFIED"=9)
{noformat}

Note that this switches step 4 to an index scan on MODIFIED_IDX rather than the 
primary key index. This fixes the performance in the test case, but might be 
undesirable in large repos where many rows have the same modified date.

It would be interesting to understand why Oracle prefers the MODIFIED_IDX over 
the primrary key index, when present. FWWI; it would be possible to add a hint 
to the query to force Oracle to use a specific index (not tried over JDBC, 
through).

Finally for completeness: the plan when we have a composite index on ID and 
MODIFIED:

{noformat}
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| 
Time     |

--------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT   |                 |     1 |   122 |    19   (0)| 
00:00:01 |

|   1 |  DELETE            | NODES           |       |       |            |     
|

|   2 |   INLIST ITERATOR  |                 |       |       |            |     
|

|*  3 |    INDEX RANGE SCAN| ID_MODIFIED_IDX |     1 |   122 |    19   (0)| 
00:00:01 |

--------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR
              "ID"='x11' AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "I
D"='x13'

              AND "MODIFIED"=13 OR "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AN
D

              "MODIFIED"=15 OR "ID"='x16' AND "MODIFIED"=16 OR "ID"='x2' AND "MO
DIFIED"=2

              OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID
"='x5' AND

              "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIF
IED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9))
{noformat}

which allows to process the whole condition with a single index lookup, but 
might have undesirable effects elsewhere.



was (Author: reschke):
Further tests show that it's indeed the condition on MODIFIED that triggers the 
behavior.

Removing it (breaking the API contract...) fixes the performance problem. So 
does adding an index on MODIFIED:

{noformat}
------------------------------------------------------------------------------------------------------

| Id  | Operation                             | Name         | Rows  | Bytes | 
Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT                      |              |     1 |   122 |  
20   (0)| 00:00:01 |

|   1 |  DELETE                               | NODES        |       |       |  
        |          |

|   2 |   INLIST ITERATOR                     |              |       |       |  
        |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NODES        |     1 |   122 |  
20   (0)| 00:00:01 |

|*  4 |     INDEX RANGE SCAN                  | MODIFIED_IDX |     8 |       |  
18   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11' AND

              "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AND "M
ODIFIED"=13 OR "ID"='x14'

              AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "ID"='x16' AN
D "MODIFIED"=16 OR

              "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='
x4' AND "MODIFIED"=4 OR

              "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='
x7' AND "MODIFIED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9)
   4 - access("MODIFIED"=1 OR "MODIFIED"=10 OR "MODIFIED"=11 OR "MODIFIED"=12 OR


              "MODIFIED"=13 OR "MODIFIED"=14 OR "MODIFIED"=15 OR "MODIFIED"=16 O
R "MODIFIED"=2 OR

              "MODIFIED"=3 OR "MODIFIED"=4 OR "MODIFIED"=5 OR "MODIFIED"=6 OR "M
ODIFIED"=7 OR "MODIFIED"=8

              OR "MODIFIED"=9)
{noformat}

Note that this switches step 4 to an index scan on MODIFIED_IDX rather than the 
primary key index. This fixes the performance in the test case, but might be 
undesirable in large repos where many rows have the same modified date.

It would be interesting to understand why Oracle prefers the MODIFIED_IDX over 
the primrary key index, when present. FWWI; it would be possible to add a hint 
to the query to force Oracle to use a specific index (not tried over JDBC, 
through).

Finally for completeness: the plan when we have a composite index on ID and 
MODIFIED:

{noformat}
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| 
Time     |

--------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT   |                 |     1 |   122 |    19   (0)| 
00:00:01 |

|   1 |  DELETE            | NODES           |       |       |            |     
|

|   2 |   INLIST ITERATOR  |                 |       |       |            |     
|

|*  3 |    INDEX RANGE SCAN| ID_MODIFIED_IDX |     1 |   122 |    19   (0)| 
00:00:01 |

--------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR
              "ID"='x11' AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "I
D"='x13'

              AND "MODIFIED"=13 OR "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AN
D

              "MODIFIED"=15 OR "ID"='x16' AND "MODIFIED"=16 OR "ID"='x2' AND "MO
DIFIED"=2

              OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID
"='x5' AND

              "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIF
IED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9))
{noformat}

which allows to process the whole condition with a single index lookup, but 
might hav undesirable effects elsewhere.


> RDB: RevisionGC performance on Oracle
> -------------------------------------
>
>                 Key: OAK-6789
>                 URL: https://issues.apache.org/jira/browse/OAK-6789
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>
> In RevisionGC on Oracle, performance of deletes is bad. Using RevisionGCTest:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, 
> deletedDocGCCount=92092 (of which leaf: 92000), updateResurrectedGCCount=0, 
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, 
> timeActive=7.429 min, timeToCollectDeletedDocs=1394 ms, 
> timeToCheckDeletedDocs=250.4 ms, timeToSortDocIds=374.0 us, 
> timeTakenToUpdateResurrectedDocs=17.00 us, timeTakenToDeleteDeletedDocs=7.399 
> min, timeTakenToCollectAndDeleteSplitDocs=97.95 ms}
> Performed RevisionGC in 7,429 min
> {noformat}
> Compared with DB2:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, 
> deletedDocGCCount=96096 (of which leaf: 96000), updateResurrectedGCCount=0, 
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, 
> timeActive=8.240 s, timeToCollectDeletedDocs=1780 ms, 
> timeToCheckDeletedDocs=259.7 ms, timeToSortDocIds=237.0 us, 
> timeTakenToUpdateResurrectedDocs=19.00 us, timeTakenToDeleteDeletedDocs=4.552 
> s, timeTakenToCollectAndDeleteSplitDocs=685.4 ms}
> Performed RevisionGC in 8,243 s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (OAK-6789) RDB: RevisionGC performance on Oracle

Reply via email to