[ 
https://issues.apache.org/jira/browse/OAK-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194472#comment-16194472
 ] 

Julian Reschke edited comment on OAK-6789 at 10/7/17 2:28 PM:
--------------------------------------------------------------

Further tests show that this is caused by the way we delete in batches, and the 
way Oracle handles these deletes.

The version garbage collection deletes in chunks of 450 documents, each 
identified by ID and with a condition on it's modification date (to prevent 
deletion of documents that have changed since the collection phase).

In {{RDBDocumentStore}} we break these down to chunks of 64 documents, and 
generate a single DELETE statement like this:
{noformat}
delete from nodes where id='x1' and modified=1 or id='x2' and modified=2  ...
{noformat}

Tests show that Oracle behaves sanely as long as we do not put more than 8 
conditions into a single statement. Checking the execution plan for 16 
conditions:
{noformat}
-----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows  | Bytes | Cost 
(%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |               |     1 |   122 |    
25(0)| 00:00:01 |
|   1 |  DELETE                       | NODES         |       |       |   |     
     |
|   2 |   INLIST ITERATOR             |               |       |       |   |     
     |
|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    
25(0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     8 |       |    
19(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR 
"ID"='x11'

              AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' 
AND "MODIFIED"=13 OR

              "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR 
"ID"='x16' AND

              "MODIFIED"=16 OR "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND 
"MODIFIED"=3 OR "ID"='x4'

              AND "MODIFIED"=4 OR "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND 
"MODIFIED"=6 OR

              "ID"='x7' AND "MODIFIED"=7 OR "ID"='x8' AND "MODIFIED"=8 OR 
"ID"='x9' AND "MODIFIED"=9)

   4 - access("ID"='x1' OR "ID"='x10' OR "ID"='x11' OR "ID"='x12' OR "ID"='x13'
OR

              "ID"='x14' OR "ID"='x15' OR "ID"='x16' OR "ID"='x2' OR "ID"='x3' 
OR "ID"='x4' OR

              "ID"='x5' OR "ID"='x6' OR "ID"='x7' OR "ID"='x8' OR "ID"='x9')
{noformat}

where "SYS_C00175322" is the primary unique index on ID.

For 8 conditions:

{noformat}
-----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows  | Bytes | Cost 
(%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |               |     1 |   122 |    
14(0)| 00:00:01 |
|   1 |  DELETE                       | NODES         |       |       |   |     
     |
|   2 |   INLIST ITERATOR             |               |       |       |   |     
     |
|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    
14(0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     4 |       |    
11(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x2' AND "MODIFIED"=2 OR 
"ID"='x3'

              AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID"='x5' AND 
"MODIFIED"=5 OR

              "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIFIED"=7 OR 
"ID"='x8' AND "MODIFIED"=8)

   4 - access("ID"='x1' OR "ID"='x2' OR "ID"='x3' OR "ID"='x4' OR "ID"='x5' OR
              "ID"='x6' OR "ID"='x7' OR "ID"='x8')

{noformat}



was (Author: reschke):
Further tests show that this is caused by the way we delete in batches, and the 
way Oracle handles these deletes.

The version garbage collection deletes in chunks of 450 documents, each 
identified by ID and with a condition on it's modification date (to prevent 
deletion of documents that have changed since the collection phase).

In {{RDBDocumentStore}} we break these down to chunks of 64 documents, and 
generate a single DELETE statement like this:
{noformat}
delete from nodes where id='x1' and modified=1 or id='x2' and modified=2  ...
{noformat}

Tests show that Oracle behaves sanely as long as we do not put more than 8 
conditions into a single statement. Checking the execution plan for 16 
conditions:
{noformat}
-----------------------------------------------------------------------------------------------

| Id  | Operation                     | Name          | Rows  | Bytes | Cost 
(%CPU)| Time     |

-----------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT              |               |     1 |   122 |    
25(0)| 00:00:01 |

|   1 |  DELETE                       | NODES         |       |       |   |     
     |

|   2 |   INLIST ITERATOR             |               |       |       |   |     
     |

|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    
25(0)| 00:00:01 |

|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     8 |       |    
19(0)| 00:00:01 |

-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11'

              AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AN
D "MODIFIED"=13 OR

              "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "I
D"='x16' AND

              "MODIFIED"=16 OR "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODI
FIED"=3 OR "ID"='x4'

              AND "MODIFIED"=4 OR "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "M
ODIFIED"=6 OR

              "ID"='x7' AND "MODIFIED"=7 OR "ID"='x8' AND "MODIFIED"=8 OR "ID"='
x9' AND "MODIFIED"=9)

   4 - access("ID"='x1' OR "ID"='x10' OR "ID"='x11' OR "ID"='x12' OR "ID"='x13'
OR

              "ID"='x14' OR "ID"='x15' OR "ID"='x16' OR "ID"='x2' OR "ID"='x3' O
R "ID"='x4' OR

              "ID"='x5' OR "ID"='x6' OR "ID"='x7' OR "ID"='x8' OR "ID"='x9')
{noformat}

where "SYS_C00175322" is the primary unique index on ID.

For 8 conditions:

{noformat}
-----------------------------------------------------------------------------------------------

| Id  | Operation                     | Name          | Rows  | Bytes | Cost 
(%CPU)| Time     |

-----------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT              |               |     1 |   122 |    
14(0)| 00:00:01 |

|   1 |  DELETE                       | NODES         |       |       |   |     
     |

|   2 |   INLIST ITERATOR             |               |       |       |   |     
     |

|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    
14(0)| 00:00:01 |

|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     4 |       |    
11(0)| 00:00:01 |

-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x2' AND "MODIFIED"=2 OR "ID"='
x3'

              AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID"='x5' AND "M
ODIFIED"=5 OR

              "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIFIED"=7 OR "ID"='
x8' AND "MODIFIED"=8)

   4 - access("ID"='x1' OR "ID"='x2' OR "ID"='x3' OR "ID"='x4' OR "ID"='x5' OR
              "ID"='x6' OR "ID"='x7' OR "ID"='x8')

{noformat}


> RDB: RevisionGC performance on Oracle
> -------------------------------------
>
>                 Key: OAK-6789
>                 URL: https://issues.apache.org/jira/browse/OAK-6789
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>         Attachments: OAK-6789.diff
>
>
> In RevisionGC on Oracle, performance of deletes is bad. Using RevisionGCTest:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, 
> deletedDocGCCount=92092 (of which leaf: 92000), updateResurrectedGCCount=0, 
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, 
> timeActive=7.429 min, timeToCollectDeletedDocs=1394 ms, 
> timeToCheckDeletedDocs=250.4 ms, timeToSortDocIds=374.0 us, 
> timeTakenToUpdateResurrectedDocs=17.00 us, timeTakenToDeleteDeletedDocs=7.399 
> min, timeTakenToCollectAndDeleteSplitDocs=97.95 ms}
> Performed RevisionGC in 7,429 min
> {noformat}
> Compared with DB2:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, 
> deletedDocGCCount=96096 (of which leaf: 96000), updateResurrectedGCCount=0, 
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, 
> timeActive=8.240 s, timeToCollectDeletedDocs=1780 ms, 
> timeToCheckDeletedDocs=259.7 ms, timeToSortDocIds=237.0 us, 
> timeTakenToUpdateResurrectedDocs=19.00 us, timeTakenToDeleteDeletedDocs=4.552 
> s, timeTakenToCollectAndDeleteSplitDocs=685.4 ms}
> Performed RevisionGC in 8,243 s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to