[ https://issues.apache.org/jira/browse/CASSANDRA-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Dejanovski updated CASSANDRA-16406: --------------------------------------------- Attachment: 16406-trunk.txt > Debug logging affects repair performance > ---------------------------------------- > > Key: CASSANDRA-16406 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16406 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Alexander Dejanovski > Assignee: Alexander Dejanovski > Priority: Normal > Fix For: 4.0-rc > > Attachments: 16406-trunk.txt, CASSANDRA-16406.png, > with_debug_logging.png, without_debug_logging.png > > > While working on the repair quality testing in CASSANDRA-16245, it appeared > that the node coordinating repairs on a 20GB per node dataset was generating > more than 2G of log with a total duration for the incremental repair > scenarios of ~2h40m: > https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/37/workflows/6a7a41c8-0fca-4080-b37e-3b38998b3fab/jobs/49/steps > ] > !with_debug_logging.png! > The logs showed a lot of messages from the MerkleTree class at high pace: > {noformat} > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) > Fully inconsistent range [#<TreeRange > (-6738651767434294419,-6738622715859497972] depth=11>, #<TreeRange > (-6738622715859497972,-6738593664284701525] depth=11>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (9) > Inconsistent digest on right sub-range #<TreeRange > (-6738593664284701525,-6738535561135108630] depth=10>: [#<OffHeapInner > -6738564612709905078 > hash=[b8efd3d684474276f316b1bc9f23b463cda4f8d620a4b5cf4d2c7dbb101bbe85] > children=[#<OffHeapLeaf > [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]> > #<OffHeapLeaf > [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>]>, > #<OffHeapInner -6738564612709905078 > hash=[95334327a0b50b7d3c51d6588a5f3d57701e0b978f6ab28e85cda3cb5a094eb5] > children=[#<OffHeapLeaf > [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]> > #<OffHeapLeaf > [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:262 - (10) > Hashing sub-ranges [#<TreeRange (-6738593664284701525,-6738564612709905078] > depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>] > for #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10> divided > by midpoint -6738564612709905078 > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:272 - (10) > Inconsistent digest on left sub-range #<TreeRange > (-6738593664284701525,-6738564612709905078] depth=11>: [#<OffHeapLeaf > [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>, > #<OffHeapLeaf > [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (10) > Inconsistent digest on right sub-range #<TreeRange > (-6738564612709905078,-6738535561135108630] depth=11>: [#<OffHeapLeaf > [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>, > #<OffHeapLeaf > [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) > Fully inconsistent range [#<TreeRange > (-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange > (-6738564612709905078,-6738535561135108630] depth=11>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (9) > Fully inconsistent range [#<TreeRange > (-6738651767434294419,-6738593664284701525] depth=10>, #<TreeRange > (-6738593664284701525,-6738535561135108630] depth=10>] > DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (8) > Inconsistent digest on right sub-range #<TreeRange > (-6738535561135108630,-6738419354835922841] depth=9>: [#<OffHeapInner > -6738477457985515736 > hash=[806ede1a35783bf5fafe8b8ccefe4d3ff48e8f0e1314f8a9ce4b23f13fed4bf9] > children=[#<OffHeapInner -6738506509560312183 > hash=[e6d133afbd8041266f8a1cfe456ff07c9e7debe8ff54279b579b252f2af78b6b] > children=[#]> #<OffHeapInner -6738448406410719289 > hash=[66bfedb588f87ad3957497728b91bd436af364e6ec40df3299d006de151ac092] > children=[#]>]>, #<OffHeapInner -6738477457985515736 > hash=[bf128431ddf7ad72417ce853f90abd0bc7a2a38088d5fbec864c02c516ed6458] > children=[#<OffHeapInner -6738506509560312183 > hash=[12a71b62ceb096369c4797f56edd91a16dd9803348b4f4cfe1454027d7dae3d2] > children=[#]> #<OffHeapInner -6738448406410719289 > hash=[adb59f5313473b44dd3b7fa697d72caaaa7b23b3c0610f23670942e2c137878a] > children=[#]>]>]{noformat} > When disabling debug logging, the duration dropped to ~2h05m with decent log > sizes: > [https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/38/workflows/c91e6ceb-438b-4f00-b38a-d670f9afb4c3/jobs/51] > !without_debug_logging.png! > There's apparently too much logging for each inconsistency found in the > Merkle tree comparisons and we should move this to TRACE level if we still > want to allow debug logging to be turned on by default. > I'll prepare a patch for the MerkleTree class and run the repair testing > scenarios again to verify their duration. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org