[ 
https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010900#comment-17010900
 ] 

Benedict Elliott Smith commented on CASSANDRA-15389:
----------------------------------------------------

Thanks, the patch is looking really good.  Just some minor suggestions to 
consider.

AbstractRow
 * If we introduce a {{BiFunction}} variant of {{apply}}, in which we can 
supply the second argument, then we can go garbage-free for e.g. {{digest}} by 
calling {{apply(ColumnData::digest, hasher)}}

BTreeRow
 * {{hasComplexDeletion}} does not need the {{if (isSimple)}} branch anymore, I 
think?
 * {{hasComplexDeletion}} seems to be testing the wrong sentinel return value 
from {{accumulate}}?
 * Does {{HAS_INVALID_DELETIONS}} need to test {{cd != null}}?

Row
 * {{collectStats}} could delegate to a new {{ComplexColumnData.accumulate}}, 
and pass itself as the parameter (though we would need to test {{!(cd 
instanceof ComplexColumnData)}} instead of {{cd.column.isSimple()}}
 * This might mean we can remove the {{hasCells()}} test, as we will be 
garbage-free and can let {{accumulate}} make the right decision for us

BTree
 * Having been playing with a lot of BTree methods recently, I personally find 
it aids clarity to have totally separate bits of code implementing behavior for 
leaves and branches, with a guard at-or-near the start of the method switching 
behaviour. It could invoke a {{methodLeaf}} variant, or do it inline, but I 
think it could make it somewhat easier to follow the logic in {{accumulate}}. 
It’s only a small amount of code to duplicate, but the reader doesn’t have to 
try and play out the switches while following the flow.
 * I also find that when iterating branches, it is most clear to simply do one 
key/child pair per loop, and a final (or initial) unrolled copy of a child 
visit. Again, a small amount of duplication but I think easier for a reader to 
digest (at least, I found this helped when reading my own methods). This 
_should_ also lead to fewer executed instructions, and perhaps better branch 
predictor behaviour (hard to say, many of them might be well predicted anyway).

nits:
 * {{AbstractRow.validateData}} should probably use method reference for 
clarity, i.e. {{apply(ColumnData::validate)}}
 * {{BTreeRow.HAS_INVALID_DELETIONS}} can be declared using lambda syntax
 * {{BTreeRow}} unused imports
 * {{BTreeRow:345}} unused {{ALWAYS_TRUE}}
 * {{Row}} unused imports
 * {{Rows}} unused {{addColumn}}
 * {{UnfilteredSerializer}} unused import
 * {{SerializationHeader}} unused imports
 * {{BTree}} unused import

> Minimize BTree iterator allocations
> -----------------------------------
>
>                 Key: CASSANDRA-15389
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15389
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0
>
>
> Allocations of BTree iterators contribute a lot amount of garbage to the 
> compaction and read paths.
> This patch removes most btree iterator allocations on hot paths by:
>  • using Row#apply where appropriate on frequently called methods 
> (Row#digest, Row#validateData
>  • adding BTree accumulate method. Like the apply method, this method walks 
> the btree with a function that takes and returns a long argument, this 
> eliminates iterator allocations without adding helper object allocations 
> (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, 
> BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, 
> UnfilteredSerializer#serializedRowBodySize) as well as eliminating the 
> allocation of helper objects in places where apply was used previously^[1]^.
>  • Create map of columns in SerializationHeader, this lets us avoid 
> allocating a btree search iterator for each row we serialize.
> These optimizations reduce garbage created during compaction by up to 13.5%
>  
> [1] the memory test does measure memory allocated by lambdas capturing objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to