[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-09-10 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193628#comment-17193628
 ] 

Marcus Eriksson commented on CASSANDRA-15393:
-

+1 assuming clean test run and Calebs last nits fixed

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-09-09 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193067#comment-17193067
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

Read through the last raft of commits, and things are looking pretty good. Some 
closing nits...

 - {{ByteArrayAccessor#toString(byte[], Charset)}} doesn't look like it needs 
to have {{CharacterCodingException}} in its {{throws}} clause, although if you 
want to keep it for uniformity with the interface, I suppose that's fine.
 - Many of the tests in {{ByteArrayUtilTest}} seem to fail because we removed 
{{ByteArrayUtil#ensureCapacity()}}, which produces a more detailed error 
message. Not sure if we want to keep that around or leave it out (performance?) 
and adjust the tests to expect the simpler error.
 - {{ValueAccessor#createArray()}} and {{compare()}} have a dangling JavaDoc 
{{@return}}.
 - {{ValueAccessor#write()}} typo: "...into the a ByteBuffer"
 - {{ValueAccessor#slice()}} typo: "...@prarm offset..."
 - {{ValueAccessor#compareByteArrayTo()}} and {{compareByteBufferTo()}} typo: 
"...\{@parame }..."
- {{TupleType#split()}} still seems like it would be a pretty easy place to 
revert to {{ByteBuffer}} only to we don't have to worry about having introduced 
any offset-related bugs? (i.e. Literally its only usage is an overload that 
passes the {{ByteBufferAccessor}} singleton.)
- The declaration of {{Unfiltered.Kind}} doesn't need a semicolon following it.
- Might still be nice to make a note in the code around why {{Unfiltered}} has 
to extend {{Clusterable}} without a type bound.
- In {{ByteArrayUtil#writeWithShortLength()}}, in the {{assert}}, {{0 <= 
length}} is always true.
- Are the FIXMEs in {{DurationSerializer}} still live?

bq. I also didn't convert the randomized tests to QT.

That's fine w/ me. It's a pretty low-value conversion.

The above notwithstanding, +1 from me overall at this point, assuming clean 
test runs, etc.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-09-08 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192414#comment-17192414
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

Pushed up a new branch 
[here|https://github.com/bdeggleston/cassandra/tree/15393-rebase-09-02], this 
was recently rebased onto trunk and I think I've addressed all review feedback, 
including reverting several buffer to accessor conversions, with the exceptions 
below:

* I didn't convert ConterContext.clearAllLocal to byte[] only because all the 
tests are written with byte buffers. 
* For ModificationStatement and *CBuilder, I started attempting to unwind this 
and remembered why I'd done it in the first place. Basically, the places where 
you can be explicit (ie: ModificationStatement) are tangentially linked to the 
places where we can't be (ie: Cell) through classes used in the coordinator and 
replica read paths. Things like slices and filters and stuff. Using  across 
the board avoids having to deal with places where you need to figure out how to 
use Comparator> from ModificaionStatement with 
Clustering from a Row.
* The lack of a native accessor is no worse than the current state of things, 
and would be straightforward to add in the future.
* I didn't remove version from the collection serializers
* I also didn't convert the randomized tests to QT. I'm not opposed to the 
idea, but I don't have the time to devote to the conversion right now.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-09-02 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189237#comment-17189237
 ] 

Marcus Eriksson commented on CASSANDRA-15393:
-

First pass done, LGTM in general, I'll try to get a second pass done this week.

*Issues*

{{CounterContext.java}}
 * {{hasLegacyShards(..)}} {{totalCount}} looks wrong, should probably be {{int 
totalCount = (accessor.size(context) - headerLength(context, accessor)) / 
STEP_LENGTH;}}

{{ByteBufferAccessor.java}}
 * {{digest(..)}} needs to add {{value.position}} to {{offset}}

{{DynamicCompositeType.java}}
 * in {{validateComparator(..)}}, adds {{1}} to {{offset}} after reading a 
{{short}}

*Nits*
 General stuff
 * Slightly inconsistent use of {{TypeSizes.INT_SIZE}} / 
{{TypeSize.sizeof(int)}} / hard coded {{4}}, I'd probably prefer being explicit 
and use X_SIZE, mostly since we represent shorts etc in ints
 * A few places added the {{left}} / {{right}} / {{accessorL}} / {{accessorR}} 
parameters, but then have {{offset1/offset2/size1/size2}} - maybe rename these 
to {{offsetL/offsetR}} etc

{{CQL3Type.java}}
 * {{generateXCQLLiteral}}: no need to return the offset
 * unused import ByteBufferUtil

{{AbstractClusteringPrefix.java}}
 * maybe bring up {{size()}} and {{get(i)}} here (as {{getRawValues().length}} 
and {{getRawValues()[i]}}), and only leave them overridden in 
{{NativeClustering}}

{{ClusteringPrefix.java}}
 * {{isBottom}} & {{isTop}} can use {{size()}} instead of 
{{getRawValues().length}}

{{ArrayClusteringBound.java}} + {{ArrayClusteringBoundary.java}} + 
{{BufferClusteringBound.java}} + {{BufferClusteringBoundary.java}}
 * unsharedHeapSizeExcludingData unused

{{AbstractType.java}}
 * {{readArray}} unused

{{ByteBufferAccessor.java}}
 * in {{compare(..)}} - instead of casting to a {{ByteBuffer}} maybe use 
{{accessor.toBuffer(right)}} (same in {{ByteArrayAccessor}})

{{DynamicCompositeType.java}}
 * no need to {{getComparatorSize}} in {{getComparator(int i, VL left, 
ValueAccessor accessorL, VR right, ValueAccessor accessorR, int 
offset1, int offset2)}}

{{MapType.java}}
 * don't use {{V}} as the type parameter in {{referencesUserType}} - quite 
confusing as it clashes with the {{MapType}} type param

{{ValueAccessor.java}}
 * why have {{compare}} / {{compareUnsigned}} with the same parameters?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-28 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186875#comment-17186875
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

[~bdeggleston] There are the [items 
above|https://issues.apache.org/jira/browse/CASSANDRA-15393?focusedCommentId=17185489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17185489]
 to resolve, of course, but here's my full feedback on the latest raft of 
commits...

*Complexity*

- The semantics of {{ValueAccessor#size()}} are a little odd that it always 
indicates the length of the array in the byte array case, but the number of 
bytes left in a byte buffer. Some of the usages are just checking if the value 
is empty, which makes me wonder if having an isEmpty() get rid of some of the 
awkwardness. On the other hand, most usages occur right before we read.
- There are some switch statements we might be able to avoid in the two 
accessor implementations if we encode the {{copy()}} and {{compare}} strategies 
in {{BackingKind}}.
- How valuable is {{ValueAccessor.copy()}} when we can make calls like 
{{srcAccessor.copyTo(src, srcOffset, dst, dstAccessor, dstOffset, size)}} call 
directly from {{CounterContext}}? Same for {{compareUnsigned()}}. Want to make 
sure we aren't cluttering the accessor interface.
- This is really an extension of the conversation above, but I'm worried that a 
handful of places where we only use {{ByteBuffer}} being changed to use the 
more abstract value/accessor combination are not only problematic for 
readability, but are adding more surface area for testing. 
{{TupleType#split()}}, for instance, might not be covered too well, but we 
could probably avoid additional tests and get to the immediate goal faster here 
if we simply stay in {{ByteBuffer}} space and avoid the new offset logic in the 
first place. Other examples of this could be 
{{CounterContext#clearAllLocal()}}, {{ClusteringPrefix.Deserializer}} (byte 
array only), and {{Clustering.Serializer#deserialize()}}.

*Formatting*

- Horizontal alignment in {{ClusteringIndexNamesFilter#forPaging()}} on the 
ternary.
- Might have to line wrap some parameter lists in 
{{RangeTombstoneBoundaryMarker}}.

*Documentation*

- Brief JavaDoc for {{ByteBufferAccessor#copyTo()}} might be helpful. 
- JavaDoc for {{Unfiltered#clustering()}}, perhaps just to explain that we 
couldn't just use the {{clustering()}} signature in {{Clusterable}}?
- What level of JavaDoc do we want for {{ValueAccessor}} and {{ObjectFactory}} 
at the interface and method level? Certainly a lot of these methods are pretty 
straightforward, but it's also a pretty critical public interface.

*Miscellaneous*

- A number of places that call {{serializeBuffer()}} with both arguments could 
just call the version with one argument.
- There are some cases, like in {{CassandraIndex.Indexer#indexCell()}}, where 
it would be nice to bind {{Clustering}} and {{Cell}} to the same type, but I 
couldn't find a clean way to do it.
- {{BufferClusteringBoundOrBoundary#exclusiveCloseInclusiveOpen()}} looks 
unused.
- {{CollectionSerializer}} has quite a few methods that don't do anything w/ 
{{version}}. (Perhaps there will be another serialized form later, so I won't 
argue hard to remove.)
- {{SinglePartitionSliceBuilder#sliceBuilder}} looks like it can be {{final}}.
- The {{Relationship}} enum in {{CounterContext}} doesn't need to have a 
{{static}} modifier?
- {{ByteArrayUtil#writeWithLength()}} is unused.
- What about renaming {{ComparatorSet}} to {{ValueComparators}} and the field 
in {{AbstractType}} to {{comparators}} (which I think we do in one other place 
already)?
- What are the {{// FIXME: value input buffer}} comments in 
{{DurationSerializer}} about?
- {{putByte()}}, {{putDouble()}}, {{putFloat()}}, {{putLong()}} are neither 
used nor tested. (The underlying {{ByteArrayUtil}} stuff lingering even if 
unused seems less an issue.)
- {{import java.nio.ByteBuffer}} use unused in {{RangeTombstoneMarker}}.

*Testing*

- Did we want to make {{CQL3TypeLiteralTest}} deterministic with the fixed zero 
seed?
- {{ByteArrayUtilTest}} is empty?
- {{logger}} is unused in {{ReadCommandTest}}
- The new randomized tests here (ex. {{ValueAccessorTest}}) might be able to 
reuse some of the infrastructure [~dcapwell] created in CASSANDRA-16064 
(generating random primitives, etc.) It might also just be nice to make QT the 
standard way we do randomization for true unit tests (free seed printing, etc.) 
I'm not making a hard suggestion that we re-write, since the new tests look 
pretty straightforward as is.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
> 

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186177#comment-17186177
 ] 

Josh McKenzie commented on CASSANDRA-15393:
---

I think it was the "perf improvements are allowed up until release as per the 
agreed upon release lifecycle" so no reason to block beta1 on it. We can 
discuss on ML.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186021#comment-17186021
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

That's reasonable, I'll ping the dev list.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186017#comment-17186017
 ] 

Michael Semb Wever commented on CASSANDRA-15393:


[~bdeggleston], a patch with 208 files and ~3.5k LOC, [~snazy] expressing 
concern, and not seeing any comment of why this was placed in 4.0-beta in May. 
That should be enough that our normal community response is to double check 
with a broader base. Maybe [~JoshuaMcKenzie] you can provide support and the 
reasoning why this was, and should be, in 4.0-beta?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186013#comment-17186013
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

[~mck], this ticket was been triaged to 4.0-beta back in May. Performance 
improvements aren't restricted by the feature freeze. Do you have concerns 
about this ticket that aren't related to the feature freeze?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185997#comment-17185997
 ] 

Michael Semb Wever commented on CASSANDRA-15393:


{quote}Is there's a big difference between allowing performance improvements, 
especially ones that address real operational problems, into {{beta}} vs. 
{{4.0.x?}}
{quote}
The choice isn't binary, 4.1 is more appropriate. Was just my way of stating my 
concerns about this going into 4.0-beta when so much effort has been put into 
the feature freeze. 4.0.x would still require a waiver IMO, but with the 
QA/testing lifecycle guidelines from 4.0-beta better landed, and the stress and 
anxiety of getting 4.0 out after so long, it may then be a much easier 
discussion to have.

Again, can we please take this to the ML, and leave the ticket for the 
technical?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185886#comment-17185886
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

{quote}The 4.0.x line is going to be around for a while…
{quote}
{quote}that's not an argument in itself to waiver this patch into 4.0-beta, and 
sets the precedence for feature creep
{quote}
That was an argument for making an incremental change now rather than waiting 
for hard Java 11 support/varargs/etc., not necessarily an argument for this 
being in 4.0-beta.
{quote}I would definitely vote for the incremental approach, but starting in 
4.0.x
{quote}
Is there's a big difference between allowing performance improvements, 
especially ones that address real operational problems, into {{beta}} vs. 
{{4.0.x?}} If anything, allowing it into the former would mean we find problems 
earlier. (Assuming there is a statutory prohibition on this kind of thing at 
this point, I suppose that means I'm arguing for a waiver...)

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-27 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185732#comment-17185732
 ] 

Michael Semb Wever commented on CASSANDRA-15393:


We have been strict on the 4.0 feature freeze and are now in the 4.0-beta phase.

With the agreed upon release lifecycle: I struggle with the justification for 
putting this into 4.0.0, regardless of the improvements the patch brings.

If it met the criteria to go into 3.11.x then 4.0-beta is obvious. But it's not 
a bug, and neither a stability/performance improvement on a 4.0 introduced 
feature. Isn't this how we should be triaging all tickets from here to 4.0-rc.
{quote}The C* project has a problem with favoring rewrites in favor of 
incremental improvements.
{quote}
Totally agree with you [~bdeggleston]. I would definitely vote for the 
incremental approach, but starting in 4.0.x or 4.1. Especially after we have a 
better QA/test process spec'd out during the 4.0-beta phase.
{quote}The 4.0.x line is going to be around for a while…
{quote}
[~maedhroz], that's not an argument in itself to waiver this patch into 
4.0-beta, and sets the precedence for feature creep.

Can I suggest this needs to be raised on dev ML to get a waiver for 4.0-beta?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185510#comment-17185510
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

bq. The idea of combining the byte array util and the accessor has also been 
floating around, and I'd prefer we didn't for both the reason above, and 
because people are pretty used to having ByteBufferUtil handy. Having 
ByteArrayUtil handy would make the appearance of byte[] everywhere a little 
less jarring (hopefully).

bq. NativeCell prevents moving most of that into AbstractCell. Given the 
simplicity of the parts that can be factored, I'd lean towards keeping the 
class hierarchy simple over minimizing the size of the array and buffer cell 
implementations.

Agreed on both points.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185506#comment-17185506
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

bq. I'd done something sort of along these lines (albeit in the opposite 
direction) by adding the `ValueAware` class, so that AbstractType et al can 
operate directly on Cell objects without having to know what a cell is. 
Although I seem to default to keeping type information on the serializer / type 
side of things, there are benefits to each which we should discuss. And 
possibly a better name for ValueAware.

Unless we have a concrete plan for {{ValueAware}}, why don't we just remove it 
and use {{Cell}} directly? There might not be a real benefit to hiding the 
concept of a cell from the types and serializers. My comments here (ex. 
{{getLong()}}) are driving toward trying to eliminate as many places where we 
actually have to explicitly pair value and accessor as possible. 

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185499#comment-17185499
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

bq. I'm inclined to go the other way, and make underlying type information 
opaque everywhere except the serializers themselves. There are very few places 
where we need to see it, and making it explicit anywhere it doesn't need to be 
just makes it more difficult to make changes later on.

I suppose my starting disposition is wondering why we wouldn't _leave_ it 
explicit (i.e. as it is now) if the change hurts readability/clarity for the 
benefit of future changes we might not actually make (or if we do make them, 
might not look exactly the way we imagine they will here).

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185489#comment-17185489
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

bq. I'd prefer that we keep the type/serializers consistent wrt type. In other 
words, I'd prefer some methods "unneccesarily" switch to using an accessor than 
having some parts of a class use accessors and other parts use byte buffers 
directly

I'm not sure. There is a clear penalty in terms of readability in places like 
{{CompositeType#decompose()}}, and the main/public version of 
{{AbstractType#decompose()}} already explicitly _does_ return a {{ByteBuffer}}. 
This is also a case where remaining explicit reduces the sheer size of the 
patch.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-24 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183659#comment-17183659
 ] 

David Capwell commented on CASSANDRA-15393:
---

bq. It might have been less of an issue, if the whole code base would have 
proper unit testing for all production code paths with all potential inputs or 
even fuzzy testing, but that's unfortunately not the case.

If it helps, CASSANDRA-16064 (Caleb linked this JIRA with that one as they both 
created some of the same classes) provides the ability to generate random types 
(primitives, all collections, UDTs, reversed, etc.) and data for arbitrary 
schemas.  Also added logic to make it a bit easier to mock out part of our code 
such as Schema class; the intent of CASSANDRA-16064 is to start adding such 
tests you talk about and lower the bar to add more.

If there are sections you feel need closer testing, I can take a look if 
CASSANDRA-16064 offers enough for those sections, or if anything else is 
desired.  Let me know.


> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-24 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183621#comment-17183621
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

Thanks for the excellent review [~maedhroz]. I've pushed up some commits 
cleaning up a bunch of stuff (excluding docs for now) and addressing most of 
the points, with the exception of the comments below.

Let me know your thoughts on them:
{quote}Looks like AbstractType#decompose() is never used without 
ByteBufferAccessor? We could probably remove the type parameter and just make 
the ByteBuffer binding explicit.
{quote}
and
{quote}TypeSerializer#toCQLLiteral() is only used w/ ByteBuffer, so it doesn't 
look like it needs to be parameterized.
{quote}
I'd prefer that we keep the type/serializers consistent wrt type. In other 
words, I'd prefer some methods "unneccesarily" switch to using an accessor than 
having some parts of a class use accessors and other parts use byte buffers 
directly
{quote}ModificationStatement looks like it's dealing exclusively with 
ByteBuffer. Should the type parameters reflect that?
{quote}
and
{quote}Trying to propagate more typing information from 
ClusteringBoundOrBoundary.Serializer upward to its users for Slices and 
UnfilteredSerializer might help clarify some things
{quote}
I'm inclined to go the other way, and make underlying type information opaque 
everywhere except the serializers themselves. There are very few places where 
we need to see it, and making it explicit anywhere it doesn't need to be just 
makes it more difficult to make changes later on.
{quote}We might benefit in terms of usability/developer ergonomics if we push 
some capabilities of the accessor into Cell. (ex. methods like getLong()). 
Similar thing going on with ClusteringPrefix, where we could perhaps stop 
making calls like 
builder.add(prefix.get[image:3144E42A-756C-4044-BF0B-371E31605DF4-68435-0002F6486CD1F067/information.png],
 prefix.accessor()) and use 
bufferAt[image:4A0500E8-9C94-4DC7-919E-D17260E81A8C-68435-0002F6486CE35ECB/information.png]
 in CP itself. I suppose ArrayBackedBuilder might need to support something 
like add(ClusteringPrefix, int) if we still want to do that lazily, after the 
isDone() check.
{quote}
and
{quote}AbstractType#writeValue() could be implemented in Cell, given the latter 
knows both its value and accessor already, and ColumnSpecification already 
knows the column type?
{quote}
I'd done something sort of along these lines (albeit in the opposite direction) 
by adding the `ValueAware` class, so that AbstractType et al can operate 
directly on Cell objects without having to know what a cell is. Although I seem 
to default to keeping type information on the serializer / type side of things, 
there are benefits to each which we should discuss. And possibly a better name 
for ValueAware.
{quote}UNSET_BYTE_ARRAY, getChar(), putBoolean(), putChar(), writeWithLength() 
are unused in ByteArrayUtil. What if we just fold these methods into 
ByteArrayAccessor.
{quote}
These are all copied from java.io.Bits, so I assume the logic is correct and 
well tested. I'd like to leave them unused if you don't mind. ValueAccessor use 
is restricted to reading for the most part, but as it gets expanded to cover 
writing things, I think having known good implementations ready to go would be 
more beneficial than starting with a minimal implementation.

The idea of combining the byte array util and the accessor has also been 
floating around, and I'd prefer we didn't for both the reason above, and 
because people are pretty used to having ByteBufferUtil handy. Having 
ByteArrayUtil handy would make the appearance of byte[] everywhere a little 
less jarring (hopefully).
{quote}We might be able to factor our some common elements of ArrayCell and 
BufferCell.
{quote}
NativeCell prevents moving most of that into AbstractCell. Given the simplicity 
of the parts that can be factored, I'd lean towards keeping the class hierarchy 
simple over minimizing the size of the array and buffer cell implementations.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these 

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-12 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176410#comment-17176410
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

bq. I disagree with the idea that we should delay improving compaction 
allocations because we could implement a better solution at some point in the 
future.

Agreed. The 4.0.x line is going to be around for a while, so I lean toward 
trying to make the best of the tools at our disposal in Java 8. I think we'd 
have to make the argument that we'd be doing real damage to the codebase by 
continuing.

bq. I don't personally think there is an issue with being able to invoke in a 
contrived example where we deliberately elide the type information, but we 
should be sure the code doesn't do that today.

Agreed. This seems like something we should be able to catch in review pretty 
easily, although perhaps in the past we've not always been very strict about 
making sure we either eliminate compiler warnings or clearly document why we 
can't.

bq. The most complex parts are probably the collection serializers and other 
places where we're now having to do offset bookkeeping.

I know I'm going to be spending more time on both of these things in my second 
pass at review (there are already enough failing tests and other things that 
need cleanup first), and we almost certainly don't have sufficient test 
coverage, but serializers and offset manipulation logic aren't _that_ tricky?

bq. It might have been less of an issue, if the whole code base would have 
proper unit testing for all production code paths with all potential inputs or 
even fuzzy testing, but that's unfortunately not the case.

We do have the entire CASSANDRA-15536 epic (including fuzz testing w/ Harry) in 
progress, which will subject anything we do here to quite a bit of additional 
scrutiny, even after we get the testing within this issue itself to the bar.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175656#comment-17175656
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

bq. I'm sorry, if my objections sound harsh. But my point is that it's better 
to fix the whole heap-pressure-nightmare with (de)serialization 
(reads/writes/re-serializations/compaction/etc) in the next major release.

It’s fine, we should be talking about these things. I disagree with the idea 
that we should delay improving compaction allocations because we could 
implement a better solution at some point in the future. It's a textbook 
example of “letting perfect be the enemy of good enough”. The C* project has a 
problem with favoring rewrites in favor of incremental improvements. Compaction 
heap pressure is a real operational problem that causes a lot of headaches, and 
there is real value in mitigating it as part of 4.0, even if it can be further 
improved in the future. 

There's certainly risk here, but I think it's being a bit overstated. The 
changes here are wide, but not particularly deep. The most complex parts are 
probably the collection serializers and other places where we're now having to 
do offset bookkeeping. These should be carefully reviewed, but they're hardly 
unverifiable.

bq. Unrelatedly,  [Blake 
Eggleston](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bdeggleston)
 , is there a reason you didn't go the whole hog and just get rid of the 
ByteBuffer versions of everything?

1) That would have been a much larger change, and I wanted to limit scope. 
IIRC, replacing partition keys would have been a lot of work for a 
comparatively small gc win.
2) bytebuffers are still useful in some places. Specifically in places where 
we're using allocators
3) I've been working on a flyweight reader in my free time that reduces another 
95% of garbage and uses bytebuffers. This will be ready for 4.next, but using 
it should be optional.
4) There's value in decoupling data format from data logic. For instance, this 
would allow us to compare native and bytebuffer values without requiring the 
allocation of an intermediate buffer.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175390#comment-17175390
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


{quote}This one compiles without errors just by omitting the generic type 
information: This one compiles without errors just by omitting the generic type 
information
{quote}
This seems like a problem that can be resolved - I agree that some of the type 
information isn't properly propagated in some places, but I suspect that might 
be because the patch was awaiting feedback on its general structure.
{quote}Another source of errors is that positions & offsets are manually 
handled now.
{quote}
I am also not a _huge_ fan of this, but we have compromises to make. I don't 
think this is particularly error-prone, however, since we already do this a 
great deal - it's pretty integral to how our serialization/deserialization 
works, so this is not a new category of problem. That's not to say there isn't 
room for suggestions that might lead to improvement here, I don't know, but it 
doesn't seem at all disqualifying to me.

 

Unrelatedly, [~bdeggleston], is there a reason you didn't go the whole hog and 
just get rid of the {{ByteBuffer}} versions of everything?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-10 Thread Robert Stupp (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175291#comment-17175291
 ] 

Robert Stupp commented on CASSANDRA-15393:
--

This one compiles without errors just by omitting the generic type information:
{code}
ValueAccessor acc = ByteArrayAccessor.instance;
CounterContext.headerLength(ByteBuffer.allocate(10), acc);
{code}
^ This is a verbose example e.g. of the method signatures 
{{o.a.c.db.rows.RangeTombstoneMarker#openBound}} + {{closeBound}} / 
{{o.a.c.db.Clusterable#clustering}} - and actual production code like the above 
(e.g. in {{o.a.c.db.MutableDeletionInfo.Builder#add}}). It seems that the 
existing production code base including the unchanged parts with all call sites 
need a full re-review to validate that this change is safe.

Another source of errors is that positions & offsets are manually handled now. 
For example 
[here|https://github.com/apache/cassandra/pull/712/files#diff-a9c2759ec4c4663dd4a24d5e0f89358fR225]
 or 
[here|https://github.com/apache/cassandra/pull/712/files#diff-1e045cbe2fa513c1d2e7ac63c1b764ccR58].
 Just think about merging a change from the 3.x line into trunk but not adding 
the necessary position/offsets arithmetics.

As I said, there's definitive value in reducing heap pressure in critical hot 
paths. But my concern that the approach is error prone due to the issues 
mentioned here still stands.
It might have been less of an issue, if the whole code base would have proper 
unit testing for all production code paths with all potential inputs or even 
fuzzy testing, but that's unfortunately not the case.

The risks are:
* Mis-computing offsets+positions can lead to potentially unnoticed data 
corruption, overflows, etc that initially go unnoticed
* Spurious new runtime errors in rare situations (barely tested ones) leading 
to ClassCastExceptions

I'm sorry, if my objections sound harsh. But my point is that it's better to 
fix the whole heap-pressure-nightmare with (de)serialization 
(reads/writes/re-serializations/compaction/etc) in the next major release.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-10 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175221#comment-17175221
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

In terms of the higher level question of whether a change like this belongs in 
4.0, I'm generally of the opinion that it does, given that it benefits both 
compaction and local reads. There are things to clean up, but once that 
happens, it really should be pretty hard to make the kind of errors mentioned 
[above|https://issues.apache.org/jira/browse/CASSANDRA-15393?focusedCommentId=17173009&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17173009].
 If I were arguing against including this in 4.0, I would probably point to the 
sheer size of the patch and the amount of typing information it adds to 
existing classes, although the new type information on classes and method 
signatures means the number of lines in the diff is kind of misleading as a 
measure of how risky the change is.

Unless we think the patch will both hamper future efforts to avoid 
materializing objects on heap in the first place during compaction, I don't see 
a strong reason not to move forward.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-10 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175217#comment-17175217
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

While we continue to hash out the more strategic discussion, here are the notes 
from my first pass at review:

(Note: These comments are based on things as they stand in [my 
rebase|https://github.com/apache/cassandra/pull/712].)

*Naming Ideas*

- {{ClusteringPrefix#getBuffer(int)}} -> {{bufferAt(int)}}
- {{ClusteringPrefix#getString(int)}} -> {{stringAt(int)}}
- {{ValueAccessor#toSafeBuffer()}} -> {{toMutableBuffer()}}

*Typing*

- It looks safe to throw {{@SafeVarargs}} on {{CompositeType#build()}}.
- {{TypeSerializer#validate()}} and {{AbstractType#writeValue()}} hide the 
class-level type parameter.
- Looks like {{AbstractType#decompose()}} is never used without 
{{ByteBufferAccessor}}? We could probably remove the type parameter and just 
make the {{ByteBuffer}} binding explicit.
- {{TypeSerializer#toCQLLiteral()}} is only used w/ {{ByteBuffer}}, so it 
doesn't look like it needs to be parameterized.
- {{ModificationStatement}} looks like it's dealing exclusively with 
{{ByteBuffer}}. Should the type parameters reflect that?
- Trying to propagate more typing information from 
{{ClusteringBoundOrBoundary.Serializer}} upward to its users for {{Slices}} and 
{{UnfilteredSerializer}} might help clarify some things. (i.e. Literally return 
{{ClusteringBoundOrBoundary}} from {{deserializeValues()}}. We could 
create an array-based version of TOP and BOTTOM.)
- The {{MultiCBuilder}} sub-classes seem to be used only w/ {{ByteBuffer}}. 
Make sure it's types are explicit about that?
- Assuming we don't find some creative ways to reduce the number of places 
where we need type 
  parameters, the following classes also need to be rounded out to avoid 
compile-time warnings:
-- {{BufferClusteringBoundOrBoundary}}, {{ClusteringPrefix}}, 
{{ClusteringBound}}, {{ClusteringBoundOrBoundary}}
-- {{Row}} (on Cell return types), {{Cell}}, {{Array}} / {{BufferCell}}, 
{{ColumnMetadata}}, {{BTreeRow}}, {{ComplexColumnData}}, {{Slice}}
-- {{CounterContext}} (ex. {{total()}})
-- {{RegularColumnIndex}}, {{IndexEntry}}, {{CollectionKeyIndex}}, 
{{CollectionKeyIndexBase}}, {{CassandraIndex}}
-- {{AbstractAllocator}}
-- {{CBuilder}} (and {{ArrayBackedBuilder}}...also {{ArrayBackedBuilder}} only 
builds {{BufferClustering}}? If so, that should flow up in 
   the type information to {{RegularColumnIndex}}, et al.)

*Safety Concerns*

- {{AbstractArrayClusteringPrefix}} references it's own sub-class in a static 
context...we might need to move the empty size to {{ArrayClustering}}?

*Code Organization*

- Would be nice to avoid the switch in {{ClusteringBoundary}} and 
{{ClusteringBound}}. If we know this only works with the {{ByteBuffer}} 
version, can we just cleanly separate the static factories from the interfaces?
- We might be able to factor our some common elements of {{ArrayCell}} and 
{{BufferCell}}.
- We might benefit in terms of usability/developer ergonomics if we push some 
capabilities of the accessor into {{Cell}}. (ex. methods like {{getLong()}}). 
Similar thing going on with {{ClusteringPrefix}}, where we could perhaps stop 
making calls like {{builder.add(prefix.get(i), prefix.accessor())}} and use 
{{bufferAt(i)}} in CP itself. I suppose {{ArrayBackedBuilder}} might need to 
support something like {{add(ClusteringPrefix, int)}} if we still want to do 
that lazily, after the {{isDone()}} check.
- {{AbstractType#writeValue()}} could be implemented in {{Cell}}, given the 
latter knows both its value and accessor already, and {{ColumnSpecification}} 
already knows the column type?

*TODOs and FIXMEs*

- Would {{NativeCell}} not having a specialized accessor be any worse than the 
existing codebase? (If not, I'd be okay with deferring it to another Jira...?)
- It looks like there are a few places where you wanted to avoid 
{{ClusteringPrefix#getBufferArray()}}. (ex. {{RangeTombstoneMarker.Merger}}). 
I'm not sure how many of those usages actually suffer from the wrapping 
overhead though. 
- {{validateCollectionMember}}, the logger, and {{componentsCount}} are unused 
in {{AbstractType}}.
- {{deserializeValuesWithoutSize()}} unused in {{ClusteringPrefix}} (and 
{{version}} is unused in the other overload?)
- Most of the static factories in {{ArrayClusteringBoundOrBoundary}} and 
{{ArrayClusteringBound}} are unused.
- {{UNSET_BYTE_ARRAY}}, {{getChar()}}, {{putBoolean()}}, {{putChar()}}, 
{{writeWithLength()}} are unused in {{ByteArrayUtil}}. What if we just fold 
these methods into {{ByteArrayAccessor}}.
- FIXME in {{AbstractClusteringPrefix}} ...modify {{Digest}} to take a 
value/accessor pair?
- Some items commented out near the top of {{BufferClusteringBound}}.
- TypeSerializer#serialize() and serializeBuffer() don't both need to exist?
- {{Array

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-10 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175119#comment-17175119
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

bq. The split of ByteBuffer into a pair of accessor + container is error prone 
- e.g. accidentally passing a byte[] with the ByteBufferAccessor or vice versa 
leads to "interesting" results.

Could you elaborate on how this would happen [~snazy]? I've structured the 
classes to avoid this problem specifically. The value accessor's type parameter 
needs to be the same as the value's type, mixing them up should result in a 
compile error. Is there somewhere this doesn't work as intended?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-07 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173037#comment-17173037
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


{quote}But introducing an API that's error prone is dangerous. But introducing 
an API that's error prone is dangerous.
{quote}
This particular issue can be somewhat resolved by using generic parameters for 
methods that accept the object and accessor, including the constructor of any 
object holding them, e.g.
{code:java}
public interface Accessor {}

public class SomethingWithData
{
final Object data;
final Accessor accessor;
public  SomethingWithData(D data, Accessor accessor) { 
this.data = data; this.accessor = accessor; }
} {code}
This avoids polluting the codebase with type parameters, but ensures the first 
supplier of the Object+Accessor is safe, and later supplies are safe if they 
only propagate pairs produced in this way.

We could also probably generify the objects themselves, I don't think it would 
be too painful - but this at least is an easy and less invasive improvement.

 
{quote}The underlying issue IMO is not {{ByteBuffer}} itself but the vast 
amount of those intermediate and often tiny BB instances.
{quote}
I don't have a strong opinion on the overall structural impact of this change; 
I haven't thought through the various options.  However I don't think we 
produce all that many intermediate {{ByteBuffer}} specifically, and though we 
do produce a lot of intermediate objects in many cases, there are workloads 
where this isn't the issue, and object creation is still a problem.

When it comes to compaction, my preferred approach would be to avoid 
materialising on heap at all - but the change here helps both compaction and 
reads/writes, so it doesn't have to be either/or, and we should definitely be 
trying to reduce our heap footprint in general.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-07 Thread Robert Stupp (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173009#comment-17173009
 ] 

Robert Stupp commented on CASSANDRA-15393:
--

Thanks for tackling this! Heap pressure induced by {{ByteBuffer}} is definitely 
a pita. I'm not sure though whether this patch is suitable to be included in 
4.0 this late (beta stage) as it looks quite intrusive and is pretty big.
 Haven't thoroughly reviewed it, but here are a few notes:
 * The new *Accessor classes have constants for boolean (true/false) 
representations, which are modifiable. IIRC there's been an issue in the past 
when that (or a similar) constant was accidentally changed.
 * The split of {{ByteBuffer}} into a pair of {{accessor + container}} is error 
prone - e.g. accidentally passing a {{byte[]}} with the {{ByteBufferAccessor}} 
or vice versa leads to "interesting" results.

The underlying issue IMO is not {{ByteBuffer}} itself but the vast amount of 
those intermediate and often tiny BB instances. But introducing an API that's 
error prone is dangerous.

It feels both more efficient and much safer to have an API & implementation 
that actually _prevents_ (the vast majority of) these new objects. I.e. instead 
of creating a new {{ByteBuffer}} (or "just" a new {{byte[]}} as in the PR) to 
write some value (e.g. an {{int}}) to a target buffer, write that value 
_directly_ into the target buffer. _How_ that's implemented is a different 
topic, but I'd prefer to consider all the new developments in Java and make it 
safe & transparent to use.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-06 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172676#comment-17172676
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

There is a fair number of unit test 
[failures|https://app.circleci.com/pipelines/github/maedhroz/cassandra/94/workflows/9a0adbdc-56ae-48fd-91a7-1104afcba560/jobs/463],
 although they seem to mostly predate the rebase.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-06 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172612#comment-17172612
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

[~bdeggleston] I made a pass at a rebase, given the changes made in trunk 
around CASSANDRA-15400, CASSANDRA-15461, and CASSANDRA-14365 since we last 
touched this. You can find it 
[here|https://github.com/apache/cassandra/pull/712] (and a link to a CircleCI 
run is in the comments).

Notes:

- {{CompactionAllocationTest}} already exists on trunk, so I more or less took 
its modifications during conflict resolution.
- I've 
[preserved|https://github.com/apache/cassandra/pull/712/files#diff-e65bcd2d398327679a75a9da82473635R45]
 the FIXME in the digest update.
- There are numerous places where I've added proper type/generics information 
to method signatures. It probably inflated the diff a bit.
- I think I've implemented {{ClusteringPrefix#minimize()}} reasonably across 
the native, array-backed, and bb-backed classes, but that's an area for us to 
review.

With all of that cleaned up, I'll start on actually reviewing the patch :)

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171803#comment-17171803
 ] 

Caleb Rackliffe commented on CASSANDRA-15393:
-

[~marcuse] [~samt] I'm going to make a pass at review here if you two don't 
mind...

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-12-02 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986441#comment-16986441
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

[~benedict], I’d like your opinion on this. Instead of adding a generic 
{{Value}} wrapper, I tried using the same opaque value / value accessor pattern 
I’d used on CASSANDRA-15390 to generalize serialization logic. Code is 
[here|https://github.com/bdeggleston/cassandra/tree/15393-accessor].

Decoupling the serialization logic from the memory abstraction like this makes 
it possible to realize all the allocation improvements of using byte arrays for 
cell values without locking us into byte arrays or adding overhead of a wrapper 
object (which incurs a ~6% allocation penalty with these tests). We also avoid 
worrying about megamorphic call sites.

As a bonus, this pattern means we can convert clustering prefix to use byte 
arrays as well without a lot of extra work, which saves an additional 5% of 
allocations, on top of the existing ~18% from cells.

So that’s the good news, the bad news is that it touches a lot of serialization 
code, and is just a larger patch in general than simply converting cells to use 
byte arrays (although I’m not sure how well that would have worked with the 
slab allocator in practice). I still need to chase down a handful of failing 
tests and there are some rough edges around how value/accessors are handled via 
generic wildcards that need to be worked out.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-11-04 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967116#comment-16967116
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

Here's a rough initial pass at converting cells from buffers to arrays: 
[https://github.com/bdeggleston/cassandra/tree/15387-arrays]

There are {{FIXME}}s everywhere and I still need to clean up some ideas that 
didn't pan out and take another look at cell cloning, but +902/-402 is smaller 
than I was expecting.

I've also started playing around with converting from buffer to a {{Value}} 
type that can be backed by ByteBuffer, byte[], or native memory. It's nicer to 
look at, but is also a much deeper rabbit hole. There is also code in a lot of 
places that relies on ByteBuffers having mutable position that would need to 
reworked and verified.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964309#comment-16964309
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

{quote}Yes, remove it entirely from the API (and perhaps more widely from the 
codebase, although this could be done incrementally)
{quote}
That would probably be the most straightforward approach for trunk. There may 
be some sharp edges outside of the local read path but it's worth looking into. 
I've been working on a garbage free reader that relies on having some wrapper 
over the raw bytes, but that would be a post-4.0 project and we could go back 
to byte buffers if/when it materializes.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964284#comment-16964284
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


Thanks - I now see that it's right there in the spreadsheet if I had looked at 
the calculation

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964279#comment-16964279
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

The row for this patch measures the delta between 15393 and 15391 against the 
baseline established in 15388

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964278#comment-16964278
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


Yes, remove it entirely from the API (and perhaps more widely from the 
codebase, although this could be done incrementally)

> The linked spreadsheet has 3 tabs

Thanks, that's what I was missing.  How is this being measured, given that this 
patch builds upon other patches that reduce allocations?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964267#comment-16964267
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

> Would the local-read benefits be better served by removing ByteBuffer 
> entirely?

Could you clarify that? Do you mean remove ByteBuffer from the cell interface 
entirely and using byte arrays for everything?

The linked spreadsheet has 3 tabs, the "Tiny" tab you're probably looking at is 
just measuring the {{(k int primary key, v int)}} data model. The medium and 
wide tabs are workloads represent more common workloads.


> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964251#comment-16964251
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


Would the local-read benefits (as well as many others) be better served by 
removing {{ByteBuffer}} entirely?  It's more involved, but again has no 
negative repercussions we need to mitigate, and is a legacy we need to shed 
eventually.

Could you also clarify the improvement numbers?  The 
[spreadsheet|https://docs.google.com/spreadsheets/d/1_m2qoJYkyM3-W95Ex8yxqPtE48lJcKgeaUlsvHrInmA/edit#gid=0]
 you linked suggests this only improves ~1-2%?

We could also improve the local-read path in a similar garbage-free manner, but 
I would prefer to refactor more deeply to support that safely.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964231#comment-16964231
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

No, but the local read path also benefits from most of these changes.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964228#comment-16964228
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


This shouldn't ever be used for compaction I don't think?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964227#comment-16964227
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

The only exception I'm aware of is the reverse sstable iterator.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964215#comment-16964215
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


So first we would have to ensure we materialise to the heap anything we hold 
onto with unknown lifetime (e.g. {{ColumnIndex.firstClustering}}, much like 
with flushing).  I don't think there are many such places.

We could then use the fact that an {{Unfiltered}} is immediately discarded 
after being written, to keep in memory at most 2 {{Unfiltered}} - the one we're 
maybe using, and the one that might now have been peeked at for use in the 
{{MergeIterator}}.

I don't think anything more should be needed, but we'd have to look carefully.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964202#comment-16964202
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

I had not, this group of changes was developed on 3.0 then ported up to trunk. 
That would neatly fix the concerns with this ticket and CASSANDRA-15391 though. 
Something that's not immediately clear to me (and I'm not really familiar with 
off heap memtables at all), is how we'd prevent the amount of memory used off 
heap from growing too large. Reclaiming after the parent partition has been 
closed would be too long to wait in the case of huge partitions, and the 
lifecycle of Unfiltered instances rely on the garbage collector cleaning things 
up. Any thoughts there?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963950#comment-16963950
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


Have we considered, for compaction, materialising {{NativeCell}} from disk 
instead?  These have only 16 or 24 bytes of heap overhead, depending on the JVM 
settings, and avoid materialising any of the payload.  We would need to 
parameterise our {{Serializer}}, and for compaction associate it with a 
{{NativeAllocator}} for producing {{NativeCell}}, and reclaim this after each 
partition (or each row of a partition).  Even the {{NativeCell}} themselves 
could be reused in this scheme, giving zero `Cell` related garbage.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-10-31 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963872#comment-16963872
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


We have to be careful here, because this could have serious negative 
consequences for users employing off heap memtables, since all Cell operations 
will become megamorphic.  I don't think we can introduce this change without 
fairly extensive performance testing.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org