subject:"\[jira\] \[Comment Edited\] \(CASSANDRA\-6936\) Make all byte representations of types comparable by their unsigned byte representation only"

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-30 Thread Ivan Senic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560942#comment-17560942
 ] 

Ivan Senic edited comment on CASSANDRA-6936 at 6/30/22 9:12 AM:


Do I understand correct that this will be first available in the `4.2` release 
that is scheduled to go out in a year?


was (Author: JIRAUSER281556):
Do I understand good that this will be first available in the `4.2` release 
that is scheduled to go out in a year?

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.2
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-03-23 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375771#comment-14375771
]

Benedict edited comment on CASSANDRA-6936 at 3/23/15 11:43 AM:
---

So, the more often I think of future storage changes, the more this becomes a
pain and a headache. I would like to reassess the possibility of making
everything byte-order comparable. How widely deployed are custom AbstractType
implementations where the comparator makes a difference? Because it seems
dropping support for just this (and having the user define an ASC/DESC order on
the fields for maps/sets/tables within a UDT instead, for instance) would give
us the ability to deliver it universally.

As far as I am aware, we're the only database that hamstrings ourselves with
this limitation (or permittance). I would like to byte-prefix compress our
index file (because as standard it takes up a significant proportion of the
data it indexes unnecessarily, inflating the number of disk accesses and
reducing the effective capacity of the key cache), but this isn't possible
without a majority of fields supporting this. Even then, if we have special
casing for those that do not, this is a headache and code complexity. It also
pollutes the icache and branch predictors (not just with the inflation of
variances, but in the logic to select between them). This is not to be
understated: it's surprising how many icache misses you can get on a simple
in-memory stress workload, which is underrepresentative of the variation for a
normal deployment. vtune rates our utilisation of chips pretty poorly, and this
is a major contributor. The same is true for optimising merges (we get
significantly better algorithmic complexity with much fewer changes if the
comparable fields are byte-prefix comparable), and for compressing clustering
columns in data files on disk. I am certain I will encounter more scenarios
before long.

I think the cumulative performance wins here would be really _very_
significant, for all workloads (compaction, disk reads and in-memory reads all
have significant wins from this change).

CASSANDRA-8099, CASSANDRA-8731, CASSANDRA-8906 and CASSANDRA-8915 all help, but
none will help as significantly - and each adds its own complexity, whereas
this would _simplify_, which I think is important (for us as well as the CPU)

was (Author: benedict):
So, the more often I think of future storage changes, the more this becomes a
pain and a headache. I would like to reassess the possibility of making
everything byte-order comparable. How widely deployed are custom AbstractType
implementations where the comparator makes a difference? Because it seems
dropping support for just this (and having the user define an ASC/DESC order on
the fields for maps/sets/tables within a UDT instead, for instance) would give
us the ability to deliver it universally.

I think the cumulative performance wins here would be really _very_
significant, for all workloads (compaction, disk reads and in-memory reads all
have significant wins from this change).

Make all byte representations of types comparable by their unsigned byte
representation only

Key: CASSANDRA-6936
URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Labels: performance
Fix For: 3.0

This could be a painful change, but is necessary for

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Aleksey Yeschenko (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305367#comment-14305367
]

Aleksey Yeschenko edited comment on CASSANDRA-6936 at 2/4/15 4:03 PM:
--

Additionally, I wouldn't want to layer extra conversion logic on top of the
already happening CASSANDRA-8099. We will have bugs there (in back and forth
conversion of mutations and read commands). We are still catching bugs of this
kind from CASSANDRA-3237. You don't want to make things worse by having this on
top, in a single release.

was (Author: iamaleksey):
Additionally, I wouldn't want to layer extra conversion logic on top of the
already happening CASSANDRA-8099. We will have bugs there (in back and forth
convertion of mutations and read commands). We are still catching bugs of this
kind from CASSANDRA-3237. You don't want to make things worth by having this on
top, in a single release.

Make all byte representations of types comparable by their unsigned byte
representation only

Key: CASSANDRA-6936
URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Labels: performance
Fix For: 3.0

This could be a painful change, but is necessary for implementing a
trie-based index, and settling for less would be suboptimal; it also should
make comparisons cheaper all-round, and since comparison operations are
pretty much the majority of C*'s business, this should be easily felt (see
CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with
major performance impacts). No copying/special casing/slicing should mean
fewer opportunities to introduce performance regressions as well.
Since I have slated for 3.0 a lot of non-backwards-compatible sstable
changes, hopefully this shouldn't be too much more of a burden.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305181#comment-14305181
]

Benedict edited comment on CASSANDRA-6936 at 2/4/15 2:49 PM:
-

bq. Maybe a time will come where comparisons are our main bottleneck but we're
not there atm and future storage changes will probably impact this as well.

We are there already. Speak to [~jblangs...@datastax.com] and [~jshook] for
instance, who've each been working with users seeing CPU costs of comparison
bottleneck performance. One of these customers is seeing a blistering 4MB/s of
compaction throughput with their CPUs maxed out. The other had to stop using
collections entirely. Comparisons are pretty much the main time sink for c*
when working with clustering columns, and especially collections.

The big problem fields are int, bigint and timestamp. All of these are very
commonly used, and trivial to make byte-order comparable. The optimisations
made a little while back had a significant impact on CPU cost of merges, and
they all depend on byte-order comaprability of every clustering column on the
table. For such small fields the cost of the virtual invocation is a
significant percentage of the time spent since the data will generally be in
cache, having just been read off disk. We can avoid multiple such virtual
invocations if all of the fields are byte-order comparable. It also improves
instruction cache occupancy for these common methods, since they all go through
the same codepath (at the time of making those optimisations, instruction cache
misses were actually a significant problem, and likely worse on a live server
with a more varied workload).

Future storage changes largely depend on it too for delivering the best
performance, as the binary trie is likely to be the most significant win.
Further CASSANDRA-8731 can perhaps exploit the nature of these fields to reduce
costs of merging even further.

That all said, CASSANDRA-8731 may well help get some of the way there by
itself, depending on how things pan out.

was (Author: benedict):
bq. Maybe a time will come where comparisons are our main bottleneck but we're
not there atm and future storage changes will probably impact this as well.

We are there already. Speak to [~jblangs...@datastax.com], for instance, who's
been working with two users recently seeing CPU costs of comparison bottleneck
performance. One of these customers is seeing a blistering 4MB/s of compaction
throughput with their CPUs maxed out. Comparisons are pretty much the main time
sink for c* when working with clustering columns, and especially collections.

Make all byte representations of types comparable by their unsigned byte
representation only

Key: CASSANDRA-6936
URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Labels: performance
Fix For: 3.0

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

[jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

4 matches

Site Navigation

Mail list logo

Footer information