[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611103#comment-14611103 ] Ariel Weisberg commented on CASSANDRA-9499: --- Rebased against trunk. Left out the OHC changes which I will do separately. https://github.com/apache/cassandra/commit/6adb9e3ca49007b1be7e3eda2173281f3e3bfc8e https://github.com/apache/cassandra/commit/5e9a7e4fa93a8452ccbec63cef918c8864c274d3 > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599680#comment-14599680 ] Robert Stupp commented on CASSANDRA-9499: - If the current state works for you, I'll release that and do the refactor later. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599675#comment-14599675 ] Ariel Weisberg commented on CASSANDRA-9499: --- Should we plan on that refactor for 3.0, or should we take what you have (which works for me) and refactor OHC later? I just want to nail down what will go into 3.0. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599672#comment-14599672 ] Robert Stupp commented on CASSANDRA-9499: - Then I'd like to refactor OHC's public API to make ByteBuffers first-class citizen and plumb methods using streams on that. Otherwise ("just" extending the current API) would make it look clumsy. But that's stuff for a different ticket. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599549#comment-14599549 ] Benedict commented on CASSANDRA-9499: - bq. I won't object to getting to fix them If they're existing failures, don't worry. It just looked like they were specific to the branches including -madness, as none of the other branches of yours had so many failures, but both branches with -madness did. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599534#comment-14599534 ] Benedict commented on CASSANDRA-9499: - Obviously this is out-of-scope, but the only real problem a stream solves is ensuring there is always enough room. That and being easily passed to JDK tools, and supporting functionality like vint encoding. However we could certainly explore a "lightweight stream" API that exposes the BB and just has ensureRemaining() exposed. vint coding can be done via static method calls. *if* it's workable, I would be strongly in favour of this, as right now the method invocation costs for writing/reading streams are really significant. It isn't a small undertaking, though. But nor is it that huge. It is definitely something we should explore, to see how viable it is. In a follow-up ticket, of course :) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599512#comment-14599512 ] Ariel Weisberg commented on CASSANDRA-9499: --- That works for me. Thinking aloud a lot of the pain we face is from using streams which pretty much inevitably are wrappers around ByteBuffers instead of having an ecosystem around ByteBuffers. I think throughout C* we end up with the worst of both worlds. OHC doesn't need to do that so it could define it's key and value serializers to accept ByteBuffers? You can go from a ByteBuffer to a stream pretty easily, but not the other way around. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599150#comment-14599150 ] Robert Stupp commented on CASSANDRA-9499: - [~aweisberg] OHC now supports putIfAbsentDirect and addOrReplaceDirect. It's not released yet - can you take a look at the [relevant commit|https://github.com/snazy/ohc/commit/e18330130a859735d105d112d57ab718aca18697] in {{develop}} branch whether this works for you? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598622#comment-14598622 ] Ariel Weisberg commented on CASSANDRA-9499: --- As far as I can tell the dtests are doing a similar thing on trunk http://cassci.datastax.com/view/trunk/job/trunk_dtest/ I won't object to getting to fix them (it's where I started anyways). It's probably going to be a while though. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598589#comment-14598589 ] Benedict commented on CASSANDRA-9499: - Looks like there _may_ be some dtest failures introduced by the -madness branch? First commit LGTM, and at a high level the madness branch looks OK too. The OHC stuff can wait to improve until we have a replace implementation, since that's where the real serialization work happens with us. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598357#comment-14598357 ] Ariel Weisberg commented on CASSANDRA-9499: --- I pushed a squashed version of both branches. C-9499 passed tests. C-9499-madness needed an additional commit/bug fix for the row cache and the tests still need to run. https://github.com/apache/cassandra/commit/22af3c73adf14d42c60b596b0b275d60198dd097 https://github.com/apache/cassandra/commit/1dcf4066f31265eecdc3919a875272436dfea05a > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598108#comment-14598108 ] Ariel Weisberg commented on CASSANDRA-9499: --- If your busy let me know and I can do it. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598102#comment-14598102 ] Robert Stupp commented on CASSANDRA-9499: - I'll look at OHC regarding new direct replace + putIfAbsent methods and check test coverage soon. See https://github.com/snazy/ohc/issues/11 + https://github.com/snazy/ohc/issues/12 > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597939#comment-14597939 ] Ariel Weisberg commented on CASSANDRA-9499: --- Unless we store the buffer in a container with a nullable field it will entail hitting the thread-local twice. Maybe if assertions are enabled? Regarding the OHC changes. I can make get and put direct, but I can't make replace, or putIfAbsent direct since OHC doesn't have methods for that. The value serializer will always contain the adapters because one has to be provided for construction, but they won't be used if the direct access method is used. I can create a separate ticket to add those to OHC. Is fixing up the rest of the access methods for OHC something I should do immediately? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597877#comment-14597877 ] Aleksey Yeschenko commented on CASSANDRA-9499: -- bq, Also are we doing this as two or one commits? Go with two. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597874#comment-14597874 ] Benedict commented on CASSANDRA-9499: - bq. That doesn't make the tempBuffer in DataOutputPlus any safer. If a DataOutputPlus ends up wrapping another DataOutputPlus it could go poorly. Well, that's kind of out-of-scope for this ticket since it's already there as a risk, and this has already scope-creeped. But yes, I agree - making that a Queue/List of temporary buffers *could* be sensible (or just nulling each time we extract, and failing if we double-use). bq. Also are we doing this as two or one commits? I've no opinion on that... > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597846#comment-14597846 ] Ariel Weisberg commented on CASSANDRA-9499: --- That doesn't make the tempBuffer in DataOutputPlus any safer. If a DataOutputPlus ends up wrapping another DataOutputPlus it could go poorly. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597838#comment-14597838 ] Benedict commented on CASSANDRA-9499: - bq. You can get github (and git can probably do this as well)... When I'm committing on behalf of somebody else (and I thought we were nits away), I expect a fully-formed commit with CHANGES.txt wired up, commit message provided, etc. bq. Once for varint and once when wrapping a bytebuffer VIntCoding wouldn't really be able to access the buffer internal to DOSP, so my expectation was (since we know how big this buffer needs to be, which is not very big at all) that we would introduce a {{tempBuffer}} in VIntCoding, and introduce an {{\@Inline}} annotated static method that takes the size and encodes into the {{tempBuffer}}, returning the byte array which callers must discard immediately after using. This is guaranteed not to be used more than once simultaneously, and can be invoked from both locations. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597794#comment-14597794 ] Ariel Weisberg commented on CASSANDRA-9499: --- I'm confused. Are we doing this as one commit or two? Previously I thought that adding varint encoding was going to be separate from removing Encoded streams. You can get github (and git can probably do this as well) to provide you with a squashed commit without destroying the commit history on the development branch. https://github.com/apache/cassandra/compare/trunk...aweisberg:C-9499.diff Using the temporary buffer isn't safe. You can construct scenarios where the buffer is retrieved and used twice. Once for varint and once when wrapping a bytebuffer at which point ensureRemaining() could clobber it. It's a dubious pattern in the first place. The only way to make it safe IMO is to have a list of buffers and return them when done. I could fix it by calling ensure remaining before doing the encoding in BDOSS but that still leaves me thinking it's data corruption waiting to happen. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597557#comment-14597557 ] Sylvain Lebresne commented on CASSANDRA-9499: - It's a minor detail, but since TypeSizes only has {{NATIVE}} now, it would felt cleaner to me to actually ditch the instance and make all methods static, which would save from passing the same object to every {{serializedSize}} method. Note that I don't mind if you prefer doing that as a separate ticket, but since it's a pretty simple and automatic change and a consequence of this, figure it's as simple to get that done here. Let's call it a suggestion :) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597477#comment-14597477 ] Benedict commented on CASSANDRA-9499: - Thanks [~snazy] [~aweisberg]: fancy tweaking the modified code to use that instead of the new Forwarding classes? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597470#comment-14597470 ] Robert Stupp commented on CASSANDRA-9499: - Heh - it's already in OHC. Just use the {{OHCache.getDirect(key).buffer()}} resp. {{OHCache.putDirect(key,valueLen).buffer()}} methods. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597432#comment-14597432 ] Benedict commented on CASSANDRA-9499: - [~aweisberg]: i've pushed a small further tweak [here|https://github.com/belliottsmith/cassandra/tree/C-9499-madness] to simplify the readMinimum logic, and reduce the number of operations (and size of code) on the normal path. I'm also ambivalent about not using the VInt methods in the switch for SerializingCache, but it's not super-important (since obviously the inner serialization will not until 8099, but 8099 is unlikely to touch this particular place) Otherwise, LGTM. If you can squash and otherwise prepare for commit, we can get it into mainline. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597435#comment-14597435 ] Benedict commented on CASSANDRA-9499: - [~snazy]: While reviewing this, I noticed that OHC provides a DataInput/DataOutput, which results in pretty inefficient serialization between the two (especially since we wrap, and do vint encoding). Could we update its interface to provide a ByteBuffer instead? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597441#comment-14597441 ] Robert Stupp commented on CASSANDRA-9499: - [~benedict] should be possible (BB wasn't necessary before) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596343#comment-14596343 ] Ariel Weisberg commented on CASSANDRA-9499: --- I am +1 on Benedict's changes. I merged everything in and rebased C-9499 and C-9499-madness and both are passing. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593527#comment-14593527 ] Benedict commented on CASSANDRA-9499: - OK, so I just pushed a change which reverses the byte order of the vint to big endian. This permits removing quite a few LOC, and I reckon it's now closer to optimal. Take a look and tell me what you think. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593500#comment-14593500 ] Benedict commented on CASSANDRA-9499: - and, just riffing here: it looks to me like it might be better to {{prepareReadPrimitive(1)}} and grab our {{firstByte}} before we do anything else, since that permits us to short-circuit more quickly. It would also be nice to simply generalise {{readMinimum}}, to choose when it throws EOF. We don't buy much from grabbing the limit inline, I don't think. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593474#comment-14593474 ] Benedict commented on CASSANDRA-9499: - Looking good. I've pushed some suggestions to the bit fiddling, for both clarity and performance [here|https://github.com/belliottsmith/cassandra/tree/C-9499-suggestions]. Including efficient size computation, some extraneous operation removal, and what seemed to me clearer names to methods (please feel free to dispute). I had to strip out the debugging statements to figure out what was going on, hope that isn't a problem for you. It's still more computationally involved than I would _like_, and I feel there may be some more efficiencies. So I will mull it over a little more, but wanted to upload what I had so we didn't tread too much on each other. Some further suggestions I did not implement: * I'm not sure why we need the try/finally block, and I'd prefer we removed it - there's nothing that should be able to throw an exception between our setting and unsetting the limit. * The extra reading in NIODataInputStream, I would prefer we extract to a separate method, and force it to not be inlined. Since it should be a rare occurrence that we need to do this, and we incur more expensive costs than this inner method invocation when this does happen, I would prefer to keep the rest of the method as tightly packed for icache behaviour as possible > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592307#comment-14592307 ] Ariel Weisberg commented on CASSANDRA-9499: --- I have the proposed encoding implemented and have the beginnings of an efficient implementation for BufferedDataOutputStreamPlus and NIOInputStream. There is probably still branches or that could be removed as well as making the bit fiddling more efficient and clearer. I am going to try and clean it up, but I could use feedback. I couldn't come up with an efficient implementation for computeUnsignedVIntSize (when you haven't encoded it yet). I also created the branch https://github.com/aweisberg/cassandra/commits/C-9499-madness where I removed Encoded*Stream and changed the serializers to use DataInputPlus which extends DataInput to add the varint decoding methods. I haven't rebased that on top of C-9499 yet. I am using 1s for the extension bits. I am also emitting the bytes in little endian order although it seems like I would need to do that at least for the first byte. I could emit the rest of the bytes in big endian order for the getLong(). Right now I have to reverse them because the ByteBuffer is big endian. l am wagering it is faster than swapping the settings. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589935#comment-14589935 ] Benedict commented on CASSANDRA-9499: - bq. Doesn't shifting the continuation bits to the first byte penalize the range of 1-byte values? No. You have to have a single continuation "bit" for either encoding. They're literally equivalent, just with their bit positions swapped around, except in the single byte case, in which case they are exactly identical. Run-length encoding means you count the number of set (or unset - this would actually be cleaner) contiguously at the top; the first that isn't (or is) tells you how many more bytes you need to read. i.e. if the first bit is unset, you're done, and the remaining 7 bits are value. If all 8 bits are set, we need to read a full long. This encoding gives us pretty ideal behaviour, of cheap operation over the single byte representation, followed by consistent behaviour across all the remaining possible values. It also lets us easily avoid wasting a whole byte for the final bit of a long. This is equivalent to Aleksey's characterisation. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589920#comment-14589920 ] Aleksey Yeschenko commented on CASSANDRA-9499: -- I was thinking that we'd just implement what UTF8 does, except without self-synchronization semantics, which we don't really care about (so using the full range of the subsequent bytes, instead of using up their first 2 bits with the '10' marker). They way you tell the difference between length bits and value bits in the first byte is by separating the length bits and value bits by a mandatory 0 bit. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589904#comment-14589904 ] Ariel Weisberg commented on CASSANDRA-9499: --- Doesn't shifting the continuation bits to the first byte penalize the range of 1-byte values? Supposedly that is the most common and so having 1 or 2 byte values with as much range as possible is a big advantage. Comments in the Google code make it seem like it's all about 1-byte values and that is the thing to optimize for. In practice if you only have to process one or two bytes the current implementation won't have to loop that many times. I don't understand how you can use a variable number of top order bits in the first bytes to indicate the number of bytes. How can you tell the difference between length bytes and value bytes? The way the existing code that doesn't use length extension works (and is simple) is that it drops the remaining bits of the first byte. That is something we want to get away from I thought. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470 ] Benedict commented on CASSANDRA-9499: - I'm confused as to why we need 10 bytes? Pretty much by definition a continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use Google's implementation. It looks pretty rubbish. The reason they use 10 bytes is they cannot be bothered to realise the last byte does not need a continuation bit. They also use a *terrible* implementation for deciding how long it needs to be. Here's a simple change which makes it 9 bytes, and easily optimised: the continuation bits are all shifted to the first byte, which effectively encodes the length in run-length encoding, i.e. the number of contiguous top order bits that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^ (byte) -1)}} This way we read the first byte, and if there are any more to read, we read a long, and quickly truncate. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589450#comment-14589450 ] Sylvain Lebresne commented on CASSANDRA-9499: - bq. breaking compatibility with persisted caches? We don't persist cache contents, we only persist the keys that are cached and reload them on startup, so this is a non-issue. More generally, I think we should just drop {{EncodedDataOutput}}/{{EncodedDataInput}} and just make sure we don't forget to update the serialization code from 8099 to use the new methods introduced here (which shouldn't take very long). > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589026#comment-14589026 ] Ariel Weisberg commented on CASSANDRA-9499: --- I pushed an implementation based on the protocol buffers varint scheme. If you have ideas for what an efficient implementation would look like let me know. On the read side it would be faster if I did the 10-bytes with padding thing. Maybe then copy to a byte array to avoid pulling a byte out of ByteBuffer? Using a long seems tricky since it can be up to 10 bytes. Could do the same thing on the serialization side so that we only go into ByteBuffer. I will have to get the size of the encoded integer to figure out how much space I will need first. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588924#comment-14588924 ] Benedict commented on CASSANDRA-9499: - I'm totally cool with letting caches not survive a major version upgrade, but I'm also not too fussed. We could deprecate them entirely, since we will use writeVInt wherever helpful in the normal serialization calls that they are backed by. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588857#comment-14588857 ] Ariel Weisberg commented on CASSANDRA-9499: --- Am I changing the implementation used by EncodedData{Output | Input} as well and breaking compatibility with persisted caches? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588787#comment-14588787 ] Benedict commented on CASSANDRA-9499: - I don't mind using sign extension, since that's what I assumed we used in the first place (and sounds like it's what Sylvain assumed we were using too). We could also very easily perform zigzag encoding in a method that _depends_ on {{writePosVInt}}, so no duplication (although we probably won't prevent the VM inlining it, which it will). It also has the benefit that implementation is very succinct, so less generated code. Although I don't think there's a way to write a branchless read, although we could probably still use getLong(), and efficiently truncate. We would need the option for a ninth byte, though, and would need to munge the resulting long quite a bit, so it would just complicate things significantly, probably to little benefit. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588727#comment-14588727 ] Ariel Weisberg commented on CASSANDRA-9499: --- I don't think the method we currently use is analogous to what protocol buffers is doing. https://developers.google.com/protocol-buffers/docs/encoding#varints The first description is of a method is just using the last bit of each byte as a continuation bit. We are doing a length prefix and not extension. https://github.com/google/protobuf/blob/master/java/src/main/java/com/google/protobuf/CodedOutputStream.java#L1213 The encoding on the wire is unsigned and an additional transform (zigzag) is applied for signed numbers. This sounds like it is closer to what everyone is looking for. Want a bias towards positive numbers? We could do unsigned with a small amount of additional code. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588647#comment-14588647 ] Ariel Weisberg commented on CASSANDRA-9499: --- I got the readVInt optimization where there is no loop working. What are we looking to accomplish with a different encoding? * Bias towards positive numbers * More efficient length extension (currently the first byte is dropped) for > 1 byte values * Simpler/faster implementation I just bent over backwards to remove branching, loops, instructions etc. If we change encodings are we going to lose that ground? Is what we would get from the first two attributes more important? Do we want to do just one, say more efficient length extension? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588343#comment-14588343 ] Benedict commented on CASSANDRA-9499: - When I said little endian, it was a typo. Typically we use big endian, so we want any method to assume this. I hadn't noticed that they chose to implement it was, effectively, little endian encoding. Which is another reason to switch: it's confusing to swap between different types of encoding, and breaks assumptions. However, if we wanted to implement this optimisation for the current encoding scheme, we would just modify if to a shift left instead of an &, followed by a Long.reverseBytes() (nb: admittedly without thinking about it too rigorously) That said, it is a shame we didn't standardise on little endian, given our main target platform is little endian. I know Java defaults to big endian, but the target platform is more important. That's a much scarier change than modifying this, though, so let's not go there. Probably ever. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588326#comment-14588326 ] Ariel Weisberg commented on CASSANDRA-9499: --- I am getting the wrong answer for Integer.MIN_VALUE trying to mask out the long that has the 8-byte value. I get -2357198848. I replaced {noformat} long i = 0; for (int idx = 0; idx < len; idx++) { byte b = buf.get(); i = i << 8; i = i | (b & 0xFF); } {noformat} with {noformat} long i = buf.getLong(buf.position()); i &= (-1L >>> (64 - (len * 8))); {noformat} I guess this comes down to the endianess issue. Right now NIODataInputStream doesn't handle little endian anything. The buffer is private and there is no constructor for providing one nor a method for changing the order of the underlying buffer. Is this a dead end, pick a different implementation depending on byte order? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588212#comment-14588212 ] Ariel Weisberg commented on CASSANDRA-9499: --- Another question, how do we know all the ByteBuffers are little endian? Is that the case right now? I that the DataOutput/Input are big endian? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588210#comment-14588210 ] Benedict commented on CASSANDRA-9499: - bq. Are we going to shoot for a different encoding? I think so, but it's still up for discussion. bq. There is no guarantee I can get extra space in the buffer for the writer to perform a putLong. Some implementations will not resize the buffer. Ah. CommitLog. Yes, that is annoying, and pretty much nothing can be done about it, other than padding the space we allocate for the CL record (which wouldn't actually be dreadful). OK, nevermind. Forget both optimisations (the read is a little ugly, and if we mostly expect small values, it's of dubious benefit). We can revisit at a later date. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588175#comment-14588175 ] Ariel Weisberg commented on CASSANDRA-9499: --- There is no guarantee I can get extra space in the buffer for the writer to perform a putLong. Some implementations will not resize the buffer. If I ask for the extra space it could throw an exception. For the reader I could write enough code to make it happen. I'll give it a shot now. Are we going to shoot for a different encoding? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588026#comment-14588026 ] Benedict commented on CASSANDRA-9499: - bq. I was, in fact, more thinking of the simpler scheme of using one bit as a "continuation bit" I'm not fixated on the exact specifics of the scheme I suggested, I was mostly suggesting a switch to a continutationbit*s* scheme (with these bits encoded upfront for simplicity), as well as being strongly in favour of one encoding scheme. We could, for instance, have a single continuation bit, that is expanded to, say, 3 bits + sign if it's set, so that we represent [0..127] in 1 byte, [-2048..2047] in 2 bytes, [-512K..512K] in 3 bytes etc.; or we could have two initial continuation bits, permitting up to 16K in 2 bytes. I would just prefer we settle on what we consider to be approximately the most generally useful scheme, and stick with that. I should note I'm not saying that it definitely would not help to have a proliferation of methods, but IMO the _likelihood_ of payoff is too low to spend time on (we have lots of things we know will payoff). I think it's somewhat more likely to harm than help, since a majority of values to be encoded will likely see the same size under the general scheme as they would under their more specific scheme, and there are negative costs associated with a proliferation of method calls. I'm hoping _in general_ on the project we can steadily prune the number of methods a typical workloads entails visiting. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587997#comment-14587997 ] Sylvain Lebresne commented on CASSANDRA-9499: - bq. the existing scheme (including a positive-only variant supporting 250 values) Again, I haven't really looked at the existing scheme so I really wasn't fixed on it. I was, in fact, more thinking of the simpler scheme of using one bit as a "continuation bit", which does only support 127 values in a single byte but degrades to 2 bytes properly. Anyway, doesn't matter. bq. the worse our instruction cache population becomes I'm unfortunately not smart enough to properly estimate how that kind of effect would play against other considerations (like saving bytes sometimes) on average in the grand scheme of everything. My point was merely to suggest that expressing our assumptions through different API calls that would be optimized based on these assumption could be an idea worth exploring, but I don't want to push it any more than that. I also don't have strongly objections to the scheme you're suggesting. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587967#comment-14587967 ] Benedict commented on CASSANDRA-9499: - It's worth noting, of course, that the existing scheme has a very narrow useful band around that unlikeliness threshold, after which it jumps up to 3 bytes, instead of 2. So the poorly behaved users are penalised more strongly. I kind of expect we'll have either everyone behaving well (so 1 byte in either scheme), or people behaving quite badly, and jumping up to 3 bytes. But either way, some boundary usecases win, some lose. It's always the way. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587944#comment-14587944 ] Benedict commented on CASSANDRA-9499: - bq. So I'd personally prefer multiple (simple) encodings optimized to certain assumption (obviously, within reason) With multiple encodings, I think it will get confusing API-wise, however I wasn't talking API, but implementation. It also incurs extra costs: the more encoding paths we take, the worse our instruction cache population becomes. Right now we are rarely IO constrained, but CPU constrained. I would much rather see our encoding schemes be as efficient CPU-wise as they can be, and lose the odd byte here or there. I'm pretty sure we expect columns (in a row), and the number of partitions in a result, to _typically_ be < 64, so the scheme I proposed would work just fine for the typical case. The other two numbers are encoded once per large result, so amortized cost is zero. As a result, it just shifts the "unlikeliness" boundary for bumping up into 2 bytes, and I'm not sure we should agonise over that. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587933#comment-14587933 ] Sylvain Lebresne commented on CASSANDRA-9499: - bq. with the current scheme this would mostly help values in the range of [128..250). I'm not sure if that's worth confusing everything for. I honestly don't see how that's confusing at all, having some {{writePositiveVint}} method on top of just {{writeVint}}, so I do happen to think this would be worth it. bq. we basically lose out a small amount for values in the range 64..128, and -256..-1. Everything else we gain. I typically see cases where that's not ideal. For instance, when serializing mutation/result sets, we'll write stuff like the number of columns contained, the number of partions, the number of rows in each partition, ... Those will most often be small positive numbers, and for those, being able to pack up to 250 in a single byte would be neat. bq. we can bias towards positive encodings, since they're more common We can and I'm not saying it's a bad idea per se, but for what it's worth I do prefer having us be explicit on our assumptions. More generally, we know when we use those methods what it is we write and I think in almost all case it's trivial to know if the number is positive or not, or even if it's very very likely small (< 100-200) or not. So I'd personally prefer multiple (simple) encodings optimized to certain assumption (obviously, within reason) over a single one that makes hidden assumptions that we'll forget most of the time. But with that said, if having multiple methods sounds awful for some reason, so be it. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587908#comment-14587908 ] Benedict commented on CASSANDRA-9499: - bq. I assumed that knowing a given integer can't be negative could lead to more efficient encoding Well, we could have two different encodings, but with the current scheme this would mostly help values in the range of [128..250). I'm not sure if that's worth confusing everything for. However if we change the encoding, we can bias towards positive encodings, since they're more common. I'm somewhat inclined to use a hybrid extending bits scheme. A starting suggestion: * first byte: 2bits of length; followed by, if any of the first bits are set, 1 sign bit; followed by, if all length bits are set, 2 more bits of length; the remainder (3-6 bits) are value bits * all remaining bytes contain value bits only This would lead to the following encoding sizes ||value range||suggested scheme size||existing scheme size|| |0..63|1|1| |64..8K|2|mostly 3, 64..128=1, 128..256=2| |-8192..0|2|mostly 3, -112..0=1, -256..-112=1| |8K..2M|3|mostly 4| etc. So, we basically lose out a small amount for values in the range 64..128, and -256..-1. Everything else we gain. If we wanted to further bias towards positive encoding, we could require that at least one sign bit is present for the signbit to be present, so that negative numbers cannot be encoded in less than 3 bytes. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587880#comment-14587880 ] Sylvain Lebresne commented on CASSANDRA-9499: - bq. Could you give an example of where this would be helpful? I assumed that knowing a given integer can't be negative could lead to more efficient encoding (either one that shoves more (positive) values in less bytes or one that is more efficient out of having less cases to consider). That said, I haven't really look at our vint encoding so if neither is true (or is negligible) then lets forget about it. But just to be clear, I'm not talking of dealing with unsigned ints/longs (hence my quotes on unsigned, it's the wrong word to use), just of potentially optimizing the encoding on the assumption the value is positive (as we have many case where we'll know it's the case). > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587843#comment-14587843 ] Benedict commented on CASSANDRA-9499: - bq. which means we still can change the encoding as it pleases us Perhaps we should then. That way the size indicator can be determined as (-120 - value), with the negative values giving [-6..0] and positive [0..6]. Then the {{vIntDecoding.vIntIsNegative}} method can vanish, and we can just use the signbit of the size, and can truncate the signbit for iteration. bq. Maybe it could also make sense to have an "unsigned" variant for when we know the value written is positive. Could you give an example of where this would be helpful? I'm guessing we'll be using this in places where we have either an int or long, in which case it will be decoded into something that is signed, and I can't see us using a long instead of an int, just for the extra bit (we may as well just use the whole long, if we're vint encoding it)? bq. And as a nit, I think it would be convenient in practice to have methods for int Agreed. We can simply provide default methods in the DataOutputStreamPlus and DataInputPlus, that just wrap the casts/calls to the long variants. If there are no other implementations, it's quite likely they will get inlined (although I don't have time to hunt through hotspot to confirm if inlining decisions occur before or after the vtable resolution of default methods, I hope they did it in a way that supported it) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587818#comment-14587818 ] Sylvain Lebresne commented on CASSANDRA-9499: - bq. It's a real shame we can't modify the encoding, or at least, it's not worth the effort. Do double-check cause I might be forgetting something but I believe we currently only use vint encoding for the (serializing) row cache, which means we still can change the encoding as it pleases us. And if doing so can make things better, I think we should (it'll be much harder later when we use this in sstables). Maybe it could also make sense to have an "unsigned" variant for when we know the value written is positive. And as a nit, I think it would be convenient in practice to have methods for {{int}} (at least as far as reading is concerned, though we'll want the writing counterpart for symmetry) for when you deserialize something that's going into a {{int}} anyway (for instance an array size). > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587785#comment-14587785 ] Benedict commented on CASSANDRA-9499: - One final nit: I'm not super keen on the assert in {{vIntDecoding.vIntIsNegative}}: it's possible the compiler will eliminate the unnecessary expressions, but {{assert (value < -120 || (value >= -112 && value < 0)) == value < -120 }} will fail in exactly the same situations as {{ assert value < -120 }} > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587776#comment-14587776 ] Benedict commented on CASSANDRA-9499: - bq. For the writer there is no such ugliness, so we should at least implement this approach there: we build the long value we want, put it without moving the position, and then shift the position forwards by only the populated length. It's worth pointing out this would be dangerous in the ByteOrder is flipped, so we should assert that the ByteOrder is LITTLE_ENDIAN _if we go with this approach_. Or we could remove the order() method for BufferedDataOutputPlus, and simply throw UnsupportedOperationException in writeVInt for SafeMemoryWriter (which is the only place that facility is used). > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587771#comment-14587771 ] Benedict commented on CASSANDRA-9499: - There would be no overconsumption: we would _read_ more bytes that we needed, but we would not move the buffer position until after. This method would work for both the writer and the reader, and I don't see where the problems you raise occur. The only ugliness would be, in the reader, _potentially_ extending limit() (in the rare case we don't have any real bytes present) and resetting it when done, but this would be a rare branch to take (although it could be done unconditionally, since it's cheap), i.e. set {{limit(position + 8)}} before; and set {{limit(previousLimit)}} after. For the writer there is no such ugliness, so we should at least implement this approach there: we build the long value we want, put it without moving the position, and then shift the position forwards by only the populated length. However, on the reader, since there _would_ be some ugliness, that may also hinder the performance benefit (it would depend on the value being large, whereas we probably expect it to be small, if we're using writevInt): we _know_ the value is at least two bytes long. So perhaps we should just call getShort() before entering the loop, in the expectation/hope that we often won't require the loop itself. We could even unroll the 6 iterations of the loop into a switch statement with fall-through, since this should cap the branches at 1: but it's probably not important to do so, given the method costs likely still dominate. We could, optionally, also have a fast route for when length == 8, since I would expect that typically we willl vint encode either very small values, or full long values. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587256#comment-14587256 ] Ariel Weisberg commented on CASSANDRA-9499: --- I did all the changes except for trying to make readVInt faster. It isn't so simple which is why I didn't do it. I can't leave any padding (fake reads), and I can't over consume (lost data). I would end up having to make things right after I have figured out how many bytes I have actually consumed or alternatively I would have to make a copy (which could easily be faster). This is much simpler, but if you think its necessary I'll do it. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587204#comment-14587204 ] Benedict commented on CASSANDRA-9499: - It's a real shame we can't modify the encoding, or at least, it's not worth the effort. The implementation of read, in particular, could have a much clearer and more efficient decoding of size if the negative value positions were inverted. While we're touching this code, it is worth cleaning up these methods a little: there's no point decoding the size of 1, and then deducting from it; we can just return immediately if firstByte}} >= MIN_BYTE_VALUE // == -112}}, and always decode the size to a value in the range [2..8] {vIntIsNegative}}: it has three conditional expressions, and only one of them is needed, the other two are always false (since we return those negative values up front; and we wouldn't want to invert them anyway, I would guess) and then {{vintDecodeSize can be made into just a ternary statement. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587084#comment-14587084 ] Benedict commented on CASSANDRA-9499: - Thanks. I know Cap'n Proto and our existing code uses a loop, but why not just use {{8 - Integer.numberOfLeadingZeros\(v\)/8}} ? The cyclic dependency between {{EncodedDIS}} and AbstractDIS is a bit confusing to me. I'd rather we simply marked {{EncodedDIS @Deprecated}}, and moved all of the implementation details somewhere that's acyclic. The read method we can make quite a bit more efficient, with a special version of {{prepareReadPrimitive}}: we want the result to be that {{length}} bytes are in the buffer for consumptions,but that we are also 8 bytes or more before the end of the buffer. This way we can just call {{buffer.getLong(buffer.position())}}, then advance its position by {{length}} and truncate the long with {{ & (-1L >>> (64 - (length * 8)) }} (where {{length}} here excludes the initial size byte) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564988#comment-14564988 ] Benedict commented on CASSANDRA-9499: - bq. because we have better control over which integer we actually encode This is essentially my goal, along with much better performance for the encoding/decoding In particular CASSANDRA-9498 seems to have been caused by a lack of access to these methods, so having them available in our standard toolkit to make conscious decisions about will likely lead to better, cleaner code, with less risk of these (admittedly rare to encounter) bugs, with a side order of improved performance. As discussed briefly on IRC, if there is a lack of time before 3.0 we can settle for the EncodedD(I/O)S, and just using write/readInt in CASSANDRA-9498. I completely agree we should eliminate EncodedD(I/O)S once we deliver this. TL;DR: Yep :) > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564854#comment-14564854 ] Sylvain Lebresne commented on CASSANDRA-9499: - For the records, the initial intent was to actually use the existing {{EncodedDataInputStream/EncodedDataOutputStream}} (but I haven't had time to wire it in 8099 yet), and that's an alternative option. That said, I do prefer the idea of adding explicit new methods for integer encoding (if only because we have better control over which integer we actually encode). But I believe this mean we'll also need some {{DataInputPlus}} interface with a {{readVInt}}. And if we do that, it'd be nice to replace the existing use of {{EncodedData*}} for the sake of not having multiple ways to do the same thing. > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562372#comment-14562372 ] Benedict commented on CASSANDRA-9499: - Ariel Weisberg: do you mind, and have the time to, take a look at this in conjunction with CASSANDRA-9500? > Introduce writeVInt method to DataOutputStreamPlus > -- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Ariel Weisberg >Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)