[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-07-01 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611103#comment-14611103
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Rebased against trunk. Left out the OHC changes which I will do separately.

https://github.com/apache/cassandra/commit/6adb9e3ca49007b1be7e3eda2173281f3e3bfc8e
https://github.com/apache/cassandra/commit/5e9a7e4fa93a8452ccbec63cef918c8864c274d3


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599680#comment-14599680
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

If the current state works for you, I'll release that and do the refactor later.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599675#comment-14599675
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Should we plan on that refactor for 3.0, or should we take what you have (which 
works for me) and refactor OHC later? I just want to nail down what will go 
into 3.0.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599672#comment-14599672
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

Then I'd like to refactor OHC's public API to make ByteBuffers first-class 
citizen and plumb methods using streams on that. Otherwise ("just" extending 
the current API) would make it look clumsy. But that's stuff for a different 
ticket.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599549#comment-14599549
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. I won't object to getting to fix them 

If they're existing failures, don't worry. It just looked like they were 
specific to the branches including -madness, as none of the other branches of 
yours had so many failures, but both branches with -madness did.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599534#comment-14599534
 ] 

Benedict commented on CASSANDRA-9499:
-

Obviously this is out-of-scope, but the only real problem a stream solves is 
ensuring there is always enough room. That and being easily passed to JDK 
tools, and supporting functionality like vint encoding. However we could 
certainly explore a "lightweight stream" API that exposes the BB and just has 
ensureRemaining() exposed. vint coding can be done via static method calls. 
*if* it's workable, I would be strongly in favour of this, as right now the 
method invocation costs for writing/reading streams are really significant. It 
isn't a small undertaking, though. But nor is it that huge. 

It is definitely something we should explore, to see how viable it is. In a 
follow-up ticket, of course :)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599512#comment-14599512
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

That works for me.

Thinking aloud a lot of the pain we face is from using streams which pretty 
much inevitably are wrappers around ByteBuffers instead of having an ecosystem 
around ByteBuffers. I think throughout C* we end up with the worst of both 
worlds. OHC doesn't need to do that so it could define it's key and value 
serializers to accept ByteBuffers?

You can go from a ByteBuffer to a stream pretty easily, but not the other way 
around.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-24 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599150#comment-14599150
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

[~aweisberg] OHC now supports putIfAbsentDirect and addOrReplaceDirect. It's 
not released yet - can you take a look at the [relevant 
commit|https://github.com/snazy/ohc/commit/e18330130a859735d105d112d57ab718aca18697]
 in {{develop}} branch whether this works for you?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598622#comment-14598622
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

As far as I can tell the dtests are doing a similar thing on trunk 
http://cassci.datastax.com/view/trunk/job/trunk_dtest/ I won't object to 
getting to fix them (it's where I started anyways). It's probably going to be a 
while though.





> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598589#comment-14598589
 ] 

Benedict commented on CASSANDRA-9499:
-

Looks like there _may_ be some dtest failures introduced by the -madness branch?

First commit LGTM, and at a high level the madness branch looks OK too. The OHC 
stuff can wait to improve until we have a replace implementation, since that's 
where the real serialization work happens with us.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598357#comment-14598357
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I pushed a squashed version of both branches. C-9499 passed tests. 
C-9499-madness needed an additional commit/bug fix for the row cache and the 
tests still need to run.

https://github.com/apache/cassandra/commit/22af3c73adf14d42c60b596b0b275d60198dd097
https://github.com/apache/cassandra/commit/1dcf4066f31265eecdc3919a875272436dfea05a

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598108#comment-14598108
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

If your busy let me know and I can do it.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598102#comment-14598102
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

I'll look at OHC regarding new direct replace + putIfAbsent methods and check 
test coverage soon.
See https://github.com/snazy/ohc/issues/11 + 
https://github.com/snazy/ohc/issues/12

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597939#comment-14597939
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Unless we store the buffer in a container with a nullable field it will entail 
hitting the thread-local twice. Maybe if assertions are enabled?

Regarding the OHC changes. I can make get and put direct, but I can't make 
replace, or putIfAbsent direct since OHC doesn't have methods for that. The 
value serializer will always contain the adapters because one has to be 
provided for construction, but they won't be used if the direct access method 
is used. I can create a separate ticket to add those to OHC.

Is fixing up the rest of the access methods for OHC something I should do 
immediately?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597877#comment-14597877
 ] 

Aleksey Yeschenko commented on CASSANDRA-9499:
--

bq, Also are we doing this as two or one commits?

Go with two.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597874#comment-14597874
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. That doesn't make the tempBuffer in DataOutputPlus any safer. If a 
DataOutputPlus ends up wrapping another DataOutputPlus it could go poorly.

Well, that's kind of out-of-scope for this ticket since it's already there as a 
risk, and this has already scope-creeped. But yes, I agree - making that a 
Queue/List of temporary buffers *could* be sensible (or just nulling each time 
we extract, and failing if we double-use).

bq. Also are we doing this as two or one commits?

I've no opinion on that...

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597846#comment-14597846
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

That doesn't make the tempBuffer in DataOutputPlus any safer. If a 
DataOutputPlus ends up wrapping another DataOutputPlus it could go poorly.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597838#comment-14597838
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. You can get github (and git can probably do this as well)...

When I'm committing on behalf of somebody else (and I thought we were nits 
away), I expect a fully-formed commit with CHANGES.txt wired up, commit message 
provided, etc.

bq. Once for varint and once when wrapping a bytebuffer

VIntCoding wouldn't really be able to access the buffer internal to DOSP, so my 
expectation was (since we know how big this buffer needs to be, which is not 
very big at all) that we would introduce a {{tempBuffer}} in VIntCoding, and 
introduce an {{\@Inline}} annotated static method that takes the size and 
encodes into the {{tempBuffer}}, returning the byte array which callers must 
discard immediately after using. This is guaranteed not to be used more than 
once simultaneously, and can be invoked from both locations.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597794#comment-14597794
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I'm confused. Are we doing this as one commit or two? Previously I thought that 
adding varint encoding was going to be separate from removing Encoded streams.

You can get github (and git can probably do this as well) to provide you with a 
squashed commit without destroying the commit history on the development 
branch. 
https://github.com/apache/cassandra/compare/trunk...aweisberg:C-9499.diff

Using the temporary buffer isn't safe. You can construct scenarios where the 
buffer is retrieved and used twice. Once for varint and once when wrapping a 
bytebuffer at which point ensureRemaining() could clobber it. It's a dubious 
pattern in the first place. The only way to make it safe IMO is to have a list 
of buffers and return them when done.

I could fix it by calling ensure remaining before doing the encoding in BDOSS 
but that still leaves me thinking it's data corruption waiting to happen.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597557#comment-14597557
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

It's a minor detail, but since TypeSizes only has {{NATIVE}} now, it would felt 
cleaner to me to actually ditch the instance and make all methods static, which 
would save from passing the same object to every {{serializedSize}} method. 
Note that I don't mind if you prefer doing that as a separate ticket, but since 
it's a pretty simple and automatic change and a consequence of this, figure 
it's as simple to get that done here. Let's call it a suggestion :)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597477#comment-14597477
 ] 

Benedict commented on CASSANDRA-9499:
-

Thanks [~snazy]

[~aweisberg]: fancy tweaking the modified code to use that instead of the new 
Forwarding classes? 

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597470#comment-14597470
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

Heh - it's already in OHC. Just use the {{OHCache.getDirect(key).buffer()}} 
resp. {{OHCache.putDirect(key,valueLen).buffer()}} methods.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597432#comment-14597432
 ] 

Benedict commented on CASSANDRA-9499:
-

[~aweisberg]: i've pushed a small further tweak 
[here|https://github.com/belliottsmith/cassandra/tree/C-9499-madness] to 
simplify the readMinimum logic, and reduce the number of operations (and size 
of code) on the normal path.

I'm also ambivalent about not using the VInt methods in the switch for 
SerializingCache, but it's not super-important (since obviously the inner 
serialization will not until 8099, but 8099 is unlikely to touch this 
particular place)

Otherwise, LGTM. If you can squash and otherwise prepare for commit, we can get 
it into mainline.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597435#comment-14597435
 ] 

Benedict commented on CASSANDRA-9499:
-

[~snazy]: While reviewing this, I noticed that OHC provides a 
DataInput/DataOutput, which results in pretty inefficient serialization between 
the two (especially since we wrap, and do vint encoding). Could we update its 
interface to provide a ByteBuffer instead?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-23 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597441#comment-14597441
 ] 

Robert Stupp commented on CASSANDRA-9499:
-

[~benedict] should be possible (BB wasn't necessary before)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-22 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596343#comment-14596343
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I am +1 on Benedict's changes. I merged everything in and rebased C-9499 and 
C-9499-madness and both are passing.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593527#comment-14593527
 ] 

Benedict commented on CASSANDRA-9499:
-

OK, so I just pushed a change which reverses the byte order of the vint to big 
endian. This permits removing quite a few LOC, and I reckon it's now closer to 
optimal. Take a look and tell me what you think.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593500#comment-14593500
 ] 

Benedict commented on CASSANDRA-9499:
-

and, just riffing here: it looks to me like it might be better to 
{{prepareReadPrimitive(1)}} and grab our {{firstByte}} before we do anything 
else, since that permits us to short-circuit more quickly.

It would also be nice to simply generalise {{readMinimum}}, to choose when it 
throws EOF. We don't buy much from grabbing the limit inline, I don't think.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593474#comment-14593474
 ] 

Benedict commented on CASSANDRA-9499:
-

Looking good. I've pushed some suggestions to the bit fiddling, for both 
clarity and performance 
[here|https://github.com/belliottsmith/cassandra/tree/C-9499-suggestions]. 
Including efficient size computation, some extraneous operation removal, and 
what seemed to me clearer names to methods (please feel free to dispute). I had 
to strip out the debugging statements to figure out what was going on, hope 
that isn't a problem for you.

It's still more computationally involved than I would _like_, and I feel there 
may be some more efficiencies. So I will mull it over a little more, but wanted 
to upload what I had so we didn't tread too much on each other.

Some further suggestions I did not implement: 

* I'm not sure why we need the try/finally block, and I'd prefer we removed it 
- there's nothing that should be able to throw an exception between our setting 
and unsetting the limit.
* The extra reading in NIODataInputStream, I would prefer we extract to a 
separate method, and force it to not be inlined. Since it should be a rare 
occurrence that we need to do this, and we incur more expensive costs than this 
inner method invocation when this does happen, I would prefer to keep the rest 
of the method as tightly packed for icache behaviour as possible

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592307#comment-14592307
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I have the proposed encoding implemented and have the beginnings of an 
efficient implementation for BufferedDataOutputStreamPlus and NIOInputStream. 
There is probably still branches or that could be removed as well as making the 
bit fiddling more efficient and clearer. I am going to try and clean it up, but 
I could use feedback.

I couldn't come up with an efficient implementation for computeUnsignedVIntSize 
(when you haven't encoded it yet).
 
I also created the branch 
https://github.com/aweisberg/cassandra/commits/C-9499-madness where I removed 
Encoded*Stream and changed the serializers to use DataInputPlus which extends 
DataInput to add the varint decoding methods. I haven't rebased that on top of 
C-9499 yet.

I am using 1s for the extension bits. I am also emitting the bytes in little 
endian order although it seems like I would need to do that at least for the 
first byte. I could emit the rest of the bytes in big endian order for the 
getLong(). Right now I have to reverse them because the ByteBuffer is big 
endian. l am wagering it is faster than swapping the settings.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589935#comment-14589935
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. Doesn't shifting the continuation bits to the first byte penalize the range 
of 1-byte values? 

No. You have to have a single continuation "bit" for either encoding. They're 
literally equivalent, just with their bit positions swapped around, except in 
the single byte case, in which case they are exactly identical.

Run-length encoding means you count the number of set (or unset - this would 
actually be cleaner) contiguously at the top; the first that isn't (or is) 
tells you how many more bytes you need to read. i.e. if the first bit is unset, 
you're done, and the remaining 7 bits are value. If all 8 bits are set, we need 
to read a full long.

This encoding gives us pretty ideal behaviour, of cheap operation over the 
single byte representation, followed by consistent behaviour across all the 
remaining possible values. It also lets us easily avoid wasting a whole byte 
for the final bit of a long.

This is equivalent to Aleksey's characterisation.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-17 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589920#comment-14589920
 ] 

Aleksey Yeschenko commented on CASSANDRA-9499:
--

I was thinking that we'd just implement what UTF8 does, except without 
self-synchronization semantics, which we don't really care about (so using the 
full range of the subsequent bytes, instead of using up their first 2 bits  
with the '10' marker).

They way you tell the difference between length bits and value bits in the 
first byte is by separating the length bits and value bits by a mandatory 0 bit.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589904#comment-14589904
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Doesn't shifting the continuation bits to the first byte penalize the range of 
1-byte values? Supposedly that is the most common and so having 1 or 2 byte 
values with as much range as possible is a big advantage. Comments in the 
Google code make it seem like it's all about 1-byte values and that is the 
thing to optimize for.

In practice if you only have to process one or two bytes the current 
implementation won't have to loop that many times.

I don't understand how you can use a variable number of top order bits in the 
first bytes to indicate the number of bytes. How can you tell the difference 
between length bytes and value bytes? The way the existing code that doesn't 
use length extension works (and is simple) is that it drops the remaining bits 
of the first byte. That is something we want to get away from I thought. 


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470
 ] 

Benedict commented on CASSANDRA-9499:
-

I'm confused as to why we need 10 bytes? Pretty much by definition a 
continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use 
Google's implementation. It looks pretty rubbish. 

The reason they use 10 bytes is they cannot be bothered to realise the last 
byte does not need a continuation bit. They also use a *terrible* 
implementation for deciding how long it needs to be.

Here's  a simple change which makes it 9 bytes, and easily optimised: the 
continuation bits are all shifted to the first byte, which effectively encodes 
the length in run-length encoding, i.e. the number of contiguous top order bits 
that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^ 
(byte) -1)}}

This way we read the first byte, and if there are any more to read, we read a 
long, and quickly truncate.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589450#comment-14589450
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

bq. breaking compatibility with persisted caches?

We don't persist cache contents, we only persist the keys that are cached and 
reload them on startup, so this is a non-issue.

More generally, I think we should just drop 
{{EncodedDataOutput}}/{{EncodedDataInput}} and just make sure we don't forget 
to update the serialization code from 8099 to use the new methods introduced 
here (which shouldn't take very long).

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589026#comment-14589026
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I pushed an implementation based on the protocol buffers varint scheme. If you 
have ideas for what an efficient implementation would look like let me know.

On the read side it would be faster if I did the 10-bytes with padding thing. 
Maybe then copy to a byte array to avoid pulling a byte out of ByteBuffer? 
Using a long seems tricky since it can be up to 10 bytes.

Could do the same thing on the serialization side so that we only go into 
ByteBuffer. I will have to get the size of the encoded integer to figure out 
how much space I will need first.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588924#comment-14588924
 ] 

Benedict commented on CASSANDRA-9499:
-

I'm totally cool with letting caches not survive a major version upgrade, but 
I'm also not too fussed. We could deprecate them entirely, since we will use 
writeVInt wherever helpful in the normal serialization calls that they are 
backed by.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588857#comment-14588857
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Am I changing the implementation used by EncodedData{Output | Input} as well 
and breaking compatibility with persisted caches?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588787#comment-14588787
 ] 

Benedict commented on CASSANDRA-9499:
-

I don't mind using sign extension, since that's what I assumed we used in the 
first place (and sounds like it's what Sylvain assumed we were using too). We 
could also very easily perform zigzag encoding in a method that _depends_ on 
{{writePosVInt}}, so no duplication (although we probably won't prevent the VM 
inlining it, which it will).

It also has the benefit that implementation is very succinct, so less generated 
code. Although I don't think there's a way to write a branchless read, although 
we could probably still use getLong(), and efficiently truncate. We would need 
the option for a ninth byte, though, and would need to munge the resulting long 
quite a bit, so it would just complicate things significantly, probably to 
little benefit.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588727#comment-14588727
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I don't think the method we currently use is analogous to what protocol buffers 
is doing.

https://developers.google.com/protocol-buffers/docs/encoding#varints

The first description is of a method is just using the last bit of each byte as 
a continuation bit. We are doing a length prefix and not extension.

https://github.com/google/protobuf/blob/master/java/src/main/java/com/google/protobuf/CodedOutputStream.java#L1213

The encoding on the wire is unsigned and an additional transform (zigzag) is 
applied for signed numbers.

This sounds like it is closer to what everyone is looking for. Want a bias 
towards positive numbers? We could do unsigned with a small amount of 
additional code.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588647#comment-14588647
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I got the readVInt optimization where there is no loop working.

What are we looking to accomplish with a different encoding?

* Bias towards positive numbers
* More efficient length extension (currently the first byte is dropped) for > 1 
byte values
* Simpler/faster implementation

I just bent over backwards to remove branching, loops, instructions etc. If we 
change encodings are we going to lose that ground? Is what we would get from 
the first two attributes more important?

Do we want to do just one, say more efficient length extension?


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588343#comment-14588343
 ] 

Benedict commented on CASSANDRA-9499:
-

When I said little endian, it was a typo. Typically we use big endian, so we 
want any method to assume this. I hadn't noticed that they chose to implement 
it was, effectively, little endian encoding. Which is another reason to switch: 
it's confusing to swap between different types of encoding, and breaks 
assumptions. However, if we wanted to implement this optimisation for the 
current encoding scheme, we would just modify if to a shift left instead of an 
&, followed by a Long.reverseBytes() (nb: admittedly without thinking about it 
too rigorously)

That said, it is a shame we didn't standardise on little endian, given our main 
target platform is little endian. I know Java defaults to big endian, but the 
target platform is more important. That's a much scarier change than modifying 
this, though, so let's not go there. Probably ever. 

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588326#comment-14588326
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I am getting the wrong answer for Integer.MIN_VALUE trying to mask out the long 
that has the 8-byte value. I get -2357198848.

I replaced 
{noformat}
long i = 0;
for (int idx = 0; idx < len; idx++)
{
byte b = buf.get();
i = i << 8;
i = i | (b & 0xFF);
}
{noformat}

with

{noformat}
long i = buf.getLong(buf.position());
i &= (-1L >>> (64 - (len * 8)));
{noformat}

I guess this comes down to the endianess issue. Right now NIODataInputStream 
doesn't handle little endian anything. The buffer is private and there is no 
constructor for providing one nor a method for changing the order of the 
underlying buffer. Is this a dead end, pick a different implementation 
depending on byte order?



> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588212#comment-14588212
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

Another question, how do we know all the ByteBuffers are little endian? Is that 
the case right now? I that the DataOutput/Input are big endian?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588210#comment-14588210
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. Are we going to shoot for a different encoding?

I think so, but it's still up for discussion.

bq. There is no guarantee I can get extra space in the buffer for the writer to 
perform a putLong. Some implementations will not resize the buffer.

Ah. CommitLog. Yes, that is annoying, and pretty much nothing can be done about 
it, other than padding the space we allocate for the CL record (which wouldn't 
actually be dreadful). OK, nevermind. Forget both optimisations (the read is a 
little ugly, and if we mostly expect small values, it's of dubious benefit). We 
can revisit at a later date.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588175#comment-14588175
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

There is no guarantee I can get extra space in the buffer for the writer to 
perform a putLong. Some implementations will not resize the buffer. If I ask 
for the extra space it could throw an exception.

For the reader I could write enough code to make it happen. I'll give it a shot 
now.

Are we going to shoot for a different encoding?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588026#comment-14588026
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. I was, in fact, more thinking of the simpler scheme of using one bit as a 
"continuation bit"

I'm not fixated on the exact specifics of the scheme I suggested, I was mostly 
suggesting a switch to a continutationbit*s* scheme (with these bits encoded 
upfront for simplicity), as well as being strongly in favour of one encoding 
scheme.

We could, for instance, have a single continuation bit, that is expanded to, 
say, 3 bits + sign if it's set, so that we represent [0..127] in 1 byte, 
[-2048..2047] in 2 bytes, [-512K..512K] in 3 bytes etc.; or we could have two 
initial continuation bits, permitting up to 16K in 2 bytes. I would just prefer 
we settle on what we consider to be approximately the most generally useful 
scheme, and stick with that. 

I should note I'm not saying that it definitely would not help to have a 
proliferation of methods, but IMO the _likelihood_ of payoff is too low to 
spend time on (we have lots of things we know will payoff). I think it's 
somewhat more likely to harm than help, since a majority of values to be 
encoded will likely see the same size under the general scheme as they would 
under their more specific scheme, and there are negative costs associated with 
a proliferation of method calls. I'm hoping _in general_ on the project we can 
steadily prune the number of methods a typical workloads entails visiting.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587997#comment-14587997
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

bq. the existing scheme (including a positive-only variant supporting 250 
values)

Again, I haven't really looked at the existing scheme so I really wasn't fixed 
on it.  I was, in fact, more thinking of the simpler scheme of using one bit as 
a "continuation bit", which does only support 127 values in a single byte but 
degrades to 2 bytes properly. Anyway, doesn't matter.

bq. the worse our instruction cache population becomes

I'm unfortunately not smart enough to properly estimate how that kind of effect 
would play against other considerations (like saving bytes sometimes) on 
average in the grand scheme of everything. My point was merely to suggest that 
expressing our assumptions through different API calls that would be optimized 
based on these assumption could be an idea worth exploring, but I don't want to 
push it any more than that. I also don't have strongly objections to the scheme 
you're suggesting.


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587967#comment-14587967
 ] 

Benedict commented on CASSANDRA-9499:
-

It's worth noting, of course, that the existing scheme has a very narrow useful 
band around that unlikeliness threshold, after which it jumps up to 3 bytes, 
instead of 2. So the poorly behaved users are penalised more strongly. I kind 
of expect we'll have either everyone behaving well (so 1 byte in either 
scheme), or people behaving quite badly, and jumping up to 3 bytes. But either 
way, some boundary usecases win, some lose. It's always the way.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587944#comment-14587944
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. So I'd personally prefer multiple (simple) encodings optimized to certain 
assumption (obviously, within reason)

With multiple encodings, I think it will get confusing API-wise, however I 
wasn't talking API, but implementation. It also incurs extra costs: the more 
encoding paths we take, the worse our instruction cache population becomes. 
Right now we are rarely IO constrained, but CPU constrained. I would much 
rather see our encoding schemes be as efficient CPU-wise as they can be, and 
lose the odd byte here or there.

I'm pretty sure we expect columns (in a row), and the number of partitions in a 
result, to _typically_ be < 64, so the scheme I proposed would work just fine 
for the typical case. The other two numbers are encoded once per large result, 
so amortized cost is zero. As a result, it just shifts the "unlikeliness" 
boundary for bumping up into 2 bytes, and I'm not sure we should agonise over 
that. 

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587933#comment-14587933
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

bq. with the current scheme this would mostly help values in the range of 
[128..250). I'm not sure if that's worth confusing everything for.

I honestly don't see how that's confusing at all, having some 
{{writePositiveVint}} method on top of just {{writeVint}}, so I do happen to 
think this would be worth it.

bq. we basically lose out a small amount for values in the range 64..128, and 
-256..-1. Everything else we gain.

I typically see cases where that's not ideal. For instance, when serializing 
mutation/result sets, we'll write stuff like the number of columns contained, 
the number of partions, the number of rows in each partition, ... Those will 
most often be small positive numbers, and for those, being able to pack up to 
250 in a single byte would be neat.

bq. we can bias towards positive encodings, since they're more common

We can and I'm not saying it's a bad idea per se, but for what it's worth I do 
prefer having us be explicit on our assumptions. More generally, we know when 
we use those methods what it is we write and I think in almost all case it's 
trivial to know if the number is positive or not, or even if it's very very 
likely small (< 100-200) or not. So I'd personally prefer multiple (simple) 
encodings optimized to certain assumption (obviously, within reason) over a 
single one that makes hidden assumptions that we'll forget most of the time.

But with that said, if having multiple methods sounds awful for some reason, so 
be it.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587908#comment-14587908
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. I assumed that knowing a given integer can't be negative could lead to more 
efficient encoding

Well, we could have two different encodings, but with the current scheme this 
would mostly help values in the range of [128..250). I'm not sure if that's 
worth confusing everything for.

However if we change the encoding, we can bias towards positive encodings, 
since they're more common. I'm somewhat inclined to use a hybrid extending bits 
scheme. A starting suggestion:

* first byte: 2bits of length; followed by, if any of the first bits are set, 1 
sign bit; followed by, if all length bits are set, 2 more bits of length; the 
remainder (3-6 bits) are value bits
* all remaining bytes contain value bits only

This would lead to the following encoding sizes
||value range||suggested scheme size||existing scheme size||
|0..63|1|1|
|64..8K|2|mostly 3, 64..128=1, 128..256=2|
|-8192..0|2|mostly 3, -112..0=1, -256..-112=1|
|8K..2M|3|mostly 4|

etc.

So, we basically lose out a small amount for values in the range 64..128, and 
-256..-1. Everything else we gain. If we wanted to further bias towards 
positive encoding, we could require that at least one sign bit is present for 
the signbit to be present, so that negative numbers cannot be encoded in less 
than 3 bytes.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587880#comment-14587880
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

bq. Could you give an example of where this would be helpful?

I assumed that knowing a given integer can't be negative could lead to more 
efficient encoding (either one that shoves more (positive) values in less bytes 
or one that is more efficient out of having less cases to consider). That said, 
I haven't really look at our vint encoding so if neither is true (or is 
negligible) then lets forget about it. But just to be clear, I'm not talking of 
dealing with unsigned ints/longs (hence my quotes on unsigned, it's the wrong 
word to use), just of potentially optimizing the encoding on the assumption the 
value is positive (as we have many case where we'll know it's the case).


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587843#comment-14587843
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. which means we still can change the encoding as it pleases us

Perhaps we should then. That way the size indicator can be determined as (-120 
- value), with the negative values giving [-6..0] and positive [0..6]. Then the 
{{vIntDecoding.vIntIsNegative}} method can vanish, and we can just use the 
signbit of the size, and can truncate the signbit for iteration.

bq. Maybe it could also make sense to have an "unsigned" variant for when we 
know the value written is positive.

Could you give an example of where this would be helpful? I'm guessing we'll be 
using this in places where we have either an int or long, in which case it will 
be decoded into something that is signed, and I can't see us using a long 
instead of an int, just for the extra bit (we may as well just use the whole 
long, if we're vint encoding it)?

bq. And as a nit, I think it would be convenient in practice to have methods 
for int 

Agreed. We can simply provide default methods in the DataOutputStreamPlus and 
DataInputPlus, that just wrap the casts/calls to the long variants. If there 
are no other implementations, it's quite likely they will get inlined  
(although I don't have time to hunt through hotspot to confirm if inlining 
decisions occur before or after the vtable resolution of default methods, I 
hope they did it in a way that supported it)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587818#comment-14587818
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

bq. It's a real shame we can't modify the encoding, or at least, it's not worth 
the effort.

Do double-check cause I might be forgetting something but I believe we 
currently only use vint encoding for the (serializing) row cache, which means 
we still can change the encoding as it pleases us. And if doing so can make 
things better, I think we should (it'll be much harder later when we use this 
in sstables).

Maybe it could also make sense to have an "unsigned" variant for when we know 
the value written is positive.

And as a nit, I think it would be convenient in practice to have methods for 
{{int}} (at least as far as reading is concerned, though we'll want the writing 
counterpart for symmetry) for when you deserialize something that's going into 
a {{int}} anyway (for instance an array size).

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587785#comment-14587785
 ] 

Benedict commented on CASSANDRA-9499:
-

One final nit: I'm not super keen on the assert in 
{{vIntDecoding.vIntIsNegative}}: it's possible the compiler will eliminate the 
unnecessary expressions, but

{{assert (value < -120 || (value >= -112 && value < 0)) == value < -120 }}

will fail in exactly the same situations as

{{ assert value < -120 }}

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587776#comment-14587776
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. For the writer there is no such ugliness, so we should at least implement 
this approach there: we build the long value we want, put it without moving the 
position, and then shift the position forwards by only the populated length.

It's worth pointing out this would be dangerous in the ByteOrder is flipped, so 
we should assert that the ByteOrder is LITTLE_ENDIAN _if we go with this 
approach_. Or we could remove the order() method for BufferedDataOutputPlus, 
and simply throw UnsupportedOperationException in writeVInt for 
SafeMemoryWriter (which is the only place that facility is used).

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587771#comment-14587771
 ] 

Benedict commented on CASSANDRA-9499:
-

There would be no overconsumption: we would _read_ more bytes that we needed, 
but we would not move the buffer position until after. This method would work 
for both the writer and the reader, and I don't see where the problems you 
raise occur. The only ugliness would be, in the reader, _potentially_ extending 
limit() (in the rare case we don't have any real bytes present) and resetting 
it when done, but this would be a rare branch to take (although it could be 
done unconditionally, since it's cheap), i.e. set {{limit(position + 8)}} 
before; and set {{limit(previousLimit)}} after.

For the writer there is no such ugliness, so we should at least implement this 
approach there: we build the long value we want, put it without moving the 
position, and then shift the position forwards by only the populated length.

However, on the reader, since there _would_ be some ugliness, that may also 
hinder the performance benefit (it would depend on the value being large, 
whereas we probably expect it to be small, if we're using writevInt): we _know_ 
the value is at least two bytes long. So perhaps we should just call getShort() 
before entering the loop, in the expectation/hope that we often won't require 
the loop itself. We could even unroll the 6 iterations of the loop into a 
switch statement with fall-through, since this should cap the branches at 1: 
but it's probably not important to do so, given the method costs likely still 
dominate. 

We could, optionally, also have a fast route for when length == 8, since I 
would expect that typically we willl vint encode either very small values, or 
full long values.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587256#comment-14587256
 ] 

Ariel Weisberg commented on CASSANDRA-9499:
---

I did all the changes except for trying to make readVInt faster. 

It isn't so simple which is why I didn't do it. I can't leave any padding (fake 
reads), and I can't over consume (lost data). I would end up having to make 
things right after I have figured out how many bytes I have actually consumed 
or alternatively I would have to make a copy (which could easily be faster). 
This is much simpler, but if you think its necessary I'll do it.




> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587204#comment-14587204
 ] 

Benedict commented on CASSANDRA-9499:
-

It's a real shame we can't modify the encoding, or at least, it's not worth the 
effort.

The implementation of read, in particular, could have a much clearer and more 
efficient decoding of size if the negative value positions were inverted.

While we're touching this code, it is worth cleaning up these methods a little: 
there's no point decoding the size of 1, and then deducting from it; we can 
just return immediately if firstByte}} >= MIN_BYTE_VALUE // == -112}}, and 
always decode the size to a value in the range [2..8]

{vIntIsNegative}}: it has three conditional expressions, and only one of them 
is needed, the other two are always false (since we return those negative 
values up front; and we wouldn't want to invert them anyway, I would guess)

and then {{vintDecodeSize can be made into just a ternary statement.

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587084#comment-14587084
 ] 

Benedict commented on CASSANDRA-9499:
-

Thanks.

I know Cap'n Proto and our existing code uses a loop, but why not just use {{8 
- Integer.numberOfLeadingZeros\(v\)/8}} ?

The cyclic dependency between {{EncodedDIS}} and AbstractDIS is a bit confusing 
to me. I'd rather we simply marked {{EncodedDIS @Deprecated}}, and moved all of 
the implementation details somewhere that's acyclic.

The read method we can make quite a bit more efficient, with a special version 
of {{prepareReadPrimitive}}: we want the result to be that {{length}} bytes are 
in the buffer for consumptions,but that we are also 8 bytes or more before the 
end of the buffer. This way we can just call 
{{buffer.getLong(buffer.position())}}, then advance its position by {{length}} 
and truncate the long with {{ & (-1L >>> (64 - (length * 8)) }} (where 
{{length}} here excludes the initial size byte)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-05-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564988#comment-14564988
 ] 

Benedict commented on CASSANDRA-9499:
-

bq. because we have better control over which integer we actually encode

This is essentially my goal, along with much better performance for the 
encoding/decoding  In particular CASSANDRA-9498 seems to have been caused by a 
lack of access to these methods, so having them available in our standard 
toolkit to make conscious decisions about will likely lead to better, cleaner 
code, with less risk of these (admittedly rare to encounter) bugs, with a side 
order of improved performance.

As discussed briefly on IRC, if there is a lack of time before 3.0 we can 
settle for the EncodedD(I/O)S, and just using write/readInt in CASSANDRA-9498.

I completely agree we should eliminate EncodedD(I/O)S once we deliver this.

TL;DR: Yep :)

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-05-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564854#comment-14564854
 ] 

Sylvain Lebresne commented on CASSANDRA-9499:
-

For the records, the initial intent was to actually use the existing 
{{EncodedDataInputStream/EncodedDataOutputStream}} (but I haven't had time to 
wire it in 8099 yet), and that's an alternative option.

That said, I do prefer the idea of adding explicit new methods for integer 
encoding (if only because we have better control over which integer we actually 
encode). But I believe this mean we'll also need some {{DataInputPlus}} 
interface with a {{readVInt}}. And if we do that, it'd be nice to replace the 
existing use of {{EncodedData*}} for the sake of not having multiple ways to do 
the same thing.


> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus

2015-05-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562372#comment-14562372
 ] 

Benedict commented on CASSANDRA-9499:
-

Ariel Weisberg: do you mind, and have the time to, take a look at this in 
conjunction with CASSANDRA-9500?

> Introduce writeVInt method to DataOutputStreamPlus
> --
>
> Key: CASSANDRA-9499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)