[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-05-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986467#comment-13986467
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

bq.  introduced a large number of errors in the auth_test dtest (though fyi I 
still have one auth test failure on my box, for grant_revoke_cleanup_test, but 
that looks unrelated to this issue).

Yes, my bad, that was a small oversight (which was not auth related per-se so 
it might have broken a few other tests too btw). Anyway, I ninja-fixed it in 
commit 233761e.

And as you said, I did commited that but forgotten to close the issue, so doing 
it now. Thanks.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2

 Attachments: auth_test_dtest-2.1~git9872b74.node1.log, 
 auth_test_dtest-2.1~git9872b74.txt


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-05-01 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986621#comment-13986621
 ] 

Michael Shuler commented on CASSANDRA-6855:
---

There were a couple other tests that SpecificOptions.DEFAULT,3 indeed fixed. 
Thanks!

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2

 Attachments: auth_test_dtest-2.1~git9872b74.node1.log, 
 auth_test_dtest-2.1~git9872b74.txt


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright side, 
 the additions of this V3 are comparatively much more simple to implement for 
 a client that those of V2 (in fact, for clients that want to support UDT, it 
 will probably require less effort to add the changes for this new version 
 than to try to support UDT without it), so I do think we make good progress 
 on getting the protocol stabilized 



--
This message was sent by 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-30 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985792#comment-13985792
 ] 

Tyler Hobbs commented on CASSANDRA-6855:


+1

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright side, 
 the additions of this V3 are comparatively much more simple to implement for 
 a client that those of V2 (in fact, for clients that want to support UDT, it 
 will probably require less effort to add the changes for this new version 
 than to try to support UDT without it), so I do think we make good progress 
 on getting the protocol stabilized 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-29 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984564#comment-13984564
 ] 

Tyler Hobbs commented on CASSANDRA-6855:


bq. I don't think there is a point in allowing them. I did added that 
restriction to the protocol and so the code throw a ProtocolException (rather 
than an IRE); I think it makes more sense to force clients to validate on there 
side.

Sounds good.

bq. I did added some tests. I'm sure we could improve on those somewhat, and 
possibly extend to other stuffs, but honestly I think further efforts would be 
better spent (in the sense that we'd get a much better and realistic testing of 
this) adding support for the new protocol version to existing drivers and check 
that the change didn't break previous versions (and that's where I intent to 
spend my efforts).

I agree that we can get better tests using the drivers, there are just a few 
problems:
# The native protocol drivers aren't integrated into our workflow (no automated 
testing)
# The driver tests aren't specifically trying to exercise these parts of the 
Cassandra code, so they could easily miss things
# A driver has to be immediately updated (perhaps in some branch) to test the 
changes
# It's typically much slower to track down a bug with a dtest than with a unit 
test

Switching the dtests to use a native protocol driver would help to address #1 
and #2, but not #3 (which could get complicated).  In the long run, a balance 
of unit tests and functional tests is needed.  When we can easily use unit 
tests to exhaustively test something like message serialization, we should do 
so to make sure edge cases are handled properly.  With those in place, dtests 
don't usually need to be as thorough or convoluted.

With that said, the tests you added are probably fine for now.  Can you 
merge/rebase against the latest 2.1?  CASSANDRA-7068 seems to be breaking my 
tests of this.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984609#comment-13984609
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

bq.  Can you merge/rebase against the latest 2.1?

Sure, force-pushed that on the existing branch.

bq. In the long run, a balance of unit tests and functional tests is needed

For what it's worth, I was not really making any generic statement on unit 
testing, but merely acknowledging the lack of thoroughness of the tests I did 
added, but that it felt to me that, since some priority must be set, 
implementing this driver side was probably better at testing the overall new 
version of the protocol. In the long run, sure, let's unit test when it makes 
sense to unit test.


 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982919#comment-13982919
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

Rebased and pushed an additional commit on the [same 
branch|https://github.com/pcmanus/cassandra/commits/proto-v3-2] to address the 
comments above up to the following remarks.

bq. I don't know if we want to allow negative timestamps (or if those would 
break things), but if not, we should explicitly check for that and throw an IRE.

I don't think there is a point in allowing them. I did added that restriction 
to the protocol and so the code throw a ProtocolException (rather than an IRE); 
I think it makes more sense to force clients to validate on there side.

bq. Can you add some unit tests for serialization/deserialization of UDTs, 
collections, and Events for both the v2 and v3 protocols?

I did added some tests. I'm sure we could improve on those somewhat, and 
possibly extend to other stuffs, but honestly I think further efforts would be 
better spent (in the sense that we'd get a much better and realistic testing of 
this) adding support for the new protocol version to existing drivers and check 
that the change didn't break previous versions (and that's where I intent to 
spend my efforts).


 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-25 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981464#comment-13981464
 ] 

Tyler Hobbs commented on CASSANDRA-6855:


bq. There is also the fact that currently we wouldn't have much use of this.

We'll be deprecating server-side timestamps in the near future, right?  That's 
the first case I can think of.  Other than that, your points are valid, so I'm 
fine with holding off on this until we see a more concrete need for it.

Some comments on the patch:
* -1 is being used as a no timestamp provided placeholder, so the checks look 
for a timestamp = 0.  I don't know if we want to allow negative timestamps (or 
if those would break things), but if not, we should explicitly check for that 
and throw an IRE.
* In BatchStatement.execute(), you have:
{code}
if (options.getConsistency() == null || options.getSerialConsistency() == null)
throw new InvalidRequestException(Invalid empty consistency level);
{code}
We should make the error message clear as to whether the normal consistency 
level or the serial consistency level is missing.
* Can you add some unit tests for serialization/deserialization of UDTs, 
collections, and Events for both the v2 and v3 protocols?

There are also a few minor typos in the spec docs:
* 0x40 flag: that have the same meaning than in QUERY requests
** than = as
* SCHEMA_CHANGE: options depends of the preceding target.
** of = on
* SCHEMA_CHANGE: target can be ... and describe what has been modified
** describe = describes
* SCHEMA_CHANGE: and the seconde one will be the name
** second = second
* Changes from v2: been added to thie documentation
** thei = the


 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-02 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957675#comment-13957675
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

While I'm not necessarily against in principle, I'm not entirely sold for 
practical reasons. First, we already had to handle a few minor deprecation and 
so far we've logged a warning in the log. And while it's not perfect, it 
doesn't feel too bad either to me especially since we shouldn't deprecate stuff 
all the time. This to say that while sending the deprecation warning to the 
client would be nice, it doesn't seem absolutely necessary to me. On the cons 
side of things, I don't see a very clean way to add this to all possible type 
of RESULT messages without adding some bytes even when no warning is sent 
(which will be by far the majority of case): it's trivial to add a new flag to 
Result.Rows  messages, because we have the flags and we can dedicate one bit 
for an optional warning message easily, but for Result.Void, we can't make it 
entirely optional (unless maybe we create a new Result.Void_with_warning 
type, but that feels rather ugly). Sure just adding a warning field to 
Result.Void, just having it be the empty string when there is no warning, is 
only 2 bytes of overhead, but it's 2 bytes that is sent all the time for 
something that is very rarely needed, and is really just a convenience when it 
is. There is also the fact that currently we wouldn't have much use of this.

So, well, that doesn't seem too useful as of now, and I'm not too sure how to 
add it cleanly/without adding overhead in general. That said, if you have a 
concrete proposition, I'm fine looking at it.

But *please*, lets not that slow down the review of what's already here. As 
long as we decide what we want to do for this warnings idea before the final 
release of 2.1, we can deal with it asynchronously from the bulk of this issue 
(at best this should be minor modification anyway). And I really think the 
sooner we have the main parts of this in, the sooner drivers can test that new 
version and hopefully shake the bugs out before the final.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-04-01 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13956919#comment-13956919
 ] 

Tyler Hobbs commented on CASSANDRA-6855:


I haven't gotten to review this in depth, but one thing I'd like to add support 
for is warnings.  These could be a field on RESULT messages (and possibly 
others).  Mainly we need these for deprecation warnings, but I'm sure there are 
other cases where these would be handy.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright side, 
 the additions of this V3 are comparatively much more simple to implement for 
 a client that those of V2 (in fact, for clients that want to support UDT, it 
 will probably require less effort to add the changes for this new version 
 than to try to support UDT without it), so I do think we make 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-03-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949219#comment-13949219
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

I've pushed a rebase of the previous branch at 
https://github.com/pcmanus/cassandra/commits/proto-v3-2 with one additional 
commit to slightly modify schema_change notifications events so that they 
handle user types properly. I'm growing more and more convinced that we should 
really ship this with 2.1 as it'll make it easier for driver implementor to 
handle user type properly with this. Feedback appreciated.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1 beta2


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright side, 
 the additions of this V3 are comparatively much more simple to implement for 
 a client that those of V2 (in fact, for clients that want to support 

[jira] [Commented] (CASSANDRA-6855) Native protocol V3

2014-03-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934969#comment-13934969
 ] 

Sylvain Lebresne commented on CASSANDRA-6855:
-

Pushed a branch for this at 
https://github.com/pcmanus/cassandra/commits/proto-v3. There is one commit for 
each of the features above.

In trust, this is currently not really tested, and I definitively intend to 
test this more thoroughly by implementing support in the java driver before 2.1 
is out, but wanted to check there isn't too much disagreement on this before 
spending time on that.

 Native protocol V3
 --

 Key: CASSANDRA-6855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6855
 Project: Cassandra
  Issue Type: New Feature
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.1


 I think we need a V3 of the protocol for 2.1. The things that this 
 could/should includes are:
 # Adding an optional Serial CL for protocol batches (like we have for QUERY 
 and EXECUTE). It was an oversight of V2 of not adding it, and now that we can 
 batch conditional updates, it's definitively missing.
 # Proper type codes for UDT. This is not *strictly* needed to be able to 
 support UDT since currently a UDT will be sent as a custom type with his 
 fully class name + arguments. But parsing that is no fun nor convenient for 
 clients. It's also not particular space efficient (though that's probably not 
 a huge concern since with prepared statement you can avoid sending the 
 ResultSet metadata every time).
 # Serialization format for collections. Currently the serialization format 
 only allow for 65K elements, each of 65K bytes size at most. While 
 collections are not meant to store large amount of data, having the 
 limitation in the protocol serialization format is the wrong way to deal with 
 that. Concretely, the current workaround for CASSANDRA-5428 is ugly. I'll 
 note that the current serialization format is also an obstacle to supporting 
 null inside collections (whether or not we want to support null there is a 
 good question, but here again I'm not sure being limited by the serialization 
 format is a good idea).
 # CASSANDRA-6178: I continue to believe that in many case it makes somewhat 
 more sense to have the default timestamp provided by the client (this is a 
 necessary condition for true idempotent retries in particular). I'm 
 absolutely fine making that optional and leaving server-side generated 
 timestamps by default, but since client can already provide timestamp in 
 query string anyway, I don't see a big deal in making it easier for client 
 driver to control that without messing with the query string.
 # Optional names for values in QUERY messages: it has been brought to my 
 attention that while V2 allows to send a query string with values for a 
 one-roundtrip bind-and-execute, a driver can't really support named bind 
 marker with that feature properly without parsing the query. The proposition 
 is thus to make it (optionally) possible to ship the name of the marker each 
 value is supposed to be bound to.
 I think that 1) and 2) are enough reason to make a V3 (even if there is 
 disagreement on the rest that is).
 3) is a little bit more involved tbh but I do think having the current 
 limitations bolted in the protocol serialization format is wrong in the long 
 run, and it turns out that due to UDT we will start storing serialized 
 collections internally so if we want to lift said limitation in the 
 serialization format, we should do it now and everywhere, as doing it 
 afterwards will be a lot more painful.
 4) and 5) are probably somewhat more minor, but at the same time, both are 
 completely optional (a driver won't have to support those if he doesn't 
 want). They are really just about making things more flexible for client 
 drivers and they are not particularly hard to support so I don't see too many 
 reasons not to include them.
 Last but not least, I know that some may find it wrong to do a new protocol 
 version with each major of C*, so let me state my view here: I fully agree 
 that we shouldn't make an habit of that in the long run and that's 
 definitively *not* my objective. However, it would be silly to expect that we 
 could get everything right and forget nothing in the very first version. It 
 shouldn't be surprising that we'll have to burn a few versions (and there 
 might be a few more yet) before getting something more stable and complete 
 and I think that delaying the addition of stuffs that are useful to create 
 some fake notion of stability would be even more silly. On the bright side, 
 the additions of this V3 are comparatively much more simple to implement for 
 a client that those of V2 (in fact, for clients that want to support UDT, it 
 will probably