[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

Robert Stupp (JIRA) Thu, 19 May 2016 06:49:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291106#comment-15291106
 ]


Robert Stupp edited comment on CASSANDRA-10786 at 5/19/16 1:48 PM:
-------------------------------------------------------------------

Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
    - <id> is [short bytes] representing the prepared query ID.
    - <fingerprint> is [short bytes] representing the metadata hash.
    - <metadata> is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{<id><fingerprint><query_parameters>}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case the 
metadata has changed / does not match the fingerprint.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).


was (Author: snazy):
Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
    - <id> is [short bytes] representing the prepared query ID.
    - <fingerprint> is [short bytes] representing the metadata hash.
    - <metadata> is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{<id><fingerprint><query_parameters>}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case it 
has.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).

> Include hash of result set metadata in prepared statement id
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-10786
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Olivier Michallat
>            Assignee: Alex Petrov
>            Priority: Minor
>              Labels: client-impacting, protocolv5
>             Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

Reply via email to