[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-07-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370466#comment-15370466
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

Alright, +1, but please mention the new table in the NEWS file on commit.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-07-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369330#comment-15369330
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

Was an oversight. It didn't invalidate on 
{{QueryProcessor.internalStatements}}. Fixed that and triggered a new CI run 
(above links work).

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-07-08 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368070#comment-15368070
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

Huh! That's strange - let me check.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-07-07 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365918#comment-15365918
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

Haven't dug into why, but there is quite a few unit test failures that look 
abnormal (queries that should be invalid that aren't anymore).

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-07-05 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362394#comment-15362394
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

I'm sorry for having dropped that review on the floor. On the patch (which 
needs rebasing but that probably won't change the patch much), mostly lgtm but 
a few small points:
* The unit test would be bit more useful if it was testing proper execution of 
the statements when they are reloaded (and that you get an exception when they 
are dropped) rather than entirely relying on {{preparedStatementsCount()}}.
* In {{QueryProcessor}}, should add {{@VisibleForTest}} on 
{{clearPreparedStatements}}
* In {{QueryProcessor.removeInvalidPreparedStatementsForFunction}}, I much 
prefer using {{Iterables.any(statement.statement.getFunctions(), 
matchesFunction)}} as was the case before this patch than 
{{StreamSupport.stream(pstmt.getValue().statement.getFunctions().spliterator(), 
false).anyMatch(matchesFunction::apply)}}
* Really a small nit, but I would add a {{MD5Digest.byteBuffer()}} method 
instead wrapping manually in {{SystemKeyspace}}.


> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2016-04-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262841#comment-15262841
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

[~slebresne] do you have bandwidth to review this?

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-12-04 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041853#comment-15041853
 ] 

Joshua McKenzie commented on CASSANDRA-8831:


Moving from Ready to Commit back to In Progress as I don't see +1 from reviewer 
and it was changed 2 days ago by Anonymous.

Not entirely sure what that's about.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-07-05 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614181#comment-14614181
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

[~iamaleksey] shall we consider this for 2.2 or for 3.0?

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, docs-impacting
> Fix For: 3.x
>
> Attachments: 8831-3.0-v1.txt, 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-04-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508432#comment-14508432
 ] 

Jonathan Ellis commented on CASSANDRA-8831:
---

No, see previous comments for discussion about that.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
>  Labels: client-impacting, doc-impacting
> Fix For: 3.0
>
> Attachments: 8831-3.0-v1.txt, 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-04-22 Thread Olivier Michallat (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507292#comment-14507292
 ] 

Olivier Michallat commented on CASSANDRA-8831:
--

Are there plans to backport this to the 2.0 and 2.1 branches?

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8831-3.0-v1.txt, 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-03-02 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344290#comment-14344290
 ] 

Aleksey Yeschenko commented on CASSANDRA-8831:
--

Mark as Patch Available?

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8831-3.0-v1.txt, 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-23 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333755#comment-14333755
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

bq. It's enough information to re-prepare the statement and avoid having to 
bother with batches.

Fully agree.

bq. prefer 3.0 for this

Alright - will prepare a patch for 3.0.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333591#comment-14333591
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

bq. For 2.1 the USE keyspace might not have any effect. But for 3.0 (UDFs) it 
has.

If by "has effect" your mean "for the preparation of the statement given a 
query string", then I don't disagree but that's *not* what I'm saying. I'm 
saying that _once prepared_, there is only one keyspace that matters, the 
statement keyspace, and that's the one I'm talking about. To put it another 
way, either the statement has a fully-qualified keyspace, and the {{USE}} one 
has no effect, or it doesn't have such FQ keyspace and then only the {{USE}} 
one matter. In both case, there is only one keyspace that matter and that's 
what I'm suggesting using. But anyway, we might want to change that schema 
anyway, see below.

bq. But "table_name" could become a more generic "table_names set", to 
properly handle batches.

That's a good point, and in fact, that's true of {{keyspace_name}} too. Which 
kind of suck but well, the whole point of having the keyspace and table was to 
be able to query for only the prepared statement of a given table, but since 
batches screw that, I suggest we keep it simpler and just have:
{noformat}
CREATE TABLE system.prepared_statements (
prepared_id blob PRIMARY KEY,
query_string text,
logged_keyspace text   // that one _is_ the "USE keyspace"
)
{noformat}
It's enough information to re-prepare the statement and avoid having to bother 
with batches.

bq. so long as we don't whitelist the table as accessible to everyone by default

Except that so far the assumption of this ticket is that drivers would check 
that table and so it would have to white-listed. Though I guess we could forget 
about drivers being able to check it: if they can assume a restarted node will 
have kept it's statement, then they can unconditionally skip re-preparing 
statements in that case. They'd still have to do it on newly joined nodes, but 
that's a rarer event (so still an improvement) and maybe later we can implement 
Robert's idea of having newly joined node query another node to get the 
prepared statement (mildly fan of the idea on principle, it feels hacky to me, 
but why not) so drivers can skip that case too.

bq. I'd rather this went into 3.0

To be clear, I only mentioned 2.1 to say that this felt simple enough that 2.1 
could be considered as not out of question, but I tend to prefer 3.0 for this 
too.


> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Attachments: 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
>

[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-23 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333200#comment-14333200
 ] 

Aleksey Yeschenko commented on CASSANDRA-8831:
--

Since DESCRIBE is a 3.0-ticket, it's no blocker for this in 2.1 (though I'd 
rather this went into 3.0, tbh).

Viewing data embedded in the INSERT statements is an issue, however - and those 
should ideally be filtered out on lack of SELECT permission on the 
tables-basis. That said, it's not really a blocker, so long as we don't 
whitelist the table as accessible to everyone by default, like schema, local, 
and peers currently are. So just ignore the security angle for now.

Now, I still haven't read the patch itself. But "table_name" could become a 
more generic "table_names set", to properly handle batches.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Attachments: 8831-v1.txt, 8831-v2.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332486#comment-14332486
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

bq. I implemented it as "persists changes to the table every minute".

I'd *really* prefer persisting statements when we prepare them (and remove them 
when they get evicted from the cache) instead of have a scheduled task. Not 
doing so fails user expactions for no good reason that I can see. In fact, 
since prepared statement are by design supposed to be prepared very rarely, 
writing them all every minute is a waste of good CPU time. I'll note that while 
we could do it asynchronously (but still only trigget it once when the 
statement is prepared), I really don't think it's worth adding complexity 
because statement preparation is not terribly performance sensitive and 
besides, provided we don't force-flush the table (which there is no reason to), 
the write will be pretty cheap (it's really just an almost surely uncontended 
write to the memtable).

bq. Table has the layout you've proposed

Well, almost, your patch don't include the {{table_name}} in the partition key. 
And unless you had a reason not to, I think it's a nice to have.

bq. except that I had to add the column {{use_keyspace_name}}

I don't think we need that, once we've parsed the statement we know to which 
keyspace it applies without ambiguity. Or to put it another way, the only times
{{use_keyspace_name}} would differ from {{keyspace}} is when the former *do 
not* influence the prepared statement and it's thus rather useless.

Other minor remarks on the patch:
* the code to actually query the table should go into SystemKeyspace for 
symetry with every other code that query a system table.
* let's maybe add a {{keyspace()}} and {{table()}} (which can be null) methods 
to {{CQLStatement}} rather than doing ugly instanceof tests.

bq. Just making a note that this does have security implications

Yes, but given the table layout, whatever mechanism will work for 
CASSANDRA-8163 will work here to (I even think that having a DESCRIBE 
permission applying to this is actually fair). Meaning that I don't reject your 
remark, but in our current state I don't think being able to list prepared 
statement is really a lot worth than being able to list tables (yes they could 
be scalar in prepared statements but I'm not sure this changes everything 
either) and so I feel this should just be added to the list of things to handle 
in CASSANDRA-8163 but not be a blocker per-se.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Attachments: 8831-v1.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only fo

[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-22 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332096#comment-14332096
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

bq. security implications
Good point. I don't see a good solution for that at the moment without adding 
much more compleity (esp. for 2.1).
# We could exclude all system-keyspaces in the patch, but that feels more like 
a dirty hack than a complete solution.
# Preventing access to {{system.prepared_statements}} feels dirty, too.
# Moving it to a new, _inaccessible/internal-only keyspace_ might be a solution 
- too heavy for 2.1?

Options 2 + 3 imply that clients can assume, that statements stay prepared 
after node restarts (i.e. {{if cassandraVersion >= "2.1.x"}})

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Attachments: 8831-v1.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-21 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331977#comment-14331977
 ] 

Aleksey Yeschenko commented on CASSANDRA-8831:
--

Haven't looked at the patch. Just making a note that this does have security 
implications (as in authz and permissions). For both CASSANDRA-8163 and just 
actual scalar data in prepared INSERT statements.

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Robert Stupp
> Attachments: 8831-v1.txt
>
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-21 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330124#comment-14330124
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

Fair enough. So if I understand it correctly:
- Newly prepared statements are written into {{system.prepared_statements}} and 
removed upon some _certain condition_: Is that when the pstmt is evicted or the 
pstmt has timed out (e.g. TTL'd)? I assume it won't stay in 
{{system.prepared_statements}} until table or ks is dropped ;)
- When a node starts up, it prepares all statements in 
{{system.prepared_statements}}
- When a client detects a node _restart_, it basically performs {{select 
keyspace_name, table_name, prepared_id from system.prepared_statements}} and 
checks for missing prepared statements.
- In the ideal case, there are no statements to re-prepare

I'd like to propose to write that table asynchronously - i.e. synchronize the 
contents of {{QueryProcessor.preparedStatements}} every N minutes so we don't 
add any overhead to CQL processing at all. That way we wouldn't have to deal 
with TTL or _synchronization of table with pstmt-eviction_ as described above.

Follow-up: we could add some more intelligent logic that allows a newly 
bootstrapped node to get a view of probably required prepared statements - so 
clients would not have to hammer a new node with prepare-statement-calls (not 
to be included in this ticket).

I assume, that we don't want to do that for 
{{QueryProcessor.thriftPreparedStatements}}.

(With _neighbors_ I meant neighbors in the _same DC_ - yes, depends on the node 
cfg, KS cfg and app)

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327249#comment-14327249
 ] 

Sylvain Lebresne commented on CASSANDRA-8831:
-

As I said, there could be many ways to solve this. So yes, we could technically 
do that, but the interesting question is what would be the advantages and 
inconvenients of doing it compared to letting clients do it. Perhaps more 
importantly, saying that nodes aks their neighbors doesn't say *how* they do 
it. And while we could use gossip or a custom verb handler for that, it feels 
more complex than worth it to me and in both case strongly imply we'd get this 
in a major release (which both means that 2.1 is out of question, and that if 
we don't want to push that far away in the future it has to make 3.0, and I'm 
not sure adding more stuff that absolutely needs to make 3.0 is a very popular 
idea). So I continue to think that exposing this through a table is the best 
way to go: it's easy, it's something user may be interested in anyway and as a 
bonus, once we have CASSANDRA-7622, it could be relatively straighforward to 
expose more stuff in that table, like the number of times a given statement has 
been executed.

Once we've done that, we can ask ourselves whether we want to continue letting 
clients deal with prepared statements all by themselves or if we prefer making 
node query their neighbors table on startup, but imo that question comes next.  
And my personal hunch is that it's not worth bothering: it's true that it would 
avoid the problem of multiple clients concurrently checking if statements are 
prepared on a node, but I'm not sure that problem is such a big deal in 
practice, and on the other side, it feels to me that which statement is 
prepared on which node is intrinsically something clients decide (for instance, 
some statement may only be needed in some DCs, but C* has no way to know that) 
and so I'm not sure blindy replicating all statements from neighbors is 
necessarily what we want.


> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements

2015-02-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327141#comment-14327141
 ] 

Robert Stupp commented on CASSANDRA-8831:
-

Couldn't a (re)started node not just ask its _neighbors_ for a view of their 
prepared statements?

> Create a system table to expose prepared statements
> ---
>
> Key: CASSANDRA-8831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8831
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>
> Because drivers abstract from users the handling of up/down nodes, they have 
> to deal with the fact that when a node is restarted (or join), it won't know 
> any prepared statement. Drivers could somewhat ignore that problem and wait 
> for a query to return an error (that the statement is unknown by the node) to 
> re-prepare the query on that node, but it's relatively inefficient because 
> every time a node comes back up, you'll get bad latency spikes due to some 
> queries first failing, then being re-prepared and then only being executed. 
> So instead, drivers (at least the java driver but I believe others do as 
> well) pro-actively re-prepare statements when a node comes up. It solves the 
> latency problem, but currently every driver instance blindly re-prepare all 
> statements, meaning that in a large cluster with many clients there is a lot 
> of duplication of work (it would be enough for a single client to prepare the 
> statements) and a bigger than necessary load on the node that started.
> An idea to solve this it to have a (cheap) way for clients to check if some 
> statements are prepared on the node. There is different options to provide 
> that but what I'd suggest is to add a system table to expose the (cached) 
> prepared statements because:
> # it's reasonably straightforward to implement: we just add a line to the 
> table when a statement is prepared and remove it when it's evicted (we 
> already have eviction listeners). We'd also truncate the table on startup but 
> that's easy enough). We can even switch it to a "virtual table" if/when 
> CASSANDRA-7622 lands but it's trivial to do with a normal table in the 
> meantime.
> # it doesn't require a change to the protocol or something like that. It 
> could even be done in 2.1 if we wish to.
> # exposing prepared statements feels like a genuinely useful information to 
> have (outside of the problem exposed here that is), if only for 
> debugging/educational purposes.
> The exposed table could look something like:
> {noformat}
> CREATE TABLE system.prepared_statements (
>keyspace_name text,
>table_name text,
>prepared_id blob,
>query_string text,
>PRIMARY KEY (keyspace_name, table_name, prepared_id)
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)