[jira] [Updated] (CASSANDRA-12167) Review JMX metrics test coverage
[ https://issues.apache.org/jira/browse/CASSANDRA-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12167: - Summary: Review JMX metrics test coverage (was: Review JMX metrics coverage) > Review JMX metrics test coverage > > > Key: CASSANDRA-12167 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12167 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: DS Test Eng > Labels: dtest > > I just deleted the dtest that was meant to smoke test JMX metrics: > https://github.com/riptano/cassandra-dtest/pull/1085 > The idea was that you'd read JMX metrics, run stress, then make sure the > metrics went up, down, or stayed the same, as appropriate. This kind of > coverage would be good to have. > I don't think we have it anywhere in the dtests, and it probably isn't > appropriate in unit tests. We should check there's no coverage in the unit > tests, and add some coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12154) "SELECT * FROM foo LIMIT ;" does not error out
[ https://issues.apache.org/jira/browse/CASSANDRA-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12154: - Labels: lhf (was: ) > "SELECT * FROM foo LIMIT ;" does not error out > -- > > Key: CASSANDRA-12154 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12154 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Robert Stupp >Priority: Minor > Labels: lhf > > We found out that {{SELECT * FROM foo LIMIT ;}} is unanimously accepted and > executed but it should not. > Have not dug deeper why that is possible (it's not a big issue IMO) but it is > strange. Seems it doesn't parse {{LIMIT}} as {{K_LIMIT}} because otherwise it > would require an int argument. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375176#comment-15375176 ] Sylvain Lebresne commented on CASSANDRA-9318: - bq. At this point instead of adding more complexity to an approach that fundamentally doesn't solve that, why not back up and use an approach that does the right thing in all 3 cases instead? My understanding of the fundamentals of Sergio's approach is to: # maintain, on the coordinator, a state for each node that keep track of how much in-flight query we have for that node. # on a new write query, check the state for the replicas involved in that query to decide what to do (when to hint the node, when to start rate limiting or when to start rejecting the queries to the client). In that sense, I don't think the approach is fundamentally wrong but I feel the main question is on the "what to do (and when)". And as I'm not sure there is a single perfect answer for that, I do also like the approach of a strategy, if only so experimentation is easier (though technically, instead of just having an {{apply()}} that potentially throws or sleep, I think the strategy should take the replica for the query, and return a list of nodes to query and one to hint (preserving the ability to sleep or throw) to get more options on the "what to do", and not making backpressure a node-per-node thing). In term of the "default" back-pressure strategy we provide, I agree that we should mostly try to solve the scenario 3: we should define some condition where we consider things overloaded and only apply back-pressure from there. Not sure what that exact condition is btw, but I'm not convinced we can come with a good one out of thin air, I think we need to experiment. tl;dr, if we make the strategy a bit more generic as mentioned above so the decision is made from all replica involved (maybe the strategy should also keep track of the replica-state completely internally so we can implement basic strategy like having a simple high watermark very easy), and we make sure to not throttle too quickly (typically, if a single replica is slow and we don't really need it, start by just hinting him), then I'd be happy moving to the "actually test this" phase and see how it goes. > Bound the number of in-flight requests at the coordinator > - > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging >Reporter: Ariel Weisberg >Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12110) How to refer internal table values in cassandra UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374493#comment-15374493 ] Sylvain Lebresne commented on CASSANDRA-12110: -- As said above, this is still a user level question. This belong ot the mailing list. JIRA is for bug reports/feature requests only. > How to refer internal table values in cassandra UDF > --- > > Key: CASSANDRA-12110 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12110 > Project: Cassandra > Issue Type: Test > Components: CQL > Environment: Linux and java >Reporter: Raghavendra Pinninti >Priority: Minor > Labels: cassandra, cqlsh, java, triggers, udf > Fix For: 3.0.8 > > > How can I refer column values in UDF's? For example Return boolean > (True/False) for a given UserID provided it matches the value of the column > in a table. Do I need to write a separate java code as text file to hold the > values in the result set object? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12144) Undeletable rows after upgrading from 2.2.4 to 3.0.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12144: - Reviewer: Sylvain Lebresne > Undeletable rows after upgrading from 2.2.4 to 3.0.7 > > > Key: CASSANDRA-12144 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12144 > Project: Cassandra > Issue Type: Bug >Reporter: Stanislav Vishnevskiy >Assignee: Alex Petrov > > We upgraded our cluster today and now have a some rows that refuse to delete. > Here are some example traces. > https://gist.github.com/vishnevskiy/36aa18c468344ea22d14f9fb9b99171d > Even weirder. > Updating the row and querying it back results in 2 rows even though the id is > the clustering key. > {noformat} > user_id| id | since| type > ---++--+-- > 116138050710536192 | 153047019424972800 | null |0 > 116138050710536192 | 153047019424972800 | 2016-05-30 14:53:08+ |2 > {noformat} > And then deleting it again only removes the new one. > {noformat} > cqlsh:discord_relationships> DELETE FROM relationships WHERE user_id = > 116138050710536192 AND id = 153047019424972800; > cqlsh:discord_relationships> SELECT * FROM relationships WHERE user_id = > 116138050710536192 AND id = 153047019424972800; > user_id| id | since| type > ++--+-- > 116138050710536192 | 153047019424972800 | 2016-05-30 14:53:08+ |2 > {noformat} > We tried repairing, compacting, scrubbing. No Luck. > Not sure what to do. Is anyone aware of this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11424) Add support to "unset" JSON fields in prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370457#comment-15370457 ] Sylvain Lebresne commented on CASSANDRA-11424: -- Only had an admittedly quick look, but the patch seems to limit the new options to bind markers, and I don't think we want to do that. We should allow the query: {noformat} INSERT INTO t JSON '{"k":"v"}' DEFAULT NULL {noformat} while that's refused by the parser in the attached patch. > Add support to "unset" JSON fields in prepared statements > - > > Key: CASSANDRA-11424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11424 > Project: Cassandra > Issue Type: Improvement >Reporter: Ralf Steppacher >Assignee: Oded Peer > Labels: client-impacting, cql > Fix For: 3.8 > > Attachments: 11424-trunk-V1.txt, 11424-trunk-V2.txt > > > CASSANDRA-7304 introduced the ability to distinguish between {{NULL}} and > {{UNSET}} prepared statement parameters. > When inserting JSON objects it is not possible to profit from this as a > prepared statement only has one parameter that is bound to the JSON object as > a whole. There is no way to control {{NULL}} vs {{UNSET}} behavior for > columns omitted from the JSON object. > Please extend on CASSANDRA-7304 to include JSON support. > {color:grey} > (My personal requirement is to be able to insert JSON objects with optional > fields without incurring the overhead of creating a tombstone of every column > not covered by the JSON object upon initial(!) insert.) > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11403) Serializer/Version mismatch during upgrades to C* 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-11403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370470#comment-15370470 ] Sylvain Lebresne commented on CASSANDRA-11403: -- There is a fair chance that this is fixed by CASSANDRA-11393 patch, though if you can reproduce this somewhat reliably and can test said patch, that would be awesome. > Serializer/Version mismatch during upgrades to C* 3.0 > - > > Key: CASSANDRA-11403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11403 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Anthony Cozzie > > The problem line seems to be: > {code} > MessageOut message = > readCommand.createMessage(MessagingService.instance().getVersion(endpoint)); > {code} > SinglePartitionReadCommand then picks the serializer based on the version: > {code} > return new MessageOut<>(MessagingService.Verb.READ, this, version < > MessagingService.VERSION_30 ? legacyReadCommandSerializer : serializer); > {code} > However, OutboundTcpConnectionPool will test the payload size vs the version > from its smallMessages connection: > {code} > return msg.payloadSize(smallMessages.getTargetVersion()) > > LARGE_MESSAGE_THRESHOLD > {code} > Which is set when the connection/pool is created: > {code} > targetVersion = MessagingService.instance().getVersion(pool.endPoint()); > {code} > During an upgrade, this state can change between these two calls leading the > 3.0 serializer being used on 2.x packets and the following stacktrace: > ERROR [OptionalTasks:1] 2016-03-07 19:53:06,445 CassandraDaemon.java:195 - > Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.AssertionError: null > at > org.apache.cassandra.db.ReadCommand$Serializer.serializedSize(ReadCommand.java:632) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.db.ReadCommand$Serializer.serializedSize(ReadCommand.java:536) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:72) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:609) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:758) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:701) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.net.MessagingService.sendRRWithFailure(MessagingService.java:684) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.AbstractReadExecutor.makeRequests(AbstractReadExecutor.java:110) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.AbstractReadExecutor.makeDataRequests(AbstractReadExecutor.java:85) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.AbstractReadExecutor$NeverSpeculatingReadExecutor.executeAsync(AbstractReadExecutor.java:214) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.doInitialQueries(StorageProxy.java:1699) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1654) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1601) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1520) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:918) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:251) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:212) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:77) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:237) > ~[cassandra-all-3.0.3.903.jar:3.0.3.903] > at >
[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370466#comment-15370466 ] Sylvain Lebresne commented on CASSANDRA-8831: - Alright, +1, but please mention the new table in the NEWS file on commit. > Create a system table to expose prepared statements > --- > > Key: CASSANDRA-8831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8831 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > Labels: client-impacting, docs-impacting > Fix For: 3.x > > > Because drivers abstract from users the handling of up/down nodes, they have > to deal with the fact that when a node is restarted (or join), it won't know > any prepared statement. Drivers could somewhat ignore that problem and wait > for a query to return an error (that the statement is unknown by the node) to > re-prepare the query on that node, but it's relatively inefficient because > every time a node comes back up, you'll get bad latency spikes due to some > queries first failing, then being re-prepared and then only being executed. > So instead, drivers (at least the java driver but I believe others do as > well) pro-actively re-prepare statements when a node comes up. It solves the > latency problem, but currently every driver instance blindly re-prepare all > statements, meaning that in a large cluster with many clients there is a lot > of duplication of work (it would be enough for a single client to prepare the > statements) and a bigger than necessary load on the node that started. > An idea to solve this it to have a (cheap) way for clients to check if some > statements are prepared on the node. There is different options to provide > that but what I'd suggest is to add a system table to expose the (cached) > prepared statements because: > # it's reasonably straightforward to implement: we just add a line to the > table when a statement is prepared and remove it when it's evicted (we > already have eviction listeners). We'd also truncate the table on startup but > that's easy enough). We can even switch it to a "virtual table" if/when > CASSANDRA-7622 lands but it's trivial to do with a normal table in the > meantime. > # it doesn't require a change to the protocol or something like that. It > could even be done in 2.1 if we wish to. > # exposing prepared statements feels like a genuinely useful information to > have (outside of the problem exposed here that is), if only for > debugging/educational purposes. > The exposed table could look something like: > {noformat} > CREATE TABLE system.prepared_statements ( >keyspace_name text, >table_name text, >prepared_id blob, >query_string text, >PRIMARY KEY (keyspace_name, table_name, prepared_id) > ) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11424) Add support to "unset" JSON fields in prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366055#comment-15366055 ] Sylvain Lebresne commented on CASSANDRA-11424: -- To be clear, I'm not really against the {{IGNORE_OMITTED}} approach, but I just want to explore all the options. In fact, *if* our default had been to left omitted columns unset, then I'd have insisted more on that column idea since getting null for omitted values could have bee then done with {{INSERT INTO t( *) JSON ...}}, which is kind of consistent, but as that's not the case and it's too late to change, a simple flag is probably the most pragmatic option. That said, to bikkeshed on syntax, we don't use underscore for keywords in CQL, and having it after the value reads a bit better imo, so: {noformat} INSERT INTO t JSON '{"k":"v"}' IGNORE OMITTED {noformat} In fact, to bikkeshed even further, an alternative would be to call it {{DEFAULT UNSET}} (as in, by default, columns are unset), and to also support {{DEFAULT NULL}}, which would be the default, but that you could add if you like explicitness. I have a slight preference for that later option but that's arguably totally subjective. Anyway, [~thobbs] might also have an opinion since he added the JSON support and so may have though about this already. > Add support to "unset" JSON fields in prepared statements > - > > Key: CASSANDRA-11424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11424 > Project: Cassandra > Issue Type: Improvement >Reporter: Ralf Steppacher > > CASSANDRA-7304 introduced the ability to distinguish between {{NULL}} and > {{UNSET}} prepared statement parameters. > When inserting JSON objects it is not possible to profit from this as a > prepared statement only has one parameter that is bound to the JSON object as > a whole. There is no way to control {{NULL}} vs {{UNSET}} behavior for > columns omitted from the JSON object. > Please extend on CASSANDRA-7304 to include JSON support. > {color:grey} > (My personal requirement is to be able to insert JSON objects with optional > fields without incurring the overhead of creating a tombstone of every column > not covered by the JSON object upon initial(!) insert.) > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Resolution: Fixed Fix Version/s: (was: 3.0.x) (was: 3.x) 3.9 3.0.9 Status: Resolved (was: Patch Available) Committed, thanks. > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Sylvain Lebresne > Fix For: 3.0.9, 3.9 > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12143: - Resolution: Fixed Fix Version/s: (was: 2.2.x) 2.2.8 Status: Resolved (was: Patch Available) Committed, thanks. > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck >Priority: Critical > Fix For: 2.2.8 > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11031: - Reviewer: Alex Petrov (was: Sylvain Lebresne) > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365918#comment-15365918 ] Sylvain Lebresne commented on CASSANDRA-8831: - Haven't dug into why, but there is quite a few unit test failures that look abnormal (queries that should be invalid that aren't anymore). > Create a system table to expose prepared statements > --- > > Key: CASSANDRA-8831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8831 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > Labels: client-impacting, docs-impacting > Fix For: 3.x > > > Because drivers abstract from users the handling of up/down nodes, they have > to deal with the fact that when a node is restarted (or join), it won't know > any prepared statement. Drivers could somewhat ignore that problem and wait > for a query to return an error (that the statement is unknown by the node) to > re-prepare the query on that node, but it's relatively inefficient because > every time a node comes back up, you'll get bad latency spikes due to some > queries first failing, then being re-prepared and then only being executed. > So instead, drivers (at least the java driver but I believe others do as > well) pro-actively re-prepare statements when a node comes up. It solves the > latency problem, but currently every driver instance blindly re-prepare all > statements, meaning that in a large cluster with many clients there is a lot > of duplication of work (it would be enough for a single client to prepare the > statements) and a bigger than necessary load on the node that started. > An idea to solve this it to have a (cheap) way for clients to check if some > statements are prepared on the node. There is different options to provide > that but what I'd suggest is to add a system table to expose the (cached) > prepared statements because: > # it's reasonably straightforward to implement: we just add a line to the > table when a statement is prepared and remove it when it's evicted (we > already have eviction listeners). We'd also truncate the table on startup but > that's easy enough). We can even switch it to a "virtual table" if/when > CASSANDRA-7622 lands but it's trivial to do with a normal table in the > meantime. > # it doesn't require a change to the protocol or something like that. It > could even be done in 2.1 if we wish to. > # exposing prepared statements feels like a genuinely useful information to > have (outside of the problem exposed here that is), if only for > debugging/educational purposes. > The exposed table could look something like: > {noformat} > CREATE TABLE system.prepared_statements ( >keyspace_name text, >table_name text, >prepared_id blob, >query_string text, >PRIMARY KEY (keyspace_name, table_name, prepared_id) > ) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8831) Create a system table to expose prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8831: Status: Open (was: Patch Available) > Create a system table to expose prepared statements > --- > > Key: CASSANDRA-8831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8831 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > Labels: client-impacting, docs-impacting > Fix For: 3.x > > > Because drivers abstract from users the handling of up/down nodes, they have > to deal with the fact that when a node is restarted (or join), it won't know > any prepared statement. Drivers could somewhat ignore that problem and wait > for a query to return an error (that the statement is unknown by the node) to > re-prepare the query on that node, but it's relatively inefficient because > every time a node comes back up, you'll get bad latency spikes due to some > queries first failing, then being re-prepared and then only being executed. > So instead, drivers (at least the java driver but I believe others do as > well) pro-actively re-prepare statements when a node comes up. It solves the > latency problem, but currently every driver instance blindly re-prepare all > statements, meaning that in a large cluster with many clients there is a lot > of duplication of work (it would be enough for a single client to prepare the > statements) and a bigger than necessary load on the node that started. > An idea to solve this it to have a (cheap) way for clients to check if some > statements are prepared on the node. There is different options to provide > that but what I'd suggest is to add a system table to expose the (cached) > prepared statements because: > # it's reasonably straightforward to implement: we just add a line to the > table when a statement is prepared and remove it when it's evicted (we > already have eviction listeners). We'd also truncate the table on startup but > that's easy enough). We can even switch it to a "virtual table" if/when > CASSANDRA-7622 lands but it's trivial to do with a normal table in the > meantime. > # it doesn't require a change to the protocol or something like that. It > could even be done in 2.1 if we wish to. > # exposing prepared statements feels like a genuinely useful information to > have (outside of the problem exposed here that is), if only for > debugging/educational purposes. > The exposed table could look something like: > {noformat} > CREATE TABLE system.prepared_statements ( >keyspace_name text, >table_name text, >prepared_id blob, >query_string text, >PRIMARY KEY (keyspace_name, table_name, prepared_id) > ) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions
[ https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11828: - Status: Open (was: Patch Available) > Commit log needs to track unflushed intervals rather than positions > --- > > Key: CASSANDRA-11828 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11828 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > Fix For: 2.2.x, 3.0.x, 3.x > > > In CASSANDRA-11448 in an effort to give a more thorough handling of flush > errors I have introduced a possible correctness bug with disk failure policy > ignore if a flush fails with an error: > - we report the error but continue > - we correctly do not update the commit log with the flush position > - but we allow the post-flush executor to resume > - a successful later flush can thus move the log's clear position beyond the > data from the failed flush > - the log will then delete segment(s) that contain unflushed data. > After CASSANDRA-9669 it is relatively easy to fix this problem by making the > commit log track sets of intervals of unflushed data (as described in > CASSANDRA-8496). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11315) Upgrade from 2.2.6 to 3.0.5 Fails with AssertionError
[ https://issues.apache.org/jira/browse/CASSANDRA-11315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11315: - Status: Ready to Commit (was: Patch Available) > Upgrade from 2.2.6 to 3.0.5 Fails with AssertionError > - > > Key: CASSANDRA-11315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11315 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Environment: Ubuntu 14.04, Oracle Java 8, Apache Cassandra 2.2.5 -> > 3.0.3, Apache Cassandra 2.2.6 -> 3.0.5 >Reporter: Dominik Keil >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.x > > > Hi, > when trying to upgrade our development cluster from C* 2.2.5 to 3.0.3 > Cassandra fails during startup. > Here's the relevant log snippet: > {noformat} > [...] > INFO [main] 2016-03-08 11:42:01,291 ColumnFamilyStore.java:381 - > Initializing system.schema_triggers > INFO [main] 2016-03-08 11:42:01,302 ColumnFamilyStore.java:381 - > Initializing system.schema_usertypes > INFO [main] 2016-03-08 11:42:01,313 ColumnFamilyStore.java:381 - > Initializing system.schema_functions > INFO [main] 2016-03-08 11:42:01,324 ColumnFamilyStore.java:381 - > Initializing system.schema_aggregates > INFO [main] 2016-03-08 11:42:01,576 SystemKeyspace.java:1284 - Detected > version upgrade from 2.2.5 to 3.0.3, snapshotting system keyspace > WARN [main] 2016-03-08 11:42:01,911 CompressionParams.java:382 - The > sstable_compression option has been deprecated. You should use class instead > WARN [main] 2016-03-08 11:42:01,959 CompressionParams.java:333 - The > chunk_length_kb option has been deprecated. You should use chunk_length_in_kb > instead > ERROR [main] 2016-03-08 11:42:02,638 CassandraDaemon.java:692 - Exception > encountered during startup > java.lang.AssertionError: null > at > org.apache.cassandra.db.CompactTables.getCompactValueColumn(CompactTables.java:90) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.config.CFMetaData.rebuild(CFMetaData.java:315) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:291) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at org.apache.cassandra.config.CFMetaData.create(CFMetaData.java:367) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.decodeTableMetadata(LegacySchemaMigrator.java:337) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTableMetadata(LegacySchemaMigrator.java:273) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTable(LegacySchemaMigrator.java:244) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readTables$227(LegacySchemaMigrator.java:237) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at java.util.ArrayList.forEach(ArrayList.java:1249) ~[na:1.8.0_74] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTables(LegacySchemaMigrator.java:237) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readKeyspace(LegacySchemaMigrator.java:186) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readSchema$224(LegacySchemaMigrator.java:177) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at java.util.ArrayList.forEach(ArrayList.java:1249) ~[na:1.8.0_74] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readSchema(LegacySchemaMigrator.java:177) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.migrate(LegacySchemaMigrator.java:77) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:223) > [apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) > [apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) > [apache-cassandra-3.0.3.jar:3.0.3] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions
[ https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11828: - Status: Awaiting Feedback (was: Open) > Commit log needs to track unflushed intervals rather than positions > --- > > Key: CASSANDRA-11828 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11828 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > Fix For: 2.2.x, 3.0.x, 3.x > > > In CASSANDRA-11448 in an effort to give a more thorough handling of flush > errors I have introduced a possible correctness bug with disk failure policy > ignore if a flush fails with an error: > - we report the error but continue > - we correctly do not update the commit log with the flush position > - but we allow the post-flush executor to resume > - a successful later flush can thus move the log's clear position beyond the > data from the failed flush > - the log will then delete segment(s) that contain unflushed data. > After CASSANDRA-9669 it is relatively easy to fix this problem by making the > commit log track sets of intervals of unflushed data (as described in > CASSANDRA-8496). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11315) Upgrade from 2.2.6 to 3.0.5 Fails with AssertionError
[ https://issues.apache.org/jira/browse/CASSANDRA-11315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365906#comment-15365906 ] Sylvain Lebresne commented on CASSANDRA-11315: -- +1, with the very minor nit that I'd rename {{s/filterOutRedundantRows/filterOutRedundantRowForSparse}} to make it clearer why it can throw away what it does without having to look at the usage. > Upgrade from 2.2.6 to 3.0.5 Fails with AssertionError > - > > Key: CASSANDRA-11315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11315 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Environment: Ubuntu 14.04, Oracle Java 8, Apache Cassandra 2.2.5 -> > 3.0.3, Apache Cassandra 2.2.6 -> 3.0.5 >Reporter: Dominik Keil >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.x > > > Hi, > when trying to upgrade our development cluster from C* 2.2.5 to 3.0.3 > Cassandra fails during startup. > Here's the relevant log snippet: > {noformat} > [...] > INFO [main] 2016-03-08 11:42:01,291 ColumnFamilyStore.java:381 - > Initializing system.schema_triggers > INFO [main] 2016-03-08 11:42:01,302 ColumnFamilyStore.java:381 - > Initializing system.schema_usertypes > INFO [main] 2016-03-08 11:42:01,313 ColumnFamilyStore.java:381 - > Initializing system.schema_functions > INFO [main] 2016-03-08 11:42:01,324 ColumnFamilyStore.java:381 - > Initializing system.schema_aggregates > INFO [main] 2016-03-08 11:42:01,576 SystemKeyspace.java:1284 - Detected > version upgrade from 2.2.5 to 3.0.3, snapshotting system keyspace > WARN [main] 2016-03-08 11:42:01,911 CompressionParams.java:382 - The > sstable_compression option has been deprecated. You should use class instead > WARN [main] 2016-03-08 11:42:01,959 CompressionParams.java:333 - The > chunk_length_kb option has been deprecated. You should use chunk_length_in_kb > instead > ERROR [main] 2016-03-08 11:42:02,638 CassandraDaemon.java:692 - Exception > encountered during startup > java.lang.AssertionError: null > at > org.apache.cassandra.db.CompactTables.getCompactValueColumn(CompactTables.java:90) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.config.CFMetaData.rebuild(CFMetaData.java:315) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:291) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at org.apache.cassandra.config.CFMetaData.create(CFMetaData.java:367) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.decodeTableMetadata(LegacySchemaMigrator.java:337) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTableMetadata(LegacySchemaMigrator.java:273) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTable(LegacySchemaMigrator.java:244) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readTables$227(LegacySchemaMigrator.java:237) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at java.util.ArrayList.forEach(ArrayList.java:1249) ~[na:1.8.0_74] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readTables(LegacySchemaMigrator.java:237) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readKeyspace(LegacySchemaMigrator.java:186) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.lambda$readSchema$224(LegacySchemaMigrator.java:177) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at java.util.ArrayList.forEach(ArrayList.java:1249) ~[na:1.8.0_74] > at > org.apache.cassandra.schema.LegacySchemaMigrator.readSchema(LegacySchemaMigrator.java:177) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.schema.LegacySchemaMigrator.migrate(LegacySchemaMigrator.java:77) > ~[apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:223) > [apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) > [apache-cassandra-3.0.3.jar:3.0.3] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) > [apache-cassandra-3.0.3.jar:3.0.3] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365863#comment-15365863 ] Sylvain Lebresne commented on CASSANDRA-12143: -- Thanks for the test, I've pushed the branch on CI for good measure but will commit once I get the results | [12143-2.2|https://github.com/pcmanus/cassandra/commits/12143-2.2] | [utests|http://cassci.datastax.com/job/pcmanus-12143-2.2-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-12143-2.2-dtest] | > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck >Priority: Critical > Fix For: 2.2.x > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12143: - Priority: Critical (was: Major) > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck >Priority: Critical > Fix For: 2.2.x > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365836#comment-15365836 ] Sylvain Lebresne commented on CASSANDRA-12143: -- It's not: this is not committed, so this can't be in a released version. > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck > Fix For: 2.2.x > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12143: - Fix Version/s: (was: 2.2.7) 2.2.x > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck > Fix For: 2.2.x > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12147) Static thrift tables with non UTF8Type comparators can have column names converted incorrectly
[ https://issues.apache.org/jira/browse/CASSANDRA-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365832#comment-15365832 ] Sylvain Lebresne commented on CASSANDRA-12147: -- Would be great to have some kind of test (even though I know this kind of things can be annoying to test), but +1 on the patch otherwise. > Static thrift tables with non UTF8Type comparators can have column names > converted incorrectly > -- > > Key: CASSANDRA-12147 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12147 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Fix For: 3.8, 3.0.x > > > {{CompactTables::columnDefinitionComparator()}} has been broken since > CASSANDRA-8099 for non-super columnfamilies, if the comparator is not > {{UTF8Type}}. This results in being unable to read some pre-existing 2.x data > post upgrade (it's not lost, but becomes inaccessible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365817#comment-15365817 ] Sylvain Lebresne commented on CASSANDRA-9318: - A first concern I have, that has been brought up earlier in the conversation already, is that this will create a lot more OverloadedException that we're currently used to, with 2 concrete issues imo: # the new OverloadedException has a different meaning than the old one. The one meant the coordinator was overloaded (by too much hints writing), so client drivers will tend to try another coordinator when that happens. That's not a good idea here. # I think in many case that's not what the user want. If you write at CL.ONE, and 1 of your 3 replica is struggling (but other nodes are perfectly fine), having all your queries rejected (breaking availability concretely) until that one replica catch up (or dies) feels quite wrong. What I think you'd want is consider the node dead for that query and hint that slow node (and if a coordinator gets overwhelmed by those hints, we already throw an OE), but proceed with the write. I'll not in particular that when a node dies, it's not detected so right away, and detection can actually take a few seconds, which might well mean a node that just dies will be considered overloaded temporarilly by some coordinator. So considering that overloaded is roughtly the same as dead makes some sense (you wouldn't want to start failing writes because a node dies due to this mechanism if you have enough node for your CL in particular). bq. For trunk, we can enable it by default provided we can test it on a medium size cluster before committing We *should* test it on a medium sized cluster before committing, but even then I'm really reluctant making it the default on commit. I don't think making new features untested in production and can that have huge impacts the default right away is smart, even though we have done it numerous time (unsucessfully most of the time I'd say). I'd much rather leave it opt-in for a few releases so users can test it and wait more feedback until we make it the default (I know the counter argument, that no-one will test it unless it's a default, but I doubt that's true if people are genuinely put off by the existing behavior). In general, as was already brought up earlier in the discussion, I suspect fine tuning the parameters won't be trivial, and I think we'll need a fair amount of testing in different conditions to guarantee the defaults for those parameters are sane. bq. back_pressure_timeout_override: should we call it for what it is, the back_pressure_window_size and recommend that they increase the write timeout in the comments I concur that I don't love the name either, nor the concept of having an override for another setting. I'd also prefer splitting the double meaning, leaving {{write_request_timeout_in_ms}} always be the timeout, and add a {{window_size}} in the strategy parameter. We can additionally do 2 things to get roughly the same default that in the current patch: # make {{window_size}} be equal to the write timeout by default. # make the {{*_request_timeout_in_ms}} option "silent" default (i.e. commented by default), and make the default for the write one depend on whether back_pressure is on or not. To finish on the yaml option, if we move {{window_size}} to the strategy parameters, we could also got rid of {black_pressure_enabled}} and instead have a {{NoBackPressure}} strategy (which we can totally special case internally). Don't care terribly about it, but it's imo slightly more inline with other strategies. I'd also suggest abstracting the {{BackPressureState}} state as an interface and making the strategy responsible of creating a new one, since most of the details of the state are likely strategy somewhat specific. As far as I can, all we'd need as interface is: {noformat} interface BackPressureState { void onMessageSent(); void onResponseReceived(); double currentBackPressureRate(); } {noformat} This won't provide more info for custom implementation just yet, but at least it'll give them more flexibility, and it's imo a bit clearer. > Bound the number of in-flight requests at the coordinator > - > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging >Reporter: Ariel Weisberg >Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and
[jira] [Commented] (CASSANDRA-12142) Add "beta" version native protocol flag
[ https://issues.apache.org/jira/browse/CASSANDRA-12142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364339#comment-15364339 ] Sylvain Lebresne commented on CASSANDRA-12142: -- This isn't clearly started in the description, but I think we should just use this ticket to introduce the v5 protocol, setting it in beta for now. That is, I see that ticket as mainly creating the {{native_protocol_v5.spec}} with support for that beta flag. As for the flag itself, we probably should add it to each message "flags" byte (2nd byte of the header) since the protocol version is more associated to messages than connection in practice. Which also mean I'm not sure we should expose this through the {{SUPPORTED}} message, since that message itself is conditioned by the protocol version. In other words, the way I'd implement this is that the server would know which version is beta and would just reject message from a beta version unless the "I'm ok using a beta version" flag is set on the message. That way, drivers that want to test if a given version is supported as stable can just try it (without the beta flag) and will just get an "unsupported" error if it's not. > Add "beta" version native protocol flag > --- > > Key: CASSANDRA-12142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12142 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Tyler Hobbs > Labels: protocolv5 > > As discussed in CASSANDRA-10786, we'd like to add a new flag to the native > protocol to allow drivers to connect using a "beta" native protocol version. > This would be used for native protocol versions that are still in development > and may not have all of the final features. Without the "beta" flag, drivers > will be prevented from using the protocol version. > This is primarily useful for driver authors to start work against a new > protocol version when the work on that spans multiple releases. Users would > not generally be expected to utilize this flag, although it could potentially > be used to offer early feedback on new protocol features. > It seems like the {{STARTUP}} message body is the best place for the new beta > flag. We may also considering adding protocol information to the > {{SUPPORTED}} message as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11424) Add support to "unset" JSON fields in prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364321#comment-15364321 ] Sylvain Lebresne commented on CASSANDRA-11424: -- But to finish my train of though and explore options, we could slightly extend the idea of allowing column declarations by saying that the columns you provide are the "mandatory" columns, which are set to {{null}} if absent from the value, but that other values can be provided in the JSON, in which case the equivalent of {{IGNORE_OMITTED}} would be: {noformat} INSERT INTO t() JSON '{"k": "v"}'; {noformat} but I admittedly worries that it will be confusing that the two following queries aren't equivalent: {noformat} INSERT INTO t() JSON '{"k": "v"}'; // Would leave any column outside of k unset INSERT INTO t JSON '{"k": "v"}';// Would set any column outside of k to null {noformat} > Add support to "unset" JSON fields in prepared statements > - > > Key: CASSANDRA-11424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11424 > Project: Cassandra > Issue Type: Improvement >Reporter: Ralf Steppacher > > CASSANDRA-7304 introduced the ability to distinguish between {{NULL}} and > {{UNSET}} prepared statement parameters. > When inserting JSON objects it is not possible to profit from this as a > prepared statement only has one parameter that is bound to the JSON object as > a whole. There is no way to control {{NULL}} vs {{UNSET}} behavior for > columns omitted from the JSON object. > Please extend on CASSANDRA-7304 to include JSON support. > {color:grey} > (My personal requirement is to be able to insert JSON objects with optional > fields without incurring the overhead of creating a tombstone of every column > not covered by the JSON object upon initial(!) insert.) > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11424) Add support to "unset" JSON fields in prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364306#comment-15364306 ] Sylvain Lebresne commented on CASSANDRA-11424: -- Actually, I didn't think straight. If you care about preparing the query, you probably don't know the column you are interested in advance, so it doesn't help so much. > Add support to "unset" JSON fields in prepared statements > - > > Key: CASSANDRA-11424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11424 > Project: Cassandra > Issue Type: Improvement >Reporter: Ralf Steppacher > > CASSANDRA-7304 introduced the ability to distinguish between {{NULL}} and > {{UNSET}} prepared statement parameters. > When inserting JSON objects it is not possible to profit from this as a > prepared statement only has one parameter that is bound to the JSON object as > a whole. There is no way to control {{NULL}} vs {{UNSET}} behavior for > columns omitted from the JSON object. > Please extend on CASSANDRA-7304 to include JSON support. > {color:grey} > (My personal requirement is to be able to insert JSON objects with optional > fields without incurring the overhead of creating a tombstone of every column > not covered by the JSON object upon initial(!) insert.) > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364285#comment-15364285 ] Sylvain Lebresne commented on CASSANDRA-12143: -- [~michaelsembwever] the fix looks obviously good, but would you have a few minutes to also add a unit test to demonstrate the failure? > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck > Fix For: 2.2.7 > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11424) Add support to "unset" JSON fields in prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364283#comment-15364283 ] Sylvain Lebresne commented on CASSANDRA-11424: -- I believe we discussed that on the original JSON ticket but pushed it to "later". The idea we had however was to allow specifying the columns that you want to set like for a normal statement. So, to allow: {noformat} INSERT INTO t(k) JSON '{"k": "v"}' {noformat} I see 2 advantages to that syntax over a new keyword: # it avoids adding new syntax, and reuse a know syntax. And feels possibly more intuitive as a result. # it's actually more flexible than a single flag, as you can have some column left unset and others defaulted to null depending. For those reasons, this would have my personal preference. > Add support to "unset" JSON fields in prepared statements > - > > Key: CASSANDRA-11424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11424 > Project: Cassandra > Issue Type: Improvement >Reporter: Ralf Steppacher > > CASSANDRA-7304 introduced the ability to distinguish between {{NULL}} and > {{UNSET}} prepared statement parameters. > When inserting JSON objects it is not possible to profit from this as a > prepared statement only has one parameter that is bound to the JSON object as > a whole. There is no way to control {{NULL}} vs {{UNSET}} behavior for > columns omitted from the JSON object. > Please extend on CASSANDRA-7304 to include JSON support. > {color:grey} > (My personal requirement is to be able to insert JSON objects with optional > fields without incurring the overhead of creating a tombstone of every column > not covered by the JSON object upon initial(!) insert.) > {color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12030) Range tombstones that are masked by row tombstones should not be written out
[ https://issues.apache.org/jira/browse/CASSANDRA-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12030. -- Resolution: Fixed Fix Version/s: 2.2.8 2.1.16 I can buy that range tombstone are painful enough in 2.1/2.2 that it's worth alieviating some of the pain given this is pretty simple. I've run CI on the patch below to be sure, but as there doesn't seem to be new failures, committed, thanks. | [12030-2.1|https://github.com/pcmanus/cassandra/commits/12030-2.1] | [utests|http://cassci.datastax.com/job/pcmanus-12030-2.1-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-12030-2.1-dtest] | | [12030-2.2|https://github.com/pcmanus/cassandra/commits/12030-2.2] | [utests|http://cassci.datastax.com/job/pcmanus-12030-2.2-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-12030-2.2-dtest] | > Range tombstones that are masked by row tombstones should not be written out > > > Key: CASSANDRA-12030 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12030 > Project: Cassandra > Issue Type: Improvement >Reporter: Nachiket Patil >Assignee: Nachiket Patil >Priority: Minor > Fix For: 2.1.16, 2.2.8 > > Attachments: cassandra-12030-2.1.diff, cassandra-12030-2.2.diff > > > During compaction, if a table has range tombstone and a row delete with > higher timestamp than range tombstone, both are written out to disk. Some > problems seen because of this behavior: > 1. The Range tombstone is expensive to maintain. > 2. Range queries see timeouts when there are too many range tombstones > present which may be masked by row tombstones. > This can be avoided with simple optimization to not write out range tombstone > if it is masked by row delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11427: - Fix Version/s: (was: 2.2.7) 2.2.6 > Range slice queries CL > ONE trigger read-repair of purgeable tombstones > > > Key: CASSANDRA-11427 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11427 > Project: Cassandra > Issue Type: Bug >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Minor > Fix For: 2.2.6 > > Attachments: 11427-2.1.patch, 11427-2.2_v2.patch > > > Range queries will trigger read repairs for purgeable tombstones on hosts > that already compacted given tombstones. Clusters with periodical jobs for > scanning data ranges will likely see tombstones ressurected through RRs just > to have them compacted again later at the destination host. > Executing range queries (e.g. for reading token ranges) will compare the > actual data instead of using digests when executed with CL > ONE. Responses > will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the > result of {{RowDataResolver.resolveSuperset}} is used as the reference > version for the results. {{RowDataResolver.scheduleRepairs}} will then send > the superset to all nodes that returned a different result before. > Unfortunately this does also involve cases where the superset is just made up > of purgeable tombstone(s) that already have been compacted on the other > nodes. In this case a read-repair will be triggered for transfering the > purgeable tombstones to all other nodes nodes that returned an empty result. > The issue can be reproduced with the provided dtest or manually using the > following steps: > {noformat} > create keyspace test1 with replication = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2 }; > use test1; > create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = > {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND > dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0; > delete from test1 where a = 'a'; > {noformat} > {noformat} > ccm flush; > ccm node2 compact; > {noformat} > {noformat} > use test1; > consistency all; > tracing on; > select * from test1; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12143) NPE when trying to remove purgable tombstones from result
[ https://issues.apache.org/jira/browse/CASSANDRA-12143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12143: - Reviewer: Sylvain Lebresne > NPE when trying to remove purgable tombstones from result > - > > Key: CASSANDRA-12143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12143 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck > Fix For: 2.2.7 > > Attachments: 12143-2.2.txt > > > A cluster running 2.2.6 started throwing NPEs. > (500K exceptions on a node was seen.) > {noformat}WARN … AbstractLocalAwareExecutorService.java:169 - Uncaught > exception on thread Thread[SharedPool-Worker-5,5,main]: {} > java.lang.NullPointerException: null{noformat} > Bisecting this highlighted commit d3db33c008542c7044f3ed8c19f3a45679fcf52e as > the culprit, which was a fix for CASSANDRA-11427. > This commit added a line to "remove purgable tombstones from result" but > failed to null check the {{data}} variable first. This variable comes from > {{Row.cf}} which is permitted to be null where the CFS has no data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12102) Refactor and simplify Filtering-related StatementRestriction part
[ https://issues.apache.org/jira/browse/CASSANDRA-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364088#comment-15364088 ] Sylvain Lebresne commented on CASSANDRA-12102: -- We actually do index decisions multiple times, on both coordinator (since we do have some validation that depends on the index) and replicas. And I agree that it would nice to avoid that. Ideally, we'd etablish some kind of "query plan" on the coordinator (that plan would neatly separate what parts are done by which index (it could be even relatively extensible for custome index so they can ship whatever info they want for use on the replicas)), what is filtered, ...), and ship that plan with the {{ReadCommand}}. That would be nice. However, that definitively means a messaging chance, so that's at best a 4.x patch. It also probably have a non-empty intersection with CASSANDRA-10765, so we should make sure to discuss this generally before rushing into any of those ticket. > Refactor and simplify Filtering-related StatementRestriction part > - > > Key: CASSANDRA-12102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12102 > Project: Cassandra > Issue Type: Improvement >Reporter: Alex Petrov > > {{RestrictionSet}} hierarchy was significantly improved by [CASSANDRA-11354], > although filtering-related {{Restrictions}} are split with restrictions that > would go through the 2i already in {{RowFilter}}. > There's still a clear separation, although it's currently made in {{Index}} > implementations and {{RowFilter}}, by removing things that were handled by > {{Index}} for post-filtering like it's done > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ReadCommand.java#L374-L378]. > > The primary concern is that we've seen several times that there are many > corner-cases, so we may benefit from splitting columns that are handled by > the index from ones that are handled by post-filtering early in code and > possibly keeping them split for all parts of query. > I suggest to split {{ClusteringColumnRestrictions#addRowFilterTo}} into two > parts, {{addIndexRestrictions}} and {{addFilteringRestrictions}}. The change > should be quite simple and make the code simpler to understand and extend the > filtering / indexing rules. > -One problem, as noted by [~blerer] is that index decision is made on > replica, depending on cardinality, so splitting them too early might not > work.- the decision is actually made on the coordinator after > [CASSANDRA-10215], although that might involve a larger refactoring, although > might still help to keep code related to indexing / filtering together. > We can also validate that all restrictions have been respected (although we > can do that now as well). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12045) Cassandra failure during write query at consistency LOCAL_QUORUM
[ https://issues.apache.org/jira/browse/CASSANDRA-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12045. -- Resolution: Not A Problem As the error says, the values you are trying to insert are too big (they can't fit in a single commit log segment, which they have to). You can bump the {{commitlog_segment_size_in_mb}} segment *on all nodes* if you wish, but I wouldn't necessarilly advise it as Cassandra isn't optimized for storing large values. The fact you store your xml file in a text column suggests to me that you're not compressing it, and given the size of your xml, I'd personally definitively compress it (client side). Otherwise, the traditional way to deal with large values is to manually chunk them. Overall, there is no bug here and this is a user question, so I'm going to close this ticket and please reach out on the user mailing list (see bottom of http://cassandra.apache.org/) if you need more info/help. > Cassandra failure during write query at consistency LOCAL_QUORUM > -- > > Key: CASSANDRA-12045 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12045 > Project: Cassandra > Issue Type: Bug > Components: CQL, Local Write-Read Paths > Environment: Eclipse java environment >Reporter: Raghavendra Pinninti > Fix For: 3.x > > Original Estimate: 12h > Remaining Estimate: 12h > > While I am writing xml file into Cassandra table column I am facing following > exception.Its a 3 node cluster and All nodes are up. > com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure > during write query at consistency LOCAL_QUORUM (2 responses were required but > only 0 replica responded, 1 failed) at > com.datastax.driver.core.exceptions.WriteFailureException.copy(WriteFailureException.java:80) > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) > at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) > at DBConnection.oracle2Cassandra(DBConnection.java:267) at > DBConnection.main(DBConnection.java:292) Caused by: > com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure > during write query at consistency LOCAL_QUORUM (2 responses were required but > only 0 replica responded, 1 failed) at > com.datastax.driver.core.exceptions.WriteFailureException.copy(WriteFailureException.java:91) > at com.datastax.driver.core.Responses$Error.asException(Responses.java:119) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) > at > com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) > It would be great if someone helps me out from this situation. Thanks > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12031) "LEAK DETECTED" during incremental repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-12031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364071#comment-15364071 ] Sylvain Lebresne commented on CASSANDRA-12031: -- The message says that we hadn't released some memory that we should have, which is a bug, but the presence of the message imply said memory has just been freed, so no, you shouldn't be worried too much. Of course, we should still fix this. But there has been a few fix related to leaked references recently, so I would suggest upgrading to 2.2.7 (which has just been voted on successfully so should be available later today or tomorrow) and see if you can reproduce there. If you can, maybe some additional log context for the error message could help track this down. > "LEAK DETECTED" during incremental repairs > -- > > Key: CASSANDRA-12031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12031 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6.6, x86_64, Cassandra 2.2.4 >Reporter: vin01 >Priority: Minor > > I encountered some errors during an incremental repair session which look > like :- > ERROR [Reference-Reaper:1] 2016-06-19 03:28:35,884 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@2ce0fab3) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1513857473:Memory@[7f2d462191f0..7f2d46219510) > was not released before the reference was garbage collected > Should i be worried about these? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12031) "LEAK DETECTED" during incremental repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-12031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12031: - Status: Awaiting Feedback (was: Open) > "LEAK DETECTED" during incremental repairs > -- > > Key: CASSANDRA-12031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12031 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6.6, x86_64, Cassandra 2.2.4 >Reporter: vin01 >Priority: Minor > > I encountered some errors during an incremental repair session which look > like :- > ERROR [Reference-Reaper:1] 2016-06-19 03:28:35,884 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@2ce0fab3) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1513857473:Memory@[7f2d462191f0..7f2d46219510) > was not released before the reference was garbage collected > Should i be worried about these? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12030) Range tombstones that are masked by row tombstones should not be written out
[ https://issues.apache.org/jira/browse/CASSANDRA-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12030: - Reviewer: Sylvain Lebresne > Range tombstones that are masked by row tombstones should not be written out > > > Key: CASSANDRA-12030 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12030 > Project: Cassandra > Issue Type: Improvement >Reporter: Nachiket Patil >Assignee: Nachiket Patil >Priority: Minor > Attachments: cassandra-12030-2.1.diff, cassandra-12030-2.2.diff > > > During compaction, if a table has range tombstone and a row delete with > higher timestamp than range tombstone, both are written out to disk. Some > problems seen because of this behavior: > 1. The Range tombstone is expensive to maintain. > 2. Range queries see timeouts when there are too many range tombstones > present which may be masked by row tombstones. > This can be avoided with simple optimization to not write out range tombstone > if it is masked by row delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12005) Out of memory error in MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-12005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364025#comment-15364025 ] Sylvain Lebresne commented on CASSANDRA-12005: -- This happens when deserializing a mutation on a replica, and more precisely it seems to OOMs when allocating space for a value of that mutation. So in theory, it's possible that you are sometimes writing too many pretty large values in a short time span and that's too much for the nodes. That said, I'm not pretending the way C* reacts is terribly helpful (nor that I'm sure that this is what happens). At the very least, maybe we should intercept the error and log the size of the value we tried to allocate, to see if that's crazy or not. > Out of memory error in MessagingService > --- > > Key: CASSANDRA-12005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12005 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Ubuntu 14.04.4 LTS 3.13.0-79-generic #123-Ubuntu SMP > x86_64 > Cassandra ReleaseVersion: 2.2.5 > java version "1.8.0_31" > Java(TM) SE Runtime Environment (build 1.8.0_31-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) >Reporter: Chris Powell > > I am periodically loosing nodes due to the below OOM error. The nodes restart > perfectly fine. It appears intermittent and randomly affects nodes. There are > no other warnings or errors in the log files. > I am using the {{GCG1}} with the following options: > {quote} > JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" > JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" > JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1" > JVM_OPTS="$JVM_OPTS -XX:+UseTLAB" > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" > JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" > JVM_OPTS="$JVM_OPTS -XX:+ResizeTLAB" > JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" > JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=10" > JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" > {quote} > ERROR [MessagingService-Incoming-/10.184.11.109] 2016-06-14 13:00:20,237 > CassandraDaemon.java:185 - Exception in thread > Thread[MessagingService-Incoming-/10.184.11.109,5,main] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:322) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:126) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:109) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91) > ~[apache-cassandra-2.2.5.jar:2.2.5] > ERROR [MessagingService-Incoming-/10.184.11.109] 2016-06-14 13:00:20,239 > JVMStabilityInspector.java:117 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:322) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at > org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:126) > ~[apache-cassandra-2.2.5.jar:2.2.5] > at >
[jira] [Resolved] (CASSANDRA-11985) 2.0.x leaks file handles (Again)
[ https://issues.apache.org/jira/browse/CASSANDRA-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-11985. -- Resolution: Cannot Reproduce As you said, 2.0 isn't supported anymore, and we did make changes to more recent version regarding how we track open/used files, so this might actually be fixed in newer versions. It's also the kind of things that is pretty hard to track without reproduction steps. So I'm afraid I'm going to close this as it's been only reproduced on a now unsupported version. If you can reproduce on newer versions, please, do feel free to re-open (but as said above, some more info on how we could reproduce would then be helpful). > 2.0.x leaks file handles (Again) > > > Key: CASSANDRA-11985 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11985 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Unix kernel-2.6.32-431.56.1.el6.x86_64, Cassandra 2.0.14 >Reporter: Amit Singh Chowdhery >Priority: Critical > Labels: Compaction > > We are running Cassandra 2.0.14 in production environment and disk usage is > very high. On investigating it further we found that there are around 4-5 > files(~ 150 GB) in stuck mode. > Command Fired : lsof /var/lib/cassandra | grep -i deleted > Output : > java12158 cassandra 308r REG 8,16 34396638044 12727268 > /var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db > (deleted) > java12158 cassandra 327r REG 8,16 101982374806 12715102 > /var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db > (deleted) > java12158 cassandra 339r REG 8,16 12966304784 12714010 > /var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db > (deleted) > java12158 cassandra 379r REG 8,16 15323318036 12714957 > /var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db > (deleted) > we are not able to see these files in any directory. This is somewhat similar > to CASSANDRA-6275 which is fixed but still issue is there on higher version. > Also in logs no error related to compaction is reported. > so could any one of you please provide any suggestion how to counter this. > Restarting Cassandra is one solution but this issue keeps on occurring so we > cannot restart production machine is not recommended so frequently. > Also we know that this version is not supported but there is high probability > that it can occur in higher version too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12125) ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 CassandraDaemon.java:185 - Exception in thread Thread[MemtableFlushWriter:4,5,main] java.lang.RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12125: - Status: Awaiting Feedback (was: Open) > ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 > CassandraDaemon.java:185 - Exception in thread > Thread[MemtableFlushWriter:4,5,main] java.lang.RuntimeException: Last > written key DecoratedKey(.XX, X) >= current key DecoratedKey > > > Key: CASSANDRA-12125 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12125 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: RHEL-6.5 64-bit Apache Cassandra 2.2.5v >Reporter: Relish Chackochan > Fix For: 2.2.x > > > We are running on RHEL-6.5 64-bit with Apache Cassandra 2.2.5v on 4 node > cluster and getting the following error on multiple node while running the > repair job and when getting the error repair job is hang. > Can some one help to identify the issue. > {code} > ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 > CassandraDaemon.java:185 - Exception in thread > Thread[MemtableFlushWriter:4,5,main] > java.lang.RuntimeException: Last written key DecoratedKey(1467371986.8870, > 313436373337313938362e38383730) >= current key DecoratedKey(, > 313436373337323030312e38383730) writing into > /opt/cassandra/data/proddb/log_data1-0a5092a0a4fa11e5872fc1ce0a46dc27/.maxdatetimeindex_idx/tmp-la-470-big-Data.db > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12125) ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 CassandraDaemon.java:185 - Exception in thread Thread[MemtableFlushWriter:4,5,main] java.lang.RuntimeExcepti
[ https://issues.apache.org/jira/browse/CASSANDRA-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364010#comment-15364010 ] Sylvain Lebresne commented on CASSANDRA-12125: -- The message says some partition are passed in the wrong order, and that seems to happen during flush, which is uncanny. That said, it's hard to say much more with more information. In particular, this happens on a 2ndary index table and it would help to have the index definition (including the schema of table on which the index is on). A bit more of the log prior to that error could also add some useful context. Of course, if you have a way to reproduce, that's even better. > ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 > CassandraDaemon.java:185 - Exception in thread > Thread[MemtableFlushWriter:4,5,main] java.lang.RuntimeException: Last > written key DecoratedKey(.XX, X) >= current key DecoratedKey > > > Key: CASSANDRA-12125 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12125 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: RHEL-6.5 64-bit Apache Cassandra 2.2.5v >Reporter: Relish Chackochan > Fix For: 2.2.x > > > We are running on RHEL-6.5 64-bit with Apache Cassandra 2.2.5v on 4 node > cluster and getting the following error on multiple node while running the > repair job and when getting the error repair job is hang. > Can some one help to identify the issue. > {code} > ERROR [MemtableFlushWriter:4] 2016-07-01 06:20:41,137 > CassandraDaemon.java:185 - Exception in thread > Thread[MemtableFlushWriter:4,5,main] > java.lang.RuntimeException: Last written key DecoratedKey(1467371986.8870, > 313436373337313938362e38383730) >= current key DecoratedKey(, > 313436373337323030312e38383730) writing into > /opt/cassandra/data/proddb/log_data1-0a5092a0a4fa11e5872fc1ce0a46dc27/.maxdatetimeindex_idx/tmp-la-470-big-Data.db > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12115) Command that returns the seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12115. -- Resolution: Duplicate The seeds are a property of the yaml and we've talked about exposing the/some yaml settings through CASSANDRA-7622, so that would be part of it. And hence I'm going to close this as a duplicate of CASSANDRA-7622. I'll also note that I'm not fond of adding something adhoc "in the meantime" because it's pretty easy to get that info otherwise (just do a {{grep 'seeds' cassandra.yaml}} on every node) so that would feel to me like feature creep. > Command that returns the seeds > -- > > Key: CASSANDRA-12115 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12115 > Project: Cassandra > Issue Type: Improvement >Reporter: Guillaume Labelle >Priority: Minor > Labels: lhf > Fix For: 2.1.x, 3.x > > > Would be nice to get a command that would return the list of seed nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12051) JSON does not take functions
[ https://issues.apache.org/jira/browse/CASSANDRA-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12051. -- Resolution: Won't Fix I'm afraid we have no plan on supporting this. JSON support is meant as a convenience but we do not want to do anything too complex/esoteric with it. In particular, some things are not and will not be possible with JSON, like function calls, and you should either stick to normal CQL or transform your JSON client side if needed. More particularly, supporting functions-in-JSON-strings would be pretty messy: we'd have to try to parse every JSON string to see if it looks like function calls, and what if you genuinenly wanted to add a string that happens to look like a function call? This would get messy pretty quickly. So sorry, but we won't support that. > JSON does not take functions > > > Key: CASSANDRA-12051 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12051 > Project: Cassandra > Issue Type: Improvement >Reporter: Tianshi Wang > > toTimestamp(now()) does not work in JSON format. > {code} > cqlsh:ops> create table test ( >... id int, >... ts timestamp, >... primary key(id) >... ); > cqlsh:ops> insert into test (id, ts) values (1, toTimestamp(now())); > cqlsh:ops> select * from test; > id | ts > +- > 1 | 2016-06-21 18:46:28.753000+ > (1 rows) > cqlsh:ops> insert into test JSON '{"id":2,"ts":toTimestamp(now())}'; > InvalidRequest: code=2200 [Invalid query] message="Could not decode JSON > string as a map: org.codehaus.jackson.JsonParseException: Unrecognized token > 'toTimestamp': was expecting > at [Source: java.io.StringReader@2da0329d; line: 1, column: 25]. (String > was: {"id":2,"ts":toTimestamp(now())})" > cqlsh:ops> insert into test JSON '{"id":2,"ts":"toTimestamp(now())"}'; > InvalidRequest: code=2200 [Invalid query] message="Error decoding JSON value > for ts: Unable to coerce 'toTimestamp(now())' to a formatted date (long)" > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions
[ https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362592#comment-15362592 ] Sylvain Lebresne commented on CASSANDRA-11828: -- s I like the general approach, I'd like to discuss where we should commit this as this is a non-trivial patch and touching critical part of the code. As CASSANDRA-11448 created this bug, I assume 2.1 onwards is affected? But it's only easy to fix on 2.2+ due to CASSANDRA-9669, correct? Assuming my understanding is correct, wouldn't it be simpler/less risky to just revert CASSANDRA-11448 for 2.1 and 2.2? Getting your flush writer to die when your run out of space isn't ideal, but it feels somewhat minor compared to possibly losing data. Anyway, with that out of the way, some remarks on the patch: * In {{CommitLogSegment}}, the "semi-lack" of abstraction of {{IntegerIntervals}} is a bit confusing, and I'm not sure it's justified performance-wise. For instance, it takes more time that necessary to understand that {{cfDirty}} also actually holds an interval, not just a position. I'd rather create a true {{IntInterval}} class (which can use a {{long}} internally if it wants, but I'm not sure that's justified), with maybe a mutable {{AtomicIntInterval}} sublcass having the {{expandToCover}} ({{coverInMap}} imo belongs to {{CommitLogSegment}}). Of course, {{IntInterval.Set}} can very well continue to use arrays internally since we never iterate on the set except for the tests (so in practice we'll rarely create actual {{IntInterval}} objects). * I don't think {{IntegerIntervals.Set.add()}} needs to be synchronized (it's called only a synchronized method in practice). It's not a big deal, but making the class thread-unsafe will be imo more expected in case of future reuse. * The new classes introduced ({{IntegerIntervals}} and {{ReplayIntervalSet}}) lack a minimum of javadoc, and could use a little bit more comments in general. * Variable naming in {{CommitLogSegment}} is now confusing. Should rename {{cleanPos}} to {{cleanInterval}} etc. Some comment should also be updated (at least on top of {{cfDirty}}/{{cfClean}} declarations and in {{CommitLogReplayer}}) accordingly. * At the end of {{CommitLogTest.testOutOfOrderLogDiscard}}, I'd add a comment on the last assert saying something like "In the absence of error, this should be 0 because forceRecycleAllSegments would have cleaned all segment. Because we know there was an error, we want to have something to replay" (took me a minute to figure out what that test was really testing). * I'm not sure to understand why {{CommitLogReplayer.firstNotCovered()}} uses the the first range {{getValue()}} instead of the {{getKey()}} (i.e. the {{lowerBound()}}). Also, leaving the {{ranges}} in {{ReplayIntervalSet}} private and using properly named accessors would be clearer imo. * We should fix the TODO in {{Tracker}} (not sure I understand it fully, but I would suspect it's fine to not notify an invalidated CF). Nits: * there is a few use of {{Integer}} in IntegerIntervals.Set where {{int}} would be fine. * there is a few inlined usage of {{getCurrentColumnFamilyStore()}} in {{CQLTester}}, can you replace them by calling the new method instead? * A few tests (in {{CommitLogTest.testUnwriteableFlushRecovery}} and {{IntegerIntervalsTest}}) don't uses braces after {{if}}/{{for}} even though the body is multi-line. I find it a bit esoteric and inconsistent with the code base, which hurts reading imo. > Commit log needs to track unflushed intervals rather than positions > --- > > Key: CASSANDRA-11828 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11828 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > Fix For: 2.2.x, 3.0.x, 3.x > > > In CASSANDRA-11448 in an effort to give a more thorough handling of flush > errors I have introduced a possible correctness bug with disk failure policy > ignore if a flush fails with an error: > - we report the error but continue > - we correctly do not update the commit log with the flush position > - but we allow the post-flush executor to resume > - a successful later flush can thus move the log's clear position beyond the > data from the failed flush > - the log will then delete segment(s) that contain unflushed data. > After CASSANDRA-9669 it is relatively easy to fix this problem by making the > commit log track sets of intervals of unflushed data (as described in > CASSANDRA-8496). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11943) Allow value > 64K for clustering columns
[ https://issues.apache.org/jira/browse/CASSANDRA-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362410#comment-15362410 ] Sylvain Lebresne commented on CASSANDRA-11943: -- This is actually meant to be a follow up of CASSANDRA-11882. We probably validate that value > 64K are rejected for clustering values (since CASSANDRA-11882), but the fact we have this limitation in the first place (for 3.x) is only that we write the min/max clustering values in the sstable metadata and use the old serialization for that which limit values to 64K. So the goal of this ticket is to lift that limitation by rewriting the sstable metadata component without that limitation backed in. This is thus a 4.0 ticket which will require a sstable major version bump. > Allow value > 64K for clustering columns > > > Key: CASSANDRA-11943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11943 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Lerh Chuan Low >Assignee: Sylvain Lebresne > Fix For: 4.x > > > Setup: > I set this up with a 2 node cluster, but I think with a 1 node cluster it > would encounter the same issue. Use Cassandra 3. > {code} > CREATE KEYSPACE Blues WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2}; > CREATE TABLE test (a text, b text, PRIMARY KEY ((a), b)) > {code} > Do the following insert: > {code} > CONSISTENCY ALL; > "INSERT INTO %s (a, b) VALUES ('foo', ?)", ' 64k>') > {code} > Everything is fine and you can still run queries and so on, C* looks normal. > But if we restart C*, it never succeeds in starting up: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.AssertionError: Attempted serializing to buffer exceeded maximum of > 65535 bytes: 131082 > at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:50) > ~[main/:na] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:372) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.blockForWrites(CommitLogReplayer.java:257) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:168) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:312) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) > [main/:na] > Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: > Attempted serializing to buffer exceeded maximum of 65535 bytes: 131082 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > ~[na:1.8.0_40] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:365) > ~[main/:na] > ... 6 common frames omitted > Caused by: java.lang.AssertionError: Attempted serializing to buffer exceeded > maximum of 65535 bytes: 131082 > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:309) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:286) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:235) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.serialize(MetadataSerializer.java:75) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.writeMetadata(BigTableWriter.java:378) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$300(BigTableWriter.java:51) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:342) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:280) > ~[main/:na] > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.prepareToCommit(SimpleSSTableMultiWriter.java:101) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1145) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1095) > ~[main/:na] >
[jira] [Updated] (CASSANDRA-11943) Allow value > 64K for clustering columns
[ https://issues.apache.org/jira/browse/CASSANDRA-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11943: - Fix Version/s: 4.x > Allow value > 64K for clustering columns > > > Key: CASSANDRA-11943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11943 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Lerh Chuan Low >Assignee: Sylvain Lebresne > Fix For: 4.x > > > Setup: > I set this up with a 2 node cluster, but I think with a 1 node cluster it > would encounter the same issue. Use Cassandra 3. > {code} > CREATE KEYSPACE Blues WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2}; > CREATE TABLE test (a text, b text, PRIMARY KEY ((a), b)) > {code} > Do the following insert: > {code} > CONSISTENCY ALL; > "INSERT INTO %s (a, b) VALUES ('foo', ?)", ' 64k>') > {code} > Everything is fine and you can still run queries and so on, C* looks normal. > But if we restart C*, it never succeeds in starting up: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.AssertionError: Attempted serializing to buffer exceeded maximum of > 65535 bytes: 131082 > at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:50) > ~[main/:na] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:372) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.blockForWrites(CommitLogReplayer.java:257) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:168) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:312) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) > [main/:na] > Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: > Attempted serializing to buffer exceeded maximum of 65535 bytes: 131082 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > ~[na:1.8.0_40] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:365) > ~[main/:na] > ... 6 common frames omitted > Caused by: java.lang.AssertionError: Attempted serializing to buffer exceeded > maximum of 65535 bytes: 131082 > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:309) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:286) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:235) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.serialize(MetadataSerializer.java:75) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.writeMetadata(BigTableWriter.java:378) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$300(BigTableWriter.java:51) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:342) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:280) > ~[main/:na] > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.prepareToCommit(SimpleSSTableMultiWriter.java:101) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1145) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1095) > ~[main/:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] > {code} > The same error as before can be reproduced if instead of restarting C* we > call {{nodetool flush}} after the insert, it looks like while flushing > Memtables and attempting to serialize {{SSTableMetadata}} it still expects > CKeys less than 64k. -- This message was sent by Atlassian JIRA
[jira] [Updated] (CASSANDRA-11943) Allow value > 64K for clustering columns
[ https://issues.apache.org/jira/browse/CASSANDRA-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11943: - Summary: Allow value > 64K for clustering columns (was: >64k Clustering Keys cannot be flushed) > Allow value > 64K for clustering columns > > > Key: CASSANDRA-11943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11943 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Lerh Chuan Low >Assignee: Sylvain Lebresne > > Setup: > I set this up with a 2 node cluster, but I think with a 1 node cluster it > would encounter the same issue. Use Cassandra 3. > {code} > CREATE KEYSPACE Blues WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2}; > CREATE TABLE test (a text, b text, PRIMARY KEY ((a), b)) > {code} > Do the following insert: > {code} > CONSISTENCY ALL; > "INSERT INTO %s (a, b) VALUES ('foo', ?)", ' 64k>') > {code} > Everything is fine and you can still run queries and so on, C* looks normal. > But if we restart C*, it never succeeds in starting up: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.AssertionError: Attempted serializing to buffer exceeded maximum of > 65535 bytes: 131082 > at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:50) > ~[main/:na] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:372) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.blockForWrites(CommitLogReplayer.java:257) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:168) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:312) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) > [main/:na] > Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: > Attempted serializing to buffer exceeded maximum of 65535 bytes: 131082 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > ~[na:1.8.0_40] > at > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:365) > ~[main/:na] > ... 6 common frames omitted > Caused by: java.lang.AssertionError: Attempted serializing to buffer exceeded > maximum of 65535 bytes: 131082 > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:309) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:286) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.serialize(StatsMetadata.java:235) > ~[main/:na] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.serialize(MetadataSerializer.java:75) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.writeMetadata(BigTableWriter.java:378) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$300(BigTableWriter.java:51) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:342) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) > ~[main/:na] > at > org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:280) > ~[main/:na] > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.prepareToCommit(SimpleSSTableMultiWriter.java:101) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1145) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1095) > ~[main/:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] > {code} > The same error as before can be reproduced if instead of restarting C* we > call {{nodetool flush}} after the insert, it looks like while flushing > Memtables and attempting to serialize {{SSTableMetadata}} it still expects > CKeys less than 64k. --
[jira] [Updated] (CASSANDRA-8831) Create a system table to expose prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-8831: Status: Open (was: Patch Available) > Create a system table to expose prepared statements > --- > > Key: CASSANDRA-8831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8831 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > Labels: client-impacting, docs-impacting > Fix For: 3.x > > > Because drivers abstract from users the handling of up/down nodes, they have > to deal with the fact that when a node is restarted (or join), it won't know > any prepared statement. Drivers could somewhat ignore that problem and wait > for a query to return an error (that the statement is unknown by the node) to > re-prepare the query on that node, but it's relatively inefficient because > every time a node comes back up, you'll get bad latency spikes due to some > queries first failing, then being re-prepared and then only being executed. > So instead, drivers (at least the java driver but I believe others do as > well) pro-actively re-prepare statements when a node comes up. It solves the > latency problem, but currently every driver instance blindly re-prepare all > statements, meaning that in a large cluster with many clients there is a lot > of duplication of work (it would be enough for a single client to prepare the > statements) and a bigger than necessary load on the node that started. > An idea to solve this it to have a (cheap) way for clients to check if some > statements are prepared on the node. There is different options to provide > that but what I'd suggest is to add a system table to expose the (cached) > prepared statements because: > # it's reasonably straightforward to implement: we just add a line to the > table when a statement is prepared and remove it when it's evicted (we > already have eviction listeners). We'd also truncate the table on startup but > that's easy enough). We can even switch it to a "virtual table" if/when > CASSANDRA-7622 lands but it's trivial to do with a normal table in the > meantime. > # it doesn't require a change to the protocol or something like that. It > could even be done in 2.1 if we wish to. > # exposing prepared statements feels like a genuinely useful information to > have (outside of the problem exposed here that is), if only for > debugging/educational purposes. > The exposed table could look something like: > {noformat} > CREATE TABLE system.prepared_statements ( >keyspace_name text, >table_name text, >prepared_id blob, >query_string text, >PRIMARY KEY (keyspace_name, table_name, prepared_id) > ) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8831) Create a system table to expose prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362394#comment-15362394 ] Sylvain Lebresne commented on CASSANDRA-8831: - I'm sorry for having dropped that review on the floor. On the patch (which needs rebasing but that probably won't change the patch much), mostly lgtm but a few small points: * The unit test would be bit more useful if it was testing proper execution of the statements when they are reloaded (and that you get an exception when they are dropped) rather than entirely relying on {{preparedStatementsCount()}}. * In {{QueryProcessor}}, should add {{@VisibleForTest}} on {{clearPreparedStatements}} * In {{QueryProcessor.removeInvalidPreparedStatementsForFunction}}, I much prefer using {{Iterables.any(statement.statement.getFunctions(), matchesFunction)}} as was the case before this patch than {{StreamSupport.stream(pstmt.getValue().statement.getFunctions().spliterator(), false).anyMatch(matchesFunction::apply)}} * Really a small nit, but I would add a {{MD5Digest.byteBuffer()}} method instead wrapping manually in {{SystemKeyspace}}. > Create a system table to expose prepared statements > --- > > Key: CASSANDRA-8831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8831 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Robert Stupp > Labels: client-impacting, docs-impacting > Fix For: 3.x > > > Because drivers abstract from users the handling of up/down nodes, they have > to deal with the fact that when a node is restarted (or join), it won't know > any prepared statement. Drivers could somewhat ignore that problem and wait > for a query to return an error (that the statement is unknown by the node) to > re-prepare the query on that node, but it's relatively inefficient because > every time a node comes back up, you'll get bad latency spikes due to some > queries first failing, then being re-prepared and then only being executed. > So instead, drivers (at least the java driver but I believe others do as > well) pro-actively re-prepare statements when a node comes up. It solves the > latency problem, but currently every driver instance blindly re-prepare all > statements, meaning that in a large cluster with many clients there is a lot > of duplication of work (it would be enough for a single client to prepare the > statements) and a bigger than necessary load on the node that started. > An idea to solve this it to have a (cheap) way for clients to check if some > statements are prepared on the node. There is different options to provide > that but what I'd suggest is to add a system table to expose the (cached) > prepared statements because: > # it's reasonably straightforward to implement: we just add a line to the > table when a statement is prepared and remove it when it's evicted (we > already have eviction listeners). We'd also truncate the table on startup but > that's easy enough). We can even switch it to a "virtual table" if/when > CASSANDRA-7622 lands but it's trivial to do with a normal table in the > meantime. > # it doesn't require a change to the protocol or something like that. It > could even be done in 2.1 if we wish to. > # exposing prepared statements feels like a genuinely useful information to > have (outside of the problem exposed here that is), if only for > debugging/educational purposes. > The exposed table could look something like: > {noformat} > CREATE TABLE system.prepared_statements ( >keyspace_name text, >table_name text, >prepared_id blob, >query_string text, >PRIMARY KEY (keyspace_name, table_name, prepared_id) > ) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12123) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_next_2_1_x_To_current_3_x.cql3_non_compound_range_tombstones_test
[ https://issues.apache.org/jira/browse/CASSANDRA-12123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362370#comment-15362370 ] Sylvain Lebresne commented on CASSANDRA-12123: -- Not sure about [~rhatch] upgrade test (the job seems to have failed but it's unclear to me why) but +1 on the patch otherwise. > dtest failure in > upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_next_2_1_x_To_current_3_x.cql3_non_compound_range_tombstones_test > --- > > Key: CASSANDRA-12123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12123 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Tyler Hobbs > Labels: dtest > Fix For: 3.0.x, 3.x > > > example failure: > http://cassci.datastax.com/job/upgrade_tests-all-custom_branch_runs/37/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_Upgrade_next_2_1_x_To_current_3_x/cql3_non_compound_range_tombstones_test > Failed on CassCI build upgrade_tests-all-custom_branch_runs #37 > Failing here: > {code} > File "/home/automaton/cassandra-dtest/upgrade_tests/cql_tests.py", line > 1667, in cql3_non_compound_range_tombstones_test > self.assertEqual(6, len(row), row) > {code} > As we see, the row has more data returned. This implies that data isn't > properly being shadowed by the tombstone. As such, I'm filing this directly > as a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362279#comment-15362279 ] Sylvain Lebresne commented on CASSANDRA-12126: -- I "think" you are right that this can happen, and that committing an empty commit on SERIAL reads "should" fix it. Paxos is however subtle enough that I would feel more confident with this if we had a reproduction test first, if only so we can validate whatever patch we come up with. [~jkni] I believe you've spend some time on jespen-like tests for paxos, do you think this is something we could use to try to reproduce something like that relatively consistently? > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug >Reporter: sankalp kohli > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval
[ https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11349: - Resolution: Fixed Assignee: Branimir Lambov (was: Stefan Podkowinski) Fix Version/s: (was: 2.2.x) (was: 2.1.x) 2.2.8 2.1.16 Status: Resolved (was: Patch Available) Perfect, thanks, committed. > MerkleTree mismatch when multiple range tombstones exists for the same > partition and interval > - > > Key: CASSANDRA-11349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11349 > Project: Cassandra > Issue Type: Bug >Reporter: Fabien Rousseau >Assignee: Branimir Lambov > Labels: repair > Fix For: 2.1.16, 2.2.8 > > Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, > 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch > > > We observed that repair, for some of our clusters, streamed a lot of data and > many partitions were "out of sync". > Moreover, the read repair mismatch ratio is around 3% on those clusters, > which is really high. > After investigation, it appears that, if two range tombstones exists for a > partition for the same range/interval, they're both included in the merkle > tree computation. > But, if for some reason, on another node, the two range tombstones were > already compacted into a single range tombstone, this will result in a merkle > tree difference. > Currently, this is clearly bad because MerkleTree differences are dependent > on compactions (and if a partition is deleted and created multiple times, the > only way to ensure that repair "works correctly"/"don't overstream data" is > to major compact before each repair... which is not really feasible). > Below is a list of steps allowing to easily reproduce this case: > {noformat} > ccm create test -v 2.1.13 -n 2 -s > ccm node1 cqlsh > CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 2}; > USE test_rt; > CREATE TABLE IF NOT EXISTS table1 ( > c1 text, > c2 text, > c3 float, > c4 float, > PRIMARY KEY ((c1), c2) > ); > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > # now flush only one of the two nodes > ccm node1 flush > ccm node1 cqlsh > USE test_rt; > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > ccm node1 repair > # now grep the log and observe that there was some inconstencies detected > between nodes (while it shouldn't have detected any) > ccm node1 showlog | grep "out of sync" > {noformat} > Consequences of this are a costly repair, accumulating many small SSTables > (up to thousands for a rather short period of time when using VNodes, the > time for compaction to absorb those small files), but also an increased size > on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12130) SASI related tests failing since CASSANDRA-11820
[ https://issues.apache.org/jira/browse/CASSANDRA-12130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362246#comment-15362246 ] Sylvain Lebresne commented on CASSANDRA-12130: -- +1 (tests aren't fully clear, but the SASI tests are passing, so whatever the problem is, it's not this patch fault) > SASI related tests failing since CASSANDRA-11820 > > > Key: CASSANDRA-12130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12130 > Project: Cassandra > Issue Type: Test >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe > Fix For: 3.x > > > Since CASSANDRA-11820 was committed, a number of tests covering SASI have > been failing. In both {{SASIIndexTest}} and {{SSTableFlushObserverTest}}, > rows are built using an unsorted builder, which assumes that the columns are > added in clustering order. However, in both cases, this is not true and the > additional checks added to {{UnfilteredSerializer::serializeRowBody}} by > CASSANDRA-11820 now trigger assertion errors and, ultimately, failing tests. > In addition, {{SASIIndexTest}} reuses a single table in multiple tests and > performs its cleanup in the tear down method. When the assertion error is > triggered, the tear down is not run, leaving data in the table and causing > other failures in subsequent tests. > Patch to follow shortly... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7396) Allow selecting Map key, List index
[ https://issues.apache.org/jira/browse/CASSANDRA-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359122#comment-15359122 ] Sylvain Lebresne commented on CASSANDRA-7396: - Remarks on the patch: * As this basically uses terms in select clauses, this should be rebased on top of CASSANDRA-10783, rather than redoing it's own thing. I'm in particular not at all a fan of the "dynamic" thing, especially when have already the concept of {{Terminal}} and {{NonTerminal}} terms to deal with the same thing. * This only allows element/slice selection directly on a column name, and without nesting, which is imo overly restrictive (we don't have that restriction for field selection for instance). That does change a bit how we want this to work. * The way {{SelectStatement}} deals with {{ColumnFilter}} feels a bit hacky to me. I understand that we cannot always compute the {{ColumnFilter}} at preparation time anymore, and that we may want to avoid doing it at execution time if we can, but I think that could be more cleanly abstracted. * The patch seems to use {{null}} to handle the absence of {{from}} or {{to}} in the slice syntax. I'm not sure about that. I think we should refuse {{null}} but accept {{unset}} and make it equivalent to not having a value. That's more logical imo. * I'm not sure about passing the Cell to the {{ResultSetBuilder}}. First because having an {{Object}} array is somewhat ugly, but also because I think trying to push along this line in CASSANDRA-7826 will get complicated. I think it's simpler to serialize what we get from the storage blindly, and let selector extract subselections from the serialized form aferwards (which they can do without deserializing, working directly on the serialized form). * It's a bit of an edge case, but {{SELECT m, m\['4'..'6'\] FROM ...}} wasn't working as expected, as the {{ColumnFilter}} only ended up querying the selected slice, ignoring the full column selection. * There is also a problem with {{SELECT m\[3..4\] FROM ...}} because the parser parse {{3.}} as a float and fails to recognize the slice syntax afterwards. Mor eon tat below. I took a shot at fixing those [here|https://github.com/pcmanus/cassandra/commits/7396-trunk], which ends up looking quite a bit different. I did however struggled with ANTLR, and there is currently still a few parsing issue that prevent this from being "patch available": # The problem with {{SELECT m\[3..4\] FROM ...}} where {{3.}} is lexed as a float. I tried to change the lexer using ANTLR a syntactic predicate to presumably not lex {{3.}} as a float if it's followed by another {{.}}, but I must have gotten that wrong as it didn't work. I also tried fixing in the parser, making the accept '\[' term '.' term '\]' and rejecting that afterwards if the left-most term isn't what it should, but ANTLR ended with crazy conflicts. Anyway, I'm currently a bit out of options. # For some weird reason, ANTLR also started complaining about {{DISTINCT}} and {{JSON}} not being reserved function names. That it complains isn't all that weird, since after all, something like {{SELECT DISTINCT (3, 4) FROM .. }} *is* ambiguous (it could either be a DISTINCT query on a tuple, or a function call), but what is weird is that it complains following the changes made by that patch, which ought to be unrelated. It should have complained in CASSANDRA-10783 where the ambiguity was created, but somehow didn't. I've currently resolved that by make the keywords reserved, which is strictly speaking a potential breaking change. That said, that's one problem I'm personally willing to live with: in hindsight it sounds like a bad idea to not have those be reserved, and there is an upgrade path for the few users that might use them as unreserved. # I wasn't able to make ANTLR accept the new syntax in it's more general form. Basically, we only allow column names on the left-hand side of the new syntax. That is, we accept {{SELECT m\[3\]\['foo'..'bar'\] FROM}}, but not {{SELECT f(c)\[3\]}} for instance. I'd really rather avoid that limitation as we don't have it for UDT field selection, but I was unable to have ANTLR be reasonable. Anyway, the patch is currently "blocked" by those parsing issues and if someone knowledgeable with ANTLR feels like having a look, I certainly wouldn't mind. > Allow selecting Map key, List index > --- > > Key: CASSANDRA-7396 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7396 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Jonathan Ellis >Assignee: Robert Stupp > Labels: cql, docs-impacting > Fix For: 3.x > > Attachments: 7396_unit_tests.txt > > > Allow "SELECT map['key]" and "SELECT list[index]." (Selecting a UDT subfield >
[jira] [Resolved] (CASSANDRA-12110) Cassandra udf's calling in java code
[ https://issues.apache.org/jira/browse/CASSANDRA-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12110. -- Resolution: Invalid I think the reported answered its own question, and questions about usage should go the mailing list in the first place anyway, so closing. > Cassandra udf's calling in java code > > > Key: CASSANDRA-12110 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12110 > Project: Cassandra > Issue Type: Test > Components: CQL > Environment: Linux and java >Reporter: Raghavendra Pinninti >Priority: Minor > Labels: cassandra, cqlsh, java, triggers, udf > Fix For: 3.0.8 > > > I created two udf's and one trigger in Cassandra(3.2) cqlsh.How to check > existed udf's and triggers in cassandra? How can I call them through datastax > java driver in java code? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12107) Fix range scans for table with live static rows
[ https://issues.apache.org/jira/browse/CASSANDRA-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12107: - Status: Patch Available (was: Open) > Fix range scans for table with live static rows > --- > > Key: CASSANDRA-12107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12107 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Sharvanath Pathak > Labels: patch-available > Fix For: 3.0.8 > > Attachments: repro > > > We were seeing some weird behaviour with limit based scan queries. In > particular, we see the following: > {noformat} > $ cqlsh -k sd -e "consistency local_quorum; SELECT uuid, token(uuid) FROM > files WHERE token(uuid) >= token('6b470c3e43ee06d1') limit 2" > Consistency level set to LOCAL_QUORUM. > uuid | system.token(uuid) > --+-- > 6b470c3e43ee06d1 | -9218823070349964862 > 484b091ca97803cd | -8954822859271125729 > (2 rows) > $ cqlsh -k sd -e "consistency local_quorum; SELECT uuid, token(uuid) FROM > files WHERE token(uuid) > token('6b470c3e43ee06d1') limit 1" > Consistency level set to LOCAL_QUORUM. > uuid | system.token(uuid) > --+-- > c348aaec2f1e4b85 | -9218781105444826588 > {noformat} > In the table uuid is partition key, and it has a clustering key as well. > So the uuid "c348aaec2f1e4b85" should be the second one in the limit query. > After some investigation, it seems to me like the issue is in the way > DataLimits handles static rows. Here is a patch for trunk > (https://github.com/sharvanath/cassandra/commit/9a460d40e55bd7e3604d987ed4df5c8c2e03ffdc) > which seems to fix it for me. Please take a look, seems like a pretty > critical issue to me. > I have forked the dtests for it as well. However, since trunk has some > failures already, I'm not fully sure how to infer the results. > http://cassci.datastax.com/view/Dev/view/sharvanath/job/sharvanath-fixScan-dtest/ > http://cassci.datastax.com/view/Dev/view/sharvanath/job/sharvanath-fixScan-testall/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12101) DESCRIBE INDEX: missing quotes for case-sensitive index name
[ https://issues.apache.org/jira/browse/CASSANDRA-12101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12101: - Assignee: Stefania > DESCRIBE INDEX: missing quotes for case-sensitive index name > > > Key: CASSANDRA-12101 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12101 > Project: Cassandra > Issue Type: Bug >Reporter: Julien >Assignee: Stefania >Priority: Minor > Labels: cqlsh, lhf > > Create a custom index with a case-sensitive name. > The result of the DESCRIBE INDEX command does not have quotes around the > index name. As a result, the index cannot be recreated with this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12101) DESCRIBE INDEX: missing quotes for case-sensitive index name
[ https://issues.apache.org/jira/browse/CASSANDRA-12101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12101: - Labels: cqlsh lhf (was: cqlsh) > DESCRIBE INDEX: missing quotes for case-sensitive index name > > > Key: CASSANDRA-12101 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12101 > Project: Cassandra > Issue Type: Bug >Reporter: Julien >Priority: Minor > Labels: cqlsh, lhf > > Create a custom index with a case-sensitive name. > The result of the DESCRIBE INDEX command does not have quotes around the > index name. As a result, the index cannot be recreated with this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358724#comment-15358724 ] Sylvain Lebresne commented on CASSANDRA-12075: -- bq. The only thing is that if we add an exception which we dont want good drivers to not retry, we also need to patch the driver right away for this to work. Well, you have to change the native protocol to add any exception and that require a driver change anyway. > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10433) Reduce contention in CompositeType instance interning
[ https://issues.apache.org/jira/browse/CASSANDRA-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-10433: - Resolution: Fixed Fix Version/s: (was: 2.2.x) 2.2.4 2.1.16 Status: Resolved (was: Patch Available) I'm not I agree this is critical for 2.1 at this point, but as it's simple enough and has been somewhat vetted on 2.2 by now, not going to argue, committed. [~pauloricardomg] For info however, we don't re-open ticket for which something has been committed and released, which was the case here. Please open another ticket next time in that case. > Reduce contention in CompositeType instance interning > - > > Key: CASSANDRA-10433 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10433 > Project: Cassandra > Issue Type: Improvement > Environment: Cassandra 2.2.1 running on 6 AWS c3.4xlarge nodes, > CentOS 6.6 >Reporter: David Schlosnagle >Assignee: David Schlosnagle >Priority: Minor > Fix For: 2.1.16, 2.2.4 > > Attachments: > 0001-Avoid-contention-in-CompositeType-instance-interning.patch > > > While running some workload tests on Cassandra 2.2.1 and profiling with > flight recorder in a test environment, we have noticed significant contention > on the static synchronized > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) method. > We are seeing threads blocked for 22.828 seconds from a 60 second snapshot > while under a mix of reads and writes from a Thrift based client. > I would propose to reduce contention in > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) by using a > ConcurrentHashMap for the instances cache. > {code} > Contention Back Trace > org.apache.cassandra.db.marshal.CompositeType.getInstance(List) > > org.apache.cassandra.db.composites.AbstractCompoundCellNameType.asAbstractType() > org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, boolean) > org.apache.cassandra.db.SuperColumns.getComparatorFor(CFMetaData, > ByteBuffer) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnNames(CFMetaData, > ByteBuffer, Iterable) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnPath(CFMetaData, > ColumnPath) > > org.apache.cassandra.thrift.ThriftValidation.validateColumnOrSuperColumn(CFMetaData, > ByteBuffer, ColumnOrSuperColumn) > > org.apache.cassandra.thrift.ThriftValidation.validateMutation(CFMetaData, > ByteBuffer, Mutation) > > org.apache.cassandra.thrift.CassandraServer.createMutationList(ConsistencyLevel, > Map, boolean) > > org.apache.cassandra.thrift.CassandraServer.batch_mutate(Map, > ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra$Iface, > Cassandra$batch_mutate_args) > > org.apache.cassandra.thrift.ThriftValidation.validateRange(CFMetaData, > ColumnParent, SliceRange) > > org.apache.cassandra.thrift.ThriftValidation.validatePredicate(CFMetaData, > ColumnParent, SlicePredicate) > > org.apache.cassandra.thrift.CassandraServer.get_range_slices(ColumnParent, > SlicePredicate, KeyRange, ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra$Iface, > Cassandra$get_range_slices_args) > > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Object, > TBase) > org.apache.thrift.ProcessFunction.process(int, TProtocol, > TProtocol, Object) > org.apache.thrift.TBaseProcessor.process(TProtocol, > TProtocol) > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run() > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > java.util.concurrent.ThreadPoolExecutor$Worker.run() > > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(String, > List, ColumnParent, long, SlicePredicate, ConsistencyLevel, ClientState) > > org.apache.cassandra.thrift.CassandraServer.multiget_slice(List, > ColumnParent, SlicePredicate, ConsistencyLevel) > > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Cassandra$Iface, > Cassandra$multiget_slice_args) > > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Object, > TBase) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12090) Digest mismatch if static column is NULL
[ https://issues.apache.org/jira/browse/CASSANDRA-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12090: - Resolution: Fixed Fix Version/s: 3.9 3.0.9 Reproduced In: 3.7, 3.0.7 (was: 3.0.7, 3.7) Status: Resolved (was: Patch Available) You're right, when serializing we ignore the column names if there is no static row so we get a different result pre and post deserialization, and fixing the digest is the most sensible approach. I'll note that in fact, it was misguided to include the column names in the digest in the first place, and I added a comment to that regard, but we'll have to wait on 4.0 and a new protocol version for that. Anyway, pushed CI on the patch and it looked "clean" (the failures were either also on the non-patched branches, or were fairly clearly unrelated and not reproducing locally): | [12090-3.0|https://github.com/pcmanus/cassandra/commits/12090-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-12090-3.0-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-12090-3.0-dtest] | So committed, thanks. > Digest mismatch if static column is NULL > > > Key: CASSANDRA-12090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12090 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 3.0.9, 3.9 > > Attachments: 12090.txt, trace.txt > > > If a table has a static column and this column has a null value for a > partition a SELECT on this partition will always trigger a digest mismatch, > but the following full data read will not trigger a read repair since there > is no mismatch in the data. > This can be recreated using a 3 node ccm cluster with the following commands: > {code:sql} > CREATE KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': '3' }; > CREATE TABLE foo.foo ( key int, foo int, col int static, PRIMARY KEY (key, > foo) ); > CONSISTENCY QUORUM; > INSERT INTO foo.foo (key, foo) VALUES ( 1,1); > TRACING ON; > SELECT * FROM foo.foo WHERE key = 1 and foo =1; > {code} > I have added the trace in an attachment. In the trace you can see that digest > read is performed and that there is a digest mismatch, but the full data read > does not result in a mismatch. Repeating the SELECT statement will give the > same trace over and over. > The problem seams to be that the name of the static column is included when > the digest response is calculated even if the column has no value. When the > digest for the data response is calculated the column name is not included. > I think the can be solved by updating {{UnfilteredRowIterators.digest()}} so > excludes the static column if it has no value. I have a patch that does this, > it merges to both 3.0 and trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12090) Digest mismatch if static column is NULL
[ https://issues.apache.org/jira/browse/CASSANDRA-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12090: - Component/s: Streaming and Messaging > Digest mismatch if static column is NULL > > > Key: CASSANDRA-12090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12090 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 3.0.9, 3.9 > > Attachments: 12090.txt, trace.txt > > > If a table has a static column and this column has a null value for a > partition a SELECT on this partition will always trigger a digest mismatch, > but the following full data read will not trigger a read repair since there > is no mismatch in the data. > This can be recreated using a 3 node ccm cluster with the following commands: > {code:sql} > CREATE KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': '3' }; > CREATE TABLE foo.foo ( key int, foo int, col int static, PRIMARY KEY (key, > foo) ); > CONSISTENCY QUORUM; > INSERT INTO foo.foo (key, foo) VALUES ( 1,1); > TRACING ON; > SELECT * FROM foo.foo WHERE key = 1 and foo =1; > {code} > I have added the trace in an attachment. In the trace you can see that digest > read is performed and that there is a digest mismatch, but the full data read > does not result in a mismatch. Repeating the SELECT statement will give the > same trace over and over. > The problem seams to be that the name of the static column is included when > the digest response is calculated even if the column has no value. When the > digest for the data response is calculated the column name is not included. > I think the can be solved by updating {{UnfilteredRowIterators.digest()}} so > excludes the static column if it has no value. I have a patch that does this, > it merges to both 3.0 and trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357189#comment-15357189 ] Sylvain Lebresne commented on CASSANDRA-12075: -- bq. By driver you mean Java Driver, Python driver, etc and by client you mean application code talking to C* right? Generally yes, though I tend to use both interchangeably since it's kind of the same as far as the server is concerned. bq. Say we send out TombstoneOverwhelmingException to driver, it should definitely not retry on its own no matter what retry policy the client code provide. Sure, but it's easy enough to just document that in the spec. Shipping a boolean doesn't really achieve anything since a crappy driver implementation can still ignore whatever we send, and a good implementation likely won't get that wrong in the first place. bq. Same example can be made with some other type of throttles as well Well, that's not even in yet, so let's not anticipate too much. Besides, on a "throttle exception", I would disagree that there is a single "right" course of action. Maybe trying the next node is good for some client, but other may prefer just throttling themselves as a result, bq. I am fine closing this JIRA and making these part of exceptions which driver can interpret and will not retry? Not sure I understand what you mean by that. But what I would personally do on that whole question is: # add to write timeout exceptions if the query was idempotent or not, as that's an important information regarding retries that would be useful to drivers but they don't have easily. # improve the protocol spec (and or maybe write a "recommendation for drivers author" doc) to specify for each exception we can return what can be sensibly be done about them (and what isn't sensible). But shipping some kind of enum telling drivers what they *must* do will imo be too limiting for most exceptions, and we can't guarantee drivers will respect them anyway (so why ship it on the wire every time? Let's just document it). > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Reviewer: Aleksey Yeschenko (was: Sylvain Lebresne) > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Sylvain Lebresne > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne reassigned CASSANDRA-11733: Assignee: Sylvain Lebresne (was: Dave Brosius) > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Sylvain Lebresne > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Priority: Major (was: Trivial) > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Dave Brosius > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Component/s: (was: Core) Local Write-Read Paths > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Dave Brosius > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Issue Type: Bug (was: Improvement) > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Fix Version/s: 3.0.x > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Dave Brosius >Assignee: Dave Brosius > Fix For: 3.0.x, 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11733) SSTableReversedIterator ignores range tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11733: - Summary: SSTableReversedIterator ignores range tombstones (was: cleanup unused tombstone collection in SSTableReversedIterator) > SSTableReversedIterator ignores range tombstones > > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11733) cleanup unused tombstone collection in SSTableReversedIterator
[ https://issues.apache.org/jira/browse/CASSANDRA-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357158#comment-15357158 ] Sylvain Lebresne commented on CASSANDRA-11733: -- It's definitively a problem that it's unused, but it's a bug, as it should be used. As it stands, reverse iteration just ignores range tombstone when sstables are hit. I'm attaching the (trivial) fix below with a simple unit test showing the issue. | [11733-3.0|https://github.com/pcmanus/cassandra/commits/11733-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-11733-3.0-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-11733-3.0-dtest] | | [11733-trunk|https://github.com/pcmanus/cassandra/commits/11733-trunk] | [utests|http://cassci.datastax.com/job/pcmanus-11733-trunk-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-11733-trunk-dtest] | > cleanup unused tombstone collection in SSTableReversedIterator > -- > > Key: CASSANDRA-11733 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11733 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 3.x > > Attachments: remove_delete.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12112) Tombstone histogram not accounting for partition deletions
[ https://issues.apache.org/jira/browse/CASSANDRA-12112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356897#comment-15356897 ] Sylvain Lebresne commented on CASSANDRA-12112: -- Don't know if that qualifies as critical (since it's only for 2.1 and 2.2), but it's trivial enough that I'm fine with it. +1 assuming CI doesn't show a problem. > Tombstone histogram not accounting for partition deletions > -- > > Key: CASSANDRA-12112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12112 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 2.1.x, 2.2.x > > > we need to account for top level deletions in the tombstone histogram -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356881#comment-15356881 ] Sylvain Lebresne commented on CASSANDRA-10707: -- Sorry for the long iterations on the reviews. I still have a bunch of remarks, though a lot are fairly minor. In general, I "think" the general logic if fine, but it's still a bit hard to wrap your head around so I'm also listing things that are unclear to me, for which some extra-commenting might be just what's missing. Anyway, here we go: * On {{GroupMaker.State}}: ** I'd promote it to a top-level {{GroupingState}} class since it's used in {{DataLimits}} too and would have a nice symmetry with {{PagingState}}. ** Any reason not to reuse the {{Clustering.Serializer}} in the serializer (it probably requires keeping the CFMetaData or ClusteringComparator around for the types but not a big deal)? In particular, the hand-rolled serialization doesn't handle nulls (and should). ** I think it would useful to spell out in more details what having or not having each component means. My understanding is that if {{partitionKey() != null}} and {{clustering == null}}, it means we haven't started counting the partition at all (meaning that seeing that partition in the previous page made use close a group and subsequently hit the page limit, hence stopping). But if {{clustering != null}}, it means we stopped the previous page in the middle of a group. But I'm not sure I'm fully correct (maybe I'm missing some edge cases in particular?) and documenting this would be nice. ** Nit: The comment on the {{clustering()}} method is incorrect (talks about "partition key"). * In general, the handling of the static column is a bit subtle, and it would be useful to have a comment somewhere that explain its general handling in details. For instance, the first case of {{GroupByAwareCounter.applyToPartition()}} contradicts slightly my understanding of what {{state.partitionKey()}} means. That is, I though that if reaching a new partition {{X}} makes us close a group and hit a page limit, then we'll stop and have {{X}} as the partitionKey in the state. However, on the next page, it seems we assume we've somehow already accounted the static row and I'm unclear why. The code in {{applyToStatic}} is also bemusing to me: to start with, what is the subtle difference between the first condition ({{!row.isEmpty() && (assumeLiveData || row.hasLiveData(nowInSec))}}) and the second one ({{hasLiveStaticRow}})? * In {{DataResolver}}, why not just calling {{rowCounted()}} unconditionally (rather than adding helper methods)? * In {{PkPrefixGroupMaker}}, instead of dealing with a ByteBuffer array, I'd just kept the {{Clustering}} of the last row seen (null initially), and the prefix size. We can then even delegate the check of whether the 2 clustering belong to the same group to some {{Clustering}} method. On that front, we shouldn't use {{ByteBuffer.equals}} but rather the {{ClusteringComparator}} types. It happens that some types equality does not align with the bytes equality (ok, one type, IntegerType, which allow as many zeroed bytes as prefix without changing the actual value). * In {{DataLimits.CQLGroupByLimits}} the silent assumption that {{filter()}} should only called on replica, and {{newCounter()}} on the coordinator, feels pretty error-prone for the caller (I haven't even carefully checked it's the case tbh). One option would be to rename the methods to {{filterOnReplica()}} and {{newCounterOnCoordinator()}} but even that feels a bit weird/limiting. I would prefer removing the {{onReplica}} flag completely and handling that in deserialization: we can assume when deserializing a {{DataLimits}} that we are on the {{replica}} (it's arguably still a somewhat silent assumption, but one that feels safer to me; we have no reason to serialize a {{ReadCommand}} except to send it to replica, while having a need to call {{filter()}} on the coordinator or {{newCounter}} on replicas in the future sounds very plausible), so we could just call some {{withoutClustering()}} method on the {{State}} if it's a range query. Another more explicit alternative could be to add some {{onReplica()}} method to {{ReadCommand}} that we'd call in {{ReadCommandVerbHandler}} (where we know for sure we are on a replica) that would return a modified {{ReadCommand}} that would have a modified {{DataLimits}}. That said, it's probably a bit more code just for that, so I think I'm personally fine with the deserializer option. * In {{GroupByAwareCounter}}: ** in {{applyToPartition}}, in the comment starting by {{partitionKey is the last key for which we're returned rows}}, I believe it should read {{state.partitionKey() is the last ...}}. ** at end of {{applyToPartition}}, why the {{!isDone()}} test? It seems that since you just changed the
[jira] [Updated] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-10707: - Status: Awaiting Feedback (was: Open) > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-10707: - Fix Version/s: 3.x Status: Open (was: Patch Available) > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11820) Altering a column's type causes EOF
[ https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11820: - Resolution: Fixed Fix Version/s: (was: 3.0.x) (was: 3.x) 3.9 3.0.9 Status: Resolved (was: Patch Available) Fixed the test logic and re-ran CI on both 3.0 and 3.9 branch (skipped trunk because it's == to 3.9 at this point): | [11820-3.0|https://github.com/pcmanus/cassandra/commits/11820-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-11820-3.0-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-11820-3.0-dtest] | | [11820-3.9|https://github.com/pcmanus/cassandra/commits/11820-3.9] | [utests|http://cassci.datastax.com/job/pcmanus-11820-3.9-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-11820-3.9-dtest] | Doesn't seem to have any failure related to the patch, so committed. Thanks. > Altering a column's type causes EOF > --- > > Key: CASSANDRA-11820 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11820 > Project: Cassandra > Issue Type: Bug >Reporter: Carl Yeksigian >Assignee: Sylvain Lebresne > Fix For: 3.0.9, 3.9 > > > While working on CASSANDRA-10309, I was testing altering columns' types. This > series of operations fails: > {code} > CREATE TABLE test (a int PRIMARY KEY, b int) > INSERT INTO test (a, b) VALUES (1, 1) > ALTER TABLE test ALTER b TYPE BLOB > SELECT * FROM test WHERE a = 1 > {code} > Tried this on 3.0 and trunk, both fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12044) Materialized view definition regression in clustering key
[ https://issues.apache.org/jira/browse/CASSANDRA-12044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12044: - Status: Ready to Commit (was: Patch Available) > Materialized view definition regression in clustering key > - > > Key: CASSANDRA-12044 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12044 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Mior >Assignee: Carl Yeksigian > > This bug was reported on the > [users|https://mail-archives.apache.org/mod_mbox/cassandra-user/201606.mbox/%3CCAG0vsSJRtRjLJqKsd3M8X-8nXpPwRj7Q80mNkuy8sy%2B%2B%3D%2BocHA%40mail.gmail.com%3E] > mailing list. The following definitions work in 3.0.3 but fail in 3.0.7. > {code} > CREATE TABLE ks.pa ( > id bigint, > sub_id text, > name text, > class text, > r_id bigint, > k_id bigint, > created timestamp, > priority int, > updated timestamp, > value text, > PRIMARY KEY (id, sub_id, name) > ); > CREATE ks.mv_pa AS > SELECT k_id, name, value, sub_id, id, class, r_id > FROM ks.pa > WHERE k_id IS NOT NULL AND name IS NOT NULL AND value IS NOT NULL AND > sub_id IS NOT NULL AND id IS NOT NULL > PRIMARY KEY ((k_id, name), value, sub_id, id); > {code} > After running bisect, I've narrowed it down to commit > [86ba227|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=86ba227477b9f8595eb610ecaf950cfbc29dd36b] > from [CASSANDRA-11475|https://issues.apache.org/jira/browse/CASSANDRA-11475]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12044) Materialized view definition regression in clustering key
[ https://issues.apache.org/jira/browse/CASSANDRA-12044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356763#comment-15356763 ] Sylvain Lebresne commented on CASSANDRA-12044: -- +1 > Materialized view definition regression in clustering key > - > > Key: CASSANDRA-12044 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12044 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Mior >Assignee: Carl Yeksigian > > This bug was reported on the > [users|https://mail-archives.apache.org/mod_mbox/cassandra-user/201606.mbox/%3CCAG0vsSJRtRjLJqKsd3M8X-8nXpPwRj7Q80mNkuy8sy%2B%2B%3D%2BocHA%40mail.gmail.com%3E] > mailing list. The following definitions work in 3.0.3 but fail in 3.0.7. > {code} > CREATE TABLE ks.pa ( > id bigint, > sub_id text, > name text, > class text, > r_id bigint, > k_id bigint, > created timestamp, > priority int, > updated timestamp, > value text, > PRIMARY KEY (id, sub_id, name) > ); > CREATE ks.mv_pa AS > SELECT k_id, name, value, sub_id, id, class, r_id > FROM ks.pa > WHERE k_id IS NOT NULL AND name IS NOT NULL AND value IS NOT NULL AND > sub_id IS NOT NULL AND id IS NOT NULL > PRIMARY KEY ((k_id, name), value, sub_id, id); > {code} > After running bisect, I've narrowed it down to commit > [86ba227|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=86ba227477b9f8595eb610ecaf950cfbc29dd36b] > from [CASSANDRA-11475|https://issues.apache.org/jira/browse/CASSANDRA-11475]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354742#comment-15354742 ] Sylvain Lebresne commented on CASSANDRA-12075: -- bq. If you get a Unavailable from co-ordinator, you should retry it on another host since this co-ordinator could be doing a long GC pause Well, no. You *should* not, you *may*. An unavailable exception can also mean that there is genuinely not enough live nodes to perform the query (that's even the true original intent, even though a long GC pause can be indistinguishable from a genuinely dead node), in which case it makes no particular sense to retry. And that's kind of my point: what to do for that kind of situation is and should be client dependent, so I'm not a fan of "dictating" a behavior server side. So I'm all for adding more infos to the exceptions so client can have as much useful data as possible to do the decision (hence returning whether the query is {{idempotent}}, which is an objective data), but I disagree we should decide what should be done. And in the case of an unavailable exception for instance, I don't any more objective info we could send right now. bq. This will be a better approach than driver special casing which exception to retry or not. I disagree. I think it's exactly a responsibility of the driver to decide that kind of things. Rather, as hinted above, I think it's the client that should decide what he prefers and the driver should provide enough flexibility for the client to do what he wants. I'll also note that whatever we send, the driver will be at liberty to ignore it so we don't win much by including such "recommendation" to the protocol itself. I'm happy however to improve the protocol spec to provide recommendations. > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12043) Syncing most recent commit in CAS across replicas can cause all CAS queries in the CQL partition to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-12043. -- Resolution: Fixed Reviewer: Jason Brown Fix Version/s: 3.9 3.0.9 2.2.7 2.1.15 I still needed to merge the branch upwards and wait on CI results for all branch. This is now done and tests seem "fine" (no failure appears related) so committed. Thanks. | [2.1|https://github.com/pcmanus/cassandra/commits/12043-2.1] | [utests|http://cassci.datastax.com/job/pcmanus-12043-2.1-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-12043-2.1-dtest/] || | [2.2|https://github.com/pcmanus/cassandra/commits/12043-2.2] | [utests|http://cassci.datastax.com/job/pcmanus-12043-2.2-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-12043-2.2-dtest/] || | [3.0|https://github.com/pcmanus/cassandra/commits/12043-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-12043-3.0-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-12043-3.0-dtest/] || | [3.9|https://github.com/pcmanus/cassandra/commits/12043-3.9] | [utests|http://cassci.datastax.com/job/pcmanus-12043-3.9-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-12043-3.9-dtest/] || > Syncing most recent commit in CAS across replicas can cause all CAS queries > in the CQL partition to fail > > > Key: CASSANDRA-12043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12043 > Project: Cassandra > Issue Type: Bug >Reporter: sankalp kohli >Assignee: Sylvain Lebresne > Fix For: 2.1.15, 2.2.7, 3.0.9, 3.9 > > > We update the most recent commit on requiredParticipant replicas if out of > sync during the prepare round in beginAndRepairPaxos method. We keep doing > this in a loop till the requiredParticipant replicas have the same most > recent commit or we hit timeout. > Say we have 3 machines A,B and C and gc grace on the table is 10 days. We do > a CAS write at time 0 and it went to A and B but not to C. C will get the > hint later but will not update the most recent commit in paxos table. This is > how CAS hints work. > In the paxos table whose gc_grace=0, most_recent_commit in A and B will be > inserted with timestamp 0 and with a TTL of 10 days. After 10 days, this > insert will become a tombstone at time 0 till it is compacted away since > gc_grace=0. > Do a CAS read after say 1 day on the same CQL partition and this time prepare > phase involved A and C. most_recent_commit on C for this CQL partition is > empty. A sends the most_recent_commit to C with a timestamp of 0 and with a > TTL of 10 days. This most_recent_commit on C will expire on 11th day since it > is inserted after 1 day. > most_recent_commit are now in sync on A,B and C, however A and B > most_recent_commit will expire on 10th day whereas for C it will expire on > 11th day since it was inserted one day later. > Do another CAS read after 10days when most_recent_commit on A and B have > expired and is treated as tombstones till compacted. In this CAS read, say A > and C are involved in prepare phase. most_recent_commit will not match > between them since it is expired in A and is still there on C. This will > cause most_recent_commit to be applied to A with a timestamp of 0 and TTL of > 10 days. If A has not compacted away the original most_recent_commit which > has expired, this new write to most_recent_commit wont be visible on reads > since there is a tombstone with same timestamp(Delete wins over data with > same timestamp). > Another round of prepare will follow and again A would say it does not know > about most_recent_write(covered by original write which is not a tombstone) > and C will again try to send the write to A. This can keep going on till the > request timeouts or only A and B are involved in the prepare phase. > When A’s original most_recent_commit which is now a tombstone is compacted, > all the inserts which it was covering will come live. This will in turn again > get played to another replica. This ping pong can keep going on for a long > time. > The issue is that most_recent_commit is expiring at different times across > replicas. When they get replayed to a replica to make it in sync, we again > set the TTL from that point. > During the CAS read which timed out, most_recent_commit was being sent to > another replica in a loop. Even in successful requests, it will try to loop > for a couple of times if involving A and C and then when the replicas which > respond are A and B, it will succeed. So this will have impact on latencies > as well. > These timeouts gets worse when a machine is down as no progress can be made >
[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval
[ https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352747#comment-15352747 ] Sylvain Lebresne commented on CASSANDRA-11349: -- Had a look here, and I'm more comfortable with sticking to [~blambov] approach. For 2.1 and 2.2, we're now in "only critical bug fixes" and running things through RTL definitively changes things too much for my comfort. That imply I'm fine not fixing every possible problems if that gets us too far (especially since it's properly fixed in 3.0 and not that many people seems to have reported this). And Branimir's approach seems to be making a good enough impact in practice. So [~blambov], could you rebase your patch for 2.1 and 2.2 and run CI. After which, if tests are good, I'm +1 committing. > MerkleTree mismatch when multiple range tombstones exists for the same > partition and interval > - > > Key: CASSANDRA-11349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11349 > Project: Cassandra > Issue Type: Bug >Reporter: Fabien Rousseau >Assignee: Stefan Podkowinski > Labels: repair > Fix For: 2.1.x, 2.2.x > > Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, > 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch > > > We observed that repair, for some of our clusters, streamed a lot of data and > many partitions were "out of sync". > Moreover, the read repair mismatch ratio is around 3% on those clusters, > which is really high. > After investigation, it appears that, if two range tombstones exists for a > partition for the same range/interval, they're both included in the merkle > tree computation. > But, if for some reason, on another node, the two range tombstones were > already compacted into a single range tombstone, this will result in a merkle > tree difference. > Currently, this is clearly bad because MerkleTree differences are dependent > on compactions (and if a partition is deleted and created multiple times, the > only way to ensure that repair "works correctly"/"don't overstream data" is > to major compact before each repair... which is not really feasible). > Below is a list of steps allowing to easily reproduce this case: > {noformat} > ccm create test -v 2.1.13 -n 2 -s > ccm node1 cqlsh > CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 2}; > USE test_rt; > CREATE TABLE IF NOT EXISTS table1 ( > c1 text, > c2 text, > c3 float, > c4 float, > PRIMARY KEY ((c1), c2) > ); > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > # now flush only one of the two nodes > ccm node1 flush > ccm node1 cqlsh > USE test_rt; > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > ccm node1 repair > # now grep the log and observe that there was some inconstencies detected > between nodes (while it shouldn't have detected any) > ccm node1 showlog | grep "out of sync" > {noformat} > Consequences of this are a costly repair, accumulating many small SSTables > (up to thousands for a rather short period of time when using VNodes, the > time for compaction to absorb those small files), but also an increased size > on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo
[ https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351581#comment-15351581 ] Sylvain Lebresne commented on CASSANDRA-8700: - As we wanted to have the new doc (even if incomplete) for 3.8 and we're about to freeze 3.8, I committed the branch. I'm sure everyone will agree this is better than what we have. Not that this is just so that some doc is included with the 3.8 artifacts, but I'll only push the doc online next week (with the release of 3.8) and we can still add/improve things till then. So I suggest keeping this ticket open until next week, and if someone has updates to the doc he want to suggest, he can just put it here and I'll include it directly. Once 3.8 is out, I'll close this ticket and we can start using new ticket for new contributions normally. > replace the wiki with docs in the git repo > -- > > Key: CASSANDRA-8700 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8700 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jon Haddad >Assignee: Sylvain Lebresne >Priority: Blocker > Fix For: 3.8 > > Attachments: TombstonesAndGcGrace.md, bloom_filters.md, > compression.md, contributing.zip, getting_started.zip, hardware.md > > > The wiki as it stands is pretty terrible. It takes several minutes to apply > a single update, and as a result, it's almost never updated. The information > there has very little context as to what version it applies to. Most people > I've talked to that try to use the information they find there find it is > more confusing than helpful. > I'd like to propose that instead of using the wiki, the doc directory in the > cassandra repo be used for docs (already used for CQL3 spec) in a format that > can be built to a variety of output formats like HTML / epub / etc. I won't > start the bikeshedding on which markup format is preferable - but there are > several options that can work perfectly fine. I've personally use sphinx w/ > restructured text, and markdown. Both can build easily and as an added bonus > be pushed to readthedocs (or something similar) automatically. For an > example, see cqlengine's documentation, which I think is already > significantly better than the wiki: > http://cqlengine.readthedocs.org/en/latest/ > In addition to being overall easier to maintain, putting the documentation in > the git repo adds context, since it evolves with the versions of Cassandra. > If the wiki were kept even remotely up to date, I wouldn't bother with this, > but not having at least some basic documentation in the repo, or anywhere > associated with the project, is frustrating. > For reference, the last 3 updates were: > 1/15/15 - updating committers list > 1/08/15 - updating contributers and how to contribute > 12/16/14 - added a link to CQL docs from wiki frontpage (by me) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11820) Altering a column's type causes EOF
[ https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11820: - Status: Patch Available (was: Open) Attaching patch for this below. All it does is making sure we use the "current" version of a ColumnDefinition when serializing a cell. That said, I'm not absolutely satisfied with this: while it's simple on principle, it's not all that easy to follow why the definition of the cell itself may not be up-to-date, and that make things probably a bit error prone (which, btw, is not particularly new to this patch). Not sure what is a better alternative though right now, we probably need a cleaner handling of metadata changes in general. But it's not worth leaving that bug in while we thing this through, so I suggest the patch below is good enough for now. || [3.0|https://github.com/pcmanus/cassandra/commits/11820-3.0] || [utests|http://cassci.datastax.com/job/pcmanus-11820-3.0-testall/] || [dtests|http://cassci.datastax.com/job/pcmanus-11820-3.0-dtest/] || (I'll push a trunk branch on CI later, but waiting on making sure tests passes on 3.0 first. Besides, I think it'll merge cleanly) > Altering a column's type causes EOF > --- > > Key: CASSANDRA-11820 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11820 > Project: Cassandra > Issue Type: Bug >Reporter: Carl Yeksigian >Assignee: Sylvain Lebresne > Fix For: 3.0.x, 3.x > > > While working on CASSANDRA-10309, I was testing altering columns' types. This > series of operations fails: > {code} > CREATE TABLE test (a int PRIMARY KEY, b int) > INSERT INTO test (a, b) VALUES (1, 1) > ALTER TABLE test ALTER b TYPE BLOB > SELECT * FROM test WHERE a = 1 > {code} > Tried this on 3.0 and trunk, both fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12090) Digest mismatch if static column is NULL
[ https://issues.apache.org/jira/browse/CASSANDRA-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12090: - Reproduced In: 3.0.7, 3.7 (was: 3.7, 3.0.7) Reviewer: Sylvain Lebresne > Digest mismatch if static column is NULL > > > Key: CASSANDRA-12090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12090 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Attachments: 12090.txt, trace.txt > > > If a table has a static column and this column has a null value for a > partition a SELECT on this partition will always trigger a digest mismatch, > but the following full data read will not trigger a read repair since there > is no mismatch in the data. > This can be recreated using a 3 node ccm cluster with the following commands: > {code:sql} > CREATE KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': '3' }; > CREATE TABLE foo.foo ( key int, foo int, col int static, PRIMARY KEY (key, > foo) ); > CONSISTENCY QUORUM; > INSERT INTO foo.foo (key, foo) VALUES ( 1,1); > TRACING ON; > SELECT * FROM foo.foo WHERE key = 1 and foo =1; > {code} > I have added the trace in an attachment. In the trace you can see that digest > read is performed and that there is a digest mismatch, but the full data read > does not result in a mismatch. Repeating the SELECT statement will give the > same trace over and over. > The problem seams to be that the name of the static column is included when > the digest response is calculated even if the column has no value. When the > digest for the data response is calculated the column name is not included. > I think the can be solved by updating {{UnfilteredRowIterators.digest()}} so > excludes the static column if it has no value. I have a patch that does this, > it merges to both 3.0 and trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo
[ https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348556#comment-15348556 ] Sylvain Lebresne commented on CASSANDRA-8700: - Friday update: the [branch|https://github.com/pcmanus/cassandra/commits/doc_in_tree] has all the parts that has been submitted so far. I also (mostly) finished migrating the CQL doc (even though there is still some parts that I plan to improve) and I reorganized the doc slightly. Typically, having the whole CQL doc in the same file (being it source file or html output) was really unwieldy, and that was kind of true of the other top-level topic, so I've split things up a bit. The result can be (temporarily) seen [here|http://www.lebresne.net/~mcmanus/cassandra-doc-test/html/tools/index.html]. One thing I do want to point out again is that the current doc is for trunk. In a perfect world, we'd have the same doc but adapted to earlier versions, but the main difference is going to be the CQL doc, and trying to "rebuild" the CQL doc for earlier versions from the migrated version is going to be really painful and time consuming and I'm not volunteering. Besides, we can keep the link to the existing CQL doc for those old versions. So basically I'm suggesting that we start publishing our new doc with 3.8, and from that point on, update it only for tick-tock releases. > replace the wiki with docs in the git repo > -- > > Key: CASSANDRA-8700 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8700 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jon Haddad >Assignee: Sylvain Lebresne >Priority: Blocker > Fix For: 3.8 > > Attachments: TombstonesAndGcGrace.md, bloom_filters.md, > compression.md, contributing.zip, getting_started.zip, hardware.md > > > The wiki as it stands is pretty terrible. It takes several minutes to apply > a single update, and as a result, it's almost never updated. The information > there has very little context as to what version it applies to. Most people > I've talked to that try to use the information they find there find it is > more confusing than helpful. > I'd like to propose that instead of using the wiki, the doc directory in the > cassandra repo be used for docs (already used for CQL3 spec) in a format that > can be built to a variety of output formats like HTML / epub / etc. I won't > start the bikeshedding on which markup format is preferable - but there are > several options that can work perfectly fine. I've personally use sphinx w/ > restructured text, and markdown. Both can build easily and as an added bonus > be pushed to readthedocs (or something similar) automatically. For an > example, see cqlengine's documentation, which I think is already > significantly better than the wiki: > http://cqlengine.readthedocs.org/en/latest/ > In addition to being overall easier to maintain, putting the documentation in > the git repo adds context, since it evolves with the versions of Cassandra. > If the wiki were kept even remotely up to date, I wouldn't bother with this, > but not having at least some basic documentation in the repo, or anywhere > associated with the project, is frustrating. > For reference, the last 3 updates were: > 1/15/15 - updating committers list > 1/08/15 - updating contributers and how to contribute > 12/16/14 - added a link to CQL docs from wiki frontpage (by me) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12039) Add a "post bootstrap task" to the index machinery
[ https://issues.apache.org/jira/browse/CASSANDRA-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12039: - Reviewer: Sam Tunnicliffe > Add a "post bootstrap task" to the index machinery > -- > > Key: CASSANDRA-12039 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12039 > Project: Cassandra > Issue Type: New Feature >Reporter: Sergio Bossa >Assignee: Sergio Bossa > > Custom index implementations might need to be notified when the node finishes > bootstrapping in order to execute some blocking tasks before the node itself > goes into NORMAL state. > This is a proposal to add such functionality, which should roughly require > the following: > 1) Add a {{getPostBootstrapTask}} callback to the {{Index}} interface. > 2) Add an {{executePostBootstrapBlockingTasks}} method to > {{SecondaryIndexManager}} calling into the previously mentioned callback. > 3) Hook that into {{StorageService#joinTokenRing}}. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12001) nodetool stopdaemon doesn't stop cassandra gracefully
[ https://issues.apache.org/jira/browse/CASSANDRA-12001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-12001: - Labels: lhf (was: ) > nodetool stopdaemon doesn't stop cassandra gracefully > > > Key: CASSANDRA-12001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12001 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Ubuntu: Linux 3.11.0-15-generic #25~precise1-Ubuntu SMP > Thu Jan 30 17:39:31 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > Cassandra Version : > cassandra -v > 2.1.2 >Reporter: Anshu Vajpayee >Priority: Minor > Labels: lhf > > As per general opinion, nodetool stopdaemon should perform graceful shutdown > rater than crash killing of cassandra daemon . > It doesn't flush the memtables and also it doesn't stop the thrift and CQL > connection interfaces before crashing/stopping the node. It directly calls > SIGTERM on process as simple as kill -15/ctrl + c. > > 1. created a table like as below: > cqlsh:test_ks> create table t2(id1 int, id2 text, primary key(id1)); > cqlsh:test_ks> > cqlsh:test_ks> insert into t2(id1,id2) values (1,'a'); > cqlsh:test_ks> insert into t2(id1,id2) values (2,'a'); > cqlsh:test_ks> insert into t2(id1,id2) values (3,'a'); > cqlsh:test_ks> select * from t2; > id1 | id2 > -+- >1 | a >2 | a >3 | a > 2.Flush the memtable manually using nodetool flush > student@cascor:~/node1/apache-cassandra-2.1.2/bin$ nodetool flush > student@cascor:~/node1/apache-cassandra-2.1.2/bin$ cd > ../data/data/test_ks/t2-a671f6b0319a11e6a91ae3263299699d/ > student@cascor:~/node1/apache-cassandra-2.1.2/data/data/test_ks/t2-a671f6b0319a11e6a91ae3263299699d$ > ls -ltr > total 36 > -rw-rw-r-- 1 student student 16 Jun 13 12:14 test_ks-t2-ka-1-Filter.db > -rw-rw-r-- 1 student student 54 Jun 13 12:14 test_ks-t2-ka-1-Index.db > -rw-rw-r-- 1 student student 93 Jun 13 12:14 test_ks-t2-ka-1-Data.db > -rw-rw-r-- 1 student student 91 Jun 13 12:14 test_ks-t2-ka-1-TOC.txt > -rw-rw-r-- 1 student student 80 Jun 13 12:14 test_ks-t2-ka-1-Summary.db > -rw-rw-r-- 1 student student 4442 Jun 13 12:14 test_ks-t2-ka-1-Statistics.db > -rw-rw-r-- 1 student student 10 Jun 13 12:14 test_ks-t2-ka-1-Digest.sha1 > -rw-rw-r-- 1 student student 43 Jun 13 12:14 > test_ks-t2-ka-1-CompressionInfo.db > 3. Make few more changes on table t2 > cqlsh:test_ks> insert into t2(id1,id2) values (5,'a'); > cqlsh:test_ks> insert into t2(id1,id2) values (6,'a'); > cqlsh:test_ks> insert into t2(id1,id2) values (7,'a'); > cqlsh:test_ks> insert into t2(id1,id2) values (8,'a'); > cqlsh:test_ks> select * from t2; > id1 | id2 > -+- >5 | a >1 | a >8 | a >2 | a >7 | a >6 | a >3 | a > 4. Stopping the node using nodetool stopdaemon > student@cascor:~$ nodetool stopdaemon > Cassandra has shutdown. > error: Connection refused > -- StackTrace -- > java.net.ConnectException: Connection refused > 5. No new version of SStables . Reason stopdaemon doesn't run nodetool > flush/drain before actually stopping daemon. > student@cascor:~/node1/apache-cassandra-2.1.2/data/data/test_ks/t2-a671f6b0319a11e6a91ae3263299699d$ > ls -ltr > total 36 > -rw-rw-r-- 1 student student 16 Jun 13 12:14 test_ks-t2-ka-1-Filter.db > -rw-rw-r-- 1 student student 54 Jun 13 12:14 test_ks-t2-ka-1-Index.db > -rw-rw-r-- 1 student student 93 Jun 13 12:14 test_ks-t2-ka-1-Data.db > -rw-rw-r-- 1 student student 91 Jun 13 12:14 test_ks-t2-ka-1-TOC.txt > -rw-rw-r-- 1 student student 80 Jun 13 12:14 test_ks-t2-ka-1-Summary.db > -rw-rw-r-- 1 student student 4442 Jun 13 12:14 test_ks-t2-ka-1-Statistics.db > -rw-rw-r-- 1 student student 10 Jun 13 12:14 test_ks-t2-ka-1-Digest.sha1 > -rw-rw-r-- 1 student student 43 Jun 13 12:14 > test_ks-t2-ka-1-CompressionInfo.db > student@cascor:~/node1/apache-cassandra-2.1.2/data/data/test_ks/t2-a671f6b0319a11e6a91ae3263299699d$ > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11996) SSTableSet.CANONICAL can miss sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11996: - Assignee: Marcus Eriksson > SSTableSet.CANONICAL can miss sstables > -- > > Key: CASSANDRA-11996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11996 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Critical > Fix For: 3.0.x, 3.x > > > There is a race where we might miss sstables in SSTableSet.CANONICAL when we > finish up a compaction. > Reproducing unit test pushed > [here|https://github.com/krummas/cassandra/commit/1292aaa61b89730cff0c022ed1262f45afd493e5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11978) StreamReader fails to write sstable if CF directory is symlink
[ https://issues.apache.org/jira/browse/CASSANDRA-11978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11978: - Labels: lhf (was: ) > StreamReader fails to write sstable if CF directory is symlink > -- > > Key: CASSANDRA-11978 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11978 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Michael Frisch > Labels: lhf > > I'm using Cassandra v2.2.6. If the CF is stored as a symlink in the keyspace > directory on disk then StreamReader.createWriter fails because > Descriptor.fromFilename is passed the actual path on disk instead of path > with the symlink. > Example: > /path/to/data/dir/Keyspace/CFName -> /path/to/data/dir/AnotherDisk/CFName > Descriptor.fromFilename is passed "/path/to/data/dir/AnotherDisk/CFName" > instead of "/path/to/data/dir/Keyspace/CFName", then it concludes that the > keyspace name is "AnotherDisk" which is erroneous. I've temporarily worked > around this by using cfs.keyspace.getName() to get the keyspace name and > cfs.name to get the CF name as those are correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11973) Is MemoryUtil.getShort() supposed to return a sign-extended or non-sign-extended value?
[ https://issues.apache.org/jira/browse/CASSANDRA-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11973: - Reviewer: Stefania > Is MemoryUtil.getShort() supposed to return a sign-extended or > non-sign-extended value? > --- > > Key: CASSANDRA-11973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11973 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Rei Odaira >Assignee: Rei Odaira >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > Attachments: 11973-2.2.txt > > > In org.apache.cassandra.utils.memory.MemoryUtil.getShort(), the returned > value of unsafe.getShort(address) is bit-wise-AND'ed with 0x, while that > of getShortByByte(address) is not. This inconsistency results in different > returned values when the short integer is negative. Which is preferred > behavior? Looking at NativeClustering and NativeCellTest, it seems like > non-sign-extension is assumed. > By the way, is there any reason MemoryUtil.getShort() and > MemoryUtil.getShortByByte() return "int", not "short"? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347976#comment-15347976 ] Sylvain Lebresne commented on CASSANDRA-12075: -- I think the idea is to indicate whether the query was idempotent in the first place or not. So a timeout on a counter update, or a list append would say it isn't. And I don't think that's crazy since currently client have to often rely on client declaring whether the query is idempotent since they don't parse queries (and kind of need to know on timeout for the purpose of retrying). That said, I think it's mostly useful for timeout exception, not all {{RequestExecutionException}} as I believe the other exceptions are precise enough for client to make their decision). I also don't think there is cases where we can meaningfully say it should be retried on a different host. But anyway, insofar as this ticket is about adding a boolean {{isIdemptotent}} to timeout exceptions, then I'm in favor of that. This is protocol v5 thing though. > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12060) Different failure format for failed LWT between 2.x and 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346432#comment-15346432 ] Sylvain Lebresne commented on CASSANDRA-12060: -- I didn't said that the inconsistency of this ticket was *due* to CASSANDRA-9842. I haven't checked closely, but It's almost surely due to the fact that in 3.0, the underlying paxos read only query the static part, and so when it gets back an empty result, it really has no way to know the difference between "the partition just doesn't exist" or "the partition actually exists, but has nothing for the static part (at least not for the part we have condition on)". What I'm saying is that it's due to the same fact than in CASSANDRA-9842: because ironically 3.0 is able to only query the static part, it doesn't know when it gets no results anything about the liveness of the partition. I say ironically because it wasn't a problem in 2.0 due to it being somewhat imprecise: to query the static part we were simply query the whole partition (since the storage engine didn't knew about static stuff) with a limit of 1. Thanks to that though, if we got back no static part, we knew if the partition was live (meant we did got one row back, it was just not static) or not (we got nothing back). And my point is that we should probably try to get back to the 2.x behavior, because 1) it's more helpful (we get more info when there is not static part) and 2) that avoid breaking changes between 2.x and 3.x. So in other words, when we have conditions on a static column, we should probably internally query both the static part *and* the first live row of the partition. The last part being slightly inefficient, but necessary to make the distinction between live/dead partition when there is nothing static. And if we do that, then we should apply it to both this ticket and the case covered by CASSANDRA-9842. > Different failure format for failed LWT between 2.x and 3.x > --- > > Key: CASSANDRA-12060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12060 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Petrov > > When executing following CQL commands: > {code} > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'datacenter1': '1' }; > USE test; > CREATE TABLE testtable (a int, b int, s1 int static, s2 int static, v int, > PRIMARY KEY (a, b)); > INSERT INTO testtable (a,b,s1,s2,v) VALUES (2,2,2,null,2); > DELETE s1 FROM testtable WHERE a = 2 IF s2 IN (10,20,30); > {code} > The output is different between {{2.x}} and {{3.x}}: > 2.x: > {code} > cqlsh:test> DELETE s1 FROM testtable WHERE a = 2 IF s2 = 5; > [applied] | s2 > ---+-- > False | null > {code} > 3.x: > {code} > cqlsh:test> DELETE s1 FROM testtable WHERE a = 2 IF s2 = 5; > [applied] > --- > False > {code} > {{2.x}} would although return same result if executed on a partition that > does not exist at all: > {code} > cqlsh:test> DELETE s1 FROM testtable WHERE a = 5 IF s2 = 5; > [applied] > --- > False > {code} > It _might_ be related to static column LWTs, as I could not reproduce same > behaviour with non-static column LWTs. The most recent change was > [CASSANDRA-10532], which enabled LWT operations on static columns with > partition keys only. -Another possible relation is [CASSANDRA-9842], which > removed distinction between {{null}} column and non-existing row.- (striked > through since same happens on pre-[CASSANDRA-9842] code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11820) Altering a column's type causes EOF
[ https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346152#comment-15346152 ] Sylvain Lebresne commented on CASSANDRA-11820: -- Had a look. The reason this happen is that when deserializing from the sstable, we use the proper type, but create the deserialized cell using a {{ColumnDefinition}} that has that old type, so when we re-serialize later for intra-node communication, the wrong type is used and it breaks during deserialization. The good news is, nothing is corrupted, it just fails during the processing of the query (and not even when reading the sstable). I'll wrote a patch to fix it, but it'll probably be only early next week as I have a few other stuffs I want to finish this week (I'll make sure this gets in 3.0.8/3.8 in any case). > Altering a column's type causes EOF > --- > > Key: CASSANDRA-11820 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11820 > Project: Cassandra > Issue Type: Bug >Reporter: Carl Yeksigian > Fix For: 3.0.x, 3.x > > > While working on CASSANDRA-10309, I was testing altering columns' types. This > series of operations fails: > {code} > CREATE TABLE test (a int PRIMARY KEY, b int) > INSERT INTO test (a, b) VALUES (1, 1) > ALTER TABLE test ALTER b TYPE BLOB > SELECT * FROM test WHERE a = 1 > {code} > Tried this on 3.0 and trunk, both fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11820) Altering a column's type causes EOF
[ https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne reassigned CASSANDRA-11820: Assignee: Sylvain Lebresne > Altering a column's type causes EOF > --- > > Key: CASSANDRA-11820 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11820 > Project: Cassandra > Issue Type: Bug >Reporter: Carl Yeksigian >Assignee: Sylvain Lebresne > Fix For: 3.0.x, 3.x > > > While working on CASSANDRA-10309, I was testing altering columns' types. This > series of operations fails: > {code} > CREATE TABLE test (a int PRIMARY KEY, b int) > INSERT INTO test (a, b) VALUES (1, 1) > ALTER TABLE test ALTER b TYPE BLOB > SELECT * FROM test WHERE a = 1 > {code} > Tried this on 3.0 and trunk, both fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11327) Maintain a histogram of times when writes are blocked due to no available memory
[ https://issues.apache.org/jira/browse/CASSANDRA-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11327: - Resolution: Fixed Fix Version/s: 3.0.8 3.8 Status: Resolved (was: Ready to Commit) Committed, thanks. > Maintain a histogram of times when writes are blocked due to no available > memory > > > Key: CASSANDRA-11327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11327 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.8, 3.0.8 > > > I have a theory that part of the reason C* is so sensitive to timeouts during > saturating write load is that throughput is basically a sawtooth with valleys > at zero. This is something I have observed and it gets worse as you add 2i to > a table or do anything that decreases the throughput of flushing. > I think the fix for this is to incrementally release memory pinned by > memtables and 2i during flushing instead of releasing it all at once. I know > that's not really possible, but we can fake it with memory accounting that > tracks how close to completion flushing is and releases permits for > additional memory. This will lead to a bit of a sawtooth in real memory > usage, but we can account for that so the peak footprint is the same. > I think the end result of this change will be a sawtooth, but the valley of > the sawtooth will not be zero it will be the rate at which flushing > progresses. Optimizing the rate at which flushing progresses and it's > fairness with other work can then be tackled separately. > Before we do this I think we should demonstrate that pinned memory due to > flushing is actually the issue by getting better visibility into the > distribution of instances of not having any memory by maintaining a histogram > of spans of time where no memory is available and a thread is blocked. > [MemtableAllocatr$SubPool.allocate(long)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java#L186] > should be a relatively straightforward entry point for this. The first > thread to block can mark the start of memory starvation and the last thread > out can mark the end. Have a periodic task that tracks the amount of time > spent blocked per interval of time and if it is greater than some threshold > log with more details, possibly at debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11755) nodetool info should run with "readonly" jmx access
[ https://issues.apache.org/jira/browse/CASSANDRA-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11755: - Resolution: Fixed Fix Version/s: (was: 2.1.14) 3.0.8 3.8 2.1.15 Reproduced In: 3.5, 2.1.10 (was: 2.1.10, 3.5) Status: Resolved (was: Ready to Commit) Committed, thanks. > nodetool info should run with "readonly" jmx access > --- > > Key: CASSANDRA-11755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11755 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Jérôme Mainaud >Assignee: Jérôme Mainaud >Priority: Minor > Labels: security > Fix For: 2.1.15, 3.8, 3.0.8 > > Attachments: 11755-2.1.patch, > nodetool-info-exception-when-readonly.txt > > > nodetool info crash when granted with readonly jmx access > In the example given in attachment, the jmxremote.access file gives readonly > access to the cassandra jmx role. > When the role is granted to readwrite access, everything works. > The main reason is that node datacenter and rack info are fetched by an > operation invocation instead of by an attribute read. The former one is not > allowed to the role with readonly access. > This is a security concern because nodetool info could be called by a > monitoring agent (Nagios for instance) and enterprise policy often don't > allow these agents to connect to JMX with higher privileges than "readonly". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11878) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_3_x_To_indev_3_x.select_key_in_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346114#comment-15346114 ] Sylvain Lebresne commented on CASSANDRA-11878: -- Is that something we only want to do on trunk? (I don't see patch for other versions, hence the question) > dtest failure in > upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_3_x_To_indev_3_x.select_key_in_test > - > > Key: CASSANDRA-11878 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11878 > Project: Cassandra > Issue Type: Bug >Reporter: Sean McCarthy >Assignee: Marcus Eriksson > Labels: dtest > Fix For: 3.x > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log > > > example failure: > http://cassci.datastax.com/job/upgrade_tests-all/47/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_3_x_To_indev_3_x/select_key_in_test > Failed on CassCI build upgrade_tests-all #47 > Attached logs for test failure. > {code} > ERROR [CompactionExecutor:2] 2016-05-21 23:10:35,678 CassandraDaemon.java:195 > - Exception in thread Thread[CompactionExecutor:2,1,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61) > ~[apache-cassandra-3.5.jar:3.5] > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > ~[na:1.8.0_51] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > ~[na:1.8.0_51] > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:165) > ~[apache-cassandra-3.5.jar:3.5] > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > ~[na:1.8.0_51] > at > org.apache.cassandra.db.compaction.CompactionManager.submitBackground(CompactionManager.java:184) > ~[apache-cassandra-3.5.jar:3.5] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:270) > ~[apache-cassandra-3.5.jar:3.5] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_51] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11882) Clustering Key with ByteBuffer size > 64k throws Assertion Error
[ https://issues.apache.org/jira/browse/CASSANDRA-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11882: - Assignee: Lerh Chuan Low > Clustering Key with ByteBuffer size > 64k throws Assertion Error > > > Key: CASSANDRA-11882 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11882 > Project: Cassandra > Issue Type: Bug > Components: CQL, Streaming and Messaging >Reporter: Lerh Chuan Low >Assignee: Lerh Chuan Low > Fix For: 2.1.15, 2.2.7, 3.8, 3.0.8 > > Attachments: 11882-2.1.txt, 11882-2.2.txt, 11882-3.X.txt > > > Setup: > {code} > CREATE KEYSPACE Blues WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2}; > CREATE TABLE test (a text, b text, PRIMARY KEY ((a), b)) > {code} > There currently doesn't seem to be an existing check for selecting clustering > keys that are larger than 64k. So if we proceed to do the following select: > {code} > CONSISTENCY ALL; > SELECT * FROM Blues.test WHERE a = 'foo' AND b = 'something larger than 64k'; > {code} > An AssertionError is thrown in `ByteBufferUtil` with just a number and an > error message detailing 'Coordinator node timed out waiting for replica nodes > responses' . Additionally, because an error extends Throwable (it's not a > subclass of Exception), it's not caught so the connection between the > coordinator node and the other nodes which have the replicas seem to be > 'stuck' until it's restarted. Any other subsequent queries, even if it's just > SELECT where a = 'foo' and b = 'bar', will always return the Coordinator > timing out waiting for replica nodes responses'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11882) Clustering Key with ByteBuffer size > 64k throws Assertion Error
[ https://issues.apache.org/jira/browse/CASSANDRA-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-11882: - Resolution: Fixed Fix Version/s: (was: 2.2.x) (was: 2.1.x) 3.0.8 3.8 2.2.7 2.1.15 Status: Resolved (was: Ready to Commit) Committed, thanks. > Clustering Key with ByteBuffer size > 64k throws Assertion Error > > > Key: CASSANDRA-11882 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11882 > Project: Cassandra > Issue Type: Bug > Components: CQL, Streaming and Messaging >Reporter: Lerh Chuan Low > Fix For: 2.1.15, 2.2.7, 3.8, 3.0.8 > > Attachments: 11882-2.1.txt, 11882-2.2.txt, 11882-3.X.txt > > > Setup: > {code} > CREATE KEYSPACE Blues WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 2}; > CREATE TABLE test (a text, b text, PRIMARY KEY ((a), b)) > {code} > There currently doesn't seem to be an existing check for selecting clustering > keys that are larger than 64k. So if we proceed to do the following select: > {code} > CONSISTENCY ALL; > SELECT * FROM Blues.test WHERE a = 'foo' AND b = 'something larger than 64k'; > {code} > An AssertionError is thrown in `ByteBufferUtil` with just a number and an > error message detailing 'Coordinator node timed out waiting for replica nodes > responses' . Additionally, because an error extends Throwable (it's not a > subclass of Exception), it's not caught so the connection between the > coordinator node and the other nodes which have the replicas seem to be > 'stuck' until it's restarted. Any other subsequent queries, even if it's just > SELECT where a = 'foo' and b = 'bar', will always return the Coordinator > timing out waiting for replica nodes responses'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11579) remove DatabaseDescriptor dependency from SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346086#comment-15346086 ] Sylvain Lebresne commented on CASSANDRA-11579: -- [~yukim] I believe this is waiting on you to commit. > remove DatabaseDescriptor dependency from SequentialWriter > -- > > Key: CASSANDRA-11579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11579 > Project: Cassandra > Issue Type: Sub-task >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Minor > > {{SequentialWriter}} and its subclass is widely used in Cassandra, mainly > from SSTable. Removing dependency to {{DatabaseDescriptor}} improve > reusability of this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)