[jira] [Commented] (CASSANDRA-4967) config options have different bounds when set via different methods
[ https://issues.apache.org/jira/browse/CASSANDRA-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905876#comment-14905876 ] John Sumsion commented on CASSANDRA-4967: - I am part-way down revamping the validation / defaults logic for config. See this branch on github: - https://github.com/jdsumsion/cassandra/tree/4967-config-validation If I'm going the wrong direction, please let me know soon, as I want to wrap this up by the end of the summit. > config options have different bounds when set via different methods > --- > > Key: CASSANDRA-4967 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4967 > Project: Cassandra > Issue Type: Improvement > Components: Core >Affects Versions: 1.2.0 beta 2 >Reporter: Robert Coli >Priority: Minor > Labels: lhf > > (similar to some of the work done in > https://issues.apache.org/jira/browse/CASSANDRA-4479 > ) > If one sets a value in cassandra.yaml, that value might be subject to bounds > checking there. However if one sets that same value via JMX, it doesn't get > set via a bounds-checking code path. > "./src/java/org/apache/cassandra/config/DatabaseDescriptor.java" (JMX set) > {noformat} > public static void setPhiConvictThreshold(double phiConvictThreshold) > { > conf.phi_convict_threshold = phiConvictThreshold; > } > {noformat} > Versus.. > ./src/java/org/apache/cassandra/config/DatabaseDescriptor.java > (cassandra.yaml) > {noformat} > static void loadYaml() > ... > /* phi convict threshold for FailureDetector */ > if (conf.phi_convict_threshold < 5 || conf.phi_convict_threshold > > 16) > { > throw new ConfigurationException("phi_convict_threshold must > be between 5 and 16"); > } > {noformat} > This seems to create a confusing situation where the range of potential > values for a given configuration option is different when set by different > methods. > It's difficult to imagine a circumstance where you want bounds checking to > keep your node from starting if you set that value in cassandra.yaml, but > also want to allow circumvention of that bounds checking if you set via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10214) Enable index selection to be overridden on a per query basis
[ https://issues.apache.org/jira/browse/CASSANDRA-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-10214: Fix Version/s: (was: 3.x) 3.0.0 rc2 > Enable index selection to be overridden on a per query basis > > > Key: CASSANDRA-10214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10214 > Project: Cassandra > Issue Type: New Feature >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe > Fix For: 3.0.0 rc2 > > > (Broken out of CASSANDRA-10124) > We could add a {{USING INDEX }} clause to {{SELECT}} syntax to > force the choice of index and bypass the usual index selection mechanism. > {code} > CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k)); > CREATE INDEX v1_idx ON ks.t1(v1); > CREATE INDEX v2_idx ON ks.t1(v2); > CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING > 'com.foo.bar.CustomMultiColumnIndex'; > # Override internal index selection mechanism > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_idx; > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v2_idx; > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_v2_idx; > {code} > This is in some ways similar to [index > hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH] > in Oracle. > edit: fixed typo's (missing INDEX in the USING clauses) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10214) Enable index selection to be overridden on a per query basis
[ https://issues.apache.org/jira/browse/CASSANDRA-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe reassigned CASSANDRA-10214: --- Assignee: Sam Tunnicliffe > Enable index selection to be overridden on a per query basis > > > Key: CASSANDRA-10214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10214 > Project: Cassandra > Issue Type: New Feature >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe > Fix For: 3.x > > > (Broken out of CASSANDRA-10124) > We could add a {{USING INDEX }} clause to {{SELECT}} syntax to > force the choice of index and bypass the usual index selection mechanism. > {code} > CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k)); > CREATE INDEX v1_idx ON ks.t1(v1); > CREATE INDEX v2_idx ON ks.t1(v2); > CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING > 'com.foo.bar.CustomMultiColumnIndex'; > # Override internal index selection mechanism > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_idx; > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v2_idx; > SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_v2_idx; > {code} > This is in some ways similar to [index > hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH] > in Oracle. > edit: fixed typo's (missing INDEX in the USING clauses) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-10392: Description: It can be possible to use an external tracing solution in Cassandra by abstracting out the writing of tracing to system_traces tables in the tracing package to separate implementation classes and leaving abstract classes in place that define the interface and behaviour otherwise of C* tracing. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit presentation. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. There's still a few todos and fixmes in the initial patch but i'm submitting early to get feedback. was: It can be possible to use in external tracing solutions in Cassandra by abstracting out the tracing->system_traces tables in the tracing package to separate implementation classes. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit presentation. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. There's still a few todos and fixmes in the initial patch but i'm submitting early to get feedback. > Allow Cassandra to trace to custom tracing implementations > --- > > Key: CASSANDRA-10392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10392 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: mck >Assignee: mck > > It can be possible to use an external tracing solution in Cassandra by > abstracting out the writing of tracing to system_traces tables in the tracing > package to separate implementation classes and leaving abstract classes in > place that define the interface and behaviour otherwise of C* tracing. > Then via a system property "cassandra.custom_tracing_class" the Tracing class > implementation could be swapped out with something third party. > An example of this is adding Zipkin tracing into Cassandra in the Summit > presentation. > In addition this patch passes the custom payload through into the tracing > session allowing a third party tracing solution like Zipkin to do full-stack > tracing from clients through and into Cassandra. > There's still a few todos and fixmes in the initial patch but i'm submitting > early to get feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905645#comment-14905645 ] mck commented on CASSANDRA-10392: - patch coming soon… > Allow Cassandra to trace to custom tracing implementations > --- > > Key: CASSANDRA-10392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10392 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: mck >Assignee: mck > > It can be possible to use in external tracing solutions in Cassandra by > abstracting out the tracing->system_traces tables in the tracing package to > separate implementation classes. > Then via a system property "cassandra.custom_tracing_class" the Tracing class > implementation could be swapped out with something third party. > An example of this is adding Zipkin tracing into Cassandra in the Summit > presentation. > In addition this patch passes the custom payload through into the tracing > session allowing a third party tracing solution like Zipkin to do full-stack > tracing from clients through and into Cassandra. > There's still a few todos and fixmes in the initial patch but i'm submitting > early to get feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations
mck created CASSANDRA-10392: --- Summary: Allow Cassandra to trace to custom tracing implementations Key: CASSANDRA-10392 URL: https://issues.apache.org/jira/browse/CASSANDRA-10392 Project: Cassandra Issue Type: Improvement Components: Core Reporter: mck Assignee: mck It can be possible to use in external tracing solutions in Cassandra by abstracting out the tracing->system_traces tables in the tracing package to separate implementation classes. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit presentation. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. There's still a few todos and fixmes in the initial patch but i'm submitting early to get feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10378) Make skipping more efficient
[ https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905564#comment-14905564 ] Sylvain Lebresne commented on CASSANDRA-10378: -- Marking this for RC2 as this is a very simple fix that get us a fairly good improvement, and it's a file format change so it's probably to get it in 3.0 proper if possible. > Make skipping more efficient > > > Key: CASSANDRA-10378 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10378 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > Following on from the impact of CASSANDRA-10322, we can improve the > efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our > performance to be in-and-around the same ballpark except for seeks into the > middle of a large partition, which suggests (possibly) that the higher > density of data we're storing may simply be resulting in a more significant > CPU burden as we have more data to skip over (and since CASSANDRA-10322 > improves performance here really dramatically, further improvements are > likely to be of similar benefit). > I propose doing our best to flatten the skipping of macro data items into as > few skip invocations as necessary. One way of doing this would be to > introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently > skip a number of unsigned vints. Almost the entire body of a cell and row > consist of vints now, each data component with their own special {{skipX}} > method that invokes {{readUnsignedVint}}. This would permit more efficient > despatch. > We could also potentially avoid the construction of a new {{Columns}} > instance for each row skip, since all we need is an iterator over the > columns, and share the temporary space used for storing them, which should > further reduce the GC burden for skipping many rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10378) Make skipping more efficient
[ https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905561#comment-14905561 ] Sylvain Lebresne commented on CASSANDRA-10378: -- I pushed a quick patch implementing the idea above [here|https://github.com/pcmanus/cassandra/commits/10378]. The result on point queries can be point on [this graph|http://cstar.datastax.com/graph?stats=399e6124-616e-11e5-b8f9-42010af0688f&metric=op_rate&operation=3_user&smoothing=1&show_aggregates=true&xmin=0&xmax=152.68&ymin=0&ymax=110790.9]: basically, we get way much closer to 2.2 on those queries. > Make skipping more efficient > > > Key: CASSANDRA-10378 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10378 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > Following on from the impact of CASSANDRA-10322, we can improve the > efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our > performance to be in-and-around the same ballpark except for seeks into the > middle of a large partition, which suggests (possibly) that the higher > density of data we're storing may simply be resulting in a more significant > CPU burden as we have more data to skip over (and since CASSANDRA-10322 > improves performance here really dramatically, further improvements are > likely to be of similar benefit). > I propose doing our best to flatten the skipping of macro data items into as > few skip invocations as necessary. One way of doing this would be to > introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently > skip a number of unsigned vints. Almost the entire body of a cell and row > consist of vints now, each data component with their own special {{skipX}} > method that invokes {{readUnsignedVint}}. This would permit more efficient > despatch. > We could also potentially avoid the construction of a new {{Columns}} > instance for each row skip, since all we need is an iterator over the > columns, and share the temporary space used for storing them, which should > further reduce the GC burden for skipping many rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10378) Make skipping more efficient
[ https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne reassigned CASSANDRA-10378: Assignee: Sylvain Lebresne (was: Benedict) > Make skipping more efficient > > > Key: CASSANDRA-10378 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10378 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Sylvain Lebresne > Fix For: 3.x > > > Following on from the impact of CASSANDRA-10322, we can improve the > efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our > performance to be in-and-around the same ballpark except for seeks into the > middle of a large partition, which suggests (possibly) that the higher > density of data we're storing may simply be resulting in a more significant > CPU burden as we have more data to skip over (and since CASSANDRA-10322 > improves performance here really dramatically, further improvements are > likely to be of similar benefit). > I propose doing our best to flatten the skipping of macro data items into as > few skip invocations as necessary. One way of doing this would be to > introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently > skip a number of unsigned vints. Almost the entire body of a cell and row > consist of vints now, each data component with their own special {{skipX}} > method that invokes {{readUnsignedVint}}. This would permit more efficient > despatch. > We could also potentially avoid the construction of a new {{Columns}} > instance for each row skip, since all we need is an iterator over the > columns, and share the temporary space used for storing them, which should > further reduce the GC burden for skipping many rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Yeksigian updated CASSANDRA-8844: -- Reviewer: Carl Yeksigian > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to continuously "tail" the most recent logfile and get > low-latency(ms?) access to the data as it is written. > h2. Alternate approach > In order to make consuming a change log easy and efficient to do with low > latency, the following could supplement the approach outlined above > - Instead of writing to a logfile, by default, Cassandra could expose a > socket for a daemon to connect to, and from which it could pull each row. > - Cassandra would h
[jira] [Updated] (CASSANDRA-6096) Look into a Pig Macro to url encode URLs passed to CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Maynes updated CASSANDRA-6096: - Attachment: 0001-CASSANDRA-6069.patch Re-based the previous patch against Cassandra 2.1. > Look into a Pig Macro to url encode URLs passed to CqlStorage > - > > Key: CASSANDRA-6096 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6096 > Project: Cassandra > Issue Type: Bug > Components: Hadoop >Reporter: Jeremy Hanna >Priority: Minor > Labels: lhf > Attachments: 0001-CASSANDRA-6069.patch, trunk-6096.txt > > > In the evolution of CqlStorage, the URL went from non-encoded to encoded. It > would be great to somehow keep the URL readable, perhaps using the Pig macro > interface to do expansion: > http://pig.apache.org/docs/r0.9.2/cont.html#macros > See also CASSANDRA-6073 and CASSANDRA-5867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10382) nodetool info doesn't show the correct DC and RACK
[ https://issues.apache.org/jira/browse/CASSANDRA-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905453#comment-14905453 ] Nirmal Gupta commented on CASSANDRA-10382: -- Not able to reproduce using cassandra-2.2 head. [~rmarchei] Can you please attach snitch properties file and cassandra.yaml? > nodetool info doesn't show the correct DC and RACK > -- > > Key: CASSANDRA-10382 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10382 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.2.1 > GossipingPropertyFileSnitch >Reporter: Ruggero Marchei >Priority: Minor > Labels: lhf > > When running *nodetool info* cassandra returns UNKNOWN_DC and UNKNOWN_RACK: > {code} > # nodetool info > ID : b94f9ca0-f886-4111-a471-02f295573f37 > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 44.97 MB > Generation No : 1442913138 > Uptime (seconds) : 5386 > Heap Memory (MB) : 429.07 / 3972.00 > Off Heap Memory (MB) : 0.08 > Data Center: UNKNOWN_DC > Rack : UNKNOWN_RACK > Exceptions : 1 > Key Cache : entries 642, size 58.16 KB, capacity 100 MB, 5580 > hits, 8320 requests, 0.671 recent hit rate, 14400 save period in seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Token : (invoke with -T/--tokens to see all 256 tokens) > {code} > Correct DCs and RACKs are returned by *nodetool status* and *nodetool > gossipinfo* commands: > {code} > # nodetool gossipinfo|grep -E 'RACK|DC' > DC:POZ > RACK:RACK30 > DC:POZ > RACK:RACK30 > DC:SJC > RACK:RACK68 > DC:POZ > RACK:RACK30 > DC:SJC > RACK:RACK62 > DC:SJC > RACK:RACK62 > {code} > {code} > # nodetool status|grep Datacenter > Datacenter: SJC > Datacenter: POZ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Git Push Summary
Repository: cassandra Updated Branches: refs/heads/10378 [deleted] 525855d2f
[1/2] cassandra git commit: Write row size in sstable format for faster skipping
Repository: cassandra Updated Branches: refs/heads/10378 [created] 525855d2f Write row size in sstable format for faster skipping Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/424b59ad Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/424b59ad Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/424b59ad Branch: refs/heads/10378 Commit: 424b59ad5aa72b25eab8995a2c248ab734d33177 Parents: 41731b8 Author: Sylvain Lebresne Authored: Tue Sep 22 13:53:22 2015 -0700 Committer: Sylvain Lebresne Committed: Tue Sep 22 14:04:06 2015 -0700 -- src/java/org/apache/cassandra/db/Memtable.java | 2 +- .../cassandra/db/SerializationHeader.java | 26 -- .../rows/UnfilteredRowIteratorSerializer.java | 8 +- .../cassandra/db/rows/UnfilteredSerializer.java | 90 +--- .../io/sstable/AbstractSSTableSimpleWriter.java | 2 +- .../io/sstable/SSTableSimpleUnsortedWriter.java | 2 +- .../apache/cassandra/db/RowIndexEntryTest.java | 4 +- .../unit/org/apache/cassandra/db/ScrubTest.java | 3 +- .../db/compaction/AntiCompactionTest.java | 2 +- .../io/sstable/BigTableWriterTest.java | 2 +- .../io/sstable/SSTableRewriterTest.java | 4 +- .../cassandra/io/sstable/SSTableUtils.java | 2 +- 12 files changed, 78 insertions(+), 69 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/424b59ad/src/java/org/apache/cassandra/db/Memtable.java -- diff --git a/src/java/org/apache/cassandra/db/Memtable.java b/src/java/org/apache/cassandra/db/Memtable.java index 7af65d1..ae982d3 100644 --- a/src/java/org/apache/cassandra/db/Memtable.java +++ b/src/java/org/apache/cassandra/db/Memtable.java @@ -428,7 +428,7 @@ public class Memtable implements Comparable (long)partitions.size(), ActiveRepairService.UNREPAIRED_SSTABLE, sstableMetadataCollector, - new SerializationHeader(cfs.metadata, columns, stats), + new SerializationHeader(true, cfs.metadata, columns, stats), txn)); } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/424b59ad/src/java/org/apache/cassandra/db/SerializationHeader.java -- diff --git a/src/java/org/apache/cassandra/db/SerializationHeader.java b/src/java/org/apache/cassandra/db/SerializationHeader.java index decac49..0706d06 100644 --- a/src/java/org/apache/cassandra/db/SerializationHeader.java +++ b/src/java/org/apache/cassandra/db/SerializationHeader.java @@ -45,6 +45,8 @@ public class SerializationHeader { public static final Serializer serializer = new Serializer(); +private final boolean isForSSTable; + private final AbstractType keyType; private final List> clusteringTypes; @@ -53,12 +55,14 @@ public class SerializationHeader private final Map> typeMap; -private SerializationHeader(AbstractType keyType, +private SerializationHeader(boolean isForSSTable, +AbstractType keyType, List> clusteringTypes, PartitionColumns columns, EncodingStats stats, Map> typeMap) { +this.isForSSTable = isForSSTable; this.keyType = keyType; this.clusteringTypes = clusteringTypes; this.columns = columns; @@ -77,7 +81,8 @@ public class SerializationHeader List> clusteringTypes = new ArrayList<>(size); for (int i = 0; i < size; i++) clusteringTypes.add(BytesType.instance); -return new SerializationHeader(BytesType.instance, +return new SerializationHeader(false, + BytesType.instance, clusteringTypes, PartitionColumns.NONE, EncodingStats.NO_STATS, @@ -108,14 +113,16 @@ public class SerializationHeader else columns.addAll(sstable.header.columns()); } -return new SerializationHeader(metadata, columns.build(), stats.get()); +return new SerializationHeader(true, metadata, columns.build(), stats.get()); } -public SerializationHeader(CFMetaData metadata, +public
[2/2] cassandra git commit: Record previous row size
Record previous row size Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/525855d2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/525855d2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/525855d2 Branch: refs/heads/10378 Commit: 525855d2f37b2fe9376b4ce2dab9107d0d227f6a Parents: 424b59a Author: Sylvain Lebresne Authored: Wed Sep 23 14:36:04 2015 -0700 Committer: Sylvain Lebresne Committed: Wed Sep 23 14:36:04 2015 -0700 -- .../org/apache/cassandra/db/ColumnIndex.java| 10 ++- .../rows/UnfilteredRowIteratorSerializer.java | 2 + .../cassandra/db/rows/UnfilteredSerializer.java | 69 +++- 3 files changed, 60 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/ColumnIndex.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java index add5fa7..ede3f79 100644 --- a/src/java/org/apache/cassandra/db/ColumnIndex.java +++ b/src/java/org/apache/cassandra/db/ColumnIndex.java @@ -76,6 +76,7 @@ public class ColumnIndex private long startPosition = -1; private int written; +private long previousRowStart; private ClusteringPrefix firstClustering; private ClusteringPrefix lastClustering; @@ -99,7 +100,7 @@ public class ColumnIndex ByteBufferUtil.writeWithShortLength(iterator.partitionKey().getKey(), writer); DeletionTime.serializer.serialize(iterator.partitionLevelDeletion(), writer); if (header.hasStatic()) - UnfilteredSerializer.serializer.serialize(iterator.staticRow(), header, writer, version); + UnfilteredSerializer.serializer.serializeStaticRow(iterator.staticRow(), header, writer, version); } public ColumnIndex build() throws IOException @@ -131,15 +132,18 @@ public class ColumnIndex private void add(Unfiltered unfiltered) throws IOException { +long pos = currentPosition(); + if (firstClustering == null) { // Beginning of an index block. Remember the start and position firstClustering = unfiltered.clustering(); -startPosition = currentPosition(); +startPosition = pos; } -UnfilteredSerializer.serializer.serialize(unfiltered, header, writer, version); +UnfilteredSerializer.serializer.serialize(unfiltered, header, writer, pos - previousRowStart, version); lastClustering = unfiltered.clustering(); +previousRowStart = pos; ++written; if (unfiltered.kind() == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER) http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java -- diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java index 3c5cdbf..3a0558e 100644 --- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java +++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java @@ -90,6 +90,8 @@ public class UnfilteredRowIteratorSerializer // Should only be used for the on-wire format. public void serialize(UnfilteredRowIterator iterator, SerializationHeader header, ColumnFilter selection, DataOutputPlus out, int version, int rowEstimate) throws IOException { +assert !header.isForSSTable(); + ByteBufferUtil.writeWithVIntLength(iterator.partitionKey().getKey(), out); int flags = 0; http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java -- diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java index 1f77529..fac8863 100644 --- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java +++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java @@ -92,17 +92,31 @@ public class UnfilteredSerializer public void serialize(Unfiltered unfiltered, SerializationHeader header, DataOutputPlus out, int version) throws IOException { +assert !header.isForSSTable(); +serialize(unfiltered, header, out, 0, version); +} + +public void serialize(Unfiltered unfiltered, SerializationHeader header, DataOutputPl
[jira] [Assigned] (CASSANDRA-10298) Replaced dead node stayed in gossip forever
[ https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu reassigned CASSANDRA-10298: - Assignee: Dikang Gu > Replaced dead node stayed in gossip forever > --- > > Key: CASSANDRA-10298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10298 > Project: Cassandra > Issue Type: Bug >Reporter: Dikang Gu >Assignee: Dikang Gu > Attachments: CASSANDRA-10298.patch > > > The dead node stayed in the nodetool status, > DN 10.210.165.55379.76 GB 256 ? null > And in the log, it throws NPE when trying to remove it. > {code} > 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread > Thread[GossipStage:1,5,main] > 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201) > > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) > 2015-09-10_06:41:22.92456 at > org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473) > > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > 2015-09-10_06:41:22.92459 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_45] > 2015-09-10_06:41:22.92460 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_45] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values
[ https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905226#comment-14905226 ] Steven Warren commented on CASSANDRA-4386: -- I see, that makes sense! > Allow cql to use the IN syntax on secondary index values > > > Key: CASSANDRA-4386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4386 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jeremy Hanna >Assignee: Benjamin Lerer >Priority: Minor > Labels: cql > > Currently CQL has a syntax for using IN to get a set of rows with a set of > keys. This would also be very helpful for use with columns with secondary > indexes on them. Such as: > {code} > select * from users where first_name in ('françois','frank'); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-10228: --- Assignee: Paul MacIntosh > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Paul MacIntosh > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905220#comment-14905220 ] Ariel Weisberg commented on CASSANDRA-10228: My version, turns out there is no merge pain. +1 on the contents. Waiting on CI. [2.1 branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.1?expand=1] [2.2 branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.2?expand=1] [3.0 branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.1?expand=1] > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905210#comment-14905210 ] Paul MacIntosh commented on CASSANDRA-10228: https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1 > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul MacIntosh updated CASSANDRA-10228: --- Comment: was deleted (was: https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1) > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905208#comment-14905208 ] Paul MacIntosh commented on CASSANDRA-10228: Implemented: https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1 > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-10228: --- Reviewer: Ariel Weisberg > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905182#comment-14905182 ] John Sumsion commented on CASSANDRA-5780: - Here is a branch on trunk: - https://github.com/jdsumsion/cassandra/tree/5780-decomission-truncate-system > nodetool status and ring report incorrect/stale information after decommission > -- > > Key: CASSANDRA-5780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5780 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Peter Haggerty >Priority: Trivial > Labels: lhf, ponies, qa-resolved > Fix For: 2.1.x > > > Cassandra 1.2.6 ring of 12 instances, each with 256 tokens. > Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring. > The 9 instances of cassandra that are in the ring all correctly report > nodetool status information for the ring and have the same data. > After the first node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > "nodetool status" on "decommissioned-3rd" reports 9 nodes > The storage load information is similarly stale on the various decommissioned > nodes. The nodetool status and ring commands continue to return information > as if they were part of a cluster and they appear to return the last > information that they saw. > In contrast the nodetool info command fails with an exception, which isn't > ideal but at least indicates that there was a failure rather than returning > stale information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905133#comment-14905133 ] John Sumsion commented on CASSANDRA-5780: - I'm working on this on trunk, the patch will not be JDK1.8 specific to ease backporting, since this is open for 1.2, 2.x, trunk. > nodetool status and ring report incorrect/stale information after decommission > -- > > Key: CASSANDRA-5780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5780 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Peter Haggerty >Priority: Trivial > Labels: lhf, ponies, qa-resolved > Fix For: 2.1.x > > > Cassandra 1.2.6 ring of 12 instances, each with 256 tokens. > Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring. > The 9 instances of cassandra that are in the ring all correctly report > nodetool status information for the ring and have the same data. > After the first node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > "nodetool status" on "decommissioned-3rd" reports 9 nodes > The storage load information is similarly stale on the various decommissioned > nodes. The nodetool status and ring commands continue to return information > as if they were part of a cluster and they appear to return the last > information that they saw. > In contrast the nodetool info command fails with an exception, which isn't > ideal but at least indicates that there was a failure rather than returning > stale information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9967) Determine if a Materialized View is finished building, without having to query each node
[ https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905074#comment-14905074 ] Carl Yeksigian commented on CASSANDRA-9967: --- A few ideas if someone wants to pick this up: - We should use the {{system_distributed}} keyspace for this, and I think the primary key should be: {{(table_id, host_id)}} - We should retry updating the table if we don't succeed, and make sure that on startup we have captured all of the builds that we have locally in the distributed table - We need to make sure that we handle node membership properly - We should make sure that we set the exit code if the view isn't built yet > Determine if a Materialized View is finished building, without having to > query each node > > > Key: CASSANDRA-9967 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9967 > Project: Cassandra > Issue Type: New Feature >Reporter: Alan Boudreault >Priority: Minor > Labels: lhf > Fix For: 3.x > > > Since MVs are eventually consistent with its base table, It would nice if we > could easily know the state of the MV after its creation, so we could wait > until the MV is built before doing some operations. > // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10031) Name threads for improved ops/debugging
[ https://issues.apache.org/jira/browse/CASSANDRA-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904750#comment-14904750 ] clint martin commented on CASSANDRA-10031: -- would it be better to include this sort of information in the slf4j MDC, rather than altering the thread name every time a thread enters some task. This way logging can be configured as needed per-class/task rather than forcing thread naming conventions? > Name threads for improved ops/debugging > --- > > Key: CASSANDRA-10031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10031 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Priority: Minor > Labels: lhf > Fix For: 3.x > > > We currently provide basic names for threads in threads like {{STREAM-IN-1}} > which gives some basic information about what the job of the thread is. > When looking at a log statement or jstack it's helpful to have this context. > For our work stealing thread pool we share threads across all thread pools so > we lose this insight. > I'd like to propose we start using the Thread.currentThread().setName("") > In different aspects of the code to improve insight as to what cassandra is > doing at any given moment. >* At a minimum in the start of each run() method. > Ideally for much finer grain things. >* In compaction include the partition name currently being working on. >* In SP include the client ip > Etc... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information
[ https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904720#comment-14904720 ] Philip Thompson commented on CASSANDRA-10074: - This is only in help text, so I didn't bother running CI on this. > cqlsh HELP SELECT_EXPR gives outdated incorrect information > --- > > Key: CASSANDRA-10074 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10074 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: 3.0.0-alpha1-SNAPSHOT >Reporter: Jim Meyer >Assignee: Philip Thompson >Priority: Trivial > Labels: cqlsh, lhf > Fix For: 3.x > > Attachments: 10074.txt > > > Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function > supported by CQL. > It is missing a description of the SUM, AVG, MIN, and MAX built in functions. > It should probably also mention that user defined functions can be invoked > via SELECT. > The outdated text is in pylib/cqlshlib/helptopics.py under def > help_select_expr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information
[ https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson reassigned CASSANDRA-10074: --- Assignee: Philip Thompson > cqlsh HELP SELECT_EXPR gives outdated incorrect information > --- > > Key: CASSANDRA-10074 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10074 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: 3.0.0-alpha1-SNAPSHOT >Reporter: Jim Meyer >Assignee: Philip Thompson >Priority: Trivial > Labels: cqlsh, lhf > Fix For: 3.x > > > Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function > supported by CQL. > It is missing a description of the SUM, AVG, MIN, and MAX built in functions. > It should probably also mention that user defined functions can be invoked > via SELECT. > The outdated text is in pylib/cqlshlib/helptopics.py under def > help_select_expr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information
[ https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-10074: Attachment: 10074.txt > cqlsh HELP SELECT_EXPR gives outdated incorrect information > --- > > Key: CASSANDRA-10074 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10074 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: 3.0.0-alpha1-SNAPSHOT >Reporter: Jim Meyer >Assignee: Philip Thompson >Priority: Trivial > Labels: cqlsh, lhf > Fix For: 3.x > > Attachments: 10074.txt > > > Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function > supported by CQL. > It is missing a description of the SUM, AVG, MIN, and MAX built in functions. > It should probably also mention that user defined functions can be invoked > via SELECT. > The outdated text is in pylib/cqlshlib/helptopics.py under def > help_select_expr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10391) sstableloader fails with client SSL enabled with non-standard keystore/truststore location
Jon Moses created CASSANDRA-10391: - Summary: sstableloader fails with client SSL enabled with non-standard keystore/truststore location Key: CASSANDRA-10391 URL: https://issues.apache.org/jira/browse/CASSANDRA-10391 Project: Cassandra Issue Type: Bug Environment: [cqlsh 4.1.1 | Cassandra 2.0.14.425 | DSE 4.6.6 | CQL spec 3.1.1 | Thrift protocol 19.39.0] [cqlsh 5.0.1 | Cassandra 2.1.8.689 | DSE 4.7.3 | CQL spec 3.2.0 | Native protocol v3] Reporter: Jon Moses If client SSL is enabled, sstableloader is unable to access the keystore and truststore if they are not in the expected locations. I reproduce this issue providing {{-f /path/to/cassandra.yaml}} as well as manually using the {{-ks}} flag with the proper path to the keystore. For example: {noformat} client_encryption_options: enabled: true keystore: /var/tmp/.keystore {noformat} {noformat} # sstableloader -d 172.31.2.240,172.31.2.241 -f /etc/dse/cassandra/cassandra.yaml Keyspace1/Standard1/ Could not retrieve endpoint ranges: java.io.FileNotFoundException: /usr/share/dse/conf/.keystore Run with --debug to get full stack trace or --help to get help. # # sstableloader -d 172.31.2.240,172.31.2.241 -ks /var/tmp/.keystore Keyspace1/Standard1/ Could not retrieve endpoint ranges: java.io.FileNotFoundException: /usr/share/dse/conf/.keystore Run with --debug to get full stack trace or --help to get help. # {noformat} The full stack is: {noformat} # sstableloader -d 172.31.2.240,172.31.2.241 -f /etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/ Could not retrieve endpoint ranges: java.io.FileNotFoundException: /usr/share/dse/conf/.keystore java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95) Caused by: java.io.FileNotFoundException: /usr/share/dse/conf/.keystore at com.datastax.bdp.transport.client.TClientSocketFactory.getSSLSocket(TClientSocketFactory.java:128) at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:114) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:186) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111) at org.apache.cassandra.tools.BulkLoader$ExternalClient.createThriftClient(BulkLoader.java:302) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:254) ... 2 more root@ip-172-31-2-240:/tmp/foo# {noformat}. If I copy the keystore to the expected location, I get the same error with the truststore. {noformat} # sstableloader -d 172.31.2.240,172.31.2.241 -f /etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/ Could not retrieve endpoint ranges: java.io.FileNotFoundException: /usr/share/dse/conf/.truststore java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95) Caused by: java.io.FileNotFoundException: /usr/share/dse/conf/.truststore at com.datastax.bdp.transport.client.TClientSocketFactory.getSSLSocket(TClientSocketFactory.java:130) at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:114) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:186) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120) at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111) at org.apache.cassandra.tools.BulkLoader$ExternalClient.createThriftClient(BulkLoader.java:302) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:254) ... 2 more # {noformat} If I copy the truststore, it finds them both, but then fails to open them due to what I assume is a password error, even those it's present in the cassandra.yaml. {noformat} # sstableloader -d 172.31.2.240,172.31.2.241 -f /etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/ Could not retrieve endpoint ranges: java.io.IOException: Failed to open transport to: 172.31.2.240:9160 java.lang.RuntimeException: Could not retrieve endpoint range
[jira] [Commented] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values
[ https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904696#comment-14904696 ] Benjamin Lerer commented on CASSANDRA-4386: --- {quote}Wouldn't the results come back in secondary index order though?{quote} I have not started working on this ticket but what I would expect is that: If you have 2 index entries on the same node for value "A" and "B" "A" is in 3 rows with the primary keys: pk1, pk5, pk 8 "B" is in 2 rows with primary keys: pk2 and pk3 What you will get will probably be: pk1, pk5, pk8, pk2 and pk3 > Allow cql to use the IN syntax on secondary index values > > > Key: CASSANDRA-4386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4386 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jeremy Hanna >Assignee: Benjamin Lerer >Priority: Minor > Labels: cql > > Currently CQL has a syntax for using IN to get a set of rows with a set of > keys. This would also be very helpful for use with columns with secondary > indexes on them. Such as: > {code} > select * from users where first_name in ('françois','frank'); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs
[ https://issues.apache.org/jira/browse/CASSANDRA-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-10390: Assignee: Benjamin Lerer > inconsistent quoted identifier handling in UDTs > --- > > Key: CASSANDRA-10390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10390 > Project: Cassandra > Issue Type: Bug > Environment: 2.2.1 >Reporter: Jonathan Halliday >Assignee: Benjamin Lerer > Fix For: 2.2.x > > > > create keyspace test with replication = {'class': 'SimpleStrategy', > > 'replication_factor': 1 } ; > > create type if not exists mytype ("my.field" text); > > desc keyspace; -- observe that mytype is listed > > create table mytable (pk int primary key, myfield frozen); > > desc keyspace; -- observe that mytype is listed, but mytable is not. > > select * from mytable; > ValueError: Type names and field names can only contain alphanumeric > characters and underscores: 'my.field' > create table myothertable (pk int primary key, "my.field" text); > select * from myothertable; -- valid > huh? It's valid to create a field of a table, or a field of a type, with a > quoted name containing non-alpha chars, but it's not valid to use a such a > type in a table? I can just about live with that though it seems > unnecessarily restrictive, but allowing creation of such a table and then > making it invisible/unusable definitely seems wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs
[ https://issues.apache.org/jira/browse/CASSANDRA-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-10390: Reproduced In: 2.2.1 Fix Version/s: 2.2.x > inconsistent quoted identifier handling in UDTs > --- > > Key: CASSANDRA-10390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10390 > Project: Cassandra > Issue Type: Bug > Environment: 2.2.1 >Reporter: Jonathan Halliday > Fix For: 2.2.x > > > > create keyspace test with replication = {'class': 'SimpleStrategy', > > 'replication_factor': 1 } ; > > create type if not exists mytype ("my.field" text); > > desc keyspace; -- observe that mytype is listed > > create table mytable (pk int primary key, myfield frozen); > > desc keyspace; -- observe that mytype is listed, but mytable is not. > > select * from mytable; > ValueError: Type names and field names can only contain alphanumeric > characters and underscores: 'my.field' > create table myothertable (pk int primary key, "my.field" text); > select * from myothertable; -- valid > huh? It's valid to create a field of a table, or a field of a type, with a > quoted name containing non-alpha chars, but it's not valid to use a such a > type in a table? I can just about live with that though it seems > unnecessarily restrictive, but allowing creation of such a table and then > making it invisible/unusable definitely seems wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values
[ https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903314#comment-14903314 ] Steven Warren edited comment on CASSANDRA-4386 at 9/23/15 2:43 PM: --- I am fine with unordered rows for the IN clause, the current alternative with parallel queries also returns unordered results. I don't have a use case for the other operators, but assume that would be fine vs the alternative of not being supported. EDIT: Wouldn't the results come back in secondary index order though? was (Author: swarren): I am fine with unordered rows for the IN clause, the current alternative with parallel queries also returns unordered results. I don't have a use case for the other operators, but assume that would be fine vs the alternative of not being supported. > Allow cql to use the IN syntax on secondary index values > > > Key: CASSANDRA-4386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4386 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jeremy Hanna >Assignee: Benjamin Lerer >Priority: Minor > Labels: cql > > Currently CQL has a syntax for using IN to get a set of rows with a set of > keys. This would also be very helpful for use with columns with secondary > indexes on them. Such as: > {code} > select * from users where first_name in ('françois','frank'); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904474#comment-14904474 ] Yuki Morishita commented on CASSANDRA-10389: What kind of error do you see in replica nodes (in cblade1 or other nodes that failed to validate)? > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped > {quote} >
[jira] [Comment Edited] (CASSANDRA-10280) Make DTCS work well with old data
[ https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904458#comment-14904458 ] Antti Nissinen edited comment on CASSANDRA-10280 at 9/23/15 12:53 PM: -- I am also voting for discarding the max_sstable_age_days and limiting the compaction window size in DTCS. If the DTCS will have a major modifications then adopting the some of the ideas from TWCS would be beneficial and also trying to take into account the practical view points presented in several Jira items: - limiting the window size in DTCS (this item, [CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280]) - using STCS in the newest window or if the amount of files exceeds the max_threshold ([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276] , [CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666]) - while compacting a large amount of files, start from small ones and progress towards larger ones (especially in the case of small sstables originated from repair operations) [CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597] - setting limits for number of files compacted in one shot based on the sum of files sizes (not trying to compact several large files at ones and running out of disk space during the operation) [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] - round-robin approach for the selection of compaction window inside which next compaction will be executed. Target is to get rid of small files as soon as possible. At the moment TWCS and DTCS work with newer windows and progress towards the history when finished with the current one [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] Should we actually create a Jira item where we would collect the ideas for "ultimate time series compaction strategy" for more detailled discussion? At the moment these ideas are scattered around different items. Probably the above list is missing many of the relevant points. Another important goal (our wish) for the time series data base is to able to wipe off data effectively so that disk space would be released as soon as possible. I tried to describe those ideas in [CASSANDRA-10306|https://issues.apache.org/jira/browse/CASSANDRA-10306], but there is no comments yet on that item. The main idea was to have a possibility split SSTables along a certain time line on all nodes so that SSTables could be dropped (like with TTL in DTCS and TWCS) or archived on different media where they can be digged up on some day if really needed. Deleting data efficiently on demand is presently one of the biggest obstacles for using C* in closed environments with fairly limited hardware resources for time series data collection. TTL is a working solution when you can predict data collection demands well before hand and have additional resources available if predictions don't match with the reality. What are the biggest obstacles in the present architecture for the below scenario? - Decide a time stamp for the data deletion / archiving - All existing SSTables on each node would be split to two files along the time line if the SSTable covers data on both sides of the time line. - SSTables falling behind the timeline would be inactivated from the SSTable set (not participating any more on compactions or returning data on queries) - you can decide if you want copy the files somewhere else or just simply delete those - This tool could be used through the nodetool with external script was (Author: anissinen): I am also voting for discarding the max_sstable_age_days and limiting the compaction window size in DTCS. If the DTCS will have a major modifications then adopting the some of the ideas from TWCS would be beneficial and also trying to take into account the practical view points presented in several Jira items: - limiting the window size in DTCS (this item, [CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280]) - using STCS in the newest window or if the amount of files exceeds the max_threshold ([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276],[CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666]) - while compacting a large amount of files, start from small ones and progress towards larger ones (especially in the case of small sstables originated from repair operations) [CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597] - setting limits for number of files compacted in one shot based on the sum of files sizes (not trying to compact several large files at ones and running out of disk space during the operation) [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] - round-robin approach for the selection of compaction window inside which next compa
[jira] [Commented] (CASSANDRA-10280) Make DTCS work well with old data
[ https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904458#comment-14904458 ] Antti Nissinen commented on CASSANDRA-10280: I am also voting for discarding the max_sstable_age_days and limiting the compaction window size in DTCS. If the DTCS will have a major modifications then adopting the some of the ideas from TWCS would be beneficial and also trying to take into account the practical view points presented in several Jira items: - limiting the window size in DTCS (this item, [CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280]) - using STCS in the newest window or if the amount of files exceeds the max_threshold ([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276],[CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666]) - while compacting a large amount of files, start from small ones and progress towards larger ones (especially in the case of small sstables originated from repair operations) [CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597] - setting limits for number of files compacted in one shot based on the sum of files sizes (not trying to compact several large files at ones and running out of disk space during the operation) [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] - round-robin approach for the selection of compaction window inside which next compaction will be executed. Target is to get rid of small files as soon as possible. At the moment TWCS and DTCS work with newer windows and progress towards the history when finished with the current one [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] Should we actually create a Jira item where we would collect the ideas for "ultimate time series compaction strategy" for more detailled discussion? At the moment these ideas are scattered around different items. Probably the above list is missing many of the relevant points. Another important goal (our wish) for the time series data base is to able to wipe off data effectively so that disk space would be released as soon as possible. I tried to describe those ideas in [CASSANDRA-10306|https://issues.apache.org/jira/browse/CASSANDRA-10306], but there is no comments yet on that item. The main idea was to have a possibility split SSTables along a certain time line on all nodes so that SSTables could be dropped (like with TTL in DTCS and TWCS) or archived on different media where they can be digged up on some day if really needed. Deleting data efficiently on demand is presently one of the biggest obstacles for using C* in closed environments with fairly limited hardware resources for time series data collection. TTL is a working solution when you can predict data collection demands well before hand and have additional resources available if predictions don't match with the reality. What are the biggest obstacles in the present architecture for the below scenario? - Decide a time stamp for the data deletion / archiving - All existing SSTables on each node would be split to two files along the time line if the SSTable covers data on both sides of the time line. - SSTables falling behind the timeline would be inactivated from the SSTable set (not participating any more on compactions or returning data on queries) - you can decide if you want copy the files somewhere else or just simply delete those - This tool could be used through the nodetool with external script > Make DTCS work well with old data > - > > Key: CASSANDRA-10280 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10280 > Project: Cassandra > Issue Type: Sub-task > Components: Core >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.x, 2.1.x, 2.2.x > > > Operational tasks become incredibly expensive if you keep around a long > timespan of data with DTCS - with default settings and 1 year of data, the > oldest window covers about 180 days. Bootstrapping a node with vnodes with > this data layout will force cassandra to compact very many sstables in this > window. > We should probably put a cap on how big the biggest windows can get. We could > probably default this to something sane based on max_sstable_age (ie, say we > can reasonably handle 1000 sstables per node, then we can calculate how big > the windows should be to allow that) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs
Jonathan Halliday created CASSANDRA-10390: - Summary: inconsistent quoted identifier handling in UDTs Key: CASSANDRA-10390 URL: https://issues.apache.org/jira/browse/CASSANDRA-10390 Project: Cassandra Issue Type: Bug Environment: 2.2.1 Reporter: Jonathan Halliday > create keyspace test with replication = {'class': 'SimpleStrategy', > 'replication_factor': 1 } ; > create type if not exists mytype ("my.field" text); > desc keyspace; -- observe that mytype is listed > create table mytable (pk int primary key, myfield frozen); > desc keyspace; -- observe that mytype is listed, but mytable is not. > select * from mytable; ValueError: Type names and field names can only contain alphanumeric characters and underscores: 'my.field' create table myothertable (pk int primary key, "my.field" text); select * from myothertable; -- valid huh? It's valid to create a field of a table, or a field of a type, with a quoted name containing non-alpha chars, but it's not valid to use a such a type in a table? I can just about live with that though it seems unnecessarily restrictive, but allowing creation of such a table and then making it invisible/unusable definitely seems wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10389) Repair session exception Validation failed
Jędrzej Sieracki created CASSANDRA-10389: Summary: Repair session exception Validation failed Key: CASSANDRA-10389 URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax compilation) Reporter: Jędrzej Sieracki I'm running a repair on a ring of nodes, that was recently extented from 3 to 13 nodes. The extension was done two days ago, the repair was attempted yesterday. {quote} [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace perspectiv with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 517) [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 for range (-5927186132136652665,-5917344746039874798] failed with error [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] Validation failed in cblade1.XXX/XXX (progress: 0%) {quote} BTW, I am ignoring the LEAK errors for now, that's outside of the scope of the main issue: {quote} ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big was not released before the reference was garbage collected {quote} I scrubbed the sstable with failed validation on cblade1 with nodetool scrub perspectiv stock_increment_agg: {quote} INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 - Scrubbing BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') (345466609 bytes) INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 - Scrubbing BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') (60496378 bytes) ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big was not released before the reference was garbage collected INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 - Scrub of BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 - Scrub of BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped {quote} Now, after scrubbing, another repair was attempted, it did finish, but with lots of errors from other nodes: {quote} [2015-09-22 12:01:18,020] Repair session db476b51-6110-11e5-b992-9f13fa8664c8 for range (5019296454787813261,5021512586040808168] failed with error [repair #db476b51-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, (5019296454787813261,5021512586040808168]] Validation failed in /10.YYY (progress: 91%) [2015
[jira] [Comment Edited] (CASSANDRA-10212) cassandra-env.sh may be sourced twice by debian init script
[ https://issues.apache.org/jira/browse/CASSANDRA-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904220#comment-14904220 ] Stefan Podkowinski edited comment on CASSANDRA-10212 at 9/23/15 9:25 AM: - Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS arguments. Removing sourcing of cassandra-env.sh in the init script should be safe as the init script will not directly use JVM_OPTS anyway. Edit: actually arguments are not strictly equals in the following case: {{-XX:CompileCommandFile=/hotspot_compiler -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler}} {{cassandra-env.sh}} expects {{CASSANDRA_CONF}} to be set for {{-XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"}} which is not the case when sourcing from {{/etc/init.d/cassandra}}. was (Author: spo...@gmail.com): Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS arguments. Removing sourcing of cassandra-env.sh in the init script should be safe as the init script will not directly use JVM_OPTS anyway. > cassandra-env.sh may be sourced twice by debian init script > --- > > Key: CASSANDRA-10212 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10212 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Enrico Canzonieri >Assignee: Michael Shuler >Priority: Minor > > It seems that when cassandra is run as a service using the init script the > {{/etc/cassandra/cassandra-env.sh}} file is sourced twice. > This file is sourced the first time in the > [init|https://github.com/apache/cassandra/blob/trunk/debian/init] script. The > init script then executes > [{{/usr/sbin/cassandra}}|https://github.com/apache/cassandra/blob/trunk/bin/cassandra], > the latter eventually does source {{cassandra-env.sh}} as > {{$CASSANDRA_CONF/cassandra-env}}. > CASSANDRA_CONF is finally defined in > [{{cassandra.in.sh}}|https://github.com/apache/cassandra/blob/trunk/debian/cassandra.in.sh] > as {{/etc/cassandra}}. > I guess in this case the init script should not source {{cassandra-env}} at > all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10212) cassandra-env.sh may be sourced twice by debian init script
[ https://issues.apache.org/jira/browse/CASSANDRA-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904220#comment-14904220 ] Stefan Podkowinski commented on CASSANDRA-10212: Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS arguments. Removing sourcing of cassandra-env.sh in the init script should be safe as the init script will not directly use JVM_OPTS anyway. > cassandra-env.sh may be sourced twice by debian init script > --- > > Key: CASSANDRA-10212 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10212 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Enrico Canzonieri >Assignee: Michael Shuler >Priority: Minor > > It seems that when cassandra is run as a service using the init script the > {{/etc/cassandra/cassandra-env.sh}} file is sourced twice. > This file is sourced the first time in the > [init|https://github.com/apache/cassandra/blob/trunk/debian/init] script. The > init script then executes > [{{/usr/sbin/cassandra}}|https://github.com/apache/cassandra/blob/trunk/bin/cassandra], > the latter eventually does source {{cassandra-env.sh}} as > {{$CASSANDRA_CONF/cassandra-env}}. > CASSANDRA_CONF is finally defined in > [{{cassandra.in.sh}}|https://github.com/apache/cassandra/blob/trunk/debian/cassandra.in.sh] > as {{/etc/cassandra}}. > I guess in this case the init script should not source {{cassandra-env}} at > all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)