[jira] [Commented] (CASSANDRA-4967) config options have different bounds when set via different methods

2015-09-23 Thread John Sumsion (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905876#comment-14905876
 ] 

John Sumsion commented on CASSANDRA-4967:
-

I am part-way down revamping the validation / defaults logic for config.  See 
this branch on github:
- https://github.com/jdsumsion/cassandra/tree/4967-config-validation

If I'm going the wrong direction, please let me know soon, as I want to wrap 
this up by the end of the summit.

> config options have different bounds when set via different methods
> ---
>
> Key: CASSANDRA-4967
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4967
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 2
>Reporter: Robert Coli
>Priority: Minor
>  Labels: lhf
>
> (similar to some of the work done in 
> https://issues.apache.org/jira/browse/CASSANDRA-4479
> )
> If one sets a value in cassandra.yaml, that value might be subject to bounds 
> checking there. However if one sets that same value via JMX, it doesn't get 
> set via a bounds-checking code path.
> "./src/java/org/apache/cassandra/config/DatabaseDescriptor.java" (JMX set)
> {noformat}
> public static void setPhiConvictThreshold(double phiConvictThreshold)
> {
> conf.phi_convict_threshold = phiConvictThreshold;
> }
> {noformat}
> Versus..
> ./src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
> (cassandra.yaml)
> {noformat}
> static void loadYaml()
> ...
>   /* phi convict threshold for FailureDetector */
> if (conf.phi_convict_threshold < 5 || conf.phi_convict_threshold 
> > 16)
> {
> throw new ConfigurationException("phi_convict_threshold must 
> be between 5 and 16");
> }
> {noformat}
> This seems to create a confusing situation where the range of potential 
> values for a given configuration option is different when set by different 
> methods. 
> It's difficult to imagine a circumstance where you want bounds checking to 
> keep your node from starting if you set that value in cassandra.yaml, but 
> also want to allow circumvention of that bounds checking if you set via JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10214) Enable index selection to be overridden on a per query basis

2015-09-23 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-10214:

Fix Version/s: (was: 3.x)
   3.0.0 rc2

> Enable index selection to be overridden on a per query basis
> 
>
> Key: CASSANDRA-10214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10214
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.0 rc2
>
>
> (Broken out of CASSANDRA-10124)
> We could add a {{USING INDEX }} clause to {{SELECT}} syntax to 
> force the choice of index and bypass the usual index selection mechanism.
> {code}
> CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k));
> CREATE INDEX v1_idx ON ks.t1(v1);
> CREATE INDEX v2_idx ON ks.t1(v2);
> CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING 
> 'com.foo.bar.CustomMultiColumnIndex';
> # Override internal index selection mechanism
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_idx;
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v2_idx;
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_v2_idx;
> {code}
> This is in some ways similar to [index 
> hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH]
>  in Oracle. 
> edit: fixed typo's (missing INDEX in the USING clauses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10214) Enable index selection to be overridden on a per query basis

2015-09-23 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-10214:
---

Assignee: Sam Tunnicliffe

> Enable index selection to be overridden on a per query basis
> 
>
> Key: CASSANDRA-10214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10214
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Fix For: 3.x
>
>
> (Broken out of CASSANDRA-10124)
> We could add a {{USING INDEX }} clause to {{SELECT}} syntax to 
> force the choice of index and bypass the usual index selection mechanism.
> {code}
> CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k));
> CREATE INDEX v1_idx ON ks.t1(v1);
> CREATE INDEX v2_idx ON ks.t1(v2);
> CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING 
> 'com.foo.bar.CustomMultiColumnIndex';
> # Override internal index selection mechanism
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_idx;
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v2_idx;
> SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING INDEX v1_v2_idx;
> {code}
> This is in some ways similar to [index 
> hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH]
>  in Oracle. 
> edit: fixed typo's (missing INDEX in the USING clauses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations

2015-09-23 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-10392:

Description: 
It can be possible to use an external tracing solution in Cassandra by 
abstracting out the writing of tracing to system_traces tables in the tracing 
package to separate implementation classes and leaving abstract classes in 
place that define the interface and behaviour otherwise of C* tracing.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
presentation.

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.


There's still a few todos and fixmes in the initial patch but i'm submitting 
early to get feedback.

  was:
It can be possible to use in external tracing solutions in Cassandra by 
abstracting out the tracing->system_traces tables in the tracing package to 
separate implementation classes.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
presentation.

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.


There's still a few todos and fixmes in the initial patch but i'm submitting 
early to get feedback.


> Allow Cassandra to trace to custom tracing implementations 
> ---
>
> Key: CASSANDRA-10392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10392
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: mck
>Assignee: mck
>
> It can be possible to use an external tracing solution in Cassandra by 
> abstracting out the writing of tracing to system_traces tables in the tracing 
> package to separate implementation classes and leaving abstract classes in 
> place that define the interface and behaviour otherwise of C* tracing.
> Then via a system property "cassandra.custom_tracing_class" the Tracing class 
> implementation could be swapped out with something third party.
> An example of this is adding Zipkin tracing into Cassandra in the Summit 
> presentation.
> In addition this patch passes the custom payload through into the tracing 
> session allowing a third party tracing solution like Zipkin to do full-stack 
> tracing from clients through and into Cassandra.
> There's still a few todos and fixmes in the initial patch but i'm submitting 
> early to get feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations

2015-09-23 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905645#comment-14905645
 ] 

mck commented on CASSANDRA-10392:
-

patch coming soon…

> Allow Cassandra to trace to custom tracing implementations 
> ---
>
> Key: CASSANDRA-10392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10392
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: mck
>Assignee: mck
>
> It can be possible to use in external tracing solutions in Cassandra by 
> abstracting out the tracing->system_traces tables in the tracing package to 
> separate implementation classes.
> Then via a system property "cassandra.custom_tracing_class" the Tracing class 
> implementation could be swapped out with something third party.
> An example of this is adding Zipkin tracing into Cassandra in the Summit 
> presentation.
> In addition this patch passes the custom payload through into the tracing 
> session allowing a third party tracing solution like Zipkin to do full-stack 
> tracing from clients through and into Cassandra.
> There's still a few todos and fixmes in the initial patch but i'm submitting 
> early to get feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations

2015-09-23 Thread mck (JIRA)
mck created CASSANDRA-10392:
---

 Summary: Allow Cassandra to trace to custom tracing 
implementations 
 Key: CASSANDRA-10392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10392
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: mck
Assignee: mck


It can be possible to use in external tracing solutions in Cassandra by 
abstracting out the tracing->system_traces tables in the tracing package to 
separate implementation classes.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
presentation.

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.


There's still a few todos and fixmes in the initial patch but i'm submitting 
early to get feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10378) Make skipping more efficient

2015-09-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905564#comment-14905564
 ] 

Sylvain Lebresne commented on CASSANDRA-10378:
--

Marking this for RC2 as this is a very simple fix that get us a fairly good 
improvement, and it's a file format change so it's probably to get it in 3.0 
proper if possible.

> Make skipping more efficient
> 
>
> Key: CASSANDRA-10378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10378
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> Following on from the impact of CASSANDRA-10322, we can improve the 
> efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our 
> performance to be in-and-around the same ballpark except for seeks into the 
> middle of a large partition, which suggests (possibly) that the higher 
> density of data we're storing may simply be resulting in a more significant 
> CPU burden as we have more data to skip over (and since CASSANDRA-10322 
> improves performance here really dramatically, further improvements are 
> likely to be of similar benefit).
> I propose doing our best to flatten the skipping of macro data items into as 
> few skip invocations as necessary. One way of doing this would be to 
> introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently 
> skip a number of unsigned vints. Almost the entire body of a cell and row 
> consist of vints now, each data component with their own special {{skipX}} 
> method that invokes {{readUnsignedVint}}. This would permit more efficient 
> despatch.
> We could also potentially avoid the construction of a new {{Columns}} 
> instance for each row skip, since all we need is an iterator over the 
> columns, and share the temporary space used for storing them, which should 
> further reduce the GC burden for skipping many rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10378) Make skipping more efficient

2015-09-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905561#comment-14905561
 ] 

Sylvain Lebresne commented on CASSANDRA-10378:
--

I pushed a quick patch implementing the idea above 
[here|https://github.com/pcmanus/cassandra/commits/10378]. The result on point 
queries can be point on [this 
graph|http://cstar.datastax.com/graph?stats=399e6124-616e-11e5-b8f9-42010af0688f&metric=op_rate&operation=3_user&smoothing=1&show_aggregates=true&xmin=0&xmax=152.68&ymin=0&ymax=110790.9]:
 basically, we get way much closer to 2.2 on those queries.

> Make skipping more efficient
> 
>
> Key: CASSANDRA-10378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10378
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> Following on from the impact of CASSANDRA-10322, we can improve the 
> efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our 
> performance to be in-and-around the same ballpark except for seeks into the 
> middle of a large partition, which suggests (possibly) that the higher 
> density of data we're storing may simply be resulting in a more significant 
> CPU burden as we have more data to skip over (and since CASSANDRA-10322 
> improves performance here really dramatically, further improvements are 
> likely to be of similar benefit).
> I propose doing our best to flatten the skipping of macro data items into as 
> few skip invocations as necessary. One way of doing this would be to 
> introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently 
> skip a number of unsigned vints. Almost the entire body of a cell and row 
> consist of vints now, each data component with their own special {{skipX}} 
> method that invokes {{readUnsignedVint}}. This would permit more efficient 
> despatch.
> We could also potentially avoid the construction of a new {{Columns}} 
> instance for each row skip, since all we need is an iterator over the 
> columns, and share the temporary space used for storing them, which should 
> further reduce the GC burden for skipping many rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10378) Make skipping more efficient

2015-09-23 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-10378:


Assignee: Sylvain Lebresne  (was: Benedict)

> Make skipping more efficient
> 
>
> Key: CASSANDRA-10378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10378
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Sylvain Lebresne
> Fix For: 3.x
>
>
> Following on from the impact of CASSANDRA-10322, we can improve the 
> efficiency of our calls to skipping methods. CASSANDRA-10326 is showing our 
> performance to be in-and-around the same ballpark except for seeks into the 
> middle of a large partition, which suggests (possibly) that the higher 
> density of data we're storing may simply be resulting in a more significant 
> CPU burden as we have more data to skip over (and since CASSANDRA-10322 
> improves performance here really dramatically, further improvements are 
> likely to be of similar benefit).
> I propose doing our best to flatten the skipping of macro data items into as 
> few skip invocations as necessary. One way of doing this would be to 
> introduce a special {{skipUnsignedVInts(int)}} method, that can efficiently 
> skip a number of unsigned vints. Almost the entire body of a cell and row 
> consist of vints now, each data component with their own special {{skipX}} 
> method that invokes {{readUnsignedVint}}. This would permit more efficient 
> despatch.
> We could also potentially avoid the construction of a new {{Columns}} 
> instance for each row skip, since all we need is an iterator over the 
> columns, and share the temporary space used for storing them, which should 
> further reduce the GC burden for skipping many rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)

2015-09-23 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian updated CASSANDRA-8844:
--
Reviewer: Carl Yeksigian

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order, even when it is multiple logfiles behind in 
> processing
> -- Be able to continuously "tail" the most recent logfile and get 
> low-latency(ms?) access to the data as it is written.
> h2. Alternate approach
> In order to make consuming a change log easy and efficient to do with low 
> latency, the following could supplement the approach outlined above
> - Instead of writing to a logfile, by default, Cassandra could expose a 
> socket for a daemon to connect to, and from which it could pull each row.
> - Cassandra would h

[jira] [Updated] (CASSANDRA-6096) Look into a Pig Macro to url encode URLs passed to CqlStorage

2015-09-23 Thread Nathan Maynes (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Maynes updated CASSANDRA-6096:
-
Attachment: 0001-CASSANDRA-6069.patch

Re-based the previous patch against Cassandra 2.1.

> Look into a Pig Macro to url encode URLs passed to CqlStorage
> -
>
> Key: CASSANDRA-6096
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6096
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop
>Reporter: Jeremy Hanna
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-CASSANDRA-6069.patch, trunk-6096.txt
>
>
> In the evolution of CqlStorage, the URL went from non-encoded to encoded.  It 
> would be great to somehow keep the URL readable, perhaps using the Pig macro 
> interface to do expansion:
> http://pig.apache.org/docs/r0.9.2/cont.html#macros
> See also CASSANDRA-6073 and CASSANDRA-5867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10382) nodetool info doesn't show the correct DC and RACK

2015-09-23 Thread Nirmal Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905453#comment-14905453
 ] 

Nirmal Gupta commented on CASSANDRA-10382:
--

 Not able to reproduce using cassandra-2.2 head. 
[~rmarchei] Can you please attach snitch properties file and cassandra.yaml?


> nodetool info doesn't show the correct DC and RACK
> --
>
> Key: CASSANDRA-10382
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10382
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.2.1
> GossipingPropertyFileSnitch
>Reporter: Ruggero Marchei
>Priority: Minor
>  Labels: lhf
>
> When running *nodetool info* cassandra returns UNKNOWN_DC and UNKNOWN_RACK:
> {code}
> # nodetool info
> ID : b94f9ca0-f886-4111-a471-02f295573f37
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 44.97 MB
> Generation No  : 1442913138
> Uptime (seconds)   : 5386
> Heap Memory (MB)   : 429.07 / 3972.00
> Off Heap Memory (MB)   : 0.08
> Data Center: UNKNOWN_DC
> Rack   : UNKNOWN_RACK
> Exceptions : 1
> Key Cache  : entries 642, size 58.16 KB, capacity 100 MB, 5580 
> hits, 8320 requests, 0.671 recent hit rate, 14400 save period in seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Token  : (invoke with -T/--tokens to see all 256 tokens)
> {code}
> Correct DCs and RACKs are returned by *nodetool status* and *nodetool 
> gossipinfo* commands:
> {code}
> # nodetool gossipinfo|grep -E 'RACK|DC'
>   DC:POZ
>   RACK:RACK30
>   DC:POZ
>   RACK:RACK30
>   DC:SJC
>   RACK:RACK68
>   DC:POZ
>   RACK:RACK30
>   DC:SJC
>   RACK:RACK62
>   DC:SJC
>   RACK:RACK62
> {code}
> {code}
> # nodetool status|grep Datacenter
> Datacenter: SJC
> Datacenter: POZ
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Git Push Summary

2015-09-23 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/10378 [deleted] 525855d2f


[1/2] cassandra git commit: Write row size in sstable format for faster skipping

2015-09-23 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/10378 [created] 525855d2f


Write row size in sstable format for faster skipping


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/424b59ad
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/424b59ad
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/424b59ad

Branch: refs/heads/10378
Commit: 424b59ad5aa72b25eab8995a2c248ab734d33177
Parents: 41731b8
Author: Sylvain Lebresne 
Authored: Tue Sep 22 13:53:22 2015 -0700
Committer: Sylvain Lebresne 
Committed: Tue Sep 22 14:04:06 2015 -0700

--
 src/java/org/apache/cassandra/db/Memtable.java  |  2 +-
 .../cassandra/db/SerializationHeader.java   | 26 --
 .../rows/UnfilteredRowIteratorSerializer.java   |  8 +-
 .../cassandra/db/rows/UnfilteredSerializer.java | 90 +---
 .../io/sstable/AbstractSSTableSimpleWriter.java |  2 +-
 .../io/sstable/SSTableSimpleUnsortedWriter.java |  2 +-
 .../apache/cassandra/db/RowIndexEntryTest.java  |  4 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java |  3 +-
 .../db/compaction/AntiCompactionTest.java   |  2 +-
 .../io/sstable/BigTableWriterTest.java  |  2 +-
 .../io/sstable/SSTableRewriterTest.java |  4 +-
 .../cassandra/io/sstable/SSTableUtils.java  |  2 +-
 12 files changed, 78 insertions(+), 69 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/424b59ad/src/java/org/apache/cassandra/db/Memtable.java
--
diff --git a/src/java/org/apache/cassandra/db/Memtable.java 
b/src/java/org/apache/cassandra/db/Memtable.java
index 7af65d1..ae982d3 100644
--- a/src/java/org/apache/cassandra/db/Memtable.java
+++ b/src/java/org/apache/cassandra/db/Memtable.java
@@ -428,7 +428,7 @@ public class Memtable implements Comparable
  
(long)partitions.size(),
  
ActiveRepairService.UNREPAIRED_SSTABLE,
  
sstableMetadataCollector,
- new 
SerializationHeader(cfs.metadata, columns, stats),
+ new 
SerializationHeader(true, cfs.metadata, columns, stats),
  txn));
 }
 }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/424b59ad/src/java/org/apache/cassandra/db/SerializationHeader.java
--
diff --git a/src/java/org/apache/cassandra/db/SerializationHeader.java 
b/src/java/org/apache/cassandra/db/SerializationHeader.java
index decac49..0706d06 100644
--- a/src/java/org/apache/cassandra/db/SerializationHeader.java
+++ b/src/java/org/apache/cassandra/db/SerializationHeader.java
@@ -45,6 +45,8 @@ public class SerializationHeader
 {
 public static final Serializer serializer = new Serializer();
 
+private final boolean isForSSTable;
+
 private final AbstractType keyType;
 private final List> clusteringTypes;
 
@@ -53,12 +55,14 @@ public class SerializationHeader
 
 private final Map> typeMap;
 
-private SerializationHeader(AbstractType keyType,
+private SerializationHeader(boolean isForSSTable,
+AbstractType keyType,
 List> clusteringTypes,
 PartitionColumns columns,
 EncodingStats stats,
 Map> typeMap)
 {
+this.isForSSTable = isForSSTable;
 this.keyType = keyType;
 this.clusteringTypes = clusteringTypes;
 this.columns = columns;
@@ -77,7 +81,8 @@ public class SerializationHeader
 List> clusteringTypes = new ArrayList<>(size);
 for (int i = 0; i < size; i++)
 clusteringTypes.add(BytesType.instance);
-return new SerializationHeader(BytesType.instance,
+return new SerializationHeader(false,
+   BytesType.instance,
clusteringTypes,
PartitionColumns.NONE,
EncodingStats.NO_STATS,
@@ -108,14 +113,16 @@ public class SerializationHeader
 else
 columns.addAll(sstable.header.columns());
 }
-return new SerializationHeader(metadata, columns.build(), stats.get());
+return new SerializationHeader(true, metadata, columns.build(), 
stats.get());
 }
 
-public SerializationHeader(CFMetaData metadata,
+public

[2/2] cassandra git commit: Record previous row size

2015-09-23 Thread slebresne
Record previous row size


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/525855d2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/525855d2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/525855d2

Branch: refs/heads/10378
Commit: 525855d2f37b2fe9376b4ce2dab9107d0d227f6a
Parents: 424b59a
Author: Sylvain Lebresne 
Authored: Wed Sep 23 14:36:04 2015 -0700
Committer: Sylvain Lebresne 
Committed: Wed Sep 23 14:36:04 2015 -0700

--
 .../org/apache/cassandra/db/ColumnIndex.java| 10 ++-
 .../rows/UnfilteredRowIteratorSerializer.java   |  2 +
 .../cassandra/db/rows/UnfilteredSerializer.java | 69 +++-
 3 files changed, 60 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/ColumnIndex.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java 
b/src/java/org/apache/cassandra/db/ColumnIndex.java
index add5fa7..ede3f79 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -76,6 +76,7 @@ public class ColumnIndex
 private long startPosition = -1;
 
 private int written;
+private long previousRowStart;
 
 private ClusteringPrefix firstClustering;
 private ClusteringPrefix lastClustering;
@@ -99,7 +100,7 @@ public class ColumnIndex
 
ByteBufferUtil.writeWithShortLength(iterator.partitionKey().getKey(), writer);
 
DeletionTime.serializer.serialize(iterator.partitionLevelDeletion(), writer);
 if (header.hasStatic())
-
UnfilteredSerializer.serializer.serialize(iterator.staticRow(), header, writer, 
version);
+
UnfilteredSerializer.serializer.serializeStaticRow(iterator.staticRow(), 
header, writer, version);
 }
 
 public ColumnIndex build() throws IOException
@@ -131,15 +132,18 @@ public class ColumnIndex
 
 private void add(Unfiltered unfiltered) throws IOException
 {
+long pos = currentPosition();
+
 if (firstClustering == null)
 {
 // Beginning of an index block. Remember the start and position
 firstClustering = unfiltered.clustering();
-startPosition = currentPosition();
+startPosition = pos;
 }
 
-UnfilteredSerializer.serializer.serialize(unfiltered, header, 
writer, version);
+UnfilteredSerializer.serializer.serialize(unfiltered, header, 
writer, pos - previousRowStart, version);
 lastClustering = unfiltered.clustering();
+previousRowStart = pos;
 ++written;
 
 if (unfiltered.kind() == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
--
diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
index 3c5cdbf..3a0558e 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
@@ -90,6 +90,8 @@ public class UnfilteredRowIteratorSerializer
 // Should only be used for the on-wire format.
 public void serialize(UnfilteredRowIterator iterator, SerializationHeader 
header, ColumnFilter selection, DataOutputPlus out, int version, int 
rowEstimate) throws IOException
 {
+assert !header.isForSSTable();
+
 ByteBufferUtil.writeWithVIntLength(iterator.partitionKey().getKey(), 
out);
 
 int flags = 0;

http://git-wip-us.apache.org/repos/asf/cassandra/blob/525855d2/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
--
diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index 1f77529..fac8863 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
@@ -92,17 +92,31 @@ public class UnfilteredSerializer
 public void serialize(Unfiltered unfiltered, SerializationHeader header, 
DataOutputPlus out, int version)
 throws IOException
 {
+assert !header.isForSSTable();
+serialize(unfiltered, header, out, 0, version);
+}
+
+public void serialize(Unfiltered unfiltered, SerializationHeader header, 
DataOutputPl

[jira] [Assigned] (CASSANDRA-10298) Replaced dead node stayed in gossip forever

2015-09-23 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu reassigned CASSANDRA-10298:
-

Assignee: Dikang Gu

> Replaced dead node stayed in gossip forever
> ---
>
> Key: CASSANDRA-10298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10298
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Attachments: CASSANDRA-10298.patch
>
>
> The dead node stayed in the nodetool status,
> DN  10.210.165.55379.76 GB  256 ?   null
> And in the log, it throws NPE when trying to remove it.
> {code}
> 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread 
> Thread[GossipStage:1,5,main]
> 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201)
>  
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) 
> 2015-09-10_06:41:22.92456   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805)
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473)
>  
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) 
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> 2015-09-10_06:41:22.92459   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_45]
> 2015-09-10_06:41:22.92460   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values

2015-09-23 Thread Steven Warren (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905226#comment-14905226
 ] 

Steven Warren commented on CASSANDRA-4386:
--

I see, that makes sense!

> Allow cql to use the IN syntax on secondary index values
> 
>
> Key: CASSANDRA-4386
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4386
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremy Hanna
>Assignee: Benjamin Lerer
>Priority: Minor
>  Labels: cql
>
> Currently CQL has a syntax for using IN to get a set of rows with a set of 
> keys.  This would also be very helpful for use with columns with secondary 
> indexes on them.  Such as:
> {code}
> select * from users where first_name in ('françois','frank');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-10228:
---
Assignee: Paul MacIntosh

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Paul MacIntosh
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905220#comment-14905220
 ] 

Ariel Weisberg commented on CASSANDRA-10228:


My version, turns out there is no merge pain. +1 on the contents. Waiting on CI.
[2.1 
branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.1?expand=1]
[2.2 
branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.2?expand=1]
[3.0 
branch|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10228-2.1?expand=1]

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Paul MacIntosh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905210#comment-14905210
 ] 

Paul MacIntosh commented on CASSANDRA-10228:


https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Paul MacIntosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul MacIntosh updated CASSANDRA-10228:
---
Comment: was deleted

(was: 
https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1)

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Paul MacIntosh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905208#comment-14905208
 ] 

Paul MacIntosh commented on CASSANDRA-10228:


Implemented: 
https://github.com/apache/cassandra/compare/trunk...macintoshio:CASSANDRA-10228?expand=1

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-23 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-10228:
---
Reviewer: Ariel Weisberg

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission

2015-09-23 Thread John Sumsion (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905182#comment-14905182
 ] 

John Sumsion commented on CASSANDRA-5780:
-

Here is a branch on trunk:
- https://github.com/jdsumsion/cassandra/tree/5780-decomission-truncate-system

> nodetool status and ring report incorrect/stale information after decommission
> --
>
> Key: CASSANDRA-5780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5780
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Peter Haggerty
>Priority: Trivial
>  Labels: lhf, ponies, qa-resolved
> Fix For: 2.1.x
>
>
> Cassandra 1.2.6 ring of 12 instances, each with 256 tokens.
> Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring.
> The 9 instances of cassandra that are in the ring all correctly report 
> nodetool status information for the ring and have the same data.
> After the first node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> "nodetool status" on "decommissioned-3rd" reports 9 nodes
> The storage load information is similarly stale on the various decommissioned 
> nodes. The nodetool status and ring commands continue to return information 
> as if they were part of a cluster and they appear to return the last 
> information that they saw.
> In contrast the nodetool info command fails with an exception, which isn't 
> ideal but at least indicates that there was a failure rather than returning 
> stale information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission

2015-09-23 Thread John Sumsion (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905133#comment-14905133
 ] 

John Sumsion commented on CASSANDRA-5780:
-

I'm working on this on trunk, the patch will not be JDK1.8 specific to ease 
backporting, since this is open for 1.2, 2.x, trunk.

> nodetool status and ring report incorrect/stale information after decommission
> --
>
> Key: CASSANDRA-5780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5780
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Peter Haggerty
>Priority: Trivial
>  Labels: lhf, ponies, qa-resolved
> Fix For: 2.1.x
>
>
> Cassandra 1.2.6 ring of 12 instances, each with 256 tokens.
> Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring.
> The 9 instances of cassandra that are in the ring all correctly report 
> nodetool status information for the ring and have the same data.
> After the first node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> "nodetool status" on "decommissioned-3rd" reports 9 nodes
> The storage load information is similarly stale on the various decommissioned 
> nodes. The nodetool status and ring commands continue to return information 
> as if they were part of a cluster and they appear to return the last 
> information that they saw.
> In contrast the nodetool info command fails with an exception, which isn't 
> ideal but at least indicates that there was a failure rather than returning 
> stale information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9967) Determine if a Materialized View is finished building, without having to query each node

2015-09-23 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905074#comment-14905074
 ] 

Carl Yeksigian commented on CASSANDRA-9967:
---

A few ideas if someone wants to pick this up:

- We should use the {{system_distributed}} keyspace for this, and I think the 
primary key should be: {{(table_id, host_id)}}
- We should retry updating the table if we don't succeed, and make sure that on 
startup we have captured all of the builds that we have locally in the 
distributed table
- We need to make sure that we handle node membership properly
- We should make sure that we set the exit code if the view isn't built yet

> Determine if a Materialized View is finished building, without having to 
> query each node
> 
>
> Key: CASSANDRA-9967
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9967
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Alan Boudreault
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> Since MVs are eventually consistent with its base table, It would nice if we 
> could easily know the state of the MV after its creation, so we could wait 
> until the MV is built before doing some operations.
> // cc [~mbroecheler] [~tjake] [~carlyeks] [~enigmacurry]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10031) Name threads for improved ops/debugging

2015-09-23 Thread clint martin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904750#comment-14904750
 ] 

clint martin commented on CASSANDRA-10031:
--

would it be better to include this sort of information in the slf4j MDC, rather 
than altering the thread name every time a thread enters some task.  This way 
logging can be configured as needed per-class/task rather than forcing thread 
naming conventions?

> Name threads for improved ops/debugging
> ---
>
> Key: CASSANDRA-10031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10031
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> We currently provide basic names for threads in threads like {{STREAM-IN-1}}  
> which gives some basic information about what the job of the thread is.  
> When looking at a log statement or jstack it's helpful to have this context.
> For our work stealing thread pool we share threads across all thread pools so 
> we lose this insight.  
> I'd like to propose we start using the Thread.currentThread().setName("")
> In different aspects of the code to improve insight as to what cassandra is 
> doing at any given moment.
>* At a minimum in the start of each run() method.
>   Ideally for much finer grain things.
>* In compaction include the partition name currently being working on.  
>* In SP include the client ip
> Etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information

2015-09-23 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904720#comment-14904720
 ] 

Philip Thompson commented on CASSANDRA-10074:
-

This is only in help text, so I didn't bother running CI on this.

> cqlsh HELP SELECT_EXPR gives outdated incorrect information
> ---
>
> Key: CASSANDRA-10074
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10074
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: 3.0.0-alpha1-SNAPSHOT
>Reporter: Jim Meyer
>Assignee: Philip Thompson
>Priority: Trivial
>  Labels: cqlsh, lhf
> Fix For: 3.x
>
> Attachments: 10074.txt
>
>
> Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function 
> supported by CQL.
> It is missing a description of the SUM, AVG, MIN, and MAX built in functions.
> It should probably also mention that user defined functions can be invoked 
> via SELECT.
> The outdated text is in pylib/cqlshlib/helptopics.py under def 
> help_select_expr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information

2015-09-23 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson reassigned CASSANDRA-10074:
---

Assignee: Philip Thompson

> cqlsh HELP SELECT_EXPR gives outdated incorrect information
> ---
>
> Key: CASSANDRA-10074
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10074
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: 3.0.0-alpha1-SNAPSHOT
>Reporter: Jim Meyer
>Assignee: Philip Thompson
>Priority: Trivial
>  Labels: cqlsh, lhf
> Fix For: 3.x
>
>
> Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function 
> supported by CQL.
> It is missing a description of the SUM, AVG, MIN, and MAX built in functions.
> It should probably also mention that user defined functions can be invoked 
> via SELECT.
> The outdated text is in pylib/cqlshlib/helptopics.py under def 
> help_select_expr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information

2015-09-23 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10074:

Attachment: 10074.txt

> cqlsh HELP SELECT_EXPR gives outdated incorrect information
> ---
>
> Key: CASSANDRA-10074
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10074
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: 3.0.0-alpha1-SNAPSHOT
>Reporter: Jim Meyer
>Assignee: Philip Thompson
>Priority: Trivial
>  Labels: cqlsh, lhf
> Fix For: 3.x
>
> Attachments: 10074.txt
>
>
> Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function 
> supported by CQL.
> It is missing a description of the SUM, AVG, MIN, and MAX built in functions.
> It should probably also mention that user defined functions can be invoked 
> via SELECT.
> The outdated text is in pylib/cqlshlib/helptopics.py under def 
> help_select_expr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10391) sstableloader fails with client SSL enabled with non-standard keystore/truststore location

2015-09-23 Thread Jon Moses (JIRA)
Jon Moses created CASSANDRA-10391:
-

 Summary: sstableloader fails with client SSL enabled with 
non-standard keystore/truststore location
 Key: CASSANDRA-10391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10391
 Project: Cassandra
  Issue Type: Bug
 Environment: [cqlsh 4.1.1 | Cassandra 2.0.14.425 | DSE 4.6.6 | CQL 
spec 3.1.1 | Thrift protocol 19.39.0]

[cqlsh 5.0.1 | Cassandra 2.1.8.689 | DSE 4.7.3 | CQL spec 3.2.0 | Native 
protocol v3]
Reporter: Jon Moses


If client SSL is enabled, sstableloader is unable to access the keystore and 
truststore if they are not in the expected locations.  I reproduce this issue 
providing {{-f /path/to/cassandra.yaml}} as well as manually using the {{-ks}} 
flag with the proper path to the keystore.

For example:

{noformat}
client_encryption_options:
enabled: true
keystore: /var/tmp/.keystore
{noformat}

{noformat}
# sstableloader -d 172.31.2.240,172.31.2.241 -f 
/etc/dse/cassandra/cassandra.yaml Keyspace1/Standard1/
Could not retrieve endpoint ranges:
java.io.FileNotFoundException: /usr/share/dse/conf/.keystore
Run with --debug to get full stack trace or --help to get help.
#
# sstableloader -d 172.31.2.240,172.31.2.241 -ks /var/tmp/.keystore 
Keyspace1/Standard1/
Could not retrieve endpoint ranges:
java.io.FileNotFoundException: /usr/share/dse/conf/.keystore
Run with --debug to get full stack trace or --help to get help.
#
{noformat}

The full stack is:

{noformat}
# sstableloader -d 172.31.2.240,172.31.2.241 -f 
/etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/
Could not retrieve endpoint ranges:
java.io.FileNotFoundException: /usr/share/dse/conf/.keystore
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
Caused by: java.io.FileNotFoundException: /usr/share/dse/conf/.keystore
at 
com.datastax.bdp.transport.client.TClientSocketFactory.getSSLSocket(TClientSocketFactory.java:128)
at 
com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:114)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:186)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.createThriftClient(BulkLoader.java:302)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:254)
... 2 more
root@ip-172-31-2-240:/tmp/foo#
{noformat}.

If I copy the keystore to the expected location, I get the same error with the 
truststore.

{noformat}
# sstableloader -d 172.31.2.240,172.31.2.241 -f 
/etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/
Could not retrieve endpoint ranges:
java.io.FileNotFoundException: /usr/share/dse/conf/.truststore
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
Caused by: java.io.FileNotFoundException: /usr/share/dse/conf/.truststore
at 
com.datastax.bdp.transport.client.TClientSocketFactory.getSSLSocket(TClientSocketFactory.java:130)
at 
com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:114)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:186)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120)
at 
com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.createThriftClient(BulkLoader.java:302)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:254)
... 2 more
#
{noformat}

If I copy the truststore, it finds them both, but then fails to open them due 
to what I assume is a password error, even those it's present in the 
cassandra.yaml.

{noformat}
# sstableloader -d 172.31.2.240,172.31.2.241 -f 
/etc/dse/cassandra/cassandra.yaml --debug Keyspace1/Standard1/
Could not retrieve endpoint ranges:
java.io.IOException: Failed to open transport to: 172.31.2.240:9160
java.lang.RuntimeException: Could not retrieve endpoint range

[jira] [Commented] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values

2015-09-23 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904696#comment-14904696
 ] 

Benjamin Lerer commented on CASSANDRA-4386:
---

{quote}Wouldn't the results come back in secondary index order though?{quote}

I have not started working on this ticket but what I would expect is that:

If you have 2 index entries on the same node for value "A" and "B" 
"A" is in 3 rows with the primary keys:  pk1, pk5, pk 8
"B" is in 2 rows with primary keys: pk2 and pk3

What you will get will probably be: pk1, pk5, pk8, pk2 and pk3

> Allow cql to use the IN syntax on secondary index values
> 
>
> Key: CASSANDRA-4386
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4386
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremy Hanna
>Assignee: Benjamin Lerer
>Priority: Minor
>  Labels: cql
>
> Currently CQL has a syntax for using IN to get a set of rows with a set of 
> keys.  This would also be very helpful for use with columns with secondary 
> indexes on them.  Such as:
> {code}
> select * from users where first_name in ('françois','frank');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs

2015-09-23 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10390:

Assignee: Benjamin Lerer

> inconsistent quoted identifier handling in UDTs
> ---
>
> Key: CASSANDRA-10390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10390
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.2.1
>Reporter: Jonathan Halliday
>Assignee: Benjamin Lerer
> Fix For: 2.2.x
>
>
> > create keyspace test with replication = {'class': 'SimpleStrategy', 
> > 'replication_factor': 1 } ;
> > create type if not exists mytype ("my.field" text);
> > desc keyspace; -- observe that mytype is listed
> > create table mytable (pk int primary key, myfield frozen);
> > desc keyspace; -- observe that mytype is listed, but mytable is not.
> > select * from mytable;
> ValueError: Type names and field names can only contain alphanumeric 
> characters and underscores: 'my.field'
> create table myothertable (pk int primary key, "my.field" text);
> select * from myothertable; -- valid
> huh? It's valid to create a field of a table, or a field of a type, with a 
> quoted name containing non-alpha chars, but it's not valid to use a such a 
> type in a table?  I can just about live with that though it seems 
> unnecessarily restrictive, but allowing creation of such a table and then 
> making it invisible/unusable definitely seems wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs

2015-09-23 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10390:

Reproduced In: 2.2.1
Fix Version/s: 2.2.x

> inconsistent quoted identifier handling in UDTs
> ---
>
> Key: CASSANDRA-10390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10390
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.2.1
>Reporter: Jonathan Halliday
> Fix For: 2.2.x
>
>
> > create keyspace test with replication = {'class': 'SimpleStrategy', 
> > 'replication_factor': 1 } ;
> > create type if not exists mytype ("my.field" text);
> > desc keyspace; -- observe that mytype is listed
> > create table mytable (pk int primary key, myfield frozen);
> > desc keyspace; -- observe that mytype is listed, but mytable is not.
> > select * from mytable;
> ValueError: Type names and field names can only contain alphanumeric 
> characters and underscores: 'my.field'
> create table myothertable (pk int primary key, "my.field" text);
> select * from myothertable; -- valid
> huh? It's valid to create a field of a table, or a field of a type, with a 
> quoted name containing non-alpha chars, but it's not valid to use a such a 
> type in a table?  I can just about live with that though it seems 
> unnecessarily restrictive, but allowing creation of such a table and then 
> making it invisible/unusable definitely seems wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-4386) Allow cql to use the IN syntax on secondary index values

2015-09-23 Thread Steven Warren (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903314#comment-14903314
 ] 

Steven Warren edited comment on CASSANDRA-4386 at 9/23/15 2:43 PM:
---

I am fine with unordered rows for the IN clause, the current alternative with 
parallel queries also returns unordered results. I don't have a use case for 
the other operators, but assume that would be fine vs the alternative of not 
being supported.

EDIT: Wouldn't the results come back in secondary index order though?


was (Author: swarren):
I am fine with unordered rows for the IN clause, the current alternative with 
parallel queries also returns unordered results. I don't have a use case for 
the other operators, but assume that would be fine vs the alternative of not 
being supported.

> Allow cql to use the IN syntax on secondary index values
> 
>
> Key: CASSANDRA-4386
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4386
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremy Hanna
>Assignee: Benjamin Lerer
>Priority: Minor
>  Labels: cql
>
> Currently CQL has a syntax for using IN to get a set of rows with a set of 
> keys.  This would also be very helpful for use with columns with secondary 
> indexes on them.  Such as:
> {code}
> select * from users where first_name in ('françois','frank');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2015-09-23 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904474#comment-14904474
 ] 

Yuki Morishita commented on CASSANDRA-10389:


What kind of error do you see in replica nodes (in cblade1 or other nodes that 
failed to validate)?

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> {quote}
> 

[jira] [Comment Edited] (CASSANDRA-10280) Make DTCS work well with old data

2015-09-23 Thread Antti Nissinen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904458#comment-14904458
 ] 

Antti Nissinen edited comment on CASSANDRA-10280 at 9/23/15 12:53 PM:
--

I am also voting for discarding the max_sstable_age_days and limiting the 
compaction window size in DTCS. If the DTCS will have a major modifications 
then adopting the some of the ideas from TWCS would be beneficial and also 
trying to take into account the practical view points presented in several Jira 
items:

- limiting the window size in DTCS (this item, 
[CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280])

- using STCS in the newest window or if the amount of files exceeds the 
max_threshold 
([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276]  , 
[CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666])

- while compacting a large amount of files, start from small ones and progress 
towards larger ones (especially in the case of small sstables originated from 
repair operations) 
[CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597]

- setting limits for number of files compacted in one shot based on the sum of 
files sizes (not trying to compact several large files at ones and running out 
of disk space during the operation) 
[CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]

- round-robin approach for the selection of compaction window inside which next 
compaction will be executed. Target is to get rid of small files as soon as 
possible. At the moment TWCS and DTCS work with newer windows and progress 
towards the history when finished with the current one 
[CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]

Should we actually create a Jira item where we would collect the ideas for 
"ultimate time series compaction strategy" for more detailled discussion? At 
the moment these ideas are scattered around different items. Probably the above 
list is missing many of the relevant points.

Another important goal (our wish) for the time series data base is to able to 
wipe off data effectively so that disk space would be released as soon as 
possible. I tried to describe those ideas in 
[CASSANDRA-10306|https://issues.apache.org/jira/browse/CASSANDRA-10306], but 
there is no comments yet on that item. The main idea was to have a possibility 
split SSTables along a certain time line on all nodes so that SSTables could be 
dropped (like with TTL in DTCS and TWCS) or archived on different media where 
they can be digged up on some day if really needed. Deleting data efficiently 
on demand is presently one of the biggest obstacles for using C* in closed 
environments with fairly limited hardware resources for time series data 
collection. TTL is a working solution when you can predict data collection 
demands well before hand and have additional resources available if predictions 
don't match with the reality. 

What are the biggest obstacles in the present architecture for the below 
scenario?
- Decide a time stamp for the data deletion / archiving
- All existing SSTables on each node would be split to two files along the time 
line if the SSTable covers data on both sides of the time line.
- SSTables falling behind the timeline would be inactivated from the SSTable 
set (not participating any more on compactions or returning data on queries)
- you can decide if you want copy the files somewhere else or just simply 
delete those
- This tool could be used through the nodetool with external script


was (Author: anissinen):
I am also voting for discarding the max_sstable_age_days and limiting the 
compaction window size in DTCS. If the DTCS will have a major modifications 
then adopting the some of the ideas from TWCS would be beneficial and also 
trying to take into account the practical view points presented in several Jira 
items:

- limiting the window size in DTCS (this item, 
[CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280])

- using STCS in the newest window or if the amount of files exceeds the 
max_threshold 
([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276],[CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666])

- while compacting a large amount of files, start from small ones and progress 
towards larger ones (especially in the case of small sstables originated from 
repair operations) 
[CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597]

- setting limits for number of files compacted in one shot based on the sum of 
files sizes (not trying to compact several large files at ones and running out 
of disk space during the operation) 
[CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]

- round-robin approach for the selection of compaction window inside which next 
compa

[jira] [Commented] (CASSANDRA-10280) Make DTCS work well with old data

2015-09-23 Thread Antti Nissinen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904458#comment-14904458
 ] 

Antti Nissinen commented on CASSANDRA-10280:


I am also voting for discarding the max_sstable_age_days and limiting the 
compaction window size in DTCS. If the DTCS will have a major modifications 
then adopting the some of the ideas from TWCS would be beneficial and also 
trying to take into account the practical view points presented in several Jira 
items:

- limiting the window size in DTCS (this item, 
[CASSANDRA-10280|https://issues.apache.org/jira/browse/CASSANDRA-10280])

- using STCS in the newest window or if the amount of files exceeds the 
max_threshold 
([CASSANDRA-10276|https://issues.apache.org/jira/browse/CASSANDRA-10276],[CASSANDRA-9666|https://issues.apache.org/jira/browse/CASSANDRA-9666])

- while compacting a large amount of files, start from small ones and progress 
towards larger ones (especially in the case of small sstables originated from 
repair operations) 
[CASSANDRA-9597|https://issues.apache.org/jira/browse/CASSANDRA-9597]

- setting limits for number of files compacted in one shot based on the sum of 
files sizes (not trying to compact several large files at ones and running out 
of disk space during the operation) 
[CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]

- round-robin approach for the selection of compaction window inside which next 
compaction will be executed. Target is to get rid of small files as soon as 
possible. At the moment TWCS and DTCS work with newer windows and progress 
towards the history when finished with the current one 
[CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]

Should we actually create a Jira item where we would collect the ideas for 
"ultimate time series compaction strategy" for more detailled discussion? At 
the moment these ideas are scattered around different items. Probably the above 
list is missing many of the relevant points.

Another important goal (our wish) for the time series data base is to able to 
wipe off data effectively so that disk space would be released as soon as 
possible. I tried to describe those ideas in 
[CASSANDRA-10306|https://issues.apache.org/jira/browse/CASSANDRA-10306], but 
there is no comments yet on that item. The main idea was to have a possibility 
split SSTables along a certain time line on all nodes so that SSTables could be 
dropped (like with TTL in DTCS and TWCS) or archived on different media where 
they can be digged up on some day if really needed. Deleting data efficiently 
on demand is presently one of the biggest obstacles for using C* in closed 
environments with fairly limited hardware resources for time series data 
collection. TTL is a working solution when you can predict data collection 
demands well before hand and have additional resources available if predictions 
don't match with the reality. 

What are the biggest obstacles in the present architecture for the below 
scenario?
- Decide a time stamp for the data deletion / archiving
- All existing SSTables on each node would be split to two files along the time 
line if the SSTable covers data on both sides of the time line.
- SSTables falling behind the timeline would be inactivated from the SSTable 
set (not participating any more on compactions or returning data on queries)
- you can decide if you want copy the files somewhere else or just simply 
delete those
- This tool could be used through the nodetool with external script

> Make DTCS work well with old data
> -
>
> Key: CASSANDRA-10280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10280
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> Operational tasks become incredibly expensive if you keep around a long 
> timespan of data with DTCS - with default settings and 1 year of data, the 
> oldest window covers about 180 days. Bootstrapping a node with vnodes with 
> this data layout will force cassandra to compact very many sstables in this 
> window.
> We should probably put a cap on how big the biggest windows can get. We could 
> probably default this to something sane based on max_sstable_age (ie, say we 
> can reasonably handle 1000 sstables per node, then we can calculate how big 
> the windows should be to allow that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10390) inconsistent quoted identifier handling in UDTs

2015-09-23 Thread Jonathan Halliday (JIRA)
Jonathan Halliday created CASSANDRA-10390:
-

 Summary: inconsistent quoted identifier handling in UDTs
 Key: CASSANDRA-10390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10390
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.2.1
Reporter: Jonathan Halliday


> create keyspace test with replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1 } ;
> create type if not exists mytype ("my.field" text);
> desc keyspace; -- observe that mytype is listed
> create table mytable (pk int primary key, myfield frozen);
> desc keyspace; -- observe that mytype is listed, but mytable is not.
> select * from mytable;
ValueError: Type names and field names can only contain alphanumeric characters 
and underscores: 'my.field'
create table myothertable (pk int primary key, "my.field" text);
select * from myothertable; -- valid

huh? It's valid to create a field of a table, or a field of a type, with a 
quoted name containing non-alpha chars, but it's not valid to use a such a type 
in a table?  I can just about live with that though it seems unnecessarily 
restrictive, but allowing creation of such a table and then making it 
invisible/unusable definitely seems wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10389) Repair session exception Validation failed

2015-09-23 Thread JIRA
Jędrzej Sieracki created CASSANDRA-10389:


 Summary: Repair session exception Validation failed
 Key: CASSANDRA-10389
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
compilation)
Reporter: Jędrzej Sieracki


I'm running a repair on a ring of nodes, that was recently extented from 3 to 
13 nodes. The extension was done two days ago, the repair was attempted 
yesterday.

{quote}
[2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
perspectiv with repair options (parallelism: parallel, primary range: false, 
incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: 
[], # of ranges: 517)
[2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
for range (-5927186132136652665,-5917344746039874798] failed with error [repair 
#1f7c50c0-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.XXX/XXX (progress: 0%)
{quote}

BTW, I am ignoring the LEAK errors for now, that's outside of the scope of the 
main issue:
{quote}
ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
 was not released before the reference was garbage collected
{quote}

I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
perspectiv stock_increment_agg:

{quote}
INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 - 
Scrubbing 
BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
 (345466609 bytes)
INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 - 
Scrubbing 
BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
 (60496378 bytes)
ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) 
to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
 was not released before the reference was garbage collected
INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 - 
Scrub of 
BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
 complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 - 
Scrub of 
BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
 complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
{quote}

Now, after scrubbing, another repair was attempted, it did finish, but with 
lots of errors from other nodes:
{quote}
[2015-09-22 12:01:18,020] Repair session db476b51-6110-11e5-b992-9f13fa8664c8 
for range (5019296454787813261,5021512586040808168] failed with error [repair 
#db476b51-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(5019296454787813261,5021512586040808168]] Validation failed in /10.YYY 
(progress: 91%)
[2015

[jira] [Comment Edited] (CASSANDRA-10212) cassandra-env.sh may be sourced twice by debian init script

2015-09-23 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904220#comment-14904220
 ] 

Stefan Podkowinski edited comment on CASSANDRA-10212 at 9/23/15 9:25 AM:
-

Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS 
arguments. Removing sourcing of cassandra-env.sh in the init script should be 
safe as the init script will not directly use JVM_OPTS anyway. 

Edit: actually arguments are not strictly equals in the following case:
{{-XX:CompileCommandFile=/hotspot_compiler 
-XX:CompileCommandFile=/etc/cassandra/hotspot_compiler}}

{{cassandra-env.sh}} expects {{CASSANDRA_CONF}} to be set for 
{{-XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"}} which is not the 
case when sourcing from {{/etc/init.d/cassandra}}. 


was (Author: spo...@gmail.com):
Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS 
arguments. Removing sourcing of cassandra-env.sh in the init script should be 
safe as the init script will not directly use JVM_OPTS anyway. 

> cassandra-env.sh may be sourced twice by debian init script
> ---
>
> Key: CASSANDRA-10212
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10212
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Enrico Canzonieri
>Assignee: Michael Shuler
>Priority: Minor
>
> It seems that when cassandra is run as a service using the init script the 
> {{/etc/cassandra/cassandra-env.sh}} file is sourced twice.
> This file is sourced the first time in the 
> [init|https://github.com/apache/cassandra/blob/trunk/debian/init] script. The 
> init script then executes 
> [{{/usr/sbin/cassandra}}|https://github.com/apache/cassandra/blob/trunk/bin/cassandra],
>  the latter eventually does source {{cassandra-env.sh}} as 
> {{$CASSANDRA_CONF/cassandra-env}}.
> CASSANDRA_CONF is finally defined in 
> [{{cassandra.in.sh}}|https://github.com/apache/cassandra/blob/trunk/debian/cassandra.in.sh]
>  as {{/etc/cassandra}}. 
> I guess in this case the init script should not source {{cassandra-env}} at 
> all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10212) cassandra-env.sh may be sourced twice by debian init script

2015-09-23 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904220#comment-14904220
 ] 

Stefan Podkowinski commented on CASSANDRA-10212:


Sourcing cassandra-env.sh twice will execute the jvm with duplicate JVM_OPTS 
arguments. Removing sourcing of cassandra-env.sh in the init script should be 
safe as the init script will not directly use JVM_OPTS anyway. 

> cassandra-env.sh may be sourced twice by debian init script
> ---
>
> Key: CASSANDRA-10212
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10212
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Enrico Canzonieri
>Assignee: Michael Shuler
>Priority: Minor
>
> It seems that when cassandra is run as a service using the init script the 
> {{/etc/cassandra/cassandra-env.sh}} file is sourced twice.
> This file is sourced the first time in the 
> [init|https://github.com/apache/cassandra/blob/trunk/debian/init] script. The 
> init script then executes 
> [{{/usr/sbin/cassandra}}|https://github.com/apache/cassandra/blob/trunk/bin/cassandra],
>  the latter eventually does source {{cassandra-env.sh}} as 
> {{$CASSANDRA_CONF/cassandra-env}}.
> CASSANDRA_CONF is finally defined in 
> [{{cassandra.in.sh}}|https://github.com/apache/cassandra/blob/trunk/debian/cassandra.in.sh]
>  as {{/etc/cassandra}}. 
> I guess in this case the init script should not source {{cassandra-env}} at 
> all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)