[jira] [Created] (CASSANDRA-11976) cqlsh tab completion doesn't work in 2.1

2016-06-08 Thread Yusuke Takata (JIRA)
Yusuke Takata created CASSANDRA-11976:
-

 Summary: cqlsh tab completion doesn't work in 2.1
 Key: CASSANDRA-11976
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11976
 Project: Cassandra
  Issue Type: Bug
  Components: CQL
Reporter: Yusuke Takata
Priority: Minor


cqlsh tab completion doesn't work when there are two tables with the same 
prefix.
I found the similar completion issue at CASSANDRA-10733, but my problem is not 
fixed by the existing issue in 2.1.
{code}
cqlsh> CREATE KEYSPACE sample_ks WITH replication = {'class': 'SimpleStrategy' 
, 'replication_factor': 1 };
cqlsh> CREATE TABLE sample_ks.tbl_a ( key text PRIMARY KEY );
cqlsh> CREATE TABLE sample_ks.tbl_b ( key text PRIMARY KEY, value int );

// works correctly
cqlsh> INSERT INTO sample_ks.tb
cqlsh> INSERT INTO sample_ks.tbl_

// fix required
cqlsh> INSERT INTO samp
cqlsh> INSERT INTO sample_ks.tbl_( 
{code}

Also, completion doesn't work with a single column table.
{code}
cqlsh> CREATE KEYSPACE sample_ks WITH replication = {'class': 'SimpleStrategy' 
, 'replication_factor': 1 };
cqlsh> CREATE TABLE sample_ks.tbl_a ( key text PRIMARY KEY );
cqlsh> CREATE TABLE sample_ks.tbl_b ( key text PRIMARY KEY, value int );

// fix required (unnecessary comma)
cqlsh> INSERT INTO sample_ks.tbl_a
cqlsh> INSERT INTO sample_ks.tbl_a (key,
// fix required (no reaction)
cqlsh> INSERT INTO sample_ks.tbl_a (key) VALU
cqlsh> INSERT INTO sample_ks.tbl_a (key) VALU
// fix required (I can't insert only a key.)
cqlsh> INSERT INTO sample_ks.tbl_b
cqlsh> INSERT INTO sample_ks.tbl_b (key, value
{code}
I fixed the completion problem in 2.1 branch. Could someone review the attached 
patch?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11976) cqlsh tab completion doesn't work in 2.1

2016-06-08 Thread Yusuke Takata (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yusuke Takata updated CASSANDRA-11976:
--
Attachment: CASSANDRA-11976-2.patch
CASSANDRA-11976-1.patch

> cqlsh tab completion doesn't work in 2.1
> 
>
> Key: CASSANDRA-11976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Yusuke Takata
>Priority: Minor
>  Labels: cqlsh
> Attachments: CASSANDRA-11976-1.patch, CASSANDRA-11976-2.patch
>
>
> cqlsh tab completion doesn't work when there are two tables with the same 
> prefix.
> I found the similar completion issue at CASSANDRA-10733, but my problem is 
> not fixed by the existing issue in 2.1.
> {code}
> cqlsh> CREATE KEYSPACE sample_ks WITH replication = {'class': 
> 'SimpleStrategy' , 'replication_factor': 1 };
> cqlsh> CREATE TABLE sample_ks.tbl_a ( key text PRIMARY KEY );
> cqlsh> CREATE TABLE sample_ks.tbl_b ( key text PRIMARY KEY, value int );
> // works correctly
> cqlsh> INSERT INTO sample_ks.tb
> cqlsh> INSERT INTO sample_ks.tbl_
> // fix required
> cqlsh> INSERT INTO samp
> cqlsh> INSERT INTO sample_ks.tbl_( 
> {code}
> Also, completion doesn't work with a single column table.
> {code}
> cqlsh> CREATE KEYSPACE sample_ks WITH replication = {'class': 
> 'SimpleStrategy' , 'replication_factor': 1 };
> cqlsh> CREATE TABLE sample_ks.tbl_a ( key text PRIMARY KEY );
> cqlsh> CREATE TABLE sample_ks.tbl_b ( key text PRIMARY KEY, value int );
> // fix required (unnecessary comma)
> cqlsh> INSERT INTO sample_ks.tbl_a
> cqlsh> INSERT INTO sample_ks.tbl_a (key,
> // fix required (no reaction)
> cqlsh> INSERT INTO sample_ks.tbl_a (key) VALU
> cqlsh> INSERT INTO sample_ks.tbl_a (key) VALU
> // fix required (I can't insert only a key.)
> cqlsh> INSERT INTO sample_ks.tbl_b
> cqlsh> INSERT INTO sample_ks.tbl_b (key, value
> {code}
> I fixed the completion problem in 2.1 branch. Could someone review the 
> attached patch?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11977) Replace stream source selection should be deterministic

2016-06-08 Thread Thom Valley (JIRA)
Thom Valley created CASSANDRA-11977:
---

 Summary: Replace stream source selection should be deterministic
 Key: CASSANDRA-11977
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11977
 Project: Cassandra
  Issue Type: Improvement
  Components: Streaming and Messaging
 Environment: 2.1.14 / 42 Nodes / 5 DCs
Reporter: Thom Valley


The current method for dealing with the impact of inter-dc latency on bootstrap 
and replace node operations is to turn the the ring_delay and "hope" that 
gossip settles appropriately.

Even with a ring_delay of 5 minutes, we are seeing remote DCs being used as 
sources for node replacements.  We also are seeing a variable number of nodes 
being used for different replace operations.

For example, in a multiple replace test run, we have seen a node take as little 
as 3 hours (when local dc is utilized) to as much as 9 hours to complete it's 
replacement process at ~500GB of data / node on LCS.  That's a 3X variation.

Replacing a node (or bootstrapping a new node) should be done in as 
deterministic and efficient a manner as possible.  Repeatable results at a 
given topology / data density are important for operational planning.

This has a significant impact on operational planning in large environments, 
especially when globally distributed data centers are in play.

The promise of Cassandra is that it is operationally simple and highly fault 
tolerant.  How long it takes to recover a node is a critical aspect of 
maintaining that fault tolerance and reducing risk.

Remote DC links are also not just slower, but generally more expensive from a 
transport cost standpoint and frequently are bandwidth constrained.  Using a 
remote DC unnecessarily for a bootstrap / repair operation adds risk to other 
users of that resource.

Replacing a node and/or bootstrapping a node should ideally:
-Always use the local_dc if it is available
-Stream from as many nodes as possible (to reduce total time to complete)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11978) StreamReader fails to write sstable if CF directory is symlink

2016-06-08 Thread Michael Frisch (JIRA)
Michael Frisch created CASSANDRA-11978:
--

 Summary: StreamReader fails to write sstable if CF directory is 
symlink
 Key: CASSANDRA-11978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11978
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Michael Frisch


I'm using Cassandra v2.2.6.  If the CF is stored as a symlink in the keyspace 
directory on disk then StreamReader.createWriter fails because 
Descriptor.fromFilename is passed the actual path on disk instead of path with 
the symlink.

Example:
/path/to/data/dir/Keyspace/CFName -> /path/to/data/dir/AnotherDisk/CFName

Descriptor.fromFilename is passed "/path/to/data/dir/AnotherDisk/CFName" 
instead of "/path/to/data/dir/Keyspace/CFName", then it concludes that the 
keyspace name is "AnotherDisk" which is erroneous. I've temporarily worked 
around this by using cfs.keyspace.getName() to get the keyspace name and 
cfs.name to get the CF name as those are correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320612#comment-15320612
 ] 

Nate McCall commented on CASSANDRA-8700:


bq. Truly, I see 2 big advantages to having the doc in-tree:

I want to add what I consider to be the most important advantage: doing this 
will lower the barrier to entry for community participation and potentially 
allow us to expand the contributor base. We have some of the best operators 
around contributing to the community on the ML, in IRC, etc. and this will 
provide an avenue for those who can't/don't want to write Java to participate. 

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320624#comment-15320624
 ] 

Nate McCall commented on CASSANDRA-7622:


bq. exposing metrics read-only is a viable first version and want to start with 
that.

Yes please to scoping for read-only. Also, {{SHOW VARIABLES}} would be a cause 
for celebration and cheer among our community of operations folks (would not 
even care about filtering for first a first version either, just that we could 
get it). 

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320710#comment-15320710
 ] 

Sylvain Lebresne commented on CASSANDRA-7622:
-

bq. referring to this as "Exposing jmx via cql"

That's a slight simplification, we're more talking here of exposing _metrics_ 
through CQL. We are, in particular, not at all talking about using some mbeans 
in any way. We could, maybe, create a custom Reporter to do this, I hadn't 
considered this, and that would have some convenience, but the other solution 
is just to manually register each existing metrics through whatever solution we 
come up with here. The advantage of the latter, manual solution is that we'll 
probably have more control on how the actual (virtual) tables look like, which 
I personally like (we'd be trading a bit of manual labor for a better user 
experience, which I'd say is a fair trade-off), but certainly happy to consider 
the other options.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320714#comment-15320714
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

Initially the ticket was about exposing configuration over CQL, not metrics. 
FWIW I'm not convinced that exposing all metrics over virtual tables even makes 
sense.

bq. Yes please to scoping for read-only. Also, SHOW VARIABLES would be a cause 
for celebration and cheer among our community of operations folks (would not 
even care about filtering for first a first version either, just that we could 
get it).

Yes to both. Let's start with {{SHOW VARIABLES}} and go from there.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11979) cqlsh copyutil should get host metadata by connected address

2016-06-08 Thread Adam Holmberg (JIRA)
Adam Holmberg created CASSANDRA-11979:
-

 Summary: cqlsh copyutil should get host metadata by connected 
address
 Key: CASSANDRA-11979
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11979
 Project: Cassandra
  Issue Type: Bug
Reporter: Adam Holmberg
Priority: Minor
 Fix For: 2.2.x, 3.x


pylib.copyutil presently accesses cluster metadata using {{shell.hostname}} 
which could be an unresolved hostname.
https://github.com/apache/cassandra/blob/58d3b9a90461806d44dd85bf4aa928e575d5fb6c/pylib/cqlshlib/copyutil.py#L207

Cluster metadata normally refers to hosts in terms of numeric host address, not 
hostname. This works in the current integration because the driver allows hosts 
with unresolved names into metadata during the initial control connection. In a 
future version of the driver, that anomaly is removed, and no duplicate 
hosts-by-name are present in the metadata.

We will need to update copyutil to refer to hosts by address when accessing 
metadata. This can be accomplished by one of two methods presently:

# shell.conn.control_connection.host (gives the current connected host address)
# scan metadata.all_hosts() for the one that {{is_up}} and use 
host.address/host.datacenter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-06-08 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9666:
---
Fix Version/s: (was: 3.0.x)
   3.0.8

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.8, 3.0.8
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320778#comment-15320778
 ] 

Chris Lohfink commented on CASSANDRA-7622:
--

Would CASSANDRA-3527 be a good place to discuss the metrics details? I have 
some ideas around it and I would love to use it (I would be willing to attempt 
an implementation as well), but I consume a decent majority of the metrics so 
hand picked lists probably wouldn't be sufficient for me.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10869) paging_test.py:TestPagingWithDeletions.test_failure_threshold_deletions dtest fails on 2.1

2016-06-08 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320789#comment-15320789
 ] 

Jim Witschey commented on CASSANDRA-10869:
--

This has failed in the same way on 3.7:

http://cassci.datastax.com/job/cassandra-3.7-tentative_offheap_dtest/lastCompletedBuild/testReport/paging_test/TestPagingWithDeletions/test_failure_threshold_deletions/

> paging_test.py:TestPagingWithDeletions.test_failure_threshold_deletions dtest 
> fails on 2.1
> --
>
> Key: CASSANDRA-10869
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10869
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jim Witschey
>  Labels: dtest
> Fix For: 2.1.x
>
>
> This test is failing hard on 2.1. Here is its history on the JDK8 job for 
> cassandra-2.1:
> http://cassci.datastax.com/job/cassandra-2.1_dtest_jdk8/lastCompletedBuild/testReport/paging_test/TestPagingWithDeletions/test_failure_threshold_deletions/history/
> and on the JDK7 job:
> http://cassci.datastax.com/job/cassandra-2.1_dtest/lastCompletedBuild/testReport/paging_test/TestPagingWithDeletions/test_failure_threshold_deletions/history/
> It fails because a read times out after ~1.5 minutes. If this is a test 
> error, it's specific to 2.1, because the test passes consistently on newer 
> versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11979) cqlsh copyutil should get host metadata by connected address

2016-06-08 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania reassigned CASSANDRA-11979:


Assignee: Stefania

> cqlsh copyutil should get host metadata by connected address
> 
>
> Key: CASSANDRA-11979
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11979
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Holmberg
>Assignee: Stefania
>Priority: Minor
> Fix For: 2.2.x, 3.x
>
>
> pylib.copyutil presently accesses cluster metadata using {{shell.hostname}} 
> which could be an unresolved hostname.
> https://github.com/apache/cassandra/blob/58d3b9a90461806d44dd85bf4aa928e575d5fb6c/pylib/cqlshlib/copyutil.py#L207
> Cluster metadata normally refers to hosts in terms of numeric host address, 
> not hostname. This works in the current integration because the driver allows 
> hosts with unresolved names into metadata during the initial control 
> connection. In a future version of the driver, that anomaly is removed, and 
> no duplicate hosts-by-name are present in the metadata.
> We will need to update copyutil to refer to hosts by address when accessing 
> metadata. This can be accomplished by one of two methods presently:
> # shell.conn.control_connection.host (gives the current connected host 
> address)
> # scan metadata.all_hosts() for the one that {{is_up}} and use 
> host.address/host.datacenter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11980) Reads at EACH_QUORUM not respecting the level with read repair or speculative retry active

2016-06-08 Thread Aleksey Yeschenko (JIRA)
Aleksey Yeschenko created CASSANDRA-11980:
-

 Summary: Reads at EACH_QUORUM not respecting the level with read 
repair or speculative retry active
 Key: CASSANDRA-11980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11980
 Project: Cassandra
  Issue Type: Bug
  Components: Coordination
Reporter: Aleksey Yeschenko


{{ReadCallback::waitingFor()}} is not sophisticated enough to correctly count 
replies from replicas towards {{blockFor}}, and can return to the client before 
getting an actual quorum in each of the DCs.

Assume DC1: n1, n2, n3; DC2: n4, n5, n6; blockFor in this case would be 4. 
{{ReadCallback}} does not count replies from different DCs separately, however, 
so if the replies return in order of n1, n2, n3, n4, the request will still 
succeed, having achieved 4, despite not getting a quorum from DC2.

The bug potentially manifests itself if RR.GLOBAL, RR.LOCAL, or any speculative 
retry triggers.

The easiest fix would be to temporarily disable RR and speculative retry on 
each quorum reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11980) Reads at EACH_QUORUM not respecting the level with read repair or speculative retry active

2016-06-08 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11980:
--
Since Version: 3.0.0 rc2

> Reads at EACH_QUORUM not respecting the level with read repair or speculative 
> retry active
> --
>
> Key: CASSANDRA-11980
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11980
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>
> {{ReadCallback::waitingFor()}} is not sophisticated enough to correctly count 
> replies from replicas towards {{blockFor}}, and can return to the client 
> before getting an actual quorum in each of the DCs.
> Assume DC1: n1, n2, n3; DC2: n4, n5, n6; blockFor in this case would be 4. 
> {{ReadCallback}} does not count replies from different DCs separately, 
> however, so if the replies return in order of n1, n2, n3, n4, the request 
> will still succeed, having achieved 4, despite not getting a quorum from DC2.
> The bug potentially manifests itself if RR.GLOBAL, RR.LOCAL, or any 
> speculative retry triggers.
> The easiest fix would be to temporarily disable RR and speculative retry on 
> each quorum reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320830#comment-15320830
 ] 

Jeremiah Jordan commented on CASSANDRA-7622:


bq. Hand picked lists

I don't think that's the right way to view it, hand coded, not hand picked. I 
think we would expose everything that is in the metric registry. Just not 
necessarily using the registry. 

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320844#comment-15320844
 ] 

Robert Stupp commented on CASSANDRA-7622:
-

I think, we need some "routing" ability for statements against virtual tables - 
whether that's implemented as a {{WHERE node='1.2.3.4'}} at the one extreme or 
using some sort of {{Statement.setTargetNode()}} method in the drivers. 
Otherwise we have absolutely no control which coordinator executes a statement 
- and for metrics/configs/etc it is important to know which node executes the 
statement. IMO, a where-clause looks best.

+1 on starting with a {{SHOW VARIABLES}} - and just with that and see how it 
works - especially with the explicit statement routing problematic. Metrics 
could come next - they have their own pitfalls. Then we have some experience 
with this "distributed administration via CQL" and can probably properly tackle 
configuration changes and administrative commands.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320846#comment-15320846
 ] 

 Brian Hess commented on CASSANDRA-7622:


I'm still curious how you would get the JMX values (SHOW VARIABLES or whatever) 
for a particular node. That is, how would this look syntactically (WHERE 
clause? Special CQL?)? And if it's something like a WHERE clause, then how will 
the driver route the query correctly? Would it be through a custom 
LoadBalancingPolicy? If so, then it could be hard to mix normal CQL with these 
(as you'd want a different LBP for the normal CQL queries). 

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320856#comment-15320856
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

bq. I'm still curious how you would get the JMX values (SHOW VARIABLES or 
whatever) for a particular node.

I expect {{SHOW VARIABLES}} in particular to be mostly used from cqlsh, in 
which case you know exactly the node you are connecting to.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-06-08 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320861#comment-15320861
 ] 

Carl Yeksigian commented on CASSANDRA-8844:
---

I've been testing this using the process from CASSANDRA-11575. Everything seems 
to be working.

One thing that is pathologically bad is when someone mixes writes with a slow 
flushing and fast flushing tables. There probably needs to be some backpressure 
between the commitlogs (especially those which are counting against the CDC 
total) and the memtables -- should be part of a follow-on ticket, though.

I'm still reviewing the patch.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order, even when it is multiple l

[jira] [Assigned] (CASSANDRA-11980) Reads at EACH_QUORUM not respecting the level with read repair or speculative retry active

2016-06-08 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian reassigned CASSANDRA-11980:
--

Assignee: Carl Yeksigian

> Reads at EACH_QUORUM not respecting the level with read repair or speculative 
> retry active
> --
>
> Key: CASSANDRA-11980
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11980
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Carl Yeksigian
>
> {{ReadCallback::waitingFor()}} is not sophisticated enough to correctly count 
> replies from replicas towards {{blockFor}}, and can return to the client 
> before getting an actual quorum in each of the DCs.
> Assume DC1: n1, n2, n3; DC2: n4, n5, n6; blockFor in this case would be 4. 
> {{ReadCallback}} does not count replies from different DCs separately, 
> however, so if the replies return in order of n1, n2, n3, n4, the request 
> will still succeed, having achieved 4, despite not getting a quorum from DC2.
> The bug potentially manifests itself if RR.GLOBAL, RR.LOCAL, or any 
> speculative retry triggers.
> The easiest fix would be to temporarily disable RR and speculative retry on 
> each quorum reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320874#comment-15320874
 ] 

 Brian Hess commented on CASSANDRA-7622:


If the plan is to enable writing to these virtual tables, then we should think 
more about a SELECT type syntax. That would make the future UPDATE operations 
make a lot more sense. And if we are thinking that eventually we will enable 
user-defined VTs then INSERT and DELETE operations may make sense. There really 
is no need for special syntax here, so I'd caution against that. 

For that reason, I'd suggest we stick with SELECT syntax even in this first use 
case - in a forward thinking way. 

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320876#comment-15320876
 ] 

 Brian Hess commented on CASSANDRA-7622:


Jinx

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11884) dtest failure in secondary_indexes_test.TestSecondaryIndexesOnCollections.test_tuple_indexes

2016-06-08 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-11884:

   Resolution: Fixed
Fix Version/s: (was: 3.0.x)
   (was: 2.2.x)
   (was: 3.x)
   3.0.7
   3.7
   2.2.7
   Status: Resolved  (was: Patch Available)

CI across all affected branches looks good since the last CASSANDRA-9669 patch, 
so we're good to close this now.

> dtest failure in 
> secondary_indexes_test.TestSecondaryIndexesOnCollections.test_tuple_indexes
> 
>
> Key: CASSANDRA-11884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11884
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Assignee: Branimir Lambov
>  Labels: dtest
> Fix For: 2.2.7, 3.7, 3.0.7
>
> Attachments: node1.log, node1_debug.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1234/testReport/secondary_indexes_test/TestSecondaryIndexesOnCollections/test_tuple_indexes
> Failed on CassCI build trunk_dtest #1234
> Logs are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11954) Generalize SASI indexes

2016-06-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320885#comment-15320885
 ] 

Andrés de la Peña commented on CASSANDRA-11954:
---

Yes, that's exactly what I mean. Refactoring the SASI code to decouple the 
general SSTable-attached stuff from the specific index data structure and its 
associated functionalities.

> Generalize SASI indexes
> ---
>
> Key: CASSANDRA-11954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11954
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, sasi
>Reporter: Andrés de la Peña
>Priority: Minor
>  Labels: 2i, sasi, secondary_index
> Fix For: 3.x
>
>
> It would be great to independize SASI indexes from their underlying index 
> structure and query syntax. This way it would be easy to create new custom 
> SSTable attached index implementations for specific use cases. 
> The API could consist on two of interfaces, one for on-memory indexes and 
> other for on-disk indexes, implemented by users and invoked by Cassandra when 
> there are row writes, SSTable flushes, compactions, etc.
> As an example, the API could be used to build an efficient SASI geospatial 
> index based on R-trees:
> {code}
> CREATE TABLE locations (
> id text,
> date timeuuid,
> location tuple,
> PRIMARY KEY (id, date)
> );
> CREATE CUSTOM INDEX idx ON locations () USING '...' WITH OPTIONS = {...};
> INSERT INTO locations (id, date, location) VALUES ('alice', now(), (-0.18676, 
> 51.66870));
> SELECT * FROM locations WHERE expr(idx, 'POLYGON((-0.25 51.76, -0.25 51.54, 
> -0.027 51.65, -0.25 51.76))');
> {code}
> Also, custom SASI indexes predicates could be combined with regular SASI 
> indexes predicates in the same query, which would be very nice.
> What do you think? Does it make any sense?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320887#comment-15320887
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

Sure, but {{SHOW}} is a common enough command in DB world (postgres and mysql 
both have it) that I wouldn't mind having it implemented in C* - paired with 
{{SET}} for the write side.

Once we have that, we can consider whether or not we actually have use cases 
for general purpose virtual tables.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320891#comment-15320891
 ] 

Robert Stupp commented on CASSANDRA-7622:
-

I'm with you, that sth like SHOW VARIABLES would be used from cqlsh. But the 
node just added as the initial contact point. When that node fails, the 
statement would be silently executed against another node. For SHOW VARIABLES 
this is not super dramatic - but for an ALTER SYSTEM DECOMMISSION it is. That's 
why I proposed {{WHERE node='1.2.3.4'}} as part of the syntax. At least, the 
coordinator can check, whether it finds its own IP in the WHERE clause.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11979) cqlsh copyutil should get host metadata by connected address

2016-06-08 Thread Adam Holmberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320895#comment-15320895
 ] 

Adam Holmberg commented on CASSANDRA-11979:
---

We will provide an API for this in the next release:
https://datastax-oss.atlassian.net/browse/PYTHON-583

> cqlsh copyutil should get host metadata by connected address
> 
>
> Key: CASSANDRA-11979
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11979
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Holmberg
>Assignee: Stefania
>Priority: Minor
> Fix For: 2.2.x, 3.x
>
>
> pylib.copyutil presently accesses cluster metadata using {{shell.hostname}} 
> which could be an unresolved hostname.
> https://github.com/apache/cassandra/blob/58d3b9a90461806d44dd85bf4aa928e575d5fb6c/pylib/cqlshlib/copyutil.py#L207
> Cluster metadata normally refers to hosts in terms of numeric host address, 
> not hostname. This works in the current integration because the driver allows 
> hosts with unresolved names into metadata during the initial control 
> connection. In a future version of the driver, that anomaly is removed, and 
> no duplicate hosts-by-name are present in the metadata.
> We will need to update copyutil to refer to hosts by address when accessing 
> metadata. This can be accomplished by one of two methods presently:
> # shell.conn.control_connection.host (gives the current connected host 
> address)
> # scan metadata.all_hosts() for the one that {{is_up}} and use 
> host.address/host.datacenter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320913#comment-15320913
 ] 

 Brian Hess commented on CASSANDRA-7622:


I think this is a bit narrow of thinking. I can see many use cases for 
programmatic access to these values. If SHOW VARIABLE is useful in cqlsh it 
could have special cqlsh commands that convert into this. 
If you want to mix regular CQL and these, you'd need different Sessions as 
you'd need separate Cluster objects (so you can have a WhiteListPolicy for the 
one node of interest). Or if you want to look at the metrics for multiple nodes 
you would need one Cluster object per node.  

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320933#comment-15320933
 ] 

 Brian Hess commented on CASSANDRA-7622:


For metrics, Postgres appears to use a SELECT syntax. For system settings they 
use SHOW/SET. 

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320939#comment-15320939
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

bq. But the node just added as the initial contact point. When that node fails, 
the statement would be silently executed against another node.

If true this is something that needs to be changed, as it affects reading 
system tables.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320954#comment-15320954
 ] 

Sylvain Lebresne commented on CASSANDRA-7622:
-

Drivers, at least the java one, _have_ everything they need to allow forcing 
which node it routes queries to (it may not be exposed too conveniently so far, 
but we can let them do better there). And given that system table are already 
local and we're only talking about adding more system table (just with a 
different backing implementation), I don't think we need to invent anything new 
here.  

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320961#comment-15320961
 ] 

Sylvain Lebresne commented on CASSANDRA-7622:
-

bq. Sure, but SHOW is a common enough command in DB world

Imo, the general mechanism of virtual tables should be just an implementation 
detail for exposing a table. That means even in the case of {{SHOW VARIABLES}}, 
we'd really be adding a new system table with those variables, and we'll be 
able to query it through normal select. That certainly doesn't prevent us for 
supporting {{SHOW VARIABLES}} as syntactic sugar for the equivalent select 
statement. This might even make sense as a cqlsh thing (rather than a CQL one).

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320968#comment-15320968
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

bq. That certainly doesn't prevent us for supporting SHOW VARIABLES as 
syntactic sugar for the equivalent select statement. This might even make sense 
as a cqlsh thing (rather than a CQL one).

Fair enough. I'm just questioning if there are any important use cases outside 
of configuration that we actually need a generalized virtual table mechanism at 
all. If we don't, then we might as well just add special purpose 
{{SHOW}}/{{SET}} only and call it a day. If we do, then {{SHOW}}/{{SET}} in 
cqlsh on top of a virtual table wfm.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320979#comment-15320979
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

My overall concern after reading through the comments again is premature 
overgeneralisation, which is something I'm strongly against. Need more specific 
use cases before we generalise.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320976#comment-15320976
 ] 

 Brian Hess commented on CASSANDRA-7622:


+1

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320996#comment-15320996
 ] 

Jeremiah Jordan commented on CASSANDRA-7622:


Another thing to keep in mind, if we are not going to do the CQL based virtual 
table thing and only do a system version, we need to make sure that things like 
custom secondary indexes can expose stuff through this, the current 
CassandraMetricsRegistry lets them do this fairly easily.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321005#comment-15321005
 ] 

Robert Stupp commented on CASSANDRA-7622:
-

bq. If true this is something that needs to be changed, as it affects reading 
system tables.

Nothing to be fixed. There's a whitelist policy on the cluster object (wrongly 
recalled it's just an initial contact point).

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-08 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321103#comment-15321103
 ] 

Chris Lohfink commented on CASSANDRA-7622:
--

There is no {{WHERE node='1.2.3.4'}} for system.compaction_history table and 
people get along just fine. Why have routing at all? It also adds a lot of 
complexity and possibly instability. Local only would be easy for operations to 
be accessed just fine from cqlsh or the driver (ie driver can check load or 
latency to weigh coordinators in load balancing policy) or a careful user. The 
main use case is viewing (possibly setting) configs and viewing metrics so 
system-like thing seems right to me.

We could take the metric attributes from the registry to create the table, and 
partition/composite key with meaningful names just fine. What i envisioned was:

{code}
cqlsh:system_metrics> describe keyspaces;

testks  system_schema  system_auth  system  system_distributed  system_traces  
system_metrics

cqlsh:system_metrics> describe tables;  # use the type attrib as the name, 
camelcase -> lowercased, underscored

thread_pools  table  storage  index dropped_message ...

cqlsh:system_metrics> expand on;
cqlsh:system_metrics> select * from table where keyspace = 'system' and scope = 
'paxos';

# considering just the metrics
# 
org.apache.cassandra.metrics:type=Table,keyspace=system,scope=paxos,name=BytesFlushed
# 
org.apache.cassandra.metrics:type=Table,keyspace=system,scope=paxos,name=WriteLatency

@ Row 1
-+--
 keyspace| system   # partition key
 scope   | paxos# composite
 bytes_flushed   | {'Count': 0}
 write_latency   | {'Count': 1, '999thPercentile': 1302, ...}
...

{code}

metrics that only have a scope will have that as the partition key alone (ie 
client_request table would have 'read' as a partition key and failures, 
latency, timouts, unavailable, total_latency as fields).

{code}
# org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency

cqlsh:system_metrics> select * from client_requests where scope = 'read';

@ Row 1
-+--
 scope | read# partition key
 failures  | {'Count': 0}
 latency   | {'Count': 1, '999thPercentile': 1302, ...}
...
{code}

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-06-08 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321127#comment-15321127
 ] 

Paulo Motta commented on CASSANDRA-9935:


not yet, I will try to find out and update here later.

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Paulo Motta
> Fix For: 2.1.15, 3.6, 3.0.6, 2.2.7
>
> Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range 
> (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range 
> (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range 
> (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - 
> Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range 
> (5765414319217852786,5781018794516851576] failed with error 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
> [na:1.7.0_80]
> at java.util.concurrent.FutureTask.get(FutureTask.java:188) 
> [na:1.7.0_80]
> at 
> org.apache.cassandra.service

[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321131#comment-15321131
 ] 

Jonathan Ellis commented on CASSANDRA-8700:
---

I've mirrored the outline above to a gdoc so we can coordinate work more 
easily.  Just put your name down by a section you plan to tackle, like (Alex), 
then post a patch.  Let's standardize on Markdown for now (Sylvain's choice 
after being disenchanted with Textile for the CQL doc); we can convert to other 
formats easily enough later if necessary.

I'm slightly leery of posting a world-editable gdoc here where it can get 
crawled, but I've posted it to IRC.  Ping me or Sylvain if you need the link.  
Or you can just post a comment here saying what section you want to work on.

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11962) Non-fatal NPE during concurrent repair check

2016-06-08 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta reassigned CASSANDRA-11962:
---

Assignee: Paulo Motta

> Non-fatal NPE during concurrent repair check
> 
>
> Key: CASSANDRA-11962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11962
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Paulo Motta
>Priority: Minor
>
> "Usual" checks for multiple repairs result in this exception in the log file:
> {code}
> ERROR [ValidationExecutor:6] 2016-06-06 07:56:23,530 CassandraDaemon.java:222 
> - Exception in thread Thread[ValidationExecutor:6,1,main]
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> at 
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1325)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1215)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:81)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$11.call(CompactionManager.java:844)
>  ~[main/:na]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
> {code}
> However, I saw this one:
> {code}
> ERROR [ValidationExecutor:6] 2016-06-06 07:56:25,002 CassandraDaemon.java:222 
> - Exception in thread Thread[ValidationExecutor:6,1,main]
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.getActiveSSTables(ActiveRepairService.java:495)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.access$300(ActiveRepairService.java:451)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.ActiveRepairService.currentlyRepairing(ActiveRepairService.java:338)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1320)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1215)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:81)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$11.call(CompactionManager.java:844)
>  ~[main/:na]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
> {code}
> Looks like there is no entry for {{cfId}} in {{getActiveSStables}} at {{for 
> (SSTableReader sstable : columnFamilyStores.get(cfId).getSSTables())}}.
> (against trunk)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321144#comment-15321144
 ] 

Jon Haddad commented on CASSANDRA-8700:
---

I'd like to suggest we use Sphinx and Restructured Text instead of vanilla 
Markdown.  Sphinx provides a lot of really useful plugins out of the box (like 
table of contents) plus various themes, can export a variety of formats (PDF / 
HTML / epub) and there's nice third party plugins like network & block diagrams.

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321147#comment-15321147
 ] 

Sylvain Lebresne commented on CASSANDRA-8700:
-

bq.  Let's standardize on Markdown for now (Sylvain's choice after being 
disenchanted with Textile for the CQL doc); we can convert to other formats 
easily enough later if necessary.

To be clear, I'm in the process of evaluating different options (of which there 
is tons of) and I'll update this ticket (probably next week) with findings, 
after which we can decide what we want to use. But we shouldn't block writing 
the content on that, so let's just do markdown for now and if we decide for a 
different format later, I'll just convert everything we have.

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321161#comment-15321161
 ] 

Sylvain Lebresne commented on CASSANDRA-8700:
-

Btw, when you've writing a chapter you wanted to write, please just attach the 
markdown file to this ticket for now. I'll also deal with massing it all in a 
coherent form, review and get committed later.

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2016-06-08 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6936:
--
Assignee: Branimir Lambov

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: compaction, performance
> Fix For: 3.x
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2016-06-08 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-6936:
--
Fix Version/s: (was: 3.x)
   4.x

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: compaction, performance
> Fix For: 4.x
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321332#comment-15321332
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

Alright, happy to finally be able to write this. :) I'm attaching a v1 diff 
containing Birch!

h4. Why is it named Birch?
B+ Tree -> Trees that start with the letter B -> Birch... get it? haha...

h4. Description
Birch is a B+ish/inspired tree aimed at improving the performance of the 
SSTable index in Cassandra (especially with large partitions).

The existing implementation scales poorly with the size of the index/ row as 
the entire index must be deserialized onto the heap even to look for a single 
element. This puts significant pressure on the heap, where one read to a large 
partition will cause at the minimum a long painful CMS GC pause or -- in the 
worst case -- an OOM. 

The Birch implementation has a predictable fixed cost for reads at the expense 
of the additional on disk overhead for the tree itself -- with an 
implementation that is the same complexity O(log(n)) as the existing 
implementation. Every row added to the SSTable is also added to the primary 
index. If the size of the row is greater than 64kb we build an index (otherwise 
we just encode the position in the sstable for that row). All entries encoded 
into the index are page aligned and padded to the nearest boundary (4096 bytes 
by default). Every segment can be marked as either internally padded/aligned 
along a boundary or non-padded/aligned (up to 2GB). Birch indexes are aligned 
into 4096 byte nodes (both leaf and inner). Keys will be encoded inside the 
node itself, unless they exceed the size of the node/2. In that case, the size 
of the node/2 is encoded into the node itself and the offset of the remaining 
bytes in the overflow page is encoded. This enables predictable fixed 
performance of the tree, but accommodates variable length keys/elements.

h4. Notes on v1 of the diff (in no particular order)
 * I broke the changes into two logical parts: The first abstracts out the 
existing Index implementation and adds no new logic. The second includes a 
IndexedEntry implementation backed by a Birch tree.
 * The attached v1 patch is written for 2.1, I have already started rebasing 
the patch onto trunk and hope to finish that shortly and post a the trunk based 
patch
 * There's some high level Javadoc documentation in BirchWriter and 
PageAlignedWriter on the layout of the tree on disk, serialization and 
deserialization paths, and higher level goals of the classes
 * The next steps are to start getting feedback from reviews and the community. 
I also have profiled the tree itself but profiling the tree integrated into the 
stack and optimizing non-performant code paths is next (after the immediate 
task to rebase the change onto trunk)
 * There are still a few todo's I've left in regards to handling backwards 
compatibility, parts of the code I expect might be non-performant, and things 
I'd like to discuss on the "correct" implementation/behavior etc
 * I have a few unit tests that still don't pass and still need to be root 
caused... I've taken the approach this entire time that the unit tests 
shouldn't be touched to pass, so there is still a few behavioral regressions 
I've accidentally introduced. The current failing tests are: 
 ** AutoSavingCacheTest
 ** SecondaryIndexTest
 ** BatchlogManagerTest
 ** KeyCacheTest
 ** ScrubTest
 ** IndexSummaryManagerTest
 ** LegacySSTableTest
 ** MultiSliceTest
 * I need to write a unit test to test reading the legacy/existing primary 
index implementation
 * By the nature of the index's role in the database, the unit test coverage is 
actually pretty extensive as any read and write touches the index in some 
capacity

I'll be giving a talk at NGCC tomorrow (Thursday the 9th) to go over the high 
level design I ended up with and considerations I had to take into account once 
I actually got deep inside this part of the code.

Looking forward to feedback!

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-9754:

Status: Patch Available  (was: Open)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-8700:

Attachment: Installation.md

Attaching [^installation.md].

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: Installation.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-9754:

Status: Open  (was: Patch Available)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321335#comment-15321335
 ] 

Stefania edited comment on CASSANDRA-8700 at 6/8/16 8:01 PM:
-

Attaching [^Installation.md].


was (Author: stefania):
Attaching [^installation.md].

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: Installation.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321338#comment-15321338
 ] 

Jonathan Ellis commented on CASSANDRA-9754:
---

Delighted to see this patch land, looking forward to getting it merged!

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-9754:

Attachment: 9754_part1-v1.diff

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Attachments: 9754_part1-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11913) BufferUnderFlowException in CompressorTest

2016-06-08 Thread Rei Odaira (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321348#comment-15321348
 ] 

Rei Odaira commented on CASSANDRA-11913:


Yes, I agree.  That would be enough.

> BufferUnderFlowException in CompressorTest
> --
>
> Key: CASSANDRA-11913
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11913
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
> Environment: Non-x86 environments
>Reporter: Rei Odaira
>Assignee: Rei Odaira
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11913-2.2.txt
>
>
> org.apache.cassandra.io.compress.CompressorTest causes 
> java.nio.BufferUnderflowException on environments where FastByteOperations 
> uses PureJavaOperations. The root cause is that 
> CompressorTest.testArrayUncompress() copies data from a ByteBuffer to a byte 
> array beyond the limit of the ByteBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321353#comment-15321353
 ] 

Michael Kjellman commented on CASSANDRA-9754:
-

TIL: Attempting to upload to Jira via the slow and overpriced Gogo in-flight 
wifi doesn't work... "Cannot attach file 9754_part2-v1.diff: Unable to 
communicate with JIRA." Working on it.. :) 

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Attachments: 9754_part1-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-8700:

Attachment: drivers_list.md

Attaching [~drivers_list.md].

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: Installation.md, drivers_list.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321424#comment-15321424
 ] 

Stefania edited comment on CASSANDRA-8700 at 6/8/16 8:54 PM:
-

Attaching [^drivers_list.md].


was (Author: stefania):
Attaching [~drivers_list.md].

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: Installation.md, drivers_list.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-8700:

Attachment: installation.md

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: drivers_list.md, installation.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-8700:

Attachment: (was: Installation.md)

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: drivers_list.md, installation.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8700) replace the wiki with docs in the git repo

2016-06-08 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321335#comment-15321335
 ] 

Stefania edited comment on CASSANDRA-8700 at 6/8/16 8:55 PM:
-

Attaching [^installation.md].


was (Author: stefania):
Attaching [^Installation.md].

> replace the wiki with docs in the git repo
> --
>
> Key: CASSANDRA-8700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8700
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Sylvain Lebresne
>Priority: Minor
> Attachments: drivers_list.md, installation.md
>
>
> The wiki as it stands is pretty terrible.  It takes several minutes to apply 
> a single update, and as a result, it's almost never updated.  The information 
> there has very little context as to what version it applies to.  Most people 
> I've talked to that try to use the information they find there find it is 
> more confusing than helpful.
> I'd like to propose that instead of using the wiki, the doc directory in the 
> cassandra repo be used for docs (already used for CQL3 spec) in a format that 
> can be built to a variety of output formats like HTML / epub / etc.  I won't 
> start the bikeshedding on which markup format is preferable - but there are 
> several options that can work perfectly fine.  I've personally use sphinx w/ 
> restructured text, and markdown.  Both can build easily and as an added bonus 
> be pushed to readthedocs (or something similar) automatically.  For an 
> example, see cqlengine's documentation, which I think is already 
> significantly better than the wiki: 
> http://cqlengine.readthedocs.org/en/latest/
> In addition to being overall easier to maintain, putting the documentation in 
> the git repo adds context, since it evolves with the versions of Cassandra.
> If the wiki were kept even remotely up to date, I wouldn't bother with this, 
> but not having at least some basic documentation in the repo, or anywhere 
> associated with the project, is frustrating.
> For reference, the last 3 updates were:
> 1/15/15 - updating committers list
> 1/08/15 - updating contributers and how to contribute
> 12/16/14 - added a link to CQL docs from wiki frontpage (by me)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2016-06-08 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov reassigned CASSANDRA-10857:
---

Assignee: Alex Petrov

> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
> Fix For: 3.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-06-08 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321446#comment-15321446
 ] 

Branimir Lambov commented on CASSANDRA-11349:
-

Does this really solve the problem with the test you mentioned? Putting the 
tombstones through {{RangeTombstoneList}} will normalize them, but they may not 
be issued in the right position, i.e. the RTL solution only works if the data 
contains only tombstones. For example, the 
{{\["b:d:\!","b:\!",1463656272792,"t",1463731877\]}} part from the test above 
gets issued before a potential token that may come before {{b:d:!}}.

The test needs to be extended to include live tokens, for example by adding 
each of
{code}
INSERT INTO table1 (c1, c2, c3, c4) VALUES ('b', 'b', 'a', 1)
{code}
or
{code}
INSERT INTO table1 (c1, c2, c3, c4) VALUES ('b', 'd', 'a', 1)
{code}
or
{code}
INSERT INTO table1 (c1, c2, c3, c4) VALUES ('b', 'e', 'a', 1)
{code}
after the deletions.

The RTL solution will break (in different ways) for at least two of the above. 
It also has performance implications that I am not really happy to take. A 
proper solution is to either fully replicate what RTL does in the tombstone 
tracker (which may be not be worth it so late in the lifespan of 2.1 and 2.2), 
or make the tombstone tracker wrap around an RTL (which may be inefficient and 
is still somewhat tricky).

If (as Fabien's testing seems to imply) doing the digest update as 
serialization solves the majority of the differences and repair pain, I would 
prefer to stop there.

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 
> 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-08 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321454#comment-15321454
 ] 

Alex Petrov commented on CASSANDRA-11182:
-

I've looked at how it could possibly implemented. `CONTAINS` itself is fairly 
simple as it has similar semantics to `EQ`. I've implemented a rough prototype 
that would index and flush collections, which was touching a lot of SASI but 
it's mostly adding the {{target}} everywhere it's not currently passed and 
moving away from reading the value from {{Cell#value}}, and going picking up 
the right thing depending on the context.

The only possible performance implication is the fact that we still do 
post-filtering. Since the index is storing the partition key position, we check 
if the row from the row cluster satisfies all conditions with 
{{Operation#localSatisfiedBy}}. Having to iterate through all the cells in the 
collection might be quite costly. After talking with [~beobal] briefly about 
it, the one possible way to implement it is to add offsets to the concrete 
rows, although that would also mean a larger change.

cc [~xedin] 

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11981) Cassandra 2.2 -> 3.5 upgradesstables results in error: null

2016-06-08 Thread Victor Trac (JIRA)
Victor Trac created CASSANDRA-11981:
---

 Summary: Cassandra 2.2 -> 3.5 upgradesstables results in error: 
null 
 Key: CASSANDRA-11981
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11981
 Project: Cassandra
  Issue Type: Bug
 Environment: # java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

# uname -a
Linux cassandra-dfs-10-10-160-19 4.4.11-23.53.amzn1.x86_64 #1 SMP Wed Jun 1 
22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa datastax-ddc
datastax-ddc-3.5.0-1.noarch

Reporter: Victor Trac


We upgraded from Cassandra 2.2.1 -> 3.5 on a 6-node cluster. I ran 'nodetool 
upgradesstables' on all nodes successfully except for one. It would take a 2-3 
days ('nodetool netstats` showed continual progress), but then the server would 
error out with a java.io.EOFException. After trying upgradesstables several 
times, I tried a scrub. Again, it ran for a couple of days and then errored out 
with a very similar error:
{code}
# nodetool scrub
error: null
-- StackTrace --
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at 
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:215)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown 
Source)
at 
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
at 
javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
at com.sun.proxy.$Proxy7.scrub(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:248)
at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:280)
at org.apache.cassandra.tools.nodetool.Scrub.execute(Scrub.java:66)
at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:250)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:164)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11981) Cassandra 2.2 -> 3.5 upgradesstables results in error: null

2016-06-08 Thread Victor Trac (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Trac updated CASSANDRA-11981:

Environment: 
{code}
# java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

# uname -a
Linux cassandra-dfs-10-10-160-19 4.4.11-23.53.amzn1.x86_64 #1 SMP Wed Jun 1 
22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa datastax-ddc
datastax-ddc-3.5.0-1.noarch
{code}

  was:
# java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

# uname -a
Linux cassandra-dfs-10-10-160-19 4.4.11-23.53.amzn1.x86_64 #1 SMP Wed Jun 1 
22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa datastax-ddc
datastax-ddc-3.5.0-1.noarch



> Cassandra 2.2 -> 3.5 upgradesstables results in error: null 
> 
>
> Key: CASSANDRA-11981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11981
> Project: Cassandra
>  Issue Type: Bug
> Environment: {code}
> # java -version
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> # uname -a
> Linux cassandra-dfs-10-10-160-19 4.4.11-23.53.amzn1.x86_64 #1 SMP Wed Jun 1 
> 22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> # rpm -qa datastax-ddc
> datastax-ddc-3.5.0-1.noarch
> {code}
>Reporter: Victor Trac
>
> We upgraded from Cassandra 2.2.1 -> 3.5 on a 6-node cluster. I ran 'nodetool 
> upgradesstables' on all nodes successfully except for one. It would take a 
> 2-3 days ('nodetool netstats` showed continual progress), but then the server 
> would error out with a java.io.EOFException. After trying upgradesstables 
> several times, I tried a scrub. Again, it ran for a couple of days and then 
> errored out with a very similar error:
> {code}
> # nodetool scrub
> error: null
> -- StackTrace --
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:215)
> at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
> at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown 
> Source)
> at 
> javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
> at 
> javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
> at com.sun.proxy.$Proxy7.scrub(Unknown Source)
> at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:248)
> at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:280)
> at org.apache.cassandra.tools.nodetool.Scrub.execute(Scrub.java:66)
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:250)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-06-08 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-9754:

Attachment: 9754_part2-v1.diff

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11933) Improve Repair performance

2016-06-08 Thread Mahdi Mohammadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahdi Mohammadi updated CASSANDRA-11933:

Status: Patch Available  (was: In Progress)

> Improve Repair performance
> --
>
> Key: CASSANDRA-11933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Cyril Scetbon
>Assignee: Mahdi Mohammadi
>
> During  a full repair on a ~ 60 nodes cluster, I've been able to see that 
> this stage can be significant (up to 60 percent of the whole time) :
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
> It's merely caused by the fact that 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
>  calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it 
> takes more than 99% of the time. This call takes 600ms when there is no load 
> on the cluster and more if there is. So for 10k ranges, you can imagine that 
> it takes at least 1.5 hours just to compute ranges. 
> Underneath it calls 
> [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
>  which can get pretty inefficient ([~jbellis]'s 
> [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165])
> *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
> hours on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11933) Improve Repair performance

2016-06-08 Thread Mahdi Mohammadi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321471#comment-15321471
 ] 

Mahdi Mohammadi commented on CASSANDRA-11933:
-

[~pauloricardomg] Would you please setup CI for my branches?

> Improve Repair performance
> --
>
> Key: CASSANDRA-11933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Cyril Scetbon
>Assignee: Mahdi Mohammadi
>
> During  a full repair on a ~ 60 nodes cluster, I've been able to see that 
> this stage can be significant (up to 60 percent of the whole time) :
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
> It's merely caused by the fact that 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
>  calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it 
> takes more than 99% of the time. This call takes 600ms when there is no load 
> on the cluster and more if there is. So for 10k ranges, you can imagine that 
> it takes at least 1.5 hours just to compute ranges. 
> Underneath it calls 
> [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
>  which can get pretty inefficient ([~jbellis]'s 
> [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165])
> *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
> hours on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11982) Cassandra 3.5 cluster join fails

2016-06-08 Thread Victor Trac (JIRA)
Victor Trac created CASSANDRA-11982:
---

 Summary: Cassandra 3.5 cluster join fails
 Key: CASSANDRA-11982
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11982
 Project: Cassandra
  Issue Type: Bug
 Environment: # java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

# uname -a
Linux cassandra-dfs-10-10-160-19 4.4.11-23.53.amzn1.x86_64 #1 SMP Wed Jun 1 
22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa datastax-ddc
datastax-ddc-3.5.0-1.noarch

Reporter: Victor Trac


In an effort to work around a failing upgradesstables/scrub 
(https://issues.apache.org/jira/browse/CASSANDRA-11981), I force-removed a node 
from the cluster, deleted its data directory, and tried to join a node to the 
cluster anew. After a few hours, the bootstrap thread failed:

cassandra.log
{code}
INFO  18:56:24 Redistributing index summaries
INFO  19:56:23 Saved KeyCache (17 items) in 386 ms
INFO  19:56:24 Redistributing index summaries
ERROR 20:40:27 [Stream #b1e4d290-2d91-11e6-8904-df8dbad02c2a] Remote peer 
10.10.160.18 failed stream session.
INFO  20:40:27 [Stream #b1e4d290-2d91-11e6-8904-df8dbad02c2a] Session with 
/10.10.160.18 is complete
WARN  20:40:27 [Stream #b1e4d290-2d91-11e6-8904-df8dbad02c2a] Stream failed
ERROR 20:40:27 Error while waiting on bootstrap to complete. Bootstrap will 
have to be restarted.
java.util.concurrent.ExecutionException: 
org.apache.cassandra.streaming.StreamException: Stream failed
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
 ~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
 ~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[guava-18.0.jar:na]
at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1387) 
[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:974)
 [apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:748) 
[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:613) 
[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:349) 
[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) 
[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:680) 
[apache-cassandra-3.5.0.jar:3.5.0]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at 
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
 ~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
 ~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) 
~[guava-18.0.jar:na]
at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
 ~[guava-18.0.jar:na]
at 
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:213)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:189)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:636)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:489)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
 ~[apache-cassandra-3.5.0.jar:3.5.0]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
Jun 08, 2016 8:40:27 PM com.google.common.util.concurrent.ExecutionList 
executeListener
SEVERE: RuntimeException while executing runnable 
com.google.common.util.concurrent.Futures$6@3592d1b5 with executor INSTANCE
java.lang.NullPointerException
at 
org.apache.cassandra.service.StorageService$2.onFailure(StorageService.java:1382)
at com.google.common.util.concurrent.Futures$

[jira] [Commented] (CASSANDRA-11719) Add bind variables to trace

2016-06-08 Thread Mahdi Mohammadi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321506#comment-15321506
 ] 

Mahdi Mohammadi commented on CASSANDRA-11719:
-

Branch for trunk: 
[branch|https://github.com/mm-binary/cassandra/tree/11719-trunk]


> Add bind variables to trace
> ---
>
> Key: CASSANDRA-11719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11719
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Mahdi Mohammadi
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 11719-trunk.patch
>
>
> {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a 
> _TODO_ saying "we don't have [typed] access to CQL bind variables here".
> In fact, we now have access typed access to CQL bind variables there. So, it 
> is now possible to show the bind variables in the trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11719) Add bind variables to trace

2016-06-08 Thread Mahdi Mohammadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahdi Mohammadi updated CASSANDRA-11719:

Status: Patch Available  (was: In Progress)

> Add bind variables to trace
> ---
>
> Key: CASSANDRA-11719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11719
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Mahdi Mohammadi
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 11719-trunk.patch
>
>
> {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a 
> _TODO_ saying "we don't have [typed] access to CQL bind variables here".
> In fact, we now have access typed access to CQL bind variables there. So, it 
> is now possible to show the bind variables in the trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11719) Add bind variables to trace

2016-06-08 Thread Mahdi Mohammadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahdi Mohammadi updated CASSANDRA-11719:

Attachment: (was: 11719-trunk.patch)

> Add bind variables to trace
> ---
>
> Key: CASSANDRA-11719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11719
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Mahdi Mohammadi
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a 
> _TODO_ saying "we don't have [typed] access to CQL bind variables here".
> In fact, we now have access typed access to CQL bind variables there. So, it 
> is now possible to show the bind variables in the trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11983) Migration task failed to complete

2016-06-08 Thread Chris Love (JIRA)
Chris Love created CASSANDRA-11983:
--

 Summary: Migration task failed to complete
 Key: CASSANDRA-11983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11983
 Project: Cassandra
  Issue Type: Bug
  Components: Lifecycle
 Environment: Docker / Kubernetes running
Linux cassandra-21 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) 
x86_64 GNU/Linux
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
Cassnadra 3.5 installed from 
deb-src http://www.apache.org/dist/cassandra/debian 35x main
Reporter: Chris Love


When nodes are boostrapping I am getting mulitple errors: "Migration task 
failed to complete", from MigrationManager.java

The errors increase as more nodes are added to the ring, as I am creating a 
ring of 1k nodes.

Cassandra yaml i here 
https://github.com/k8s-for-greeks/gpmr/blob/3d50ff91a139b9c4a7a26eda0fb4dcf9a008fbed/pet-race-devops/docker/cassandra-debian/files/cassandra.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-08 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321703#comment-15321703
 ] 

Chen Shen commented on CASSANDRA-10862:
---

[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.
So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 
Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/5e0c5b1da83ae7f2d2ccc382fd69c438637b2772
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-08 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321719#comment-15321719
 ] 

Pavel Yaskevich commented on CASSANDRA-11182:
-

I agree with [~beobal] on that, effectively the most important thing we need to 
enable indexing for collections and partition keys is TokenTree which accepts 
variable size keys (such would enable different parititoners, collections 
support, primary key indexing etc.), once that's done all of the changes are 
going to be pretty straight-forward.

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-08 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321703#comment-15321703
 ] 

Chen Shen edited comment on CASSANDRA-10862 at 6/9/16 12:47 AM:


[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.

So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 

Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/f49013897b1694e006e001df97c6f34399d016ae
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 


was (Author: scv...@gmail.com):
[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.
So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 
Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/5e0c5b1da83ae7f2d2ccc382fd69c438637b2772
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11984) StorageService shutdown hook should use a volatile variable

2016-06-08 Thread Edward Capriolo (JIRA)
Edward Capriolo created CASSANDRA-11984:
---

 Summary: StorageService shutdown hook should use a volatile 
variable
 Key: CASSANDRA-11984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11984
 Project: Cassandra
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 3.8


In StorageService.java there is a variable accessed from other threads that is 
not marked volatile.

{noformat}
  private boolean inShutdownHook = false;
  public boolean isInShutdownHook()
   {
   return inShutdownHook;
   }
  drainOnShutdown = new Thread(new WrappedRunnable()
   {
   @Override
   public void runMayThrow() throws InterruptedException
   {
   inShutdownHook = true;
{noformat}

This is called from at least here:
{noformat}
./src/java/org/apache/cassandra/concurrent/DebuggableScheduledThreadPoolExecutor.java:
if (!StorageService.instance.isInShutdownHook())
{noformat}

This could cause issues in controlled shutdown like drain commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11984) StorageService shutdown hook should use a volatile variable

2016-06-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321756#comment-15321756
 ] 

Edward Capriolo commented on CASSANDRA-11984:
-

https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:CASSANDRA-11984?expand=1

> StorageService shutdown hook should use a volatile variable
> ---
>
> Key: CASSANDRA-11984
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11984
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 3.8
>
>
> In StorageService.java there is a variable accessed from other threads that 
> is not marked volatile.
> {noformat}
>   private boolean inShutdownHook = false;
>   public boolean isInShutdownHook()
>{
>return inShutdownHook;
>}
>   drainOnShutdown = new Thread(new WrappedRunnable()
>{
>@Override
>public void runMayThrow() throws InterruptedException
>{
>inShutdownHook = true;
> {noformat}
> This is called from at least here:
> {noformat}
> ./src/java/org/apache/cassandra/concurrent/DebuggableScheduledThreadPoolExecutor.java:
> if (!StorageService.instance.isInShutdownHook())
> {noformat}
> This could cause issues in controlled shutdown like drain commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11984) StorageService shutdown hook should use a volatile variable

2016-06-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-11984:

Status: Patch Available  (was: Open)

> StorageService shutdown hook should use a volatile variable
> ---
>
> Key: CASSANDRA-11984
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11984
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 3.8
>
>
> In StorageService.java there is a variable accessed from other threads that 
> is not marked volatile.
> {noformat}
>   private boolean inShutdownHook = false;
>   public boolean isInShutdownHook()
>{
>return inShutdownHook;
>}
>   drainOnShutdown = new Thread(new WrappedRunnable()
>{
>@Override
>public void runMayThrow() throws InterruptedException
>{
>inShutdownHook = true;
> {noformat}
> This is called from at least here:
> {noformat}
> ./src/java/org/apache/cassandra/concurrent/DebuggableScheduledThreadPoolExecutor.java:
> if (!StorageService.instance.isInShutdownHook())
> {noformat}
> This could cause issues in controlled shutdown like drain commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2016-06-08 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321773#comment-15321773
 ] 

Dikang Gu commented on CASSANDRA-6908:
--

We had a similar problem and we worked around it by setting the 
dynamic_snitch_badness_threshold to be 50, which dropped the P99 latency by 10X.


> Dynamic endpoint snitch destabilizes cluster under heavy load
> -
>
> Key: CASSANDRA-6908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Bartłomiej Romański
>Assignee: Brandon Williams
> Attachments: as-dynamic-snitch-disabled.png
>
>
> We observe that with dynamic snitch disabled our cluster is much more stable 
> than with dynamic snitch enabled.
> We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB 
> RAM, 2x480 GB SSD). We mostly do reads (about 300k/s).
> We use Astyanax on client side with TOKEN_AWARE option enabled. It 
> automatically direct read queries to one of the nodes responsible the given 
> token.
> In that case with dynamic snitch disabled Cassandra always handles read 
> locally. With dynamic snitch enabled Cassandra very often decides to proxy 
> the read to some other node. This causes much higher CPU usage and produces 
> much more garbage what results in more often GC pauses (young generation 
> fills up quicker). By "much higher" and "much more" I mean 1.5-2x.
> I'm aware that higher dynamic_snitch_badness_threshold value should solve 
> that issue. The default value is 0.1. I've looked at scores exposed in JMX 
> and the problem is that our values seemed to be completely random. They are 
> between usually 0.5 and 2.0, but changes randomly every time I hit refresh.
> Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something 
> like that, but the result will be similar to simply disabling the dynamic 
> switch at all (that's what we done).
> I've tried to understand what's the logic behind these scores and I'm not 
> sure if I get the idea...
> It's a sum (without any multipliers) of two components:
> - ratio of recent given node latency to recent average node latency
> - something called 'severity', what, if I analyzed the code correctly, is a 
> result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" 
> CPU time to the whole CPU time as reported in /proc/stats (the ratio is 
> multiplied by 100)
> In our case the second value is something around 0-2% but varies quite 
> heavily every second.
> What's the idea behind simply adding this two values without any multipliers 
> (e.g the second one is in percentage while the first one is not)? Are we sure 
> this is the best possible way of calculating the final score?
> Is there a way too force Cassandra to use (much) longer samples? In our case 
> we probably need that to get stable values. The 'severity' is calculated for 
> each second. The mean latency is calculated based on some magic, hardcoded 
> values (ALPHA = 0.75, WINDOW_SIZE = 100). 
> Am I right that there's no way to tune that without hacking the code?
> I'm aware that there's dynamic_snitch_update_interval_in_ms property in the 
> config file, but that only determines how often the scores are recalculated 
> not how long samples are taken. Is that correct?
> To sum up, It would be really nice to have more control over dynamic snitch 
> behavior or at least have the official option to disable it described in the 
> default config file (it took me some time to discover that we can just 
> disable it instead of hacking with dynamic_snitch_badness_threshold=1000).
> Currently for some scenarios (like ours - optimized cluster, token aware 
> client, heavy load) it causes more harm than good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-08 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321703#comment-15321703
 ] 

Chen Shen edited comment on CASSANDRA-10862 at 6/9/16 2:03 AM:
---

[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.

So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 

Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/3b48c092a7381d3074086476b12570db9b16dc16
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 


was (Author: scv...@gmail.com):
[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.

So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 

Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/f49013897b1694e006e001df97c6f34399d016ae
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-08 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321816#comment-15321816
 ] 

Alex Petrov commented on CASSANDRA-11182:
-

Should we do this in the scope of this ticket or just create another one?

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-08 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321981#comment-15321981
 ] 

Pavel Yaskevich commented on CASSANDRA-11182:
-

I think such a change deserves it's own ticket :)

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11961) Nonfatal NPE in CompactionMetrics

2016-06-08 Thread Achal Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Achal Shah updated CASSANDRA-11961:
---
Status: Patch Available  (was: Open)

Hi, this is my first patch to Cassandra; apologies if I've messed up in any 
way. Please let me know if there are any improvements I can make!

I've pushed a branch with my changes to github: 
https://github.com/achals/cassandra/tree/11961-trunk

I looked for tests but there didn't seem to be any for this class.

> Nonfatal NPE in CompactionMetrics
> -
>
> Key: CASSANDRA-11961
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11961
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Priority: Minor
>  Labels: lhf
>
> Just saw the following NPE on trunk. Means, that {{metaData}} from 
> {{CFMetaData metaData = compaction.getCompactionInfo().getCFMetaData();}} is 
> {{null}}. A simple {{if (metaData == null) continue;}} should fix this.
> {code}
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.cassandra.metrics.CompactionMetrics$2.getValue(CompactionMetrics.java:103)
>  ~[main/:na]
>   at 
> org.apache.cassandra.metrics.CompactionMetrics$2.getValue(CompactionMetrics.java:78)
>  ~[main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)