[jira] [Updated] (CASSANDRA-7902) Introduce the ability to ignore RR based on consistencfy

2015-03-07 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-7902:
--
Labels: docs-impacting  (was: )

 Introduce the ability to ignore RR based on consistencfy
 

 Key: CASSANDRA-7902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7902
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
  Labels: docs-impacting

 There exists a case for LOCAL_* consistency levels where you really want them 
 *local only*.  This implies that you don't ever want to do cross-dc RR, but 
 do for other levels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8670) Large columns + NIO memory pooling causes excessive direct memory usage

2015-03-07 Thread Evin Callahan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351742#comment-14351742
 ] 

Evin Callahan commented on CASSANDRA-8670:
--

What's the path forward on this?

 Large columns + NIO memory pooling causes excessive direct memory usage
 ---

 Key: CASSANDRA-8670
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8670
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg

 If you provide a large byte array to NIO and ask it to populate the byte 
 array from a socket it will allocate a thread local byte buffer that is the 
 size of the requested read no matter how large it is. Old IO wraps new IO for 
 sockets (but not files) so old IO is effected as well.
 Even If you are using Buffered{Input | Output}Stream you can end up passing a 
 large byte array to NIO. The byte array read method will pass the array to 
 NIO directly if it is larger then the internal buffer.  
 Passing large cells between nodes as part of intra-cluster messaging can 
 cause the NIO pooled buffers to quickly reach a high watermark and stay 
 there. This ends up costing 2x the largest cell size because there is a 
 buffer for input and output since they are different threads. This is further 
 multiplied by the number of nodes in the cluster - 1 since each has a 
 dedicated thread pair with separate thread locals.
 Anecdotally it appears that the cost is doubled beyond that although it isn't 
 clear why. Possibly the control connections or possibly there is some way in 
 which multiple 
 Need a workload in CI that tests the advertised limits of cells on a cluster. 
 It would be reasonable to ratchet down the max direct memory for the test to 
 trigger failures if a memory pooling issue is introduced. I don't think we 
 need to test concurrently pulling in a lot of them, but it should at least 
 work serially.
 The obvious fix to address this issue would be to read in smaller chunks when 
 dealing with large values. I think small should still be relatively large (4 
 megabytes) so that code that is reading from a disk can amortize the cost of 
 a seek. It can be hard to tell what the underlying thing being read from is 
 going to be in some of the contexts where we might choose to implement 
 switching to reading chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-8880) Add metrics to monitor the amount of tombstones created

2015-03-07 Thread Lyuben Todorov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lyuben Todorov reassigned CASSANDRA-8880:
-

Assignee: Lyuben Todorov

 Add metrics to monitor the amount of tombstones created
 ---

 Key: CASSANDRA-8880
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8880
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michaël Figuière
Assignee: Lyuben Todorov
Priority: Minor

 AFAIK there's currently no way to monitor the amount of tombstones created on 
 a Cassandra node. CASSANDRA-6057 has made it possible for users to figure out 
 how many tombstones are scanned at read time, but in write mostly workloads, 
 it may not be possible to realize if some inappropriate queries are 
 generating too many tombstones.
 Therefore the following additional metrics should be added:
 * {{writtenCells}}: amount of cells that have been written
 * {{writtenTombstoneCells}}: amount of tombstone cells that have been written
 Alternatively these could be exposed as a single gauge such as 
 {{writtenTombstoneCellsRatio}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7902) Introduce the ability to ignore RR based on consistencfy

2015-03-07 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351588#comment-14351588
 ] 

Ryan Svihla commented on CASSANDRA-7902:


Agreed, local RR for local_* is a demonstration of the principle of least 
surprise if there ever was one. 

This will have some interesting ramifications for one scenario where someone 
uses local_* because of slow dc links, but doesn't run nodetool repair because 
of slow dc links, in theory global read repair firing may have given them their 
only chance at cross dc consistency (assuming the RR even succeeds). 

However, I'm not sure it's worth complicating everything for a scenario that 
involves lots of dubious decisions and will never work properly even with or 
without global RR.

 Introduce the ability to ignore RR based on consistencfy
 

 Key: CASSANDRA-7902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7902
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams

 There exists a case for LOCAL_* consistency levels where you really want them 
 *local only*.  This implies that you don't ever want to do cross-dc RR, but 
 do for other levels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7902) Introduce the ability to ignore RR based on consistencfy

2015-03-07 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351588#comment-14351588
 ] 

Ryan Svihla edited comment on CASSANDRA-7902 at 3/7/15 2:05 PM:


Agreed, local RR for local_* is a demonstration of the principle of least 
surprise if there ever was one. 

This will have some interesting ramifications for one scenario where someone 
uses local_* because of slow dc links, but doesn't run nodetool repair because 
of slow dc links, in theory global read repair firing may have given them their 
only chance at cross dc consistency (assuming the RR even succeeds). 

However, I'm not sure it's worth complicating everything for a scenario that 
involves lots of dubious decisions and will never work properly even with or 
without global RR (assuming they even know how to turn it on).


was (Author: rssvihla):
Agreed, local RR for local_* is a demonstration of the principle of least 
surprise if there ever was one. 

This will have some interesting ramifications for one scenario where someone 
uses local_* because of slow dc links, but doesn't run nodetool repair because 
of slow dc links, in theory global read repair firing may have given them their 
only chance at cross dc consistency (assuming the RR even succeeds). 

However, I'm not sure it's worth complicating everything for a scenario that 
involves lots of dubious decisions and will never work properly even with or 
without global RR.

 Introduce the ability to ignore RR based on consistencfy
 

 Key: CASSANDRA-7902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7902
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams

 There exists a case for LOCAL_* consistency levels where you really want them 
 *local only*.  This implies that you don't ever want to do cross-dc RR, but 
 do for other levels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8413) Bloom filter false positive ratio is not honoured

2015-03-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351543#comment-14351543
 ] 

Benedict commented on CASSANDRA-8413:
-

It occurs to me that this may be more significant still for LCS, since we 
explicitly narrow the space over which we operate. Obviously it's a bounded 
problem for LCS, but still I think we should simply regularize the bits 
over the known min/max of the sstable we're writing.

 Bloom filter false positive ratio is not honoured
 -

 Key: CASSANDRA-8413
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8413
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Aleksey Yeschenko
 Fix For: 2.1.4

 Attachments: 8413.hack.txt


 Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a 
 problem with sabotaging our bloom filters when using the murmur3 partitioner. 
 I have performed a very quick test to confirm this risk is real.
 Since a typical cluster uses the same murmur3 hash for partitioning as we do 
 for bloom filter lookups, and we own a contiguous range, we can guarantee 
 that the top X bits collide for all keys on the node. This translates into 
 poor bloom filter distribution. I quickly hacked LongBloomFilterTest to 
 simulate the problem, and the result in these tests is _up to_ a doubling of 
 the actual false positive ratio. The actual change will depend on the key 
 distribution, the number of keys, the false positive ratio, the number of 
 nodes, the token distribution, etc. But seems to be a real problem for 
 non-vnode clusters of at least ~128 nodes in size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8931) IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query

2015-03-07 Thread Benedict (JIRA)
Benedict created CASSANDRA-8931:
---

 Summary: IndexSummary (and Index) should store the token, and the 
minimal key to unambiguously direct a query
 Key: CASSANDRA-8931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8931
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict


Since these files are likely sticking around a little longer, it is probably 
worth optimising them. A relatively simple change to Index and IndexSummary 
could reduce the amount of space required significantly, reduce the CPU burden 
of lookup, and hopefully bound the amount of space needed as key size grows. On 
writing first we always store the token before the key (if it is different to 
the key); then we simply truncate the whole record to the minimum length 
necessary to answer an inequality search. Since the data file contains the key 
also, we can corroborate we have the right key once we've looked up. Since BFs 
are used to reduce unnecessary lookups, we don't save much by ruling the false 
positives out one step earlier. 

 An improved follow up version would be to use a trie of shortest length to 
answer inequality lookups, as this would also ensure very long keys with common 
prefixes would not significantly increase the size of the index or summary. 
This would translate to a trie index for the summary keying into a static trie 
page for the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7220) Nodes hang with 100% CPU load

2015-03-07 Thread Mikhail Stepura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351775#comment-14351775
 ] 

Mikhail Stepura commented on CASSANDRA-7220:


I've seen similar behavior because of CASSANDRA-8485, and there is ~20min delay 
in flushing in the logs
{code}
 INFO [OptionalTasks:1] 2014-08-17 23:13:53,303 MeteredFlusher.java (line 58) 
flushing high-traffic column family CFS(Keyspace='services', 
ColumnFamily='service_request_count_per_minute') (estimated 16777216 bytes)
..
 INFO [OptionalTasks:1] 2014-08-17 23:34:26,217 MeteredFlusher.java (line 58) 
flushing high-traffic column family CFS(Keyspace='services', 
ColumnFamily='service_request_count') (estimated 12582912 bytes)
{code}

 Nodes hang with 100% CPU load
 -

 Key: CASSANDRA-7220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: C* 2.0.7
 4 nodes cluster on 12 core machines
Reporter: Robert Stupp
 Attachments: c-12-read-100perc-cpu.zip, system.log


 I've ran a test that both reads and writes rows.
 After some time, all writes succeeded and all reads stopped.
 Two of the four nodes have 16 of 16 threads of the ReadStage thread pool 
 running. The number of pending task continuouly grows on these nodes.
 I have attached outputs of the stack traces and some diagnostic output from 
 nodetool tpstats
 nodetool status shows all nodes as UN.
 I had run that test previously without any issues in with the same 
 configuration.
 Some specials from cassandra.yaml:
 - key_cache_size_in_mb: 1024
 - row_cache_size_in_mb: 8192
 The nodes running at 100% CPU are node2 and node3. node1node4 are fine.
 I'm not sure if it is reproducable - but it's definitly not a good behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8880) Add metrics to monitor the amount of tombstones created

2015-03-07 Thread Lyuben Todorov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lyuben Todorov updated CASSANDRA-8880:
--
Attachment: cassandra-2.1-8880.patch

 Add metrics to monitor the amount of tombstones created
 ---

 Key: CASSANDRA-8880
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8880
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michaël Figuière
Assignee: Lyuben Todorov
Priority: Minor
  Labels: metrics
 Fix For: 2.1.4

 Attachments: cassandra-2.1-8880.patch


 AFAIK there's currently no way to monitor the amount of tombstones created on 
 a Cassandra node. CASSANDRA-6057 has made it possible for users to figure out 
 how many tombstones are scanned at read time, but in write mostly workloads, 
 it may not be possible to realize if some inappropriate queries are 
 generating too many tombstones.
 Therefore the following additional metrics should be added:
 * {{writtenCells}}: amount of cells that have been written
 * {{writtenTombstoneCells}}: amount of tombstone cells that have been written
 Alternatively these could be exposed as a single gauge such as 
 {{writtenTombstoneCellsRatio}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8670) Large columns + NIO memory pooling causes excessive direct memory usage

2015-03-07 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8670:
-
Fix Version/s: 3.0

 Large columns + NIO memory pooling causes excessive direct memory usage
 ---

 Key: CASSANDRA-8670
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8670
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0


 If you provide a large byte array to NIO and ask it to populate the byte 
 array from a socket it will allocate a thread local byte buffer that is the 
 size of the requested read no matter how large it is. Old IO wraps new IO for 
 sockets (but not files) so old IO is effected as well.
 Even If you are using Buffered{Input | Output}Stream you can end up passing a 
 large byte array to NIO. The byte array read method will pass the array to 
 NIO directly if it is larger then the internal buffer.  
 Passing large cells between nodes as part of intra-cluster messaging can 
 cause the NIO pooled buffers to quickly reach a high watermark and stay 
 there. This ends up costing 2x the largest cell size because there is a 
 buffer for input and output since they are different threads. This is further 
 multiplied by the number of nodes in the cluster - 1 since each has a 
 dedicated thread pair with separate thread locals.
 Anecdotally it appears that the cost is doubled beyond that although it isn't 
 clear why. Possibly the control connections or possibly there is some way in 
 which multiple 
 Need a workload in CI that tests the advertised limits of cells on a cluster. 
 It would be reasonable to ratchet down the max direct memory for the test to 
 trigger failures if a memory pooling issue is introduced. I don't think we 
 need to test concurrently pulling in a lot of them, but it should at least 
 work serially.
 The obvious fix to address this issue would be to read in smaller chunks when 
 dealing with large values. I think small should still be relatively large (4 
 megabytes) so that code that is reading from a disk can amortize the cost of 
 a seek. It can be hard to tell what the underlying thing being read from is 
 going to be in some of the contexts where we might choose to implement 
 switching to reading chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-07 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351923#comment-14351923
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Since [~benedict] pulled in CASSANDRA-8778 (thanks Benedict!), I've rebased to 
exclude that patch from this change set, added unit tests, and collapsed all 
requested changes into a single commit for easy merging. 

Find it attached, or online at: 
https://github.com/jeffjirsa/cassandra/compare/cassandra-5791.diff or 
https://github.com/jeffjirsa/cassandra/commit/89c1153def3f0ef0804d45883d12b09e04bb872d





 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-07 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-5791:
--
Attachment: cassandra-5791-patch-3.diff

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)