[jira] [Updated] (CASSANDRA-7871) Reduce compaction IO in LCS

2016-04-14 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-7871:
---
   Resolution: Duplicate
Fix Version/s: (was: 3.x)
   Status: Resolved  (was: Patch Available)

we have an updated patch for this in CASSANDRA-11550, tracking this there

> Reduce compaction IO in LCS 
> 
>
> Key: CASSANDRA-7871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7871
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Dan Hendry
>Assignee: Dan Hendry
> Attachments: LeveledCompactionImprovement-2.0.10.patch, 
> experiment.png, levelmultiplier.png, sstablesize.png
>
>
> I have found LCS to be superior to STCS in almost every way - except for the 
> fact that it requires significantly more IO (a well advertised property). In 
> leveled compaction, L ~n+1~ is 10 times larger than L ~n~ so generally 1+10 
> sstables need to be compacted to promote one sstable into the next level. For 
> certain workloads, this practically this means only 1/(10+1)=9% of the IO, 
> specifically write IO, is doing ‘useful’ work. 
> But why is each level 10 times larger? Why 10? Its a pretty looking number 
> and all but thats not a very good reason to choose it. If we chose 5 or even 
> 2 we could reduce the ‘wasted’ io required to promote an sstable to the next 
> level - of course at the expense of requiring more levels. I have not been 
> able to find justification for this choice in either cassandra or leveldb 
> itself. I would like to introduce a new parameter, the leveling multiplier, 
> which controls the desired size difference between L ~n~ and L ~n+1~.
> First and foremost, a little math. Lets assume we have a CF of a fixed size 
> that is receiving continuous new data (ie: data is expiring due to TTLs or is 
> being overwritten). I believe the number of levels required is approximately 
> (see note 1):
> {noformat}data size = (sstable size)*(leveling multiplier)^(level 
> count){noformat}
> Which, when solving for the level count, becomes:
> {noformat}level count = log((data size)/(sstable size))/log(leveling 
> multiplier){noformat}
> The amount of compaction write IO required over the lifetime of a particular 
> piece of data (excluding compactions in L0) is:
> {noformat}write IO = (flush IO) + (promotion IO)*(level count)
> write IO = 1 + (1 + (level multiplier))*log((data size)/(sstable 
> size))/log(leveling multiplier){noformat}
> So ultimately, the the relationship between write IO and the level multiplier 
> is f\(x) = (1 + x)/log\(x) which is optimal at 3.59, or 4 if we round to the 
> nearest integer. Also note that write IO is proportional to log((data 
> size)/(sstable size)) which suggests using larger sstables would also reduce 
> disk IO.
> As one final analytical step we can add the following term to approximate STC 
> in L0 (which is not actually how its implemented but should be close enough 
> for moderate sstable sizes):
> {noformat}L0 write IO = max(0, floor(log((sstable size)/(flush 
> size))/log(4))){noformat}
> The following two graphs illustrate the predicted compaction requirements as 
> a function of the leveling multiplier and sstable size:
> !levelmultiplier.png!!sstablesize.png!
> In terms of empirically verifying the expected results, I set up three 
> cassandra nodes, node A having a leveling multiplier of 10 and sstable size 
> if 160 MB (current cassandra defaults), node B with multiplier 4 and size 160 
> MB, and node C with multiplier 4 and size 1024 MB. I used a simple write only 
> workload which inserted data having a TTL of 2 days at 1 MB/second (see note 
> 2). Compaction throttling was disabled and gc_grace was 60 seconds. All nodes 
> had dedicated data disks and IO measurements were for the data disks only.
> !experiment.png!
> ||Measure||Node A (10, 160MB)||Node B (4, 160MB)||Node C (4, 1024MB)||
> |Predicted IO Rate|34.4 MB/s|26.2 MB/s|20.5 MB/s|
> |Predicted Improvement|n/a|23.8%|40.4%|
> |Predicted Number of Levels (Expected Dataset of 169 GB)|3.0|5.0|3.7|
> |Experimental IO Rate|32.0 MB/s|28.0 MB/s|20.4 MB/s|
> |Experimental Improvement|n/a|12.4%|*36.3%*|
> |Experimental Number of Levels|~4.1|~6.1|~4.8|
> |Final Dataset Size (After 88 hours)|301 GB|261 GB|258 GB|
> These results indicate that Node A performed better than expected, I suspect 
> that this was due to the fact that the data insertion rate was a little too 
> high and compaction periodically got backlogged meaning the promotion from L0 
> to L1 was more efficient. Also note that the actual dataset size is larger 
> than that used in the analytical model - which is expected as expired data 
> will not get purged immediately. The size difference between node A and the 
> others however seems suspicious to me.
> In summary, these results, both the

[jira] [Updated] (CASSANDRA-5977) Structure for cfstats output (JSON, YAML, or XML)

2016-04-14 Thread Shogo Hoshii (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shogo Hoshii updated CASSANDRA-5977:

Attachment: sample_result.zip
CASSANDRA-5977-trunk.patch

Thank you for quick reply.

I fixed the source code at your advice and attached the patch again.
I also changed the json/yaml format in which property names are structured 
small case characters and '_' like 'write_count'.
If there is better format,  please don't hesitate to tell me that.

> Structure for cfstats output (JSON, YAML, or XML)
> -
>
> Key: CASSANDRA-5977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5977
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Alyssa Kwan
>Assignee: Shogo Hoshii
>Priority: Minor
>  Labels: Tools
> Fix For: 3.x
>
> Attachments: CASSANDRA-5977-trunk.patch, CASSANDRA-5977-trunk.patch, 
> sample_result.zip, sample_result.zip, tablestats_sample_result.json, 
> tablestats_sample_result.txt, tablestats_sample_result.yaml, 
> trunk-tablestats.patch, trunk-tablestats.patch
>
>
> nodetool cfstats should take a --format arg that structures the output in 
> JSON, YAML, or XML.  This would be useful for piping into another script that 
> can easily parse this and act on it.  It would also help those of us who use 
> things like MCollective gather aggregate stats across clusters/nodes.
> Thoughts?  I can submit a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11573) cqlsh fails with undefined symbol: PyUnicodeUCS2_DecodeUTF8

2016-04-14 Thread Oli Schacher (JIRA)
Oli Schacher created CASSANDRA-11573:


 Summary: cqlsh fails with undefined symbol: 
PyUnicodeUCS2_DecodeUTF8
 Key: CASSANDRA-11573
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11573
 Project: Cassandra
  Issue Type: Bug
 Environment: centos 7, datastax ddc 3.5

installed according to 
http://docs.datastax.com/en/cassandra/3.x/cassandra/install/installRHEL.html

JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_77
Cassandra version: 3.5.0
Reporter: Oli Schacher


trying to run cqlsh produces:

{quote}
cqlsh
Traceback (most recent call last):
  File "/usr/bin/cqlsh.py", line 170, in 
from cqlshlib.copyutil import ExportTask, ImportTask
ImportError: /usr/lib/python2.7/site-packages/cqlshlib/copyutil.so: undefined 
symbol: PyUnicodeUCS2_DecodeUTF8
{quote}

with 3.4 the error does not happen.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240789#comment-15240789
 ] 

Branimir Lambov commented on CASSANDRA-11452:
-

bq. For this to be a problem it would have to be a collision on all 
hashes/buckets, ...
I don't consider this a real problem for page caches, but more generally the 
initial hash is an int, and in a 24/7, 50k ops/sec operation I wouldn't rely on 
not getting _any_ clashes. They will be very rare, granted, but should the 
scenario below realize, it would be quite a serious problem.

bq. Bear in mind also that - as opposed to LIRS which is an eviction strategy - 
LFU only prevents promotion to the LRU; the eviction strategy is still LRU, so 
it will not keep collisions, only fail to filter them before they hit the 
(main) cache.

This makes things even worse. Looking at the {{BoundedLocalCache}} code in 
detail, I can see that if a collision is to filter down to the head of the 
probation space queue, it will always be the eviction victim and (presuming the 
colliding entry remains hot) would thus win over every candidate coming from 
the eden space, at least in theory. As a result the cache is reduced to just 
the eden window, 1% of the space and plain LRU eviction. This _can_ happen in 
Caffeine, take a look at [this test|http://pastebin.com/dmxK9bFv].

There appears to be an easy fix, though. Changing the {{candidateFreq > 
victimFreq}} comparison in {{evictFromMain}} to non-strict to prefer the 
candidate on equality makes such stuck victims much easier to eject as the 
range of the counters is small and should be easily reachable for a hot 
candidate. [~ben.manes], is there a reason for the strictness of this 
comparison (the paper is very elusive on these choices)?


> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

2016-04-14 Thread Daniel Sand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240798#comment-15240798
 ] 

Daniel Sand commented on CASSANDRA-3486:


[~pauloricardomg] +1 for the feature -  really needed this feature when our 
cluster blew up this week :D

> Node Tool command to stop repair
> 
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
> Environment: JVM
>Reporter: Vijay
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: repair
> Fix For: 2.1.x
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov resolved CASSANDRA-11452.
-
Resolution: Fixed

Regardless of the above comment, Caffeine does fulfill the objective of this 
ticket and is working pretty well for the page cache.

I don't think it makes sense for me to invest any more time to implement an 
alternative (and using separate map per file does not make that much sense if 
the frequency sketch still needs a shared key), so the ticket is now resolved.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240823#comment-15240823
 ] 

Ben Manes commented on CASSANDRA-11452:
---

Neither the original nor revised papers are clear about that. At first I used 
{{>=}} assuming it would be better, but in small traces the change made a 
[large 
difference|https://github.com/ben-manes/caffeine/commit/3e83411c670ca61a859f0e1ed24e216b847ccd58].
 That was prior to the window, so it may be less impactful now. The small 
traces are also very specific patterns, whereas the larger ones reflect 
real-world use, so that change might be even less noticeable. Another option 
might be to recycle the victim in the probation space or degrade to {{>=}} if 
we detect the clash. So basically I'd need to analyze it and play with your 
test (thanks!) to figure out a good strategy.

An idea that I've wanted to try is if we can detect miss predictions using a 
bloom filter. When a candidate is rejected it could be added to this sketch. If 
rejected again within the a shorter sample period, then we bypass TinyLFU. This 
might mean that the window is too small and could be dynamically sized based on 
the feedback. I think it would also help mitigate this collision.

I tried to protect the most likely attach by using a random seed, since I saw 
that degredation was possible. But making it more resilient is definitely worth 
while and your help here is great.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240829#comment-15240829
 ] 

Ben Manes commented on CASSANDRA-11452:
---

If you are comfortable with Caffeine regardless of the above, then I'd really 
appreciate the help in getting CASSANDRA-10855 merged. I am very much willing 
to work together to resolve any issues the Cassandra team discovers.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240834#comment-15240834
 ] 

Benedict commented on CASSANDRA-11452:
--

Nice catch, that's a really pernicious property.  The paper not only doesn't 
mention it, it seems to me that by comparing against the eviction victim the 
TinyLFU is no longer an _admission_ policy as stated, but both an admission 
_and_ an eviction policy - debatably in opposition to the paper.  It seems that 
we could instead be comparing against a cohort of near-to-eviction candidates, 
or some other dynamic threshold.  The eviction candidate isn't particularly 
special, as far as I can tell, it's simply a proxy for "this is the threshold 
above which a value is likely to be reused"

It would also be great to support more than 32-bit hashes for seeding the 
sketch hashes, to reduce the incidence of this.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240837#comment-15240837
 ] 

Benedict commented on CASSANDRA-10855:
--

We should definitely address Branimir's 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-11452?focusedCommentId=15240789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15240789]
 before merging.  As he says, this could theoretically completely eliminate the 
benefit of the cache in certain circumstances.

I think both supporting larger hashes, and not comparing _only_ against the 
main eviction candidate for admission, are reasonably easy and should solve the 
problem.

> Use Caffeine (W-TinyLFU) for on-heap caches
> ---
>
> Key: CASSANDRA-10855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10855
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ben Manes
>  Labels: performance
>
> Cassandra currently uses 
> [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] 
> for performance critical caches (key, counter) and Guava's cache for 
> non-critical (auth, metrics, security). All of these usages have been 
> replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the 
> author of the previously mentioned libraries.
> The primary incentive is to switch from LRU policy to W-TinyLFU, which 
> provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] 
> hit rates. It performs particularly well in database and search traces, is 
> scan resistant, and as adds a very small time/space overhead to LRU.
> Secondarily, Guava's caches never obtained similar 
> [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM 
> due to some optimizations not being ported over. This change results in 
> faster reads and not creating garbage as a side-effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240837#comment-15240837
 ] 

Benedict edited comment on CASSANDRA-10855 at 4/14/16 9:12 AM:
---

We should definitely address Branimir's 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-11452?focusedCommentId=15240789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15240789]
 before merging.  As he says, this could theoretically completely eliminate the 
benefit of the cache in certain circumstances.

I think both supporting larger hashes, and not comparing _only_ against the 
main eviction candidate for admission (e.g. comparing against a value with 
logarithmically random distance from the eviction candidate), are reasonably 
easy and should solve the problem


was (Author: benedict):
We should definitely address Branimir's 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-11452?focusedCommentId=15240789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15240789]
 before merging.  As he says, this could theoretically completely eliminate the 
benefit of the cache in certain circumstances.

I think both supporting larger hashes, and not comparing _only_ against the 
main eviction candidate for admission, are reasonably easy and should solve the 
problem.

> Use Caffeine (W-TinyLFU) for on-heap caches
> ---
>
> Key: CASSANDRA-10855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10855
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ben Manes
>  Labels: performance
>
> Cassandra currently uses 
> [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] 
> for performance critical caches (key, counter) and Guava's cache for 
> non-critical (auth, metrics, security). All of these usages have been 
> replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the 
> author of the previously mentioned libraries.
> The primary incentive is to switch from LRU policy to W-TinyLFU, which 
> provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] 
> hit rates. It performs particularly well in database and search traces, is 
> scan resistant, and as adds a very small time/space overhead to LRU.
> Secondarily, Guava's caches never obtained similar 
> [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM 
> due to some optimizations not being ported over. This change results in 
> faster reads and not creating garbage as a side-effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240841#comment-15240841
 ] 

Ben Manes commented on CASSANDRA-11452:
---

I did mention the attack vector during the revision, but that addition was 
rejected due to concern over the peer reviewer process. Unfortunately I never 
investigated it further and I'd appreciate help on that front.

How would you go about obtaining a larger hash?

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240856#comment-15240856
 ] 

Benedict commented on CASSANDRA-11452:
--

The simplest way would be to accept a hash function that at least operates over 
long values, which if not provided just returns the int, or possibly {{int | 
((long)int << 32)}}.  It would be nice if we could operate over larger values, 
but I don't think that's possible without negatively impacting Caffeine's 
footprint.

We can also protect against the attack by simply comparing against a random 
member of the LRU for frequency in the LFU on admission, perhaps preferring 
(exponentially) candidates that are near to eviction, i.e. generating a number 
via an extreme value/exponential/zipf RNF and walking this distance from the 
head of the eviction queue.  This at least provides a mechanism that will 
rapidly prune collisions without affecting behaviour meaningfully.


> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Fix SelectStatement public API (Follow-up CASSANDRA-7017)

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/trunk 5288d434b -> d8036f936


Fix SelectStatement public API (Follow-up CASSANDRA-7017)

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-7017


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d8036f93
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d8036f93
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d8036f93

Branch: refs/heads/trunk
Commit: d8036f93617e318bf930885981cc75104bf523a2
Parents: 5288d43
Author: Alex Petrov 
Authored: Thu Apr 14 11:27:06 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 11:27:06 2016 +0200

--
 .../cql3/statements/SelectStatement.java| 31 +++-
 1 file changed, 24 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d8036f93/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 9895d67..9745b05 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -204,8 +204,8 @@ public class SelectStatement implements CQLStatement
 cl.validateForRead(keyspace());
 
 int nowInSec = FBUtilities.nowInSeconds();
-int userLimit = getLimit(limit, options);
-int userPerPartitionLimit = getLimit(perPartitionLimit, options);
+int userLimit = getLimit(options);
+int userPerPartitionLimit = getPerPartitionLimit(options);
 ReadQuery query = getQuery(options, nowInSec, userLimit, 
userPerPartitionLimit);
 
 int pageSize = getPageSize(options);
@@ -232,7 +232,7 @@ public class SelectStatement implements CQLStatement
 
 public ReadQuery getQuery(QueryOptions options, int nowInSec) throws 
RequestValidationException
 {
-return getQuery(options, nowInSec, getLimit(limit, options), 
getLimit(perPartitionLimit, options));
+return getQuery(options, nowInSec, getLimit(options), 
getPerPartitionLimit(options));
 }
 
 public ReadQuery getQuery(QueryOptions options, int nowInSec, int 
userLimit, int perPartitionLimit) throws RequestValidationException
@@ -394,8 +394,8 @@ public class SelectStatement implements CQLStatement
 public ResultMessage.Rows executeInternal(QueryState state, QueryOptions 
options) throws RequestExecutionException, RequestValidationException
 {
 int nowInSec = FBUtilities.nowInSeconds();
-int userLimit = getLimit(limit, options);
-int userPerPartitionLimit = getLimit(perPartitionLimit, options);
+int userLimit = getLimit(options);
+int userPerPartitionLimit = getPerPartitionLimit(options);
 ReadQuery query = getQuery(options, nowInSec, userLimit, 
userPerPartitionLimit);
 int pageSize = getPageSize(options);
 
@@ -418,7 +418,7 @@ public class SelectStatement implements CQLStatement
 
 public ResultSet process(PartitionIterator partitions, int nowInSec) 
throws InvalidRequestException
 {
-return process(partitions, QueryOptions.DEFAULT, nowInSec, 
getLimit(limit, QueryOptions.DEFAULT));
+return process(partitions, QueryOptions.DEFAULT, nowInSec, 
getLimit(QueryOptions.DEFAULT));
 }
 
 public String keyspace()
@@ -615,7 +615,24 @@ public class SelectStatement implements CQLStatement
  * @return the limit specified by the user or 
DataLimits.NO_LIMIT if no value
  * as been specified.
  */
-public int getLimit(Term limit, QueryOptions options)
+public int getLimit(QueryOptions options)
+{
+return getLimit(limit, options);
+}
+
+/**
+ * Returns the per partition limit specified by the user.
+ * May be used by custom QueryHandler implementations
+ *
+ * @return the per partition limit specified by the user or 
DataLimits.NO_LIMIT if no value
+ * as been specified.
+ */
+public int getPerPartitionLimit(QueryOptions options)
+{
+return getLimit(perPartitionLimit, options);
+}
+
+private int getLimit(Term limit, QueryOptions options)
 {
 int userLimit = DataLimits.NO_LIMIT;
 



[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240860#comment-15240860
 ] 

Ben Manes commented on CASSANDRA-11452:
---

Yes, a custom interface to define a {{longHashCode()}} would work, but that 
doesn't seem to be a good general solution. I hope one day Java adds it onto 
Object.

I was playing with a [randomized 
W-TinyLFU|https://github.com/ben-manes/caffeine/commit/92f92f7a79a991d148cc88c9e691030dcebba22b]
 earlier today. It works well. A random walk in Caffeine (as is) would be a 
little concerning since that is a linked list traversal.

Cycling the victim through the probation passes Branimir's test and is my 
preference so far. I need to run more simulations to ensure it doesn't have any 
surprising results.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7017) allow per-partition LIMIT clause in cql

2016-04-14 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7017:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Follow-up patch committed in trunk at d8036f93617e318bf930885981cc75104bf523a2

> allow per-partition LIMIT clause in cql
> ---
>
> Key: CASSANDRA-7017
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7017
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Halliday
>Assignee: Alex Petrov
>  Labels: cql
> Fix For: 3.6
>
> Attachments: 0001-Allow-per-partition-limit-in-SELECT-queries.patch, 
> 0001-Allow-per-partition-limit-in-SELECT-queriesV2.patch, 
> 0001-CASSANDRA-7017.patch
>
>
> somewhat related to static columns (#6561) and slicing (#4851), it is 
> desirable to apply a LIMIT on a per-partition rather than per-query basis, 
> such as to retrieve the top (most recent, etc) N clustered values for each 
> partition key, e.g.
> -- for each league, keep a ranked list of users
> create table scores (league text, score int, player text, primary key(league, 
> score, player) );
> -- get the top 3 teams in each league:
> select * from scores staticlimit 3;
> this currently requires issuing one query per partition key, which is tedious 
> if all the key partition key values are known and impossible if they aren't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11560) dtest failure in user_types_test.TestUserTypes.udt_subfield_test

2016-04-14 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240866#comment-15240866
 ] 

Benjamin Lerer commented on CASSANDRA-11560:


[~thobbs] You are probably the best to investigate that problem.

> dtest failure in user_types_test.TestUserTypes.udt_subfield_test
> 
>
> Key: CASSANDRA-11560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11560
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: DS Test Eng
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1125/testReport/user_types_test/TestUserTypes/udt_subfield_test
> Failed on CassCI build trunk_dtest #1125
> Appears to be a test problem:
> {noformat}
> Error Message
> 'NoneType' object is not iterable
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-Kzg9Sk
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/tools.py", line 253, in wrapped
> f(obj)
>   File "/home/automaton/cassandra-dtest/user_types_test.py", line 767, in 
> udt_subfield_test
> self.assertEqual(listify(rows[0]), [[None]])
>   File "/home/automaton/cassandra-dtest/user_types_test.py", line 25, in 
> listify
> for i in item:
> "'NoneType' object is not iterable\n >> begin captured 
> logging << \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-Kzg9Sk\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240867#comment-15240867
 ] 

Benedict commented on CASSANDRA-11452:
--

Well, a RNF like I proposed would have a very short average walk length.  The 
nice thing about that is it protects against any number of collisions (or we 
can define a bound - 20 would probably be plenty, since you may not want to 
walk arbitrary distances, although a more general solution would be a skip list 
of two levels that permits travelling further along the list more rapidly) 
without impacting constant factors and having barely perceptible impact on 
outliers (definitely not perceptible given other causes of variance in 
Cassandra) - just offering slower protection the more collisions there are 
(which is acceptable, given they're also considerably less likely)

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Fix PER PARTITION LIMIT for queries requiring post-query ordering

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/trunk d8036f936 -> 9a0eb9a31


Fix PER PARTITION LIMIT for queries requiring post-query ordering

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11556


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9a0eb9a3
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9a0eb9a3
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9a0eb9a3

Branch: refs/heads/trunk
Commit: 9a0eb9a31e71cfc43def6497907ce2ab3d091aa1
Parents: d8036f9
Author: Alex Petrov 
Authored: Thu Apr 14 11:53:29 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 11:53:29 2016 +0200

--
 CHANGES.txt |   1 +
 .../cql3/statements/SelectStatement.java|   9 +-
 .../validation/operations/SelectLimitTest.java  | 112 +++
 .../cql3/validation/operations/SelectTest.java  |  92 ---
 4 files changed, 119 insertions(+), 95 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a0eb9a3/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3b5f1b7..443c8bc 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.6
+ * Fix PER PARTITION LIMIT for queries requiring post-query ordering 
(CASSANDRA-11556)
  * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
  * Support UDT in CQLSSTableWriter (CASSANDRA-10624)
  * Support for non-frozen user-defined types, updating

http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a0eb9a3/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 9745b05..2f64b25 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -596,12 +596,12 @@ public class SelectStatement implements CQLStatement
 // Whenever we support GROUP BY, we'll have to add a new DataLimits 
kind that knows how things are grouped and is thus
 // able to apply the user limit properly.
 // If we do post ordering we need to get all the results sorted before 
we can trim them.
-if (!selection.isAggregate() && !needsPostQueryOrdering())
+if (!selection.isAggregate())
 {
-cqlRowLimit = userLimit;
+if (!needsPostQueryOrdering())
+cqlRowLimit = userLimit;
 cqlPerPartitionLimit = perPartitionLimit;
 }
-
 if (parameters.isDistinct)
 return cqlRowLimit == DataLimits.NO_LIMIT ? 
DataLimits.DISTINCT_NONE : DataLimits.distinctLimits(cqlRowLimit);
 
@@ -853,6 +853,9 @@ public class SelectStatement implements CQLStatement
 validateDistinctSelection(cfm, selection, restrictions);
 }
 
+checkFalse(selection.isAggregate() && perPartitionLimit != null,
+   "PER PARTITION LIMIT is not allowed with aggregate 
queries.");
+
 Comparator> orderingComparator = null;
 boolean isReversed = false;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a0eb9a3/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
index cf028a1..1dffb0c 100644
--- 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
+++ 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
@@ -113,4 +113,116 @@ public class SelectLimitTest extends CQLTester
row(2, 2),
row(2, 3));
 }
+
+@Test
+public void testPerPartitionLimit() throws Throwable
+{
+perPartitionLimitTest(false);
+}
+
+@Test
+public void testPerPartitionLimitWithCompactStorage() throws Throwable
+{
+perPartitionLimitTest(true);
+}
+
+private void perPartitionLimitTest(boolean withCompactStorage) throws 
Throwable
+{
+String query = "CREATE TABLE %s (a int, b int, c int, PRIMARY KEY (a, 
b))";
+
+if (withCompactStorage)
+createTable(query + " WITH COMPACT STORAGE");
+else
+createTable(query);
+
+for (int i = 0; i < 5; i++)
+{
+for (int j = 0; j < 5; j++)
+{
+execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j,

[jira] [Updated] (CASSANDRA-11556) PER PARTITION LIMIT does not work properly for multi-partition query with ORDER BY

2016-04-14 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-11556:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed in trunk at 9a0eb9a31e71cfc43def6497907ce2ab3d091aa1

> PER PARTITION LIMIT does not work properly for multi-partition query with 
> ORDER BY
> --
>
> Key: CASSANDRA-11556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11556
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benjamin Lerer
>Assignee: Alex Petrov
> Fix For: 3.6
>
>
> Multi-partition queries with {{PER PARTITION LIMIT}} with {{ORDER BY}} do not 
> respect the {{PER PARTITION LIMIT}}.
> The problem can be reproduced with the following unit test:
> {code}
> @Test
> public void testPerPartitionLimitWithMultiPartitionQueryAndOrderBy() 
> throws Throwable
> {
> createTable("CREATE TABLE %s (a int, b int, c int, PRIMARY KEY (a, 
> b))");
> for (int i = 0; i < 5; i++)
> {
> for (int j = 0; j < 5; j++)
> {
> execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, j);
> }
> }
> assertRows(execute("SELECT * FROM %s WHERE a IN (2, 3) ORDER BY b 
> DESC PER PARTITION LIMIT ?", 2),
> row(2, 4, 4),
> row(3, 4, 4),
> row(2, 3, 3),
> row(3, 3, 3));
> }
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11556) PER PARTITION LIMIT does not work properly for multi-partition query with ORDER BY

2016-04-14 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240891#comment-15240891
 ] 

Benjamin Lerer commented on CASSANDRA-11556:


+1

> PER PARTITION LIMIT does not work properly for multi-partition query with 
> ORDER BY
> --
>
> Key: CASSANDRA-11556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11556
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benjamin Lerer
>Assignee: Alex Petrov
> Fix For: 3.6
>
>
> Multi-partition queries with {{PER PARTITION LIMIT}} with {{ORDER BY}} do not 
> respect the {{PER PARTITION LIMIT}}.
> The problem can be reproduced with the following unit test:
> {code}
> @Test
> public void testPerPartitionLimitWithMultiPartitionQueryAndOrderBy() 
> throws Throwable
> {
> createTable("CREATE TABLE %s (a int, b int, c int, PRIMARY KEY (a, 
> b))");
> for (int i = 0; i < 5; i++)
> {
> for (int j = 0; j < 5; j++)
> {
> execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, j);
> }
> }
> assertRows(execute("SELECT * FROM %s WHERE a IN (2, 3) ORDER BY b 
> DESC PER PARTITION LIMIT ?", 2),
> row(2, 4, 4),
> row(3, 4, 4),
> row(2, 3, 3),
> row(3, 3, 3));
> }
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11556) PER PARTITION LIMIT does not work properly for multi-partition query with ORDER BY

2016-04-14 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240895#comment-15240895
 ] 

Alex Petrov commented on CASSANDRA-11556:
-

Thank you!

> PER PARTITION LIMIT does not work properly for multi-partition query with 
> ORDER BY
> --
>
> Key: CASSANDRA-11556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11556
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benjamin Lerer
>Assignee: Alex Petrov
> Fix For: 3.6
>
>
> Multi-partition queries with {{PER PARTITION LIMIT}} with {{ORDER BY}} do not 
> respect the {{PER PARTITION LIMIT}}.
> The problem can be reproduced with the following unit test:
> {code}
> @Test
> public void testPerPartitionLimitWithMultiPartitionQueryAndOrderBy() 
> throws Throwable
> {
> createTable("CREATE TABLE %s (a int, b int, c int, PRIMARY KEY (a, 
> b))");
> for (int i = 0; i < 5; i++)
> {
> for (int j = 0; j < 5; j++)
> {
> execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, j);
> }
> }
> assertRows(execute("SELECT * FROM %s WHERE a IN (2, 3) ORDER BY b 
> DESC PER PARTITION LIMIT ?", 2),
> row(2, 4, 4),
> row(3, 4, 4),
> row(2, 3, 3),
> row(3, 3, 3));
> }
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240902#comment-15240902
 ] 

Ben Manes commented on CASSANDRA-11452:
---

The simple hack recycling when the hash codes are equal seems to work well. 
This would be done in {{evictFromMain}} at the end after the candidate was 
evicted. Since a weighted cache might evict multiple entries we have to reset 
the victim for the next loop.

{code}
// Recycle to guard against hash collision attacks
if (victimKey.hashCode() == candidateKey.hashCode()) {
  Node nextVictim = victim.getNextInAccessOrder();
  accessOrderProbationDeque().moveToBack(victim);
  victim = nextVictim;
}
{code}

The LIRS paper's traces (short) indicate that the difference noise.

{{
multi1: 55.28 -> 55.40
multi2: 48.37 -> 48.42
multi3: 41.78 -> 42.00
gli: 34.15 -> 34.06
ps: 57.15 -> 57.17
sprite: 54.95 -> 55.33
cs: 30.19 -> 29.82
loop: 49.95 -> 49.90
2_pools: 52.02 -> 51.96
}}

 Tomorrow I'll check some of the ARC traces, clean-up the patch, and convert 
Branimir into a unit test. Thoughts?

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11570) Concurrent execution of prepared statement returns invalid JSON as result

2016-04-14 Thread Alexander Ryabets (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Ryabets updated CASSANDRA-11570:
--
Attachment: (was: CassandraPreparedStatementsTest.zip)

> Concurrent execution of prepared statement returns invalid JSON as result
> -
>
> Key: CASSANDRA-11570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11570
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.2, C++ or C# driver
>Reporter: Alexander Ryabets
> Attachments: test_neptunao.cql
>
>
> When I use prepared statement for async execution of multiple statements I 
> get JSON with broken data. Keys got totally corrupted when values seems to be 
> normal though.
> First I encoutered this issue when I were performing stress testing of our 
> project using custom script. We are using DataStax C++ driver and execute 
> statements from different fibers.
> Then I was trying to isolate problem and wrote simple C# program which starts 
> multiple Tasks in a loop. Each task uses the once created prepared statement 
> to read data from the base. As you can see results are totally mess.
> I 've attached archive with console C# project (1 cs file) which just print 
> resulting JSON to user. 
> Here is the main part of C# code.
> {noformat}
> static void Main(string[] args)
> {
>   const int task_count = 300;
>   using(var cluster = 
> Cluster.Builder().AddContactPoints("127.0.0.1").Build())
>   {
> using(var session = cluster.Connect())
> {
>   var prepared = session.Prepare("select json * from 
> test_neptunao.ubuntu");
>   var tasks = new Task[task_count];
>   for(int i = 0; i < task_count; i++)
>   {
> tasks[i] = Query(prepared, session);
>   }
>   Task.WaitAll(tasks);
> }
>   }
>   Console.ReadKey();
> }
> private static Task Query(PreparedStatement prepared, ISession session)
> {
>   var stmt = prepared.Bind();
>   stmt.SetConsistencyLevel(ConsistencyLevel.One);
>   return session.ExecuteAsync(stmt).ContinueWith(tr =>
>   {
> foreach(var row in tr.Result)
> {
>   var value = row.GetValue(0);
>   Console.WriteLine(value);
> }
>   });
> }
> {noformat}
> I also attached cql script with test DB schema.
> {noformat}
> CREATE KEYSPACE IF NOT EXISTS test_neptunao
> WITH replication = {
>   'class' : 'SimpleStrategy',
>   'replication_factor' : 3
> };
> use test_neptunao;
> create table if not exists ubuntu (
>   id timeuuid PRIMARY KEY,
>   precise_pangolin text,
>   trusty_tahr text,
>   wily_werewolf text, 
>   vivid_vervet text,
>   saucy_salamander text,
>   lucid_lynx text
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11570) Concurrent execution of prepared statement returns invalid JSON as result

2016-04-14 Thread Alexander Ryabets (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240924#comment-15240924
 ] 

Alexander Ryabets commented on CASSANDRA-11570:
---

I've changed the samle a bit to fetch only one row per query. 

I've also attached the broken and expected outputs. You can see invalid result 
in the row with id `516b00a2-01a7-11e6-8630-c04f49e62c6b` for example.

> Concurrent execution of prepared statement returns invalid JSON as result
> -
>
> Key: CASSANDRA-11570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11570
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.2, C++ or C# driver
>Reporter: Alexander Ryabets
> Attachments: test_neptunao.cql
>
>
> When I use prepared statement for async execution of multiple statements I 
> get JSON with broken data. Keys got totally corrupted when values seems to be 
> normal though.
> First I encoutered this issue when I were performing stress testing of our 
> project using custom script. We are using DataStax C++ driver and execute 
> statements from different fibers.
> Then I was trying to isolate problem and wrote simple C# program which starts 
> multiple Tasks in a loop. Each task uses the once created prepared statement 
> to read data from the base. As you can see results are totally mess.
> I 've attached archive with console C# project (1 cs file) which just print 
> resulting JSON to user. 
> Here is the main part of C# code.
> {noformat}
> static void Main(string[] args)
> {
>   const int task_count = 300;
>   using(var cluster = 
> Cluster.Builder().AddContactPoints("127.0.0.1").Build())
>   {
> using(var session = cluster.Connect())
> {
>   var prepared = session.Prepare("select json * from 
> test_neptunao.ubuntu");
>   var tasks = new Task[task_count];
>   for(int i = 0; i < task_count; i++)
>   {
> tasks[i] = Query(prepared, session);
>   }
>   Task.WaitAll(tasks);
> }
>   }
>   Console.ReadKey();
> }
> private static Task Query(PreparedStatement prepared, ISession session)
> {
>   var stmt = prepared.Bind();
>   stmt.SetConsistencyLevel(ConsistencyLevel.One);
>   return session.ExecuteAsync(stmt).ContinueWith(tr =>
>   {
> foreach(var row in tr.Result)
> {
>   var value = row.GetValue(0);
>   Console.WriteLine(value);
> }
>   });
> }
> {noformat}
> I also attached cql script with test DB schema.
> {noformat}
> CREATE KEYSPACE IF NOT EXISTS test_neptunao
> WITH replication = {
>   'class' : 'SimpleStrategy',
>   'replication_factor' : 3
> };
> use test_neptunao;
> create table if not exists ubuntu (
>   id timeuuid PRIMARY KEY,
>   precise_pangolin text,
>   trusty_tahr text,
>   wily_werewolf text, 
>   vivid_vervet text,
>   saucy_salamander text,
>   lucid_lynx text
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11570) Concurrent execution of prepared statement returns invalid JSON as result

2016-04-14 Thread Alexander Ryabets (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Ryabets updated CASSANDRA-11570:
--
Attachment: valid_output.txt
broken_output.txt
CassandraPreparedStatementsTest.zip

> Concurrent execution of prepared statement returns invalid JSON as result
> -
>
> Key: CASSANDRA-11570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11570
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.2, C++ or C# driver
>Reporter: Alexander Ryabets
> Attachments: CassandraPreparedStatementsTest.zip, broken_output.txt, 
> test_neptunao.cql, valid_output.txt
>
>
> When I use prepared statement for async execution of multiple statements I 
> get JSON with broken data. Keys got totally corrupted when values seems to be 
> normal though.
> First I encoutered this issue when I were performing stress testing of our 
> project using custom script. We are using DataStax C++ driver and execute 
> statements from different fibers.
> Then I was trying to isolate problem and wrote simple C# program which starts 
> multiple Tasks in a loop. Each task uses the once created prepared statement 
> to read data from the base. As you can see results are totally mess.
> I 've attached archive with console C# project (1 cs file) which just print 
> resulting JSON to user. 
> Here is the main part of C# code.
> {noformat}
> static void Main(string[] args)
> {
>   const int task_count = 300;
>   using(var cluster = 
> Cluster.Builder().AddContactPoints("127.0.0.1").Build())
>   {
> using(var session = cluster.Connect())
> {
>   var prepared = session.Prepare("select json * from 
> test_neptunao.ubuntu");
>   var tasks = new Task[task_count];
>   for(int i = 0; i < task_count; i++)
>   {
> tasks[i] = Query(prepared, session);
>   }
>   Task.WaitAll(tasks);
> }
>   }
>   Console.ReadKey();
> }
> private static Task Query(PreparedStatement prepared, ISession session)
> {
>   var stmt = prepared.Bind();
>   stmt.SetConsistencyLevel(ConsistencyLevel.One);
>   return session.ExecuteAsync(stmt).ContinueWith(tr =>
>   {
> foreach(var row in tr.Result)
> {
>   var value = row.GetValue(0);
>   Console.WriteLine(value);
> }
>   });
> }
> {noformat}
> I also attached cql script with test DB schema.
> {noformat}
> CREATE KEYSPACE IF NOT EXISTS test_neptunao
> WITH replication = {
>   'class' : 'SimpleStrategy',
>   'replication_factor' : 3
> };
> use test_neptunao;
> create table if not exists ubuntu (
>   id timeuuid PRIMARY KEY,
>   precise_pangolin text,
>   trusty_tahr text,
>   wily_werewolf text, 
>   vivid_vervet text,
>   saucy_salamander text,
>   lucid_lynx text
> );
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Allow only DISTINCT queries with partition keys restrictions

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.2 19b4b637a -> 69edeaa46


Allow only DISTINCT queries with partition keys restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/69edeaa4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/69edeaa4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/69edeaa4

Branch: refs/heads/cassandra-2.2
Commit: 69edeaa46b78bb168f7e9d0b1c991c07b90f41ca
Parents: 19b4b63
Author: Alex Petrov 
Authored: Thu Apr 14 12:26:52 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:26:52 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 
 .../cql3/statements/SelectStatement.java|  3 ++
 .../cql3/validation/operations/SelectTest.java  | 45 
 4 files changed, 58 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 54013a3..c72b6cb 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.6
+ * Allow only DISTINCT queries with partition keys restrictions 
(CASSANDRA-11339)
  * CqlConfigHelper no longer requires both a keystore and truststore to work 
(CASSANDRA-11532)
  * Make deprecated repair methods backward-compatible with previous 
notification service (CASSANDRA-11430)
  * IncomingStreamingConnection version check message wrong (CASSANDRA-11462)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java 
b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index e0cf743..3934f33 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
@@ -279,6 +279,15 @@ public final class StatementRestrictions
 }
 
 /**
+ * Checks if the restrictions contain any non-primary key restrictions
+ * @return true if the restrictions contain any non-primary 
key restrictions, false otherwise.
+ */
+public boolean hasNonPrimaryKeyRestrictions()
+{
+return !nonPrimaryKeyRestrictions.isEmpty();
+}
+
+/**
  * Returns the partition key components that are not restricted.
  * @return the partition key components that are not restricted.
  */

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 291e3e4..7bba330 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -885,6 +885,9 @@ public class SelectStatement implements CQLStatement
   StatementRestrictions 
restrictions)
   throws 
InvalidRequestException
 {
+checkFalse(restrictions.hasClusteringColumnsRestriction() || 
restrictions.hasNonPrimaryKeyRestrictions(),
+   "SELECT DISTINCT with WHERE clause only supports 
restriction by partition key.");
+
 Collection requestedColumns = 
selection.getColumns();
 for (ColumnDefinition def : requestedColumns)
 checkFalse(!def.isPartitionKey() && !def.isStatic(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index d8cd3c3..d444fde 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
@@ -1253,6 +1253,51 @@ public class SelectTest extends CQLTester
 Assert.assertEquals(9, rows.length);
 }
 
+@Test
+public void testSelectDistinctWithWhereClause() throws Throwable {
+createTable("CREATE TABLE %s (k int, a int, b int, PRIMARY KEY (k, 
a))");
+createInd

[jira] [Commented] (CASSANDRA-11339) WHERE clause in SELECT DISTINCT can be ignored

2016-04-14 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240952#comment-15240952
 ] 

Benjamin Lerer commented on CASSANDRA-11339:


+1
Thanks for the patch.

> WHERE clause in SELECT DISTINCT can be ignored
> --
>
> Key: CASSANDRA-11339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11339
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Philip Thompson
>Assignee: Alex Petrov
> Fix For: 2.2.x, 3.x
>
> Attachments: 
> 0001-Add-validation-for-distinct-queries-disallowing-quer.patch
>
>
> I've tested this out on 2.1-head. I'm not sure if it's the same behavior on 
> newer versions.
> For a given table t, with {{PRIMARY KEY (id, v)}} the following two queries 
> return the same result:
> {{SELECT DISTINCT id FROM t WHERE v > X ALLOW FILTERING}}
> {{SELECT DISTINCT id FROM t}}
> The WHERE clause in the former is silently ignored, and all id are returned, 
> regardless of the value of v in any row. 
> It seems like this has been a known issue for a while:
> http://stackoverflow.com/questions/26548788/select-distinct-cql-ignores-where-clause
> However, if we don't support filtering on anything but the partition key, we 
> should reject the query, rather than silently dropping the where clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] cassandra git commit: Merge branch cassandra-2.2 into cassandra-3.0

2016-04-14 Thread blerer
Merge branch cassandra-2.2 into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0818e1b1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0818e1b1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0818e1b1

Branch: refs/heads/cassandra-3.0
Commit: 0818e1b16af36adb2fbbd3dffacdccc2ecf60a9a
Parents: fd24b7c 69edeaa
Author: Benjamin Lerer 
Authored: Thu Apr 14 12:32:56 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:33:05 2016 +0200

--

--




[1/2] cassandra git commit: Allow only DISTINCT queries with partition keys restrictions

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 fd24b7c0d -> 0818e1b16


Allow only DISTINCT queries with partition keys restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/69edeaa4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/69edeaa4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/69edeaa4

Branch: refs/heads/cassandra-3.0
Commit: 69edeaa46b78bb168f7e9d0b1c991c07b90f41ca
Parents: 19b4b63
Author: Alex Petrov 
Authored: Thu Apr 14 12:26:52 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:26:52 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 
 .../cql3/statements/SelectStatement.java|  3 ++
 .../cql3/validation/operations/SelectTest.java  | 45 
 4 files changed, 58 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 54013a3..c72b6cb 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.6
+ * Allow only DISTINCT queries with partition keys restrictions 
(CASSANDRA-11339)
  * CqlConfigHelper no longer requires both a keystore and truststore to work 
(CASSANDRA-11532)
  * Make deprecated repair methods backward-compatible with previous 
notification service (CASSANDRA-11430)
  * IncomingStreamingConnection version check message wrong (CASSANDRA-11462)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java 
b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index e0cf743..3934f33 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
@@ -279,6 +279,15 @@ public final class StatementRestrictions
 }
 
 /**
+ * Checks if the restrictions contain any non-primary key restrictions
+ * @return true if the restrictions contain any non-primary 
key restrictions, false otherwise.
+ */
+public boolean hasNonPrimaryKeyRestrictions()
+{
+return !nonPrimaryKeyRestrictions.isEmpty();
+}
+
+/**
  * Returns the partition key components that are not restricted.
  * @return the partition key components that are not restricted.
  */

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 291e3e4..7bba330 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -885,6 +885,9 @@ public class SelectStatement implements CQLStatement
   StatementRestrictions 
restrictions)
   throws 
InvalidRequestException
 {
+checkFalse(restrictions.hasClusteringColumnsRestriction() || 
restrictions.hasNonPrimaryKeyRestrictions(),
+   "SELECT DISTINCT with WHERE clause only supports 
restriction by partition key.");
+
 Collection requestedColumns = 
selection.getColumns();
 for (ColumnDefinition def : requestedColumns)
 checkFalse(!def.isPartitionKey() && !def.isStatic(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index d8cd3c3..d444fde 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
@@ -1253,6 +1253,51 @@ public class SelectTest extends CQLTester
 Assert.assertEquals(9, rows.length);
 }
 
+@Test
+public void testSelectDistinctWithWhereClause() throws Throwable {
+createTable("CREATE TABLE %s (k int, a int, b int, PRIMARY KEY (k, 
a))");
+createInd

cassandra git commit: Allow only DISTINCT queries with partition keys or static columns restrictions

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 0818e1b16 -> 6ad874509


Allow only DISTINCT queries with partition keys or static columns restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6ad87450
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6ad87450
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6ad87450

Branch: refs/heads/cassandra-3.0
Commit: 6ad874509d6c7edd53bb3a4b897477d6a2753c19
Parents: 0818e1b
Author: Alex Petrov 
Authored: Thu Apr 14 12:35:07 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:35:07 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 +++
 .../cql3/statements/SelectStatement.java|  4 ++
 .../cql3/validation/operations/SelectTest.java  | 72 
 4 files changed, 86 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index ed4c412..3b4d473 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.6
+ * Allow only DISTINCT queries with partition keys or static columns 
restrictions (CASSANDRA-11339)
  * LogAwareFileLister should only use OLD sstable files in current folder to 
determine disk consistency (CASSANDRA-11470)
  * Notify indexers of expired rows during compaction (CASSANDRA-11329)
  * Properly respond with ProtocolError when a v1/v2 native protocol

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java 
b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index 797b8e4..763a7be 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
@@ -396,6 +396,15 @@ public final class StatementRestrictions
 }
 
 /**
+ * Checks if the restrictions contain any non-primary key restrictions
+ * @return true if the restrictions contain any non-primary 
key restrictions, false otherwise.
+ */
+public boolean hasNonPrimaryKeyRestrictions()
+{
+return !nonPrimaryKeyRestrictions.isEmpty();
+}
+
+/**
  * Returns the partition key components that are not restricted.
  * @return the partition key components that are not restricted.
  */

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 51d675b..b4215ac 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -896,6 +896,10 @@ public class SelectStatement implements CQLStatement
   StatementRestrictions 
restrictions)
   throws 
InvalidRequestException
 {
+checkFalse(restrictions.hasClusteringColumnsRestriction() ||
+   (restrictions.hasNonPrimaryKeyRestrictions() && 
!restrictions.nonPKRestrictedColumns(true).stream().allMatch(ColumnDefinition::isStatic)),
+   "SELECT DISTINCT with WHERE clause only supports 
restriction by partition key and/or static columns.");
+
 Collection requestedColumns = 
selection.getColumns();
 for (ColumnDefinition def : requestedColumns)
 checkFalse(!def.isPartitionKey() && !def.isStatic(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index a7eeeb8..5c19e1b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
@@ -1253,6 +1253,78 @@ public class SelectTest extends CQLTester
 Assert.assertEquals(9, rows.length);
 }
 
+@Test
+public void 

[3/4] cassandra git commit: Allow only DISTINCT queries with partition keys or static columns restrictions

2016-04-14 Thread blerer
Allow only DISTINCT queries with partition keys or static columns restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6ad87450
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6ad87450
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6ad87450

Branch: refs/heads/trunk
Commit: 6ad874509d6c7edd53bb3a4b897477d6a2753c19
Parents: 0818e1b
Author: Alex Petrov 
Authored: Thu Apr 14 12:35:07 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:35:07 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 +++
 .../cql3/statements/SelectStatement.java|  4 ++
 .../cql3/validation/operations/SelectTest.java  | 72 
 4 files changed, 86 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index ed4c412..3b4d473 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.6
+ * Allow only DISTINCT queries with partition keys or static columns 
restrictions (CASSANDRA-11339)
  * LogAwareFileLister should only use OLD sstable files in current folder to 
determine disk consistency (CASSANDRA-11470)
  * Notify indexers of expired rows during compaction (CASSANDRA-11329)
  * Properly respond with ProtocolError when a v1/v2 native protocol

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java 
b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index 797b8e4..763a7be 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
@@ -396,6 +396,15 @@ public final class StatementRestrictions
 }
 
 /**
+ * Checks if the restrictions contain any non-primary key restrictions
+ * @return true if the restrictions contain any non-primary 
key restrictions, false otherwise.
+ */
+public boolean hasNonPrimaryKeyRestrictions()
+{
+return !nonPrimaryKeyRestrictions.isEmpty();
+}
+
+/**
  * Returns the partition key components that are not restricted.
  * @return the partition key components that are not restricted.
  */

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 51d675b..b4215ac 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -896,6 +896,10 @@ public class SelectStatement implements CQLStatement
   StatementRestrictions 
restrictions)
   throws 
InvalidRequestException
 {
+checkFalse(restrictions.hasClusteringColumnsRestriction() ||
+   (restrictions.hasNonPrimaryKeyRestrictions() && 
!restrictions.nonPKRestrictedColumns(true).stream().allMatch(ColumnDefinition::isStatic)),
+   "SELECT DISTINCT with WHERE clause only supports 
restriction by partition key and/or static columns.");
+
 Collection requestedColumns = 
selection.getColumns();
 for (ColumnDefinition def : requestedColumns)
 checkFalse(!def.isPartitionKey() && !def.isStatic(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ad87450/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index a7eeeb8..5c19e1b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
@@ -1253,6 +1253,78 @@ public class SelectTest extends CQLTester
 Assert.assertEquals(9, rows.length);
 }
 
+@Test
+public void testSelectDistinctWithWhereClause() throws Throwable {
+createTable("CREATE TABLE %s (k int,

[2/4] cassandra git commit: Merge branch cassandra-2.2 into cassandra-3.0

2016-04-14 Thread blerer
Merge branch cassandra-2.2 into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0818e1b1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0818e1b1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0818e1b1

Branch: refs/heads/trunk
Commit: 0818e1b16af36adb2fbbd3dffacdccc2ecf60a9a
Parents: fd24b7c 69edeaa
Author: Benjamin Lerer 
Authored: Thu Apr 14 12:32:56 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:33:05 2016 +0200

--

--




[4/4] cassandra git commit: Merge branch cassandra-3.0 into trunk

2016-04-14 Thread blerer
Merge branch cassandra-3.0 into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ccacf7d1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ccacf7d1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ccacf7d1

Branch: refs/heads/trunk
Commit: ccacf7d1a94875c2da10bacbe63f99b630030fdf
Parents: 9a0eb9a 6ad8745
Author: Benjamin Lerer 
Authored: Thu Apr 14 12:38:04 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:38:04 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 +++
 .../cql3/statements/SelectStatement.java|  4 ++
 .../cql3/validation/operations/SelectTest.java  | 72 
 4 files changed, 86 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ccacf7d1/CHANGES.txt
--
diff --cc CHANGES.txt
index 443c8bc,3b4d473..329e55c
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,48 -1,5 +1,49 @@@
 -3.0.6
 +3.6
 + * Fix PER PARTITION LIMIT for queries requiring post-query ordering 
(CASSANDRA-11556)
 + * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
 + * Support UDT in CQLSSTableWriter (CASSANDRA-10624)
 + * Support for non-frozen user-defined types, updating
 +   individual fields of user-defined types (CASSANDRA-7423)
 + * Make LZ4 compression level configurable (CASSANDRA-11051)
 + * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017)
 + * Make custom filtering more extensible with UserExpression (CASSANDRA-11295)
 + * Improve field-checking and error reporting in cassandra.yaml 
(CASSANDRA-10649)
 + * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507)
 + * More user friendly error when providing an invalid token to nodetool 
(CASSANDRA-9348)
 + * Add static column support to SASI index (CASSANDRA-11183)
 + * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization 
(CASSANDRA-11434)
 + * Support LIKE operator in prepared statements (CASSANDRA-11456)
 + * Add a command to see if a Materialized View has finished building 
(CASSANDRA-9967)
 + * Log endpoint and port associated with streaming operation (CASSANDRA-8777)
 + * Print sensible units for all log messages (CASSANDRA-9692)
 + * Upgrade Netty to version 4.0.34 (CASSANDRA-11096)
 + * Break the CQL grammar into separate Parser and Lexer (CASSANDRA-11372)
 + * Compress only inter-dc traffic by default (CASSANDRA-)
 + * Add metrics to track write amplification (CASSANDRA-11420)
 + * cassandra-stress: cannot handle "value-less" tables (CASSANDRA-7739)
 + * Add/drop multiple columns in one ALTER TABLE statement (CASSANDRA-10411)
 + * Add require_endpoint_verification opt for internode encryption 
(CASSANDRA-9220)
 + * Add auto import java.util for UDF code block (CASSANDRA-11392)
 + * Add --hex-format option to nodetool getsstables (CASSANDRA-11337)
 + * sstablemetadata should print sstable min/max token (CASSANDRA-7159)
 + * Do not wrap CassandraException in TriggerExecutor (CASSANDRA-9421)
 + * COPY TO should have higher double precision (CASSANDRA-11255)
 + * Stress should exit with non-zero status after failure (CASSANDRA-10340)
 + * Add client to cqlsh SHOW_SESSION (CASSANDRA-8958)
 + * Fix nodetool tablestats keyspace level metrics (CASSANDRA-11226)
 + * Store repair options in parent_repair_history (CASSANDRA-11244)
 + * Print current leveling in sstableofflinerelevel (CASSANDRA-9588)
 + * Change repair message for keyspaces with RF 1 (CASSANDRA-11203)
 + * Remove hard-coded SSL cipher suites and protocols (CASSANDRA-10508)
 + * Improve concurrency in CompactionStrategyManager (CASSANDRA-10099)
 + * (cqlsh) interpret CQL type for formatting blobs (CASSANDRA-11274)
 + * Refuse to start and print txn log information in case of disk
 +   corruption (CASSANDRA-10112)
 + * Resolve some eclipse-warnings (CASSANDRA-11086)
 + * (cqlsh) Show static columns in a different color (CASSANDRA-11059)
 + * Allow to remove TTLs on table with default_time_to_live (CASSANDRA-11207)
 +Merged from 3.0:
+  * Allow only DISTINCT queries with partition keys or static columns 
restrictions (CASSANDRA-11339)
   * LogAwareFileLister should only use OLD sstable files in current folder to 
determine disk consistency (CASSANDRA-11470)
   * Notify indexers of expired rows during compaction (CASSANDRA-11329)
   * Properly respond with ProtocolError when a v1/v2 native protocol

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ccacf7d1/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ccacf7d1/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java

[1/4] cassandra git commit: Allow only DISTINCT queries with partition keys restrictions

2016-04-14 Thread blerer
Repository: cassandra
Updated Branches:
  refs/heads/trunk 9a0eb9a31 -> ccacf7d1a


Allow only DISTINCT queries with partition keys restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/69edeaa4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/69edeaa4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/69edeaa4

Branch: refs/heads/trunk
Commit: 69edeaa46b78bb168f7e9d0b1c991c07b90f41ca
Parents: 19b4b63
Author: Alex Petrov 
Authored: Thu Apr 14 12:26:52 2016 +0200
Committer: Benjamin Lerer 
Committed: Thu Apr 14 12:26:52 2016 +0200

--
 CHANGES.txt |  1 +
 .../restrictions/StatementRestrictions.java |  9 
 .../cql3/statements/SelectStatement.java|  3 ++
 .../cql3/validation/operations/SelectTest.java  | 45 
 4 files changed, 58 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 54013a3..c72b6cb 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.6
+ * Allow only DISTINCT queries with partition keys restrictions 
(CASSANDRA-11339)
  * CqlConfigHelper no longer requires both a keystore and truststore to work 
(CASSANDRA-11532)
  * Make deprecated repair methods backward-compatible with previous 
notification service (CASSANDRA-11430)
  * IncomingStreamingConnection version check message wrong (CASSANDRA-11462)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java 
b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index e0cf743..3934f33 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
@@ -279,6 +279,15 @@ public final class StatementRestrictions
 }
 
 /**
+ * Checks if the restrictions contain any non-primary key restrictions
+ * @return true if the restrictions contain any non-primary 
key restrictions, false otherwise.
+ */
+public boolean hasNonPrimaryKeyRestrictions()
+{
+return !nonPrimaryKeyRestrictions.isEmpty();
+}
+
+/**
  * Returns the partition key components that are not restricted.
  * @return the partition key components that are not restricted.
  */

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 291e3e4..7bba330 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -885,6 +885,9 @@ public class SelectStatement implements CQLStatement
   StatementRestrictions 
restrictions)
   throws 
InvalidRequestException
 {
+checkFalse(restrictions.hasClusteringColumnsRestriction() || 
restrictions.hasNonPrimaryKeyRestrictions(),
+   "SELECT DISTINCT with WHERE clause only supports 
restriction by partition key.");
+
 Collection requestedColumns = 
selection.getColumns();
 for (ColumnDefinition def : requestedColumns)
 checkFalse(!def.isPartitionKey() && !def.isStatic(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/69edeaa4/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java 
b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index d8cd3c3..d444fde 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
@@ -1253,6 +1253,51 @@ public class SelectTest extends CQLTester
 Assert.assertEquals(9, rows.length);
 }
 
+@Test
+public void testSelectDistinctWithWhereClause() throws Throwable {
+createTable("CREATE TABLE %s (k int, a int, b int, PRIMARY KEY (k, 
a))");
+createIndex("CREATE INDEX

[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240956#comment-15240956
 ] 

Branimir Lambov commented on CASSANDRA-11452:
-

I'm sorry, I don't see how this helps. Once both the hot key and its collision 
are in the main area (this check is not enough to guarantee that won't happen 
though it probably manages to do so for this specific test), this path is no 
longer triggered.

I think we should be looking for a way to eject an offending entry after it has 
entered. I think an ideal test would verify that CLASH is among the cached keys 
before "// Now run a repeating sequence ...", but not there after the loop has 
finished.

Did you run traces with candidate preference on equality? Is it still bad?

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11339) WHERE clause in SELECT DISTINCT can be ignored

2016-04-14 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-11339:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed in 2.2 at 69edeaa46b78bb168f7e9d0b1c991c07b90f41ca.
Committed in 3.0 at 6ad874509d6c7edd53bb3a4b897477d6a2753c19 and merged into 
trunk.

> WHERE clause in SELECT DISTINCT can be ignored
> --
>
> Key: CASSANDRA-11339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11339
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Philip Thompson
>Assignee: Alex Petrov
> Fix For: 2.2.x, 3.x
>
> Attachments: 
> 0001-Add-validation-for-distinct-queries-disallowing-quer.patch
>
>
> I've tested this out on 2.1-head. I'm not sure if it's the same behavior on 
> newer versions.
> For a given table t, with {{PRIMARY KEY (id, v)}} the following two queries 
> return the same result:
> {{SELECT DISTINCT id FROM t WHERE v > X ALLOW FILTERING}}
> {{SELECT DISTINCT id FROM t}}
> The WHERE clause in the former is silently ignored, and all id are returned, 
> regardless of the value of v in any row. 
> It seems like this has been a known issue for a while:
> http://stackoverflow.com/questions/26548788/select-distinct-cql-ignores-where-clause
> However, if we don't support filtering on anything but the partition key, we 
> should reject the query, rather than silently dropping the where clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241025#comment-15241025
 ] 

Benedict commented on CASSANDRA-11452:
--

Also, just to clarify, I'm not proposing a random _eviction_, just a random 
selection of who to compare against for _admission_ - the eviction candidate 
would still be the LRU.  Thus the collision would always be removed within a 
short number of steps after reaching the LRU spot, and ordinarily rapidly after.

It's also worth noting that a RNF whose average walk distance was only a little 
larger than 1 (so that it usually compared against the eviction candidate) 
would more than suffice - if the chance of each distance was 1/4 of the prior 
distance, the average walk length would only be 1.33, but it would still take 
only a few comparisons for the eviction to unblock, and a few more for multiple 
such collisions to be resolved.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11522) batch_size_fail_threshold_in_kb shouldn't only apply to batch

2016-04-14 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241042#comment-15241042
 ] 

Paulo Motta commented on CASSANDRA-11522:
-

Yes, since this an improvement it normally goes in trunk.

> batch_size_fail_threshold_in_kb shouldn't only apply to batch
> -
>
> Key: CASSANDRA-11522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11522
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: lhf
>
> I can buy that C* is not good at dealing with large (in bytes) inserts and 
> that it makes sense to provide a user configurable protection against inserts 
> larger than a certain size, but it doesn't make sense to limit this to 
> batches. It's absolutely possible to insert a single very large row and 
> internally a batch with a single statement is exactly the same than a single 
> similar insert, so rejecting the former and not the later is confusing and 
> well, wrong.
> Note that I get that batches are more likely to get big and that's where the 
> protection is most often useful, but limiting the option to batch is still 
> less useful (it's a hole in the protection) and it's going to confuse users 
> in thinking that batches to a single partition are different from single 
> inserts.
> Of course that also mean that we should rename that option to 
> {{write_size_fail_threshold_in_kb}}. Which means we probably want to add this 
> new option and just deprecate {{batch_size_fail_threshold_in_kb}} for now 
> (with removal in 4.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11570) Concurrent execution of prepared statement returns invalid JSON as result

2016-04-14 Thread Alexander Ryabets (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Ryabets updated CASSANDRA-11570:
--
Description: 
When I use prepared statement for async execution of multiple statements I get 
JSON with broken data. Keys got totally corrupted when values seems to be 
normal though.

First I encoutered this issue when I were performing stress testing of our 
project using custom script. We are using DataStax C++ driver and execute 
statements from different fibers.

Then I was trying to isolate problem and wrote simple C# program which starts 
multiple Tasks in a loop. Each task uses the once created prepared statement to 
read data from the base. As you can see results are totally mess.

I 've attached archive with console C# project (1 cs file) which just print 
resulting JSON to user. 
Here is the main part of C# code.

{noformat}
static void Main(string[] args)
{
  const int task_count = 300;

  using(var cluster = Cluster.Builder().AddContactPoints(/*contact points 
here*/).Build())
  {
using(var session = cluster.Connect())
{
  var prepared = session.Prepare("select json * from test_neptunao.ubuntu 
where id=?");
  var tasks = new Task[task_count];
  for(int i = 0; i < task_count; i++)
  {
tasks[i] = Query(prepared, session);
  }
  Task.WaitAll(tasks);
}
  }
  Console.ReadKey();
}

private static Task Query(PreparedStatement prepared, ISession session)
{
  string id = GetIdOfRandomRow();
  var stmt = prepared.Bind(id);
  stmt.SetConsistencyLevel(ConsistencyLevel.One);
  return session.ExecuteAsync(stmt).ContinueWith(tr =>
  {
foreach(var row in tr.Result)
{
  var value = row.GetValue(0);
  //some kind of output
}
  });
}
{noformat}

I also attached cql script with test DB schema.

{noformat}
CREATE KEYSPACE IF NOT EXISTS test_neptunao
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 3
};

use test_neptunao;

create table if not exists ubuntu (
id timeuuid PRIMARY KEY,
precise_pangolin text,
trusty_tahr text,
wily_werewolf text, 
vivid_vervet text,
saucy_salamander text,
lucid_lynx text
);
{noformat}

  was:
When I use prepared statement for async execution of multiple statements I get 
JSON with broken data. Keys got totally corrupted when values seems to be 
normal though.

First I encoutered this issue when I were performing stress testing of our 
project using custom script. We are using DataStax C++ driver and execute 
statements from different fibers.

Then I was trying to isolate problem and wrote simple C# program which starts 
multiple Tasks in a loop. Each task uses the once created prepared statement to 
read data from the base. As you can see results are totally mess.

I 've attached archive with console C# project (1 cs file) which just print 
resulting JSON to user. 
Here is the main part of C# code.

{noformat}
static void Main(string[] args)
{
  const int task_count = 300;

  using(var cluster = 
Cluster.Builder().AddContactPoints("127.0.0.1").Build())
  {
using(var session = cluster.Connect())
{
  var prepared = session.Prepare("select json * from 
test_neptunao.ubuntu");
  var tasks = new Task[task_count];
  for(int i = 0; i < task_count; i++)
  {
tasks[i] = Query(prepared, session);
  }
  Task.WaitAll(tasks);
}
  }
  Console.ReadKey();
}

private static Task Query(PreparedStatement prepared, ISession session)
{
  var stmt = prepared.Bind();
  stmt.SetConsistencyLevel(ConsistencyLevel.One);
  return session.ExecuteAsync(stmt).ContinueWith(tr =>
  {
foreach(var row in tr.Result)
{
  var value = row.GetValue(0);
  Console.WriteLine(value);
}
  });
}
{noformat}

I also attached cql script with test DB schema.

{noformat}
CREATE KEYSPACE IF NOT EXISTS test_neptunao
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 3
};

use test_neptunao;

create table if not exists ubuntu (
id timeuuid PRIMARY KEY,
precise_pangolin text,
trusty_tahr text,
wily_werewolf text, 
vivid_vervet text,
saucy_salamander text,
lucid_lynx text
);
{noformat}


> Concurrent execution of prepared statement returns invalid JSON as result
> -
>
> Key: CASSANDRA-11570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11570
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.2, C++ or C# driver
>Reporter: Alexander Ryabets
> Attachments: CassandraPreparedStatementsTest.zip, broken_output.txt

[jira] [Commented] (CASSANDRA-10853) deb package migration to dh_python2

2016-04-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241099#comment-15241099
 ] 

Igor Galić commented on CASSANDRA-10853:


ansible has this already fixed with {{dh-python | python-support}} in their 
{{Depends}} and {{Build-Depends}} 
https://github.com/ansible/ansible/pull/15031/files

> deb package migration to dh_python2
> ---
>
> Key: CASSANDRA-10853
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10853
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Michael Shuler
>Assignee: Michael Shuler
> Fix For: 3.0.x, 3.x
>
>
> I'm working on a deb job in jenkins, and I had forgotten to open a bug for 
> this. There is no urgent need, since {{python-support}} is in Jessie, but 
> this package is currently in transition to be removed.
> http://deb.li/dhs2p
> During deb build:
> {noformat}
> dh_pysupport: This program is deprecated, you should use dh_python2 instead. 
> Migration guide: http://deb.li/dhs2p
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11522) batch_size_fail_threshold_in_kb shouldn't only apply to batch

2016-04-14 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241105#comment-15241105
 ] 

Paulo Motta commented on CASSANDRA-11522:
-

I just noticed that CASSANDRA-10876 effectively removed this protection for 
single partition batches, given they do not have the same concerns as 
multi-partition batches (as discussed on CASSANDRA-8011). So I'm not sure we 
should introduce this limitation to single inserts.

> batch_size_fail_threshold_in_kb shouldn't only apply to batch
> -
>
> Key: CASSANDRA-11522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11522
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: lhf
>
> I can buy that C* is not good at dealing with large (in bytes) inserts and 
> that it makes sense to provide a user configurable protection against inserts 
> larger than a certain size, but it doesn't make sense to limit this to 
> batches. It's absolutely possible to insert a single very large row and 
> internally a batch with a single statement is exactly the same than a single 
> similar insert, so rejecting the former and not the later is confusing and 
> well, wrong.
> Note that I get that batches are more likely to get big and that's where the 
> protection is most often useful, but limiting the option to batch is still 
> less useful (it's a hole in the protection) and it's going to confuse users 
> in thinking that batches to a single partition are different from single 
> inserts.
> Of course that also mean that we should rename that option to 
> {{write_size_fail_threshold_in_kb}}. Which means we probably want to add this 
> new option and just deprecate {{batch_size_fail_threshold_in_kb}} for now 
> (with removal in 4.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11474) cqlsh: COPY FROM should use regular inserts for single statement batches

2016-04-14 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241121#comment-15241121
 ] 

Paulo Motta commented on CASSANDRA-11474:
-

LGTM, but given that CASSANDRA-10876 removed this limitation for single 
partition batches on trunk, special casing single mutation batches on COPY FROM 
doesn't bring us much additional benefits, so I think we should include this 
only on 2.2 and 3.0 for code simplicity.

Could you provide a trunk patch without the single insert optimization, but 
only with the error report and empty chunk fixes?

> cqlsh: COPY FROM should use regular inserts for single statement batches
> 
>
> Key: CASSANDRA-11474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11474
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Stefania
>Assignee: Stefania
>Priority: Minor
>  Labels: lhf
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> I haven't reproduced it with a test yet but, from code inspection, if CQL 
> rows are larger than {{batch_size_fail_threshold_in_kb}} and this parameter 
> cannot be changed, then data import will fail.
> Users can control the batch size by setting MAXBATCHSIZE.
> If a batch contains a single statement, there is no need to use a batch and 
> we should use normal inserts instead or, alternatively, we should skip the 
> batch size check for unlogged batches with only one statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11522) batch_size_fail_threshold_in_kb shouldn't only apply to batch

2016-04-14 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241123#comment-15241123
 ] 

Paulo Motta commented on CASSANDRA-11522:
-

Perhaps we should rename the properties to 
{{multi_partition_batch_size_warn_threshold}} and 
{{multi_partition_batch_size_fail_threshold}} to avoid confusion ?

> batch_size_fail_threshold_in_kb shouldn't only apply to batch
> -
>
> Key: CASSANDRA-11522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11522
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: lhf
>
> I can buy that C* is not good at dealing with large (in bytes) inserts and 
> that it makes sense to provide a user configurable protection against inserts 
> larger than a certain size, but it doesn't make sense to limit this to 
> batches. It's absolutely possible to insert a single very large row and 
> internally a batch with a single statement is exactly the same than a single 
> similar insert, so rejecting the former and not the later is confusing and 
> well, wrong.
> Note that I get that batches are more likely to get big and that's where the 
> protection is most often useful, but limiting the option to batch is still 
> less useful (it's a hole in the protection) and it's going to confuse users 
> in thinking that batches to a single partition are different from single 
> inserts.
> Of course that also mean that we should rename that option to 
> {{write_size_fail_threshold_in_kb}}. Which means we probably want to add this 
> new option and just deprecate {{batch_size_fail_threshold_in_kb}} for now 
> (with removal in 4.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241167#comment-15241167
 ] 

T Jake Luciani commented on CASSANDRA-9625:
---

[~ruoranwang] thanks for all the info.  This seems to be some kind of 
compaction bug which is affecting the graphite reporter.  Can you reproduce 
this? How are you calling repair?

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11258) Repair scheduling - Resource locking API

2016-04-14 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-11258:

Reviewer: Paulo Motta

> Repair scheduling - Resource locking API
> 
>
> Key: CASSANDRA-11258
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11258
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
>Priority: Minor
>
> Create a resource locking API & implementation that is able to lock a 
> resource in a specified data center. It should handle priorities to avoid 
> node starvation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9625) GraphiteReporter not reporting

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241167#comment-15241167
 ] 

T Jake Luciani edited comment on CASSANDRA-9625 at 4/14/16 1:41 PM:


[~ruoranwang] thanks for all the info.  This seems to be some kind of 
compaction bug which is affecting the graphite reporter.  Can you reproduce 
this? How are you calling repair? Can you attach logs from the node where this 
happens.


was (Author: tjake):
[~ruoranwang] thanks for all the info.  This seems to be some kind of 
compaction bug which is affecting the graphite reporter.  Can you reproduce 
this? How are you calling repair?

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11572) SStableloader does not stream data if the Cassandra table was altered to drop some column

2016-04-14 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita resolved CASSANDRA-11572.

Resolution: Duplicate

This should be fixed in 2.1.13 by CASSANDRA-10700.

> SStableloader does not stream data if the Cassandra table was altered to drop 
> some column
> -
>
> Key: CASSANDRA-11572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: manuj singh
>
> Sstabble loader stops working whenever the cassandra table is altered to drop 
> some column. 
> the following error shows:
> Error:
> Could not retrieve endpoint ranges:
> java.lang.IllegalArgumentException
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
> Caused by: java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:275)
> at 
> org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
> at 
> org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
> at 
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
> at 
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
> at 
> org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
> at 
> org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
> at 
> org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
> at 
> org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
> ... 2 more
> The only solution is then to drop the table and create it again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11562) "Could not retrieve endpoint ranges" for sstableloader

2016-04-14 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241177#comment-15241177
 ] 

Yuki Morishita commented on CASSANDRA-11562:


Can you try sstableloader from 2.1.13?
This should be fixed by CASSANDRA-10700.

> "Could not retrieve endpoint ranges" for sstableloader
> --
>
> Key: CASSANDRA-11562
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11562
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: $ uname -a
> Linux bigdb-100 3.2.0-99-virtual #139-Ubuntu SMP Mon Feb 1 23:52:21 UTC 2016 
> x86_64 x86_64 x86_64 GNU/Linux
> I am using Datastax Enterprise 4.7.5-1 which is based on 2.1.11.
>Reporter: Jens Rantil
>
> I am setting up a second datacenter and have a very slow and shaky VPN 
> connection to my old datacenter. To speed up import process I am trying to 
> seed the new datacenter with a backup (that has been transferred encrypted 
> out of bands from the VPN). When this is done I will issue a final 
> clusterwide repair.
> However...sstableloader crashes with the following:
> {noformat}
> sstableloader -v --nodes XXX --username MYUSERNAME --password MYPASSWORD 
> --ignore YYY,ZZZ ./backupdir/MYKEYSPACE/MYTABLE/
> Could not retrieve endpoint ranges:
> java.lang.IllegalArgumentException
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
> Caused by: java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:267)
> at 
> org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
> at 
> org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
> at 
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
> at 
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
> at 
> org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
> at 
> org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
> at 
> org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
> at 
> org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
> at 
> org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
> ... 2 more
> {noformat}
> (where YYY,ZZZ are nodes in the old DC)
> The files in ./backupdir/MYKEYSPACE/MYTABLE/ are an exact copy of a snapshot 
> from the older datacenter that has been taken with the exact same version of 
> Datastax Enterprise/Cassandra. The backup was taken 2-3 days ago.
> Question: ./backupdir/MYKEYSPACE/MYTABLE/ contains the non-"*.db" file  
> "manifest.json". Is that an issue?
> My workaround for my quest will probably be to copy the snapshot directories 
> out to the nodes of the new datacenter and do a DC-local repair+cleanup.
> Let me know if I can assist in debugging this further.
> References:
>  * This _might_ be a duplicate of 
> https://issues.apache.org/jira/browse/CASSANDRA-10629.
>  * http://stackoverflow.com/q/34757922/260805. 
> http://stackoverflow.com/a/35213418/260805 claims this could happen when 
> dropping a column, but don't think I've dropped any column for this column 
> ever.
>  * http://stackoverflow.com/q/28632555/260805
>  * http://stackoverflow.com/q/34487567/260805



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-14 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241183#comment-15241183
 ] 

Branimir Lambov commented on CASSANDRA-8844:


First round of comments (I haven't looked at the read/replay part yet):

- I was a fan of the {{ReplayPosition}} name. It stands for a more general 
concept which happens be the commit log position for us. Further to this, it 
should be a {{CommitLogPosition}} rather than {{..SegmentPosition}} as it does 
not just specify a position within a given segment but an overall position in 
the log (for a specific keyspace). I am also wondering if it should not include 
a keyspace id / reference now that it is keyspace-specific to be able to fail 
fast on mismatch.
- I'd prefer to throw the {{WriteTimeoutException}} directly from {{allocate}} 
(instead of catching null in {{CommitLog}} and doing the same). Doing the check 
inside the {{while}} loop will avoid the over-allocation and do less work in 
the common case.
- Do we really need to have separate buffer pools per manager? Static (or not) 
shared will offer slightly better cache locality, and it's better to block both 
commit logs if we're running beyond allowed memory (we may want to double the 
default limit).
- [{{segmentManagers}} 
array|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8844_review#diff-05c1e4fd86fea19b8e0552b1f289be85R119]:
 An {{EnumMap}} (which boils down to the same thing) would be cleaner and 
should not have any performance impact.
- 
[{{shutdownBlocking}}|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8844_review#diff-05c1e4fd86fea19b8e0552b1f289be85R465]:
 Better shutdown in parallel, i.e. initiate and await termination separately.
- [{{reCalculating}} cas in 
{{maybeUpdateCDCSizeCounterAsync}}|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8844_review#diff-878dc31866184d5ef750ccd9befc8382R72]
 is fishy: makes you think it would clear on exception in running update, which 
isn't the case. The {{updateCDCDirectorySize}} body should be wrapped in {{try 
... finally}} as well to do that. 
- You could use a scheduled executor to avoid the explicit delays. Or a 
{{RateLimiter}} (we'd prefer to update ASAP when triggered, but not too often) 
instead of the delay.
- 
[{{updateCDCOverflowSize}}|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8844_review#diff-878dc31866184d5ef750ccd9befc8382R227]:
 use {{while (!reCalculating.compareAndSet(false, true)) {};}}. You should 
reset the value afterwards.
- I don't get the {{DirectorySizeCalculator}}. Why the {{alive}} and 
{{visited}} sets, the {{listFiles}} step? Either list the files and just loop 
through them, or do the {{walkFileTree}} operation -- you are now doing the 
same work twice. Use a plain long instead of the atomic as the class is still 
thread-unsafe.
- {{CDCSizeCalculator.calculateSize}} should return the size, and maybe made 
synchronized for a bit of additional safety.
- [Scrubber 
change|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8844_review#diff-30afe7671ae9073cb81bb7c364d37f3fR327]
 should be reverted.
- "Permissible" changed to "permissable" at some places in the code; the latter 
is a misspelling.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To 

[jira] [Commented] (CASSANDRA-11264) Repair scheduling - Failure handling and retry

2016-04-14 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241220#comment-15241220
 ] 

Paulo Motta commented on CASSANDRA-11264:
-

After having a look at your original patch I saw that a failed task will be 
re-prioritized against other scheduled jobs/tasks with a high priority (given 
its last run time will not be updated), so that's already a retry mechanism in 
itself.

Rather than cluttering the scheduled repair mechanism with retry logic, I think 
that it's better to add a retry option to (non-scheduled) repair job, and do 
more fine grained retry on individual steps such as validation and sync, since 
this will be more effective against transient failures rather than retrying the 
whole task and potentially losing work of non-failed tasks.

We can of course log warns and gather statistics when a scheduled task fails, 
but I think we should add retry support to repair independently of this. WDYT?

> Repair scheduling - Failure handling and retry
> --
>
> Key: CASSANDRA-11264
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11264
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
>Priority: Minor
>
> Make it possible for repairs to be run again if they fail and clean up the 
> associated resources (validations and streaming sessions) before retrying. 
> Log a warning for each re-attempt and an error if it can't complete in X 
> times. The number of retries before considering the repair a failure could be 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: upgrade netty to 4.0.36

2016-04-14 Thread jake
Repository: cassandra
Updated Branches:
  refs/heads/trunk ccacf7d1a -> a0d070764


upgrade netty to 4.0.36

patch by tjake; reviewed by Jason Brown for CASSANDRA-11567


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a0d07076
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a0d07076
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a0d07076

Branch: refs/heads/trunk
Commit: a0d070764ab9cf0a1eb16d7ffd7d57cbcefd2a82
Parents: ccacf7d
Author: T Jake Luciani 
Authored: Wed Apr 13 13:49:53 2016 -0400
Committer: T Jake Luciani 
Committed: Thu Apr 14 10:31:53 2016 -0400

--
 CHANGES.txt|   1 +
 build.xml  |   2 +-
 lib/netty-all-4.0.34.Final.jar | Bin 2144516 -> 0 bytes
 lib/netty-all-4.0.36.Final.jar | Bin 0 -> 2195921 bytes
 4 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a0d07076/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 329e55c..43d1c3c 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.6
+ * Update Netty to 4.0.36 (CASSANDRA-11567)
  * Fix PER PARTITION LIMIT for queries requiring post-query ordering 
(CASSANDRA-11556)
  * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
  * Support UDT in CQLSSTableWriter (CASSANDRA-10624)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a0d07076/build.xml
--
diff --git a/build.xml b/build.xml
index c6b2246..034fb29 100644
--- a/build.xml
+++ b/build.xml
@@ -411,7 +411,7 @@
   
   
   
-  
+  
   
   
   

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a0d07076/lib/netty-all-4.0.34.Final.jar
--
diff --git a/lib/netty-all-4.0.34.Final.jar b/lib/netty-all-4.0.34.Final.jar
deleted file mode 100644
index 590b429..000
Binary files a/lib/netty-all-4.0.34.Final.jar and /dev/null differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a0d07076/lib/netty-all-4.0.36.Final.jar
--
diff --git a/lib/netty-all-4.0.36.Final.jar b/lib/netty-all-4.0.36.Final.jar
new file mode 100644
index 000..5e278c4
Binary files /dev/null and b/lib/netty-all-4.0.36.Final.jar differ



[jira] [Created] (CASSANDRA-11574) COPY FROM command in cqlsh throws error

2016-04-14 Thread Mahafuzur Rahman (JIRA)
Mahafuzur Rahman created CASSANDRA-11574:


 Summary: COPY FROM command in cqlsh throws error
 Key: CASSANDRA-11574
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11574
 Project: Cassandra
  Issue Type: Bug
  Components: CQL
 Environment: Operating System: Ubuntu Server 14.04
JDK: Oracle JDK 8 update 77
Python: 2.7.6
Reporter: Mahafuzur Rahman
 Fix For: 3.0.6


Any COPY FROM command in cqlsh is throwing the following error:

"get_num_processes() takes no keyword arguments"

Example command: 

COPY inboxdata 
(to_user_id,to_user_network,created_time,attachments,from_user_id,from_user_name,from_user_network,id,message,to_user_name,updated_time)
 FROM 'inbox.csv';





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11567) Update netty version

2016-04-14 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani resolved CASSANDRA-11567.

Resolution: Fixed
  Reviewer: Jason Brown

committed {{a0d070764ab9cf0a1eb16d7ffd7d57cbcefd2a82}}

> Update netty version
> 
>
> Key: CASSANDRA-11567
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11567
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
> Fix For: 3.6
>
>
> Mainly for prereq to CASSANDRA-11421. 
> Netty 4.0.34 -> 4.0.36.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11574) COPY FROM command in cqlsh throws error

2016-04-14 Thread Mahafuzur Rahman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahafuzur Rahman updated CASSANDRA-11574:
-
Description: 
Any COPY FROM command in cqlsh is throwing the following error:

"get_num_processes() takes no keyword arguments"

Example command: 

COPY inboxdata 
(to_user_id,to_user_network,created_time,attachments,from_user_id,from_user_name,from_user_network,id,message,to_user_name,updated_time)
 FROM 'inbox.csv';

Similar commands worked parfectly in the previous versions such as 3.0.4

  was:
Any COPY FROM command in cqlsh is throwing the following error:

"get_num_processes() takes no keyword arguments"

Example command: 

COPY inboxdata 
(to_user_id,to_user_network,created_time,attachments,from_user_id,from_user_name,from_user_network,id,message,to_user_name,updated_time)
 FROM 'inbox.csv';




> COPY FROM command in cqlsh throws error
> ---
>
> Key: CASSANDRA-11574
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11574
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Operating System: Ubuntu Server 14.04
> JDK: Oracle JDK 8 update 77
> Python: 2.7.6
>Reporter: Mahafuzur Rahman
> Fix For: 3.0.6
>
>
> Any COPY FROM command in cqlsh is throwing the following error:
> "get_num_processes() takes no keyword arguments"
> Example command: 
> COPY inboxdata 
> (to_user_id,to_user_network,created_time,attachments,from_user_id,from_user_name,from_user_network,id,message,to_user_name,updated_time)
>  FROM 'inbox.csv';
> Similar commands worked parfectly in the previous versions such as 3.0.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Integrated JMX authn & authz

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241276#comment-15241276
 ] 

T Jake Luciani commented on CASSANDRA-10091:


Sorry didn't follow up. +1 assuming CI looks good

> Integrated JMX authn & authz
> 
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10988) isInclusive and boundsAsComposites take bounds in different order

2016-04-14 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-10988:

Summary: isInclusive and boundsAsComposites take bounds in different order  
(was: ClassCastException in SelectStatement)

> isInclusive and boundsAsComposites take bounds in different order
> -
>
> Key: CASSANDRA-10988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10988
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Vassil Hristov
>Assignee: Alex Petrov
> Fix For: 2.2.x
>
>
> After we've upgraded our cluster to version 2.1.11, we started getting the 
> bellow exceptions for some of our queries. Issue seems to be very similar to 
> CASSANDRA-7284.
> Code to reproduce:
> {code:java}
> createTable("CREATE TABLE %s (" +
> "a text," +
> "b int," +
> "PRIMARY KEY (a, b)" +
> ") WITH COMPACT STORAGE" +
> "AND CLUSTERING ORDER BY (b DESC)");
> execute("insert into %s (a, b) values ('a', 2)");
> execute("SELECT * FROM %s WHERE a = 'a' AND b > 0");
> {code}
> {code:java}
> java.lang.ClassCastException: 
> org.apache.cassandra.db.composites.Composites$EmptyComposite cannot be cast 
> to org.apache.cassandra.db.composites.CellName
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.cellFromByteBuffer(AbstractCellNameType.java:188)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractSimpleCellNameType.makeCellName(AbstractSimpleCellNameType.java:125)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.makeCellName(AbstractCellNameType.java:254)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.makeExclusiveSliceBound(SelectStatement.java:1197)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.applySliceRestriction(SelectStatement.java:1205)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1283)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1250)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:276)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.1

[jira] [Updated] (CASSANDRA-10988) isInclusive and boundsAsComposites in Restriction take bounds in different order

2016-04-14 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-10988:

Summary: isInclusive and boundsAsComposites in Restriction take bounds in 
different order  (was: isInclusive and boundsAsComposites take bounds in 
different order)

> isInclusive and boundsAsComposites in Restriction take bounds in different 
> order
> 
>
> Key: CASSANDRA-10988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10988
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Vassil Hristov
>Assignee: Alex Petrov
> Fix For: 2.2.x
>
>
> After we've upgraded our cluster to version 2.1.11, we started getting the 
> bellow exceptions for some of our queries. Issue seems to be very similar to 
> CASSANDRA-7284.
> Code to reproduce:
> {code:java}
> createTable("CREATE TABLE %s (" +
> "a text," +
> "b int," +
> "PRIMARY KEY (a, b)" +
> ") WITH COMPACT STORAGE" +
> "AND CLUSTERING ORDER BY (b DESC)");
> execute("insert into %s (a, b) values ('a', 2)");
> execute("SELECT * FROM %s WHERE a = 'a' AND b > 0");
> {code}
> {code:java}
> java.lang.ClassCastException: 
> org.apache.cassandra.db.composites.Composites$EmptyComposite cannot be cast 
> to org.apache.cassandra.db.composites.CellName
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.cellFromByteBuffer(AbstractCellNameType.java:188)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractSimpleCellNameType.makeCellName(AbstractSimpleCellNameType.java:125)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.makeCellName(AbstractCellNameType.java:254)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.makeExclusiveSliceBound(SelectStatement.java:1197)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.applySliceRestriction(SelectStatement.java:1205)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1283)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1250)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:276)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at org.apache.cass

[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241420#comment-15241420
 ] 

Ben Manes commented on CASSANDRA-11452:
---

For large traces the difference is marginal, with s3 showing a 2% loss. For 
small traces the difference can be substantial

db: 51.29 -> 51.52
s3: 51.10 -> 49.12
oltp: 37.91 -> 38.10
multi1: 55.59 -> 50.50
gli: 34.16 -> 16.11
cs: 30.31 -> 26.74

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11428) Eliminate Allocations

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241430#comment-15241430
 ] 

T Jake Luciani commented on CASSANDRA-11428:


Combined the sub tickets into one patch and removed the copied netty util now 
that CASSANDRA-11567 is in. I also changed the ThreadLocals in CBUtils to be 
FastThreadLocals since they are accessed from netty FastThreadLocalThreads, I 
can see a slight improvement in performance.

[branch| http://github.com/tjake/cassandra/tree/rm-allocations]
[testall| 
https://cassci.datastax.com/view/trunk/job/tjake-rm-allocations-testall]
[dtest| https://cassci.datastax.com/view/trunk/job/tjake-rm-allocations-dtest]

> Eliminate Allocations
> -
>
> Key: CASSANDRA-11428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11428
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Nitsan Wakart
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: benchmarks.tar.gz, pom.xml
>
>
> Linking relevant issues under this master ticket.  For small changes I'd like 
> to test and commit these in bulk 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11428) Eliminate Allocations

2016-04-14 Thread Nitsan Wakart (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241440#comment-15241440
 ] 

Nitsan Wakart commented on CASSANDRA-11428:
---

You seem to have dropped the Pair allocation change?
Also, in CBUtil if we take the netty approach the TL ByteBuffer and encoder are 
not needed.

> Eliminate Allocations
> -
>
> Key: CASSANDRA-11428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11428
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Nitsan Wakart
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: benchmarks.tar.gz, pom.xml
>
>
> Linking relevant issues under this master ticket.  For small changes I'd like 
> to test and commit these in bulk 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11428) Eliminate Allocations

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241443#comment-15241443
 ] 

T Jake Luciani commented on CASSANDRA-11428:


Ah thanks for that, I force pushed with those changes and will re-start tests

> Eliminate Allocations
> -
>
> Key: CASSANDRA-11428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11428
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Nitsan Wakart
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: benchmarks.tar.gz, pom.xml
>
>
> Linking relevant issues under this master ticket.  For small changes I'd like 
> to test and commit these in bulk 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11560) dtest failure in user_types_test.TestUserTypes.udt_subfield_test

2016-04-14 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs reassigned CASSANDRA-11560:
---

Assignee: Tyler Hobbs  (was: DS Test Eng)

> dtest failure in user_types_test.TestUserTypes.udt_subfield_test
> 
>
> Key: CASSANDRA-11560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11560
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: Tyler Hobbs
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1125/testReport/user_types_test/TestUserTypes/udt_subfield_test
> Failed on CassCI build trunk_dtest #1125
> Appears to be a test problem:
> {noformat}
> Error Message
> 'NoneType' object is not iterable
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-Kzg9Sk
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/tools.py", line 253, in wrapped
> f(obj)
>   File "/home/automaton/cassandra-dtest/user_types_test.py", line 767, in 
> udt_subfield_test
> self.assertEqual(listify(rows[0]), [[None]])
>   File "/home/automaton/cassandra-dtest/user_types_test.py", line 25, in 
> listify
> for i in item:
> "'NoneType' object is not iterable\n >> begin captured 
> logging << \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-Kzg9Sk\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11535) Add dtests for PER PARTITION LIMIT queries with paging

2016-04-14 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241473#comment-15241473
 ] 

Alex Petrov commented on CASSANDRA-11535:
-

Merged as 
[2e8b5f7b80ddf4c59bffb2f259fc992b79287028|https://github.com/riptano/cassandra-dtest/commit/2e8b5f7b80ddf4c59bffb2f259fc992b79287028]

> Add dtests for PER PARTITION LIMIT queries with paging
> --
>
> Key: CASSANDRA-11535
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11535
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Minor
>
> [#7017|https://issues.apache.org/jira/browse/CASSANDRA-7017] introduces {{PER 
> PARTITION LIMIT}} queries. In order to ensure they work with paging, with 
> partitions containing only static columns, we need to add {{dtests}} to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11535) Add dtests for PER PARTITION LIMIT queries with paging

2016-04-14 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-11535:

Status: Ready to Commit  (was: Patch Available)

> Add dtests for PER PARTITION LIMIT queries with paging
> --
>
> Key: CASSANDRA-11535
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11535
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Minor
>
> [#7017|https://issues.apache.org/jira/browse/CASSANDRA-7017] introduces {{PER 
> PARTITION LIMIT}} queries. In order to ensure they work with paging, with 
> partitions containing only static columns, we need to add {{dtests}} to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11535) Add dtests for PER PARTITION LIMIT queries with paging

2016-04-14 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-11535:

Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

> Add dtests for PER PARTITION LIMIT queries with paging
> --
>
> Key: CASSANDRA-11535
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11535
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Minor
>
> [#7017|https://issues.apache.org/jira/browse/CASSANDRA-7017] introduces {{PER 
> PARTITION LIMIT}} queries. In order to ensure they work with paging, with 
> partitions containing only static columns, we need to add {{dtests}} to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11428) Eliminate Allocations

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241487#comment-15241487
 ] 

T Jake Luciani commented on CASSANDRA-11428:


Actually looks like we can remove the changes you made to decodeString since 
CASSANDRA-8101 is no longer needed (fixed in netty 4.0.35)

> Eliminate Allocations
> -
>
> Key: CASSANDRA-11428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11428
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Nitsan Wakart
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: benchmarks.tar.gz, pom.xml
>
>
> Linking relevant issues under this master ticket.  For small changes I'd like 
> to test and commit these in bulk 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11416) No longer able to load backups into new cluster if there was a dropped column

2016-04-14 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241503#comment-15241503
 ] 

Aleksey Yeschenko commented on CASSANDRA-11416:
---

True. I'm looking into options, and not a fan of any of them tbh. The easiest 
would be to include {{ALTER TABLE DROP}} output in {{DESCRIBE}}, and have a 
variant of it that accepts the timestamp of drop.

> No longer able to load backups into new cluster if there was a dropped column
> -
>
> Key: CASSANDRA-11416
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11416
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.x
>
>
> The following change to the sstableloader test works in 2.1/2.2 but fails in 
> 3.0+
> https://github.com/JeremiahDJordan/cassandra-dtest/commit/7dc66efb8d24239f0a488ec5a613240531aeb7db
> {code}
> CREATE TABLE test_drop (key text PRIMARY KEY, c1 text, c2 text, c3 text, c4 
> text)
> ...insert data...
> ALTER TABLE test_drop DROP c4
> ...insert more data...
> {code}
> Make a snapshot and save off a describe to backup table test_drop.
> Decide to restore the snapshot to a new cluster.   First restore the schema 
> from describe. (column c4 isn't there)
> {code}
> CREATE TABLE test_drop (key text PRIMARY KEY, c1 text, c2 text, c3 text)
> {code}
> sstableload the snapshot data.
> Works in 2.1/2.2.  Fails in 3.0+ with:
> {code}
> java.lang.RuntimeException: Unknown column c4 during deserialization
> java.lang.RuntimeException: Failed to list files in 
> /var/folders/t4/rlc2b6450qbg92762l9l4mt8gn/T/dtest-3eKv_g/test/node1/data1_copy/ks/drop_one-bcef5280f11b11e5825a43f0253f18b5
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:544)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:76)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:165)
>   at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:104)
> Caused by: java.lang.RuntimeException: Unknown column c4 during 
> deserialization
>   at 
> org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.openForBatch(SSTableReader.java:430)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.lambda$openSSTables$193(SSTableLoader.java:121)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.lambda$innerList$184(LogAwareFileLister.java:75)
>   at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2965)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:77)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:49)
>   ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-14 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241510#comment-15241510
 ] 

Carl Yeksigian commented on CASSANDRA-8844:
---

While working on figuring out a separate issue related to being over the CDC 
limit, I realized that currently the keyspace could have CDC DCs and have 
{{durable_writes=false}}. This would mean that we would not be writing to the 
CDC logs in all of our DCs. We can either:
# Add the CDC local DC check in {{Mutation#apply()}}, where we currently only 
check whether the keyspace has durable writes
# Validate that CDC isn't used with {{durable_writes=false}} keyspaces

1 seems more in line with CDC - allowing the performance to only affect 
operations in a single datacenter. However, we would also probably have to 
replay the CDC logs on startup even though {{durable_writes=false}}; otherwise 
there would be data in the CDC log that doesn't exist in the cluster.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial t

[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241521#comment-15241521
 ] 

Ben Manes commented on CASSANDRA-11452:
---

I assumed that it would be acceptable to reduce the penalty when a clash was 
detected. The current version ejects the victim so that the candidates flow 
through the probation space. I think that should be similar to your >= 
approach, without reducing the hit rate in the small traces. Can you review the 
[patch|https://github.com/ben-manes/caffeine/commit/22ce6339ec91fd7eadfb462fcb176aac69aeb47f]?

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241524#comment-15241524
 ] 

Ben Manes commented on CASSANDRA-11452:
---

Sorry if I'm being a bit obtuse. If you write a short snippet I can try 
applying that approach.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11264) Repair scheduling - Failure handling and retry

2016-04-14 Thread Marcus Olsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241559#comment-15241559
 ] 

Marcus Olsson commented on CASSANDRA-11264:
---

bq. After having a look at your original patch I saw that a failed task will be 
re-prioritized against other scheduled jobs/tasks with a high priority (given 
its last run time will not be updated), so that's already a retry mechanism in 
itself.
While this is true, I believe that this part should probably be reworked a bit. 
If we have a scenario where one particular job will always fail, we will end up 
in a loop where that job would get retried constantly which leads to starvation 
of other jobs. One option is to keep it simple and only run it once (by 
removing the retry logic) and also add a flag for the job which is used to 
determine when the job is allowed to run again. Something like:
{code}
execute()
{
 runTasks();
 if (allTasksWasSuccessful())
 {
  nextRun = -1
  lastRunTime = now;
 }
 else
 {
  nextRun = now + defaultWaitTime;
 }
}
{code}
Then that flag would be used to avoid prioritizing the failing job against the 
other jobs until the {{defaultWaitTime}} has elapsed. This flag could also work 
nicely with the rejection policies (assuming that they estimate the time until 
the job can actually be run), especially if we would be able to reject repairs 
on a specific table rather than all tables. WDYT?

bq. Rather than cluttering the scheduled repair mechanism with retry logic, I 
think that it's better to add a retry option to (non-scheduled) repair job, and 
do more fine grained retry on individual steps such as validation and sync, 
since this will be more effective against transient failures rather than 
retrying the whole task and potentially losing work of non-failed tasks.
Great idea! If e.g. a validation would fail on one node, would we clean up the 
resources on that node by CASSANDRA-11190 (specifically about cleaning up 
resources, so that we can safely retry it) or would we need a separate way of 
doing that? 

bq. We can of course log warns and gather statistics when a scheduled task 
fails, but I think we should add retry support to repair independently of this. 
WDYT?
Sounds good!

> Repair scheduling - Failure handling and retry
> --
>
> Key: CASSANDRA-11264
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11264
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
>Priority: Minor
>
> Make it possible for repairs to be run again if they fail and clean up the 
> associated resources (validations and streaming sessions) before retrying. 
> Log a warning for each re-attempt and an error if it can't complete in X 
> times. The number of retries before considering the repair a failure could be 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11416) No longer able to load backups into new cluster if there was a dropped column

2016-04-14 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241558#comment-15241558
 ] 

Jeremiah Jordan commented on CASSANDRA-11416:
-

Maybe we should just log a warning/error about the columns instead of throwing 
an exception?  And then ignore them?  Aka assume they are there because someone 
dropped them in a previous life.

> No longer able to load backups into new cluster if there was a dropped column
> -
>
> Key: CASSANDRA-11416
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11416
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeremiah Jordan
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.x
>
>
> The following change to the sstableloader test works in 2.1/2.2 but fails in 
> 3.0+
> https://github.com/JeremiahDJordan/cassandra-dtest/commit/7dc66efb8d24239f0a488ec5a613240531aeb7db
> {code}
> CREATE TABLE test_drop (key text PRIMARY KEY, c1 text, c2 text, c3 text, c4 
> text)
> ...insert data...
> ALTER TABLE test_drop DROP c4
> ...insert more data...
> {code}
> Make a snapshot and save off a describe to backup table test_drop.
> Decide to restore the snapshot to a new cluster.   First restore the schema 
> from describe. (column c4 isn't there)
> {code}
> CREATE TABLE test_drop (key text PRIMARY KEY, c1 text, c2 text, c3 text)
> {code}
> sstableload the snapshot data.
> Works in 2.1/2.2.  Fails in 3.0+ with:
> {code}
> java.lang.RuntimeException: Unknown column c4 during deserialization
> java.lang.RuntimeException: Failed to list files in 
> /var/folders/t4/rlc2b6450qbg92762l9l4mt8gn/T/dtest-3eKv_g/test/node1/data1_copy/ks/drop_one-bcef5280f11b11e5825a43f0253f18b5
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:544)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:76)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:165)
>   at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:104)
> Caused by: java.lang.RuntimeException: Unknown column c4 during 
> deserialization
>   at 
> org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.openForBatch(SSTableReader.java:430)
>   at 
> org.apache.cassandra.io.sstable.SSTableLoader.lambda$openSSTables$193(SSTableLoader.java:121)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.lambda$innerList$184(LogAwareFileLister.java:75)
>   at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at 
> java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2965)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:77)
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:49)
>   ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11523) server side exception on secondary index query through thrift

2016-04-14 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-11523:

Status: Patch Available  (was: In Progress)

The problem is that when the indexed column is not covered by the Thrift 
query's {{slice_predicate}} it isn't included in the data retrieved from the 
base table using the key obtained from the index lookup. The NPE is coming from 
the staleness check which expects that data to be present in the base table, it 
only affects thrift queries, not CQL queries against the same table. 

I've pushed branches off 3.0 & trunk with a simple fix to {{KeysSearcher}} 
which essentially mimics the pre-2.2 behaviour, which used {{ExtendedFilter}} 
to add any columns required for the read to the filter before pruning them from 
the results returned to the user. 

[~yngwiie], [this 
patch|https://github.com/beobal/cassandra/commit/517e2e78a618d0e9c6225f9b27ed837450bdcc80.patch]
 applies cleanly to 3.0.4. If possible, would you mind applying it and checking 
that it works for you?

Pull request to add a new dtest for this issue is 
[here|https://github.com/riptano/cassandra-dtest/pull/926]


||branch||testall||dtest||
|[11523-3.0|https://github.com/beobal/cassandra/tree/11523-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11523-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11523-3.0-dtest]|
|[11523-trunk|https://github.com/beobal/cassandra/tree/11523-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11523-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11523-trunk-dtest]|


> server side exception on secondary index query through thrift
> -
>
> Key: CASSANDRA-11523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: linux opensuse 13.2, jdk8
>Reporter: Ivan Georgiev
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.x, 3.x
>
>
> Trying to upgrade from 2.x to 3.x, using 3.0.4 for the purpose. We are using 
> thrift interface for the time being. Everything works fine except for 
> secondary index queries. 
> When doing a get_range_slices call with row_filter set in the KeyRange we get 
> a server side exception. Here is a trace of the exception:
> INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.401 [Thrift:12] DEBUG 
> o.a.cassandra.service.ReadCallback - Failed; received 0 of 1 responses
> INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.401 [SharedPool-Worker-1] 
> WARN  o.a.c.c.AbstractLocalAwareExecutorService - Uncaught exception on 
> thread Thread[SharedPool-Worker-1,5,main]: {}
> INFO   | jvm 1| 2016/04/07 14:56:35 | java.lang.RuntimeException: 
> java.lang.NullPointerException
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2450)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> INFO   | jvm 1| 2016/04/07 14:56:35 | Caused by: 
> java.lang.NullPointerException: null
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.index.internal.keys.KeysSearcher.filterIfStale(KeysSearcher.java:155)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.index.internal.keys.KeysSearcher.access$300(KeysSearcher.java:36)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.index.internal.keys.KeysSearcher$1.prepareNext(KeysSearcher.java:104)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.index.internal.keys.KeysSearcher$1.hasNext(KeysSearcher.java:70)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> INFO   | jvm 1| 2016/04/07 14:56:35 | at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitio

[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241560#comment-15241560
 ] 

Benedict commented on CASSANDRA-11452:
--

Something like [this|https://github.com/belliottsmith/caffeine/tree/random-hack]

I haven't checked it works as I haven't time to get it all compiling etc, but 
it should clearly demonstrate what I'm talking about.

It looks like Caffeine has changed a great deal since I last looked at it! I'll 
have to have a poke around when I have time.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11422) Eliminate temporary object[] allocations in ColumnDefinition::hashCode

2016-04-14 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-11422:
---
Status: Patch Available  (was: Open)

> Eliminate temporary object[] allocations in ColumnDefinition::hashCode
> --
>
> Key: CASSANDRA-11422
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11422
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
>
> ColumnDefinition::hashCode currently calls Objects.hashCode(Object...)
> This triggers the allocation of a short lived Object[] which is not 
> eliminated by EscapeAnalysis. I have implemented a fix by inlining the 
> hashcode logic and also added a caching hashcode field. This improved 
> performance on the read workload.
> Fix is available here:
> https://github.com/nitsanw/cassandra/tree/objects-hashcode-fix



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11423) Eliminate Pair allocations for default DataType conversions

2016-04-14 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-11423:
---
Status: Patch Available  (was: Open)

> Eliminate Pair allocations for default DataType conversions
> ---
>
> Key: CASSANDRA-11423
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11423
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
>
> The method DataType::fromType returns a Pair. The common path through the 
> method is:
> {
>DataType dt = dataTypeMap.get(type);
>return new Pair(dt, null);
> }
> This results in many redundant allocation and is easy to fix by adding a 
> DataType field to cache this result per DataType and replacing the last line 
> with:
>   return dt.pair;
> see fix:
> https://github.com/nitsanw/cassandra/tree/data-type-dafault-pair



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables

2016-04-14 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-9830:
---
Status: Open  (was: Patch Available)

> Option to disable bloom filter in highest level of LCS sstables
> ---
>
> Key: CASSANDRA-9830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Jonathan Ellis
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: performance
> Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully 
> populated series.  (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually 
> been inserted, the bloom filter on the highest level only helps reject 
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level 
> sstables.  This will dramatically reduce memory usage for LCS and may even 
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11555) Make prepared statement cache size configurable

2016-04-14 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp reassigned CASSANDRA-11555:


Assignee: Robert Stupp

> Make prepared statement cache size configurable
> ---
>
> Key: CASSANDRA-11555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11555
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
>
> The prepared statement caches in {{org.apache.cassandra.cql3.QueryProcessor}} 
> are configured using the formula {{Runtime.getRuntime().maxMemory() / 256}}. 
> Sometimes applications may need more than that. Proposal is to make that 
> value configurable - probably also distinguish thrift and native CQL3 queries 
> (new applications don't need the thrift stuff).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11555) Make prepared statement cache size configurable

2016-04-14 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11555:
-
Status: Patch Available  (was: Open)

[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11555-pstmt-cache-config-trunk]
[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11555-pstmt-cache-config-trunk-testall/lastBuild/]
[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11555-pstmt-cache-config-trunk-dtest/lastBuild/]


> Make prepared statement cache size configurable
> ---
>
> Key: CASSANDRA-11555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11555
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
>
> The prepared statement caches in {{org.apache.cassandra.cql3.QueryProcessor}} 
> are configured using the formula {{Runtime.getRuntime().maxMemory() / 256}}. 
> Sometimes applications may need more than that. Proposal is to make that 
> value configurable - probably also distinguish thrift and native CQL3 queries 
> (new applications don't need the thrift stuff).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11552) Reduce amount of logging calls from ColumnFamilyStore.selectAndReference

2016-04-14 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp reassigned CASSANDRA-11552:


Assignee: Robert Stupp

> Reduce amount of logging calls from ColumnFamilyStore.selectAndReference
> 
>
> Key: CASSANDRA-11552
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11552
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>
> {{org.apache.cassandra.db.ColumnFamilyStore#selectAndReference}} logs two 
> messages at _info_ level "as fast as it can" if it waits for more than 100ms.
> The following code is executed in a while-true fashion in this case:
> {code}
> logger.info("Spinning trying to capture released readers {}", 
> released);
> logger.info("Spinning trying to capture all readers {}", 
> view.sstables);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11552) Reduce amount of logging calls from ColumnFamilyStore.selectAndReference

2016-04-14 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11552:
-
Status: Patch Available  (was: Open)

Changed the code to use {{NoSpamLogger}}. Patch is against 2.1 and merges 
unanimously to trunk.
Normally this piece of code is completely irrelevant, but if referencing the 
sstables fails, it will log a huge number of MB per second effectively rotating 
out log files with the messages of the root cause.

||branch||testall||dtest
|[2.1|https://github.com/apache/cassandra/compare/cassandra-2.1...snazy:11552-selectAndRef-spin-spam-2.1?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-2.1-testall/lastBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-2.1-dtest/lastBuild/]
|[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...snazy:11552-selectAndRef-spin-spam-2.2?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-2.2-testall/lastBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-2.2-dtest/lastBuild/]
|[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:11552-selectAndRef-spin-spam-3.0?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-3.0-testall/lastBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-3.0-dtest/lastBuild/]
|[trunk|https://github.com/apache/cassandra/compare/trunk...snazy:11552-selectAndRef-spin-spam-trunk?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-trunk-testall/lastBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11552-selectAndRef-spin-spam-trunk-dtest/lastBuild/]


> Reduce amount of logging calls from ColumnFamilyStore.selectAndReference
> 
>
> Key: CASSANDRA-11552
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11552
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>
> {{org.apache.cassandra.db.ColumnFamilyStore#selectAndReference}} logs two 
> messages at _info_ level "as fast as it can" if it waits for more than 100ms.
> The following code is executed in a while-true fashion in this case:
> {code}
> logger.info("Spinning trying to capture released readers {}", 
> released);
> logger.info("Spinning trying to capture all readers {}", 
> view.sstables);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7186) alter table add column not always propogating

2016-04-14 Thread Uttam Phalnikar (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241725#comment-15241725
 ] 

Uttam Phalnikar commented on CASSANDRA-7186:


We are experiencing similar issue intermittently. It usually happens when table 
has some data (m+ records)

Steps to reproduce:
- Alter table add column
- nodetool describecluster to verify the nodes are in sync
- desc table from any node to verify column is added to the table
- select * from table limit 1 doesn't show the column
- insert into table (id, )values('some-id',  alter table add column not always propogating
> -
>
> Key: CASSANDRA-7186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7186
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Martin Meyer
>Assignee: Philip Thompson
> Fix For: 2.0.12
>
>
> I've been many times in Cassandra 2.0.6 that adding columns to existing 
> tables seems to not fully propagate to our entire cluster. We add an extra 
> column to various tables maybe 0-2 times a week, and so far many of these 
> ALTERs have resulted in at least one node showing the old table description a 
> pretty long time (~30 mins) after the original ALTER command was issued.
> We originally identified this issue when a connected clients would complain 
> that a column it issued a SELECT for wasn't a known column, at which point we 
> have to ask each node to describe the most recently altered table. One of 
> them will not know about the newly added field. Issuing the original ALTER 
> statement on that node makes everything work correctly.
> We have seen this issue on multiple tables (we don't always alter the same 
> one). It has affected various nodes in the cluster (not always the same one 
> is not getting the mutation propagated). No new nodes have been added to the 
> cluster recently. All nodes are homogenous (hardware and software), running 
> 2.0.6. We don't see any particular errors or exceptions on the node that 
> didn't get the schema update, only the later error from a Java client about 
> asking for an unknown column in a SELECT. We have to check each node manually 
> to find the offender. The tables he have seen this on are under fairly heavy 
> read and write load, but we haven't altered any tables that are not, so that 
> might not be important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10547) Updating a CQL List many times creates many tombstones

2016-04-14 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241742#comment-15241742
 ] 

Alex Petrov commented on CASSANDRA-10547:
-

You can upgrade to (at least) 2.1.13, the issue doesn't appear on it anymore. 
I've ran similar tests against 2.1.5 and 2.1.13.

2.1.5:
{code}
Read 1 live and 23 tombstoned cells [SharedPool-Worker-3] | 2016-04-14 
21:05:09.391000 | 127.0.0.1 |
{code}

2.1.13
{code}
Read 1 live and 0 tombstone cells [SharedPool-Worker-3] | 2016-04-14 
21:01:01.666000 | 127.0.0.1 |
{code}

Issue doesn't appear on {3.x} either. 

> Updating a CQL List many times creates many tombstones 
> ---
>
> Key: CASSANDRA-10547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10547
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.9, Java driver 2.1.5
>Reporter: James Bishop
>Assignee: Alex Petrov
> Attachments: tombstone.snippet
>
>
> We encountered a TombstoneOverwhelmingException in cassandra system.log which 
> caused some of our CQL queries to fail.
> We are able to reproduce this issue by updating a CQL List column many times. 
> The number of tombstones created seems to be related to (number of list items 
> * number of list updates). We update the entire list on each update using the 
> java driver. (see attached code for details)
> Running nodetool compact does not help, but nodetool flush does. It appears 
> that the tombstones are being accumulated in memory. 
> For example if we update a list of 100 items 1000 times, this creates more  
> than 100,000 tombstones and exceeds the default tombstone_failure_threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10547) Updating a CQL List many times creates many tombstones

2016-04-14 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241742#comment-15241742
 ] 

Alex Petrov edited comment on CASSANDRA-10547 at 4/14/16 7:08 PM:
--

You can upgrade to (at least) 2.1.13, the issue doesn't appear on it anymore. 
I've ran similar tests against 2.1.5 and 2.1.13.

2.1.5:
{code}
Read 1 live and 23 tombstoned cells [SharedPool-Worker-3] | 2016-04-14 
21:05:09.391000 | 127.0.0.1 |
{code}

2.1.13
{code}
Read 1 live and 0 tombstone cells [SharedPool-Worker-3] | 2016-04-14 
21:01:01.666000 | 127.0.0.1 |
{code}

Issue doesn't appear on {{3.x}} either. 


was (Author: ifesdjeen):
You can upgrade to (at least) 2.1.13, the issue doesn't appear on it anymore. 
I've ran similar tests against 2.1.5 and 2.1.13.

2.1.5:
{code}
Read 1 live and 23 tombstoned cells [SharedPool-Worker-3] | 2016-04-14 
21:05:09.391000 | 127.0.0.1 |
{code}

2.1.13
{code}
Read 1 live and 0 tombstone cells [SharedPool-Worker-3] | 2016-04-14 
21:01:01.666000 | 127.0.0.1 |
{code}

Issue doesn't appear on {3.x} either. 

> Updating a CQL List many times creates many tombstones 
> ---
>
> Key: CASSANDRA-10547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10547
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.9, Java driver 2.1.5
>Reporter: James Bishop
>Assignee: Alex Petrov
> Fix For: 2.1.13
>
> Attachments: tombstone.snippet
>
>
> We encountered a TombstoneOverwhelmingException in cassandra system.log which 
> caused some of our CQL queries to fail.
> We are able to reproduce this issue by updating a CQL List column many times. 
> The number of tombstones created seems to be related to (number of list items 
> * number of list updates). We update the entire list on each update using the 
> java driver. (see attached code for details)
> Running nodetool compact does not help, but nodetool flush does. It appears 
> that the tombstones are being accumulated in memory. 
> For example if we update a list of 100 items 1000 times, this creates more  
> than 100,000 tombstones and exceeds the default tombstone_failure_threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-10547) Updating a CQL List many times creates many tombstones

2016-04-14 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov resolved CASSANDRA-10547.
-
   Resolution: Resolved
Fix Version/s: 2.1.13

> Updating a CQL List many times creates many tombstones 
> ---
>
> Key: CASSANDRA-10547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10547
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.9, Java driver 2.1.5
>Reporter: James Bishop
>Assignee: Alex Petrov
> Fix For: 2.1.13
>
> Attachments: tombstone.snippet
>
>
> We encountered a TombstoneOverwhelmingException in cassandra system.log which 
> caused some of our CQL queries to fail.
> We are able to reproduce this issue by updating a CQL List column many times. 
> The number of tombstones created seems to be related to (number of list items 
> * number of list updates). We update the entire list on each update using the 
> java driver. (see attached code for details)
> Running nodetool compact does not help, but nodetool flush does. It appears 
> that the tombstones are being accumulated in memory. 
> For example if we update a list of 100 items 1000 times, this creates more  
> than 100,000 tombstones and exceeds the default tombstone_failure_threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5977) Structure for cfstats output (JSON, YAML, or XML)

2016-04-14 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241751#comment-15241751
 ] 

Yuki Morishita commented on CASSANDRA-5977:
---

Thanks, and I like the change to snake case.

I uploaded your patch and running tests. If tests are good, I will commit.

||branch||testall||dtest||
|[5977|https://github.com/yukim/cassandra/tree/5977]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-5977-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-5977-dtest/lastCompletedBuild/testReport/]|

I changed the code a bit for styling.
If we can upgrade jackson to 2.0 and have annotations, it will be much cleaner, 
but for now, the patch is sufficient.

> Structure for cfstats output (JSON, YAML, or XML)
> -
>
> Key: CASSANDRA-5977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5977
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Alyssa Kwan
>Assignee: Shogo Hoshii
>Priority: Minor
>  Labels: Tools
> Fix For: 3.x
>
> Attachments: CASSANDRA-5977-trunk.patch, CASSANDRA-5977-trunk.patch, 
> sample_result.zip, sample_result.zip, tablestats_sample_result.json, 
> tablestats_sample_result.txt, tablestats_sample_result.yaml, 
> trunk-tablestats.patch, trunk-tablestats.patch
>
>
> nodetool cfstats should take a --format arg that structures the output in 
> JSON, YAML, or XML.  This would be useful for piping into another script that 
> can easily parse this and act on it.  It would also help those of us who use 
> things like MCollective gather aggregate stats across clusters/nodes.
> Thoughts?  I can submit a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect

2016-04-14 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11553:

Status: Patch Available  (was: Open)

> hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
> ---
>
> Key: CASSANDRA-11553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11553
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Artem Aliev
>Assignee: Artem Aliev
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: CASSANDRA-11553-2.2.txt
>
>
> CASSANDRA-10058 add session and cluster close to all places in hadoop except 
> one place on reconnection.
> The writer uses one connection per new cluster, so I added cluster.close() 
> call to sesseionClose() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11485) ArithmeticException in avgFunctionForDecimal

2016-04-14 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11485:
-
 Assignee: Robert Stupp
Fix Version/s: 3.0.x
   Status: Patch Available  (was: Open)

1 divided by 3 - eh, yea.

I've changed the avg() for decimal to use RoundingMode.HALF_EVEN in the patch.

Since we cannot pass a "parameter" to an aggregation (the API doesn't support 
that at the moment), the way to use another rounding mode would be to implement 
a UDA.


|3.0|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:11485-avg-decimal-round-3.0?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11485-avg-decimal-round-3.0-testall/lastBuild/]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11485-avg-decimal-round-3.0-dtest/lastBuild/]
|trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11485-avg-decimal-round-trunk?expand=1]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11485-avg-decimal-round-trunk-testall/lastBuild/]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11485-avg-decimal-round-trunk-dtest/lastBuild/]


> ArithmeticException in avgFunctionForDecimal
> 
>
> Key: CASSANDRA-11485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11485
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Nico Haller
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.0.x
>
>
> I am running into issues when using avg in queries on decimal values.
> It throws an ArithmeticException in 
> org/apache/cassandra/cql3/functions/AggregateFcts.java (Line 184).
> So whenever an exact representation of the quotient is not possible it will 
> throw that error and it never returns to the querying client.
> I am not so sure if this is intended behavior or a bug, but in my opinion if 
> an exact representation of the value is not possible, it should automatically 
> round the value.
> Specifying a rounding mode when calling the divide function should solve the 
> issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11437) Make number of cores used for copy tasks visible

2016-04-14 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241796#comment-15241796
 ] 

Jim Witschey commented on CASSANDRA-11437:
--

Sorry for the delay, and thank you for the ping. I'm +1, looks great. I ran it 
locally and on a parameterized job here:

http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/67/

> Make number of cores used for copy tasks visible
> 
>
> Key: CASSANDRA-11437
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11437
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jim Witschey
>Assignee: Stefania
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> As per this conversation with [~Stefania]:
> https://github.com/riptano/cassandra-dtest/pull/869#issuecomment-200597829
> we don't currently have a way to verify that the test environment variable 
> {{CQLSH_COPY_TEST_NUM_CORES}} actually affects the behavior of {{COPY}} in 
> the intended way. If this were added, we could make our tests of the one-core 
> edge case a little stricter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241560#comment-15241560
 ] 

Benedict edited comment on CASSANDRA-11452 at 4/14/16 7:57 PM:
---

Something like [this|https://github.com/belliottsmith/caffeine/tree/random-hack]

edit: repushed to fix an obvious bug (still not run though)

I haven't checked it works as I haven't time to get it all compiling etc, but 
it should clearly demonstrate what I'm talking about.

It looks like Caffeine has changed a great deal since I last looked at it! I'll 
have to have a poke around when I have time.


was (Author: benedict):
Something like [this|https://github.com/belliottsmith/caffeine/tree/random-hack]

I haven't checked it works as I haven't time to get it all compiling etc, but 
it should clearly demonstrate what I'm talking about.

It looks like Caffeine has changed a great deal since I last looked at it! I'll 
have to have a poke around when I have time.

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11575) Add out-of-process testing for CDC

2016-04-14 Thread Carl Yeksigian (JIRA)
Carl Yeksigian created CASSANDRA-11575:
--

 Summary: Add out-of-process testing for CDC
 Key: CASSANDRA-11575
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11575
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian


There are currently no dtests for the new cdc feature. We should have some, at 
least to ensure that the cdc files have a lifecycle that makes sense, and make 
sure that things like a continually cleaning daemon and a lazy daemon have the 
properties we expect; for this, we don't need to actually process the files, 
but make sure they fit the characteristics we expect from them. A more complex 
daemon would need to be written in Java.

I already hit a problem where if the cdc is over capacity, the cdc properly 
throws the WTE, but it will not reset after the overflow directory is undersize 
again. It is supposed to correct the size within 250ms and allow more writes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

2016-04-14 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241872#comment-15241872
 ] 

Ben Manes commented on CASSANDRA-11452:
---

Thanks. I won't have the bandwidth to test this until the evening. Roy flew 
into SF for a conference (from Israel) so we're going to meet. If you have any 
questions for me to discuss with him I'll proxy.

A quick glance and your trick has a nice distribution. A 1M iteration into a 
multiset showed,
[0 x 750485, 1 x 186958, 2 x 46910, 3 x 11731, 4 x 2901, 5 x 776, 6 x 171, 7 x 
49, 8 x 15, 9 x 3, 11]

I'd probably jitter when as the selection of the victim near the top of the 
loop and add a check to handle zero weight entries. I'll take care of that part.

It seems like we'd need both your jitter and the hash check added in the prior 
commit. It does sound that the combination would be an effective guard against 
this type of attack. Do you think the random seed used by the sketch is still a 
good addition?

> Cache implementation using LIRS eviction for in-process page cache
> --
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11206) Support large partitions on the 3.0 sstable format

2016-04-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241925#comment-15241925
 ] 

T Jake Luciani commented on CASSANDRA-11206:


* You need to change the version of sstable since this change alters the Index 
component.
* Please run dtests/unit test with column_index_cache_size_in_kb: 0 
* Is the AutoSavingCache change require a step on the users part or will it 
naturally skip the saved cache on startup?
* The 0,1,2 magic bytes that encode what type of index entry this is should be 
made constants

> Support large partitions on the 3.0 sstable format
> --
>
> Key: CASSANDRA-11206
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11206
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Robert Stupp
> Fix For: 3.x
>
> Attachments: 11206-gc.png, trunk-gc.png
>
>
> Cassandra saves a sample of IndexInfo objects that store the offset within 
> each partition of every 64KB (by default) range of rows.  To find a row, we 
> binary search this sample, then scan the partition of the appropriate range.
> The problem is that this scales poorly as partitions grow: on a cache miss, 
> we deserialize the entire set of IndexInfo, which both creates a lot of GC 
> overhead (as noted in CASSANDRA-9754) but is also non-negligible i/o activity 
> (relative to reading a single 64KB row range) as partitions get truly large.
> We introduced an "offset map" in CASSANDRA-10314 that allows us to perform 
> the IndexInfo bsearch while only deserializing IndexInfo that we need to 
> compare against, i.e. log(N) deserializations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11576) Add support for JNA mlockall(2) on POWER

2016-04-14 Thread Rei Odaira (JIRA)
Rei Odaira created CASSANDRA-11576:
--

 Summary: Add support for JNA mlockall(2) on POWER
 Key: CASSANDRA-11576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11576
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: POWER architecture
Reporter: Rei Odaira
Priority: Minor
 Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x


org.apache.cassandra.utils.CLibrary contains hard-coded C-macro values to be 
passed to system calls through JNA. These values are system-dependent, and as 
far as I investigated, Linux and AIX on the IBM POWER architecture define 
{{MCL_CURRENT}} and {{MCL_FUTURE}} (for mlockall(2)) as different values than 
the current hard-coded values.  As a result, mlockall(2) fails on these 
platforms.
{code}
WARN  18:51:51 Unknown mlockall error 22
{code}
I am going to provide a patch to support JNA mlockall(2) on POWER.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow

2016-04-14 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-7:
-

Assignee: Joel Knighton

> ColUpdateTimeDeltaHistogram histogram overflow
> --
>
> Key: CASSANDRA-7
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Assignee: Joel Knighton
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
>
> {code}
> getting attribute Mean of 
> org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram
>  threw an exceptionjavax.management.RuntimeMBeanException: 
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
> {code}
> Although the fact that this histogram has 164 buckets already, I wonder if 
> there is something weird with the computation thats causing this to be so 
> large? It appears to be coming from updates to system.local
> {code}
> org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11577) Traces persist for longer than 24 hours

2016-04-14 Thread Josh Wickman (JIRA)
Josh Wickman created CASSANDRA-11577:


 Summary: Traces persist for longer than 24 hours
 Key: CASSANDRA-11577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11577
 Project: Cassandra
  Issue Type: Bug
Reporter: Josh Wickman
Priority: Minor


My deployment currently has clusters on both Cassandra 1.2 (1.2.19) and 2.1 
(2.1.11) with tracing on.  On 2.1, the trace records persist for longer than 
the [documented 24 
hours|https://docs.datastax.com/en/cql/3.3/cql/cql_reference/tracing_r.html]:

{noformat}
cqlsh> select started_at from system_traces.sessions limit 10;

 started_at
--
 2016-03-11 23:28:40+
 2016-03-14 21:09:07+
 2016-03-14 16:42:25+
 2016-03-14 16:13:13+
 2016-03-14 19:12:11+
 2016-03-14 21:25:57+
 2016-03-29 22:45:28+
 2016-03-14 19:56:27+
 2016-03-09 23:31:41+
 2016-03-10 23:08:44+

(10 rows)
{noformat}

My systems on 1.2 do not exhibit this problem:

{noformat}
cqlsh> select started_at from system_traces.sessions limit 10;

 started_at
--
 2016-04-13 22:49:31+
 2016-04-14 18:06:45+
 2016-04-14 07:57:00+
 2016-04-14 04:35:05+
 2016-04-14 03:54:20+
 2016-04-14 10:54:38+
 2016-04-14 18:34:04+
 2016-04-14 12:56:57+
 2016-04-14 01:57:20+
 2016-04-13 21:36:01+
{noformat}

The event records also persist alongside the session records, for example:

{noformat}
cqlsh> select session_id, dateOf(event_id) from system_traces.events where 
session_id = fc8c1e80-e7e0-11e5-a2fb-1968ff3c067b;

 session_id   | dateOf(event_id)
--+--
 fc8c1e80-e7e0-11e5-a2fb-1968ff3c067b | 2016-03-11 23:28:40+
{noformat}

Between these versions, the table parameter {{default_time_to_live}} was 
introduced.  The {{system_traces}} tables report the default value of 0:

{noformat}
cqlsh> desc table system_traces.sessions

CREATE TABLE system_traces.sessions (
session_id uuid PRIMARY KEY,
coordinator inet,
duration int,
parameters map,
request text,
started_at timestamp
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = 'traced sessions'
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.SnappyCompressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
{noformat}

I suspect that {{default_time_to_live}} is superseding the mechanism used in 
1.2 to expire the trace records.  Evidently I cannot change this parameter for 
this table:

{noformat}
cqlsh> alter table system_traces.sessions with default_time_to_live = 86400;
Unauthorized: code=2100 [Unauthorized] message="Cannot ALTER "
{noformat}

I realize Cassandra 1.2 is no longer supported, but the problem is being 
manifested in Cassandra 2.1 for me (I included 1.2 only for comparison).  Since 
I couldn't find an existing ticket addressing this issue, I'm concerned that it 
may be present in more recent versions of Cassandra as well, but I have not 
tested these.

The persistent trace records are contributing to disk filling, and more 
importantly, making it more difficult to analyze the trace data.  Is there a 
workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11192) remove DatabaseDescriptor dependency from o.a.c.io.util package

2016-04-14 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-11192:
---
Issue Type: Improvement  (was: Sub-task)
Parent: (was: CASSANDRA-11191)

> remove DatabaseDescriptor dependency from o.a.c.io.util package
> ---
>
> Key: CASSANDRA-11192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11192
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuki Morishita
>
> DatabaseDescriptor is the source of all configuration in Cassandra, but since 
> its static initialization from Config/cassandra.yaml, it is hard to configure 
> programatically. Also if it's not {{Config.setClientMode(true)}}, 
> DatabaseDescriptor creates/initializes tons of unnecessary things for just 
> reading SSTable.
> Since o.a.c.io.util is the core of accessing files, they should be as 
> independent as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11578) remove DatabaseDescriptor dependency from FileUtil

2016-04-14 Thread Yuki Morishita (JIRA)
Yuki Morishita created CASSANDRA-11578:
--

 Summary: remove DatabaseDescriptor dependency from FileUtil
 Key: CASSANDRA-11578
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11578
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor


{{FileUtil}} has dependencies to {{DatabaseDescriptor}} and other online 
related classes like {{StorageService}} when handling FS error.

This is used in handling error in SSTable as well, so when one wants to use 
SSTableReader/Writer offline, they has a chance to initializing unnecessary 
staff at error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11578) remove DatabaseDescriptor dependency from FileUtil

2016-04-14 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-11578:
---
Status: Patch Available  (was: Open)

||branch||testall||dtest||
|[11192-fileutil|https://github.com/yukim/cassandra/tree/11192-fileutil]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11192-fileutil-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11192-fileutil-dtest/lastCompletedBuild/testReport/]|

Patch to add {{FSErrorHandler}} and use default implementation to access 
{{StorageService}} etc only from {{CassandraDaemon}}.

> remove DatabaseDescriptor dependency from FileUtil
> --
>
> Key: CASSANDRA-11578
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11578
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>
> {{FileUtil}} has dependencies to {{DatabaseDescriptor}} and other online 
> related classes like {{StorageService}} when handling FS error.
> This is used in handling error in SSTable as well, so when one wants to use 
> SSTableReader/Writer offline, they has a chance to initializing unnecessary 
> staff at error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >