date:20190816

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909454#comment-16909454
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I tracked the problem down to the StreamingInboundHandler shutdown code.  I 
couldn't work out why it was failing the dtests, but perhaps the code being 
called in
{{session.messageReceived}} didn't like to be thread interrupted.

I've added a commit to revert the original change, and pushed up a new one that 
re-implements the set tracking the active handlers and switched to tracking 
with a weak reference as you suggested.

Here's a clean(ish) [CircleCI 
run|https://circleci.com/workflow-run/5ed4f520-c378-4760-9e42-c000b5c1946b], 
{{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} 
failed, but there were flaky test comments in there too so not sure how 
reliable a test it is.

If you're happy with the change, please can you squash the two new commits into 
the merge commit when you push to origin.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909356#comment-16909356
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've broken the trunk changes up into smaller commits and the issue is with the 
changes to StreamingInboundHandler which would only affect trunk.

This commit breaks the python dtests 
https://github.com/jonmeredith/cassandra/commit/da07569f65ae0eb248488295d5b0a70a8039ee6a

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15283) nodetool statushandoff does not list max hint window

2019-08-16 Thread DeepakVohra (JIRA)

DeepakVohra created CASSANDRA-15283:
---

 Summary: nodetool statushandoff does not list max hint window
 Key: CASSANDRA-15283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15283
 Project: Cassandra
  Issue Type: Bug
  Components: Tool/nodetool
Reporter: DeepakVohra


According to _CASSANDRA-13728_
_Provide max hint window as part of nodetool_
the nodetool statushandoff should list max hint window.

But, with latest 4.0 build the output is still the same as before.
{code:java}
nodetool statushandoff
Hinted handoff is running{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15282) nodetool tablestats does not list bytes repaired/unrepaired

2019-08-16 Thread DeepakVohra (JIRA)

DeepakVohra created CASSANDRA-15282:
---

 Summary: nodetool tablestats does not list bytes 
repaired/unrepaired
 Key: CASSANDRA-15282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15282
 Project: Cassandra
  Issue Type: Bug
  Components: Tool/nodetool
Reporter: DeepakVohra


According to _CASSANDRA-13774_
_add bytes repaired/unrepaired in nodetool tablestats_

But, only the percent is listed, and not the actual bytes with latest 4.0 build.
{code:java}
nodetool tablestats --sort local_read_count --top 3
...
Percent repaired: 0.0
...{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all

2019-08-16 Thread DeepakVohra (JIRA)

DeepakVohra created CASSANDRA-15281:
---

 Summary: help for clearsnapshot needs to be updated to indicate 
requirement for --all
 Key: CASSANDRA-15281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15281
 Project: Cassandra
  Issue Type: Bug
  Components: CQL/Syntax
Reporter: DeepakVohra


According to _CASSANDRA-13391_
_nodetool clearsnapshot should require --all to clear all snapshots_


But the help for clearsnapshot does not indicate the same.
{code:java}
[ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot
 NAME nodetool clearsnapshot - Remove the snapshot with the given name from the 
given keyspaces. If no snapshotName is specified we will remove all 
snapshots{code}
The help for clearsnapshot needs to be updated to indicate requirement for 
--all to remove all snapshots.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

2019-08-16 Thread Tanaka (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909334#comment-16909334
 ] 

Tanaka commented on CASSANDRA-10248:


[~thibaultcha] Thank you very much for your feedback. I never looked at this 
issue from the perspective of the maintainers of client drivers so thank you 
for pointing that out to me. 

I'm relatively new to the Apache Cassandra community and my mentor advised me 
to look & possibly solve documentation related issues in order to learn more 
about this project. 

Given your feedback, I think I can create such a matrix and submit my feature 
for consideration. 

Take care.

> Document compatibilities between native specs and Cassandra versions
> 
>
> Key: CASSANDRA-10248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10248
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Documentation and Website
>Reporter: Thibault Charbonnier
>Assignee: Tanaka
>Priority: Low
>  Labels: documentation
>
> Nowhere in the native specs is specified for which Cassandra version it is 
> compatible with. This has been confusing to me when implementing a given 
> protocol in a Lua driver, and has apparently been confusing other people [1].
> I remember seeing a table specifying which specs were compatible with which 
> Cassandra version somewhere in the Python driver documentation but I am 
> currently unable to find it.
> Proposed solution: maybe include a small table in each specification file 
> describing the compatibilities between Cassandra and the current (and 
> eventually older) specs.
> [1] 
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201504.mbox/%3ca87729c9-fa6a-4b34-bb7b-b324e154c...@datastax.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

2019-08-16 Thread Thibault Charbonnier (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909244#comment-16909244
 ] 

Thibault Charbonnier commented on CASSANDRA-10248:
--

[~TanakaRM] In my view, the need to provide a compatibility matrix between 
Cassandra and native protocol versions isn't for users of the client drivers, 
but for _developers of the client drivers_ themselves. Client drivers must 
document which version of Cassandra they are compatible with (for users), and 
so should the binary protocol documentation (for maintainers). Given some 
features are only available with specific versions of the binary protocol, it 
is also useful to expose this information to users of the client drivers as 
well. Hence, this information should be easily accessible in my opinion.

My personal experience (maintaining the Lua driver) was that figuring out which 
version of Cassandra I needed to test my implementations of different versions 
of the binary protocol was somewhat of a pain point. In the end, I had to write 
a compatibility matrix myself, from information gathered in various other 
drivers documentation or bits from the binary protocol specifications 
(https://thibaultcha.github.io/lua-cassandra/manual/README.md.html).

My suggestion in this ticket is to include a similar table in the binary 
specifications documentation 
([https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec)] 
so as to help current and future client drivers maintainers and users having a 
clear vision of which version of the protocol is compatible with which version 
of Cassandra, at a glance.
{quote}Do you think it makes sense to have the compatibility tables on 
Cassandra's official documents?
{quote}
Yes, I very much do think so.

 

This is my feedback as a maintainer, thanks for taking the time to look at it!

> Document compatibilities between native specs and Cassandra versions
> 
>
> Key: CASSANDRA-10248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10248
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Documentation and Website
>Reporter: Thibault Charbonnier
>Assignee: Tanaka
>Priority: Low
>  Labels: documentation
>
> Nowhere in the native specs is specified for which Cassandra version it is 
> compatible with. This has been confusing to me when implementing a given 
> protocol in a Lua driver, and has apparently been confusing other people [1].
> I remember seeing a table specifying which specs were compatible with which 
> Cassandra version somewhere in the Python driver documentation but I am 
> currently unable to find it.
> Proposed solution: maybe include a small table in each specification file 
> describing the compatibilities between Cassandra and the current (and 
> eventually older) specs.
> [1] 
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201504.mbox/%3ca87729c9-fa6a-4b34-bb7b-b324e154c...@datastax.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-16 Thread Samuel Klock (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909228#comment-16909228
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Thanks for the feedback.  I've tweaked the 3.11 patch accordingly.  (Minor 
wrinkle: we don't end up deferring to {{seek()}} in the {{null}} buffer case as 
{{current()}}, which uses the buffer, is called first.)

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the

[jira] [Commented] (CASSANDRA-15208) Listing the same data directory multiple times can result in an java.lang.AssertionError: null on startup

2019-08-16 Thread Jeremy Hanna (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909212#comment-16909212
 ] 

Jeremy Hanna commented on CASSANDRA-15208:
--

Just curious, why would you list the same data directory multiple times?

> Listing the same data directory multiple times can result in an 
> java.lang.AssertionError: null on startup
> -
>
> Key: CASSANDRA-15208
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15208
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Damien Stevenson
>Priority: Normal
>
> Listing the same data directory multiple times in the yaml can result in an 
> java.lang.AssertionError: null on startup.
> This error will only happen if Cassandra was stopped part way through an 
> sstable operation (i.e a compaction) and is restarted
> Error: 
> {noformat}
> Exception (java.lang.AssertionError) encountered during startup: null
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.addReplica(LogReplicaSet.java:63)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.addReplicas(LogReplicaSet.java:57)
> at org.apache.cassandra.db.lifecycle.LogFile.(LogFile.java:147)
> at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:95)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction$LogFilesByName.removeUnfinishedLeftovers(LogTransaction.java:476)
> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.HashMap$EntrySpliterator.tryAdvance(HashMap.java:1717)
> at 
> java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
> at 
> java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
> at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:454)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction$LogFilesByName.removeUnfinishedLeftovers(LogTransaction.java:471)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:438)
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction.removeUnfinishedLeftovers(LogTransaction.java:430)
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:549)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:658)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:275)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
> at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
> ERROR o.a.c.service.CassandraDaemon Exception encountered during startup
> java.lang.AssertionError: null
> at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.addReplica(LogReplicaSet.java:63)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at java.util.ArrayList.forEach(ArrayList.java:1257) ~[na:1.8.0_171]
> at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.addReplicas(LogReplicaSet.java:57)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at org.apache.cassandra.db.lifecycle.LogFile.(LogFile.java:147) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at org.apache.cassandra.db.lifecycle.LogFile.make(LogFile.java:95) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.lifecycle.LogTransaction$LogFilesByName.removeUnfinishedLeftovers(LogTransaction.java:476)
>  ~[apache-cassandra-3.11.4.jar:3.11>
> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
> ~[na:1.8.0_171]
> at java.util.HashMap$EntrySpliterator.tryAdvance(HashMap.java:1717) 
> ~[na:1.8.0_171]
> at 
> java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
>  ~[na:1.8.0_171]
> at 
> java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
>  ~[na:1.8.0_171]
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) 
> ~[na:1.8.0_171]
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
> ~[na:1.8.0_171]
> at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230) 
> ~[na:1.8.0_171]
> at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196) 
> ~[na:1.8.0_171]
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.

[jira] [Commented] (CASSANDRA-13753) The documentation website can be fitted well on device width.

2019-08-16 Thread Tanaka (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909107#comment-16909107
 ] 

Tanaka commented on CASSANDRA-13753:


[~ashish1269] Hi. So I went through some of the issues expressed on this ticket

1) Went through the provided link on different laptops with different 
resolutions (Mac & PC) and I did not have to scroll horizontally. 

2) When I scroll to the bottom of the documentation pages, it doesn't shoot me 
back up.

3) Spent some time going through the website on mobile. It's not the best 
mobile experience I've ever had but I wouldn't say it is not mobile-friendly.

Granted this ticket was created 2 years ago so maybe in that changes were made 
& this ticket wasn't updated?

> The documentation website can be fitted well on device width.
> -
>
> Key: CASSANDRA-13753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13753
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Documentation and Website
> Environment: *Operating System : *Ubuntu
> *Browsers: *
> * Firefox
> * Google Chrome
>Reporter: Ashish Tomer
>Assignee: Ashish Tomer
>Priority: Low
>  Labels: Responsive, UI, documentation, website
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> The following shortcomings/ issues are noticed on the pages of cassandra 
> documentation website ([http://cassandra.apache.org/doc/latest/])
> *1.* On laptop screen with resolution 1366 × 768 the width of the 
> webpage is more than the width of the screen. The content of the website is 
> going left and user has to scroll horizontally to read the lines. The 
> horizontal scrollbar at the bottom needs to be removed.
> *2.* When some pages are scrolled down the whole page fluctuate and jump back 
> to top of the page. {color:red}Example link - 
> {color}[http://cassandra.apache.org/doc/latest/architecture/overview.html]
> *3.* The website is not mobile friendly and can be made responsive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

2019-08-16 Thread Tanaka (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908053#comment-16908053
 ] 

Tanaka edited comment on CASSANDRA-10248 at 8/16/19 1:19 PM:
-

[~thibaultcha] [~benedict] In regards to this issue, every driver on this page: 
[Client 
Drivers|[http://cassandra.apache.org/doc/latest/getting_started/drivers.html?highlight=python]]
 has documentation that states what version of Apache Cassandra is compatible 
with each driver but it is hosted on their own websites/repos. Do you think it 
makes sense to have the compatibility tables on Cassandra's official documents? 


was (Author: tanakarm):
[~thibaultcha] In regards to this issue, every driver on this page: [Client 
Drivers|[http://cassandra.apache.org/doc/latest/getting_started/drivers.html?highlight=python]]
 has documentation that states what version of Apache Cassandra is compatible 
with each driver but it is hosted on their own websites/repos. Do you think it 
makes sense to have the compatibility tables on Cassandra's official documents? 

> Document compatibilities between native specs and Cassandra versions
> 
>
> Key: CASSANDRA-10248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10248
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Documentation and Website
>Reporter: Thibault Charbonnier
>Assignee: Tanaka
>Priority: Low
>  Labels: documentation
>
> Nowhere in the native specs is specified for which Cassandra version it is 
> compatible with. This has been confusing to me when implementing a given 
> protocol in a Lua driver, and has apparently been confusing other people [1].
> I remember seeing a table specifying which specs were compatible with which 
> Cassandra version somewhere in the Python driver documentation but I am 
> currently unable to find it.
> Proposed solution: maybe include a small table in each specification file 
> describing the compatibilities between Cassandra and the current (and 
> eventually older) specs.
> [1] 
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201504.mbox/%3ca87729c9-fa6a-4b34-bb7b-b324e154c...@datastax.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15227) Remove StageManager

2019-08-16 Thread Venkata Harikrishna Nukala (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908928#comment-16908928
 ] 

Venkata Harikrishna Nukala commented on CASSANDRA-15227:


[~benedict] Made the changes and updated the branch (squashed commits). 

Made changes to pass _ExecutorServiceInitialiser_ as the last param to make 
enum easy to read. I tried passing executor as the param for the constructor 
but felt passing jmxName, jmxType, etc... as params look consistent and clean. 
Removed jmxName as member variable form Stage. Using thread instance comparison 
for _Gossiper_ as it needs to check if the code is being executed by executor 
thread or not. Created a separate class (_JMXEnabledSingleThreadExecutor_) 
instead _JMXEnabledThreadPoolExecutor_ as somebody can change the core/max 
threads using JMX which _JMXEnabledSingleThreadExecutor_ doesn't allow. Now 
Gossip stage literally uses only one thread. _ANTI_ENTROPY, MIGRATION & MISC_ 
stages use a single thread and can use _JMXEnabledSingleThreadExecutor_ but did 
not make changes because I thought it is worth tracking that as a separate 
change and maybe there could be some debate around them.

> Remove StageManager
> ---
>
> Key: CASSANDRA-15227
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15227
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Other
>Reporter: Benedict
>Assignee: Venkata Harikrishna Nukala
>Priority: Normal
> Fix For: 4.0
>
>
> his is a minor cleanup; this class should not exist, but should be embedded 
> in the Stage enum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-16 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908901#comment-16908901
 ] 

Benedict commented on CASSANDRA-15274:
--

You may need to modify the code in {{SSTableExport}} to include the line 
{{metadata.compressionParameters.setCrcCheckChance(0);}} at the start of 
{{export}}

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 ca

[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-16 Thread Vladimir Vavro (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908894#comment-16908894
 ] 

Vladimir Vavro edited comment on CASSANDRA-15274 at 8/16/19 9:30 AM:
-

Since affected version is 2.2.x there is no sstabledump available, but there is 
sstable2json. We tried to export one file and the attempt failed - but it looks 
like it again failed during the crc check based on this part of error message:


 Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-10664-big-Data.db):
 corruption detected, chunk at 7392105638 of length 35173.
        at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)


 Is it possible that sstable2json is using the same code to handle the data as 
Cassandra normally does? If it true, is it different for newer utilities 
sstableexport/sstabledump ?


was (Author: vvavro):
Since affected version is 2.2.x there is no sstabledump available, but there is 
sstable2json. We tried to export one file and the attempt failed - but it looks 
like it again failed during the crc check based on this part of error message:
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-10664-big-Data.db):
 corruption detected, chunk at 7392105638 of length 35173.
        at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)
Is it possible that sstable2json is using the same code to handle the data as 
Cassandra normally does? If it true, is it different for newer utilities 
sstableexport/sstabledump ?

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-16 Thread Vladimir Vavro (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908894#comment-16908894
 ] 

Vladimir Vavro commented on CASSANDRA-15274:


Since affected version is 2.2.x there is no sstabledump available, but there is 
sstable2json. We tried to export one file and the attempt failed - but it looks 
like it again failed during the crc check based on this part of error message:
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-10664-big-Data.db):
 corruption detected, chunk at 7392105638 of length 35173.
        at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)
Is it possible that sstable2json is using the same code to handle the data as 
Cassandra normally does? If it true, is it different for newer utilities 
sstableexport/sstabledump ?

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-16 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908884#comment-16908884
 ] 

Benedict commented on CASSANDRA-14415:
--

Late to the party, but I agree with Kurt that we should simply {{return 0}} for 
{{n < 0}}, and we should probably let {{seek}} handle the {{null}} buffer.

Looks like a good simple patch.  I don't see any blockers to this.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without t

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

[jira] [Created] (CASSANDRA-15283) nodetool statushandoff does not list max hint window

[jira] [Created] (CASSANDRA-15282) nodetool tablestats does not list bytes repaired/unrepaired

[jira] [Created] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all

[jira] [Commented] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

[jira] [Commented] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

[jira] [Commented] (CASSANDRA-15208) Listing the same data directory multiple times can result in an java.lang.AssertionError: null on startup

[jira] [Commented] (CASSANDRA-13753) The documentation website can be fitted well on device width.

[jira] [Comment Edited] (CASSANDRA-10248) Document compatibilities between native specs and Cassandra versions

[jira] [Commented] (CASSANDRA-15227) Remove StageManager

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

16 matches

Site Navigation

Mail list logo

Footer information