[jira] [Commented] (CASSANDRA-11920) bloom_filter_fp_chance needs to be validated up front
[ https://issues.apache.org/jira/browse/CASSANDRA-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339015#comment-15339015 ] Arindam Gupta commented on CASSANDRA-11920: --- Thanks Tyler Hobbs for your help. Please let me know if any further actions required from my side. One additional question : Do we need to change documentation also reflecting this change? > bloom_filter_fp_chance needs to be validated up front > - > > Key: CASSANDRA-11920 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11920 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle, Local Write-Read Paths >Reporter: ADARSH KUMAR >Assignee: Arindam Gupta >Priority: Minor > Labels: lhf > Attachments: 11920-3.0.txt > > > Hi, > I was doing some bench-marking on bloom_filter_fp_chance values. Everything > worked fine for values .01(default for STCS), .001, .0001. But when I set > bloom_filter_fp_chance = .1 i observed following behaviour: > 1). Reads and writes looked normal from cqlsh. > 2). SSttables are never created. > 3). It just creates two files (*-Data.db and *-index.db) of size 0kb. > 4). nodetool flush does not work and produce following exception: > java.lang.UnsupportedOperationException: Unable to satisfy 1.0E-5 with 20 > buckets per element > at > org.apache.cassandra.utils.BloomCalculations.computeBloomSpec(BloomCalculations.java:150) > . > I checked BloomCalculations class and following lines are responsible for > this exception: > if (maxFalsePosProb < probs[maxBucketsPerElement][maxK]) { > throw new UnsupportedOperationException(String.format("Unable to > satisfy %s with %s buckets per element", > maxFalsePosProb, > maxBucketsPerElement)); > } > From the code it looks like a hard coaded validation (unless we can change > the nuber of buckets). > So, if this validation is hard coaded then why it is even allowed to set such > value of bloom_fileter_fp_chance, that can prevent ssTable generation? > Please correct this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11870) Consider allocating direct buffers bypassing ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/CASSANDRA-11870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338935#comment-15338935 ] Mahdi Mohammadi commented on CASSANDRA-11870: - Shouldn't we set a default value for `max_direct_memory_in_mb` in Config.java? > Consider allocating direct buffers bypassing ByteBuffer.allocateDirect > -- > > Key: CASSANDRA-11870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11870 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > As outlined in CASSANDRA-11818, {{ByteBuffer.allocateDirect}} uses > {{Bits.reserveMemory}}, which is there to respect the JVM setting > {{-XX:MaxDirectMemorySize=...}}. > {{Bits.reserveMemory}} first tries an "optimistic" {{tryReserveMemory}} and > exits immediately on success. However, if that somehow doesn't succeed, it > triggers a {{System.gc()}}, which is bad IMO (however, kind of how direct > buffers work in Java). After that GC it sleeps and tries to reserve the > memory up to 9 times - up to 511 ms - and then throws > {{OutOfMemoryError("Direct buffer memory")}}. > This is unnecessary for us since we always immediately "free" direct buffers > as soon as we no longer need them. > Proposal: Manage direct-memory reservations in our own code and skip > {{Bits.reserveMemory}} that way. > (However, Netty direct buffers are not under our control.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11537) Give clear error when certain nodetool commands are issued before server is ready
[ https://issues.apache.org/jira/browse/CASSANDRA-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous updated CASSANDRA-11537: -- Status: Ready to Commit (was: Patch Available) > Give clear error when certain nodetool commands are issued before server is > ready > - > > Key: CASSANDRA-11537 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11537 > Project: Cassandra > Issue Type: Improvement >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Labels: lhf > > As an ops person upgrading and servicing Cassandra servers, I require a more > clear message when I issue a nodetool command that the server is not ready > for it so that I am not confused. > Technical description: > If you deploy a new binary, restart, and issue nodetool > scrub/compact/updatess etc you get unfriendly assertion. An exception would > be easier to understand. Also if a user has turned assertions off it is > unclear what might happen. > {noformat} > EC1: Throw exception to make it clear server is still in start up process. > :~# nodetool upgradesstables > error: null > -- StackTrace -- > java.lang.AssertionError > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:97) > at > org.apache.cassandra.service.StorageService.getValidKeyspace(StorageService.java:2573) > at > org.apache.cassandra.service.StorageService.getValidColumnFamilies(StorageService.java:2661) > at > org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2421) > {noformat} > EC1: > Patch against 2.1 (branch) > https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:exception-on-startup?expand=1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338810#comment-15338810 ] Alex Petrov commented on CASSANDRA-12010: - After giving it yet another thought we may actually still benefit from patch as it uses {{beforeAndAfterFlush}} helper as Joel noted above. I will reopen it. > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-12010: Status: Patch Available (was: Reopened) > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov reopened CASSANDRA-12010: - > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-12010: Status: Ready to Commit (was: Patch Available) > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X
[ https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338771#comment-15338771 ] Alex Petrov commented on CASSANDRA-10857: - We could add some defaulting alias for the select queries, although I'm not sure if it's desired since it'll be special-casing. Might be better to wait until we have regular column renames and then just rename it. I've separated a commit that allows {{\"\"}} identifiers, disallowing it for MVs, UDTs, Fn names and arguments, table names. It's definitely not a good idea to allow that. This actually might be a good idea to allow empty identifiers through ANTLR and add user-friendly messages, as we currently have {{required (...)+ loop did not match anything at input}} error message on empty string identifier. > Allow dropping COMPACT STORAGE flag from tables in 3.X > -- > > Key: CASSANDRA-10857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10857 > Project: Cassandra > Issue Type: Improvement > Components: CQL, Distributed Metadata >Reporter: Aleksey Yeschenko >Assignee: Alex Petrov > Fix For: 3.x > > > Thrift allows users to define flexible mixed column families - where certain > columns would have explicitly pre-defined names, potentially non-default > validation types, and be indexed. > Example: > {code} > create column family foo > and default_validation_class = UTF8Type > and column_metadata = [ > {column_name: bar, validation_class: Int32Type, index_type: KEYS}, > {column_name: baz, validation_class: UUIDType, index_type: KEYS} > ]; > {code} > Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and > {{UUIDType}}, respectively, and be indexed. Columns with any other name will > be validated by {{UTF8Type}} and will not be indexed. > With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns > internally. However, being {{WITH COMPACT STORAGE}}, the table will only > expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column > not named {{bar}} and {{baz}}) right now requires going through Thrift. > This is blocking Thrift -> CQL migration for users who have mixed > dynamic/static column families. That said, it *shouldn't* be hard to allow > users to drop the {{compact}} flag to expose the table as it is internally > now, and be able to access all columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338610#comment-15338610 ] Alex Petrov commented on CASSANDRA-12010: - [~snazy] result is the same :) so all good. Thanks for taking care of it! > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11870) Consider allocating direct buffers bypassing ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/CASSANDRA-11870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11870: - Status: Patch Available (was: Open) Patch for this: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11870-own-off-heap-space-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11870-own-off-heap-space-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11870-own-off-heap-space-trunk-dtest/lastSuccessfulBuild/] > Consider allocating direct buffers bypassing ByteBuffer.allocateDirect > -- > > Key: CASSANDRA-11870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11870 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > As outlined in CASSANDRA-11818, {{ByteBuffer.allocateDirect}} uses > {{Bits.reserveMemory}}, which is there to respect the JVM setting > {{-XX:MaxDirectMemorySize=...}}. > {{Bits.reserveMemory}} first tries an "optimistic" {{tryReserveMemory}} and > exits immediately on success. However, if that somehow doesn't succeed, it > triggers a {{System.gc()}}, which is bad IMO (however, kind of how direct > buffers work in Java). After that GC it sleeps and tries to reserve the > memory up to 9 times - up to 511 ms - and then throws > {{OutOfMemoryError("Direct buffer memory")}}. > This is unnecessary for us since we always immediately "free" direct buffers > as soon as we no longer need them. > Proposal: Manage direct-memory reservations in our own code and skip > {{Bits.reserveMemory}} that way. > (However, Netty direct buffers are not under our control.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11537) Give clear error when certain nodetool commands are issued before server is ready
[ https://issues.apache.org/jira/browse/CASSANDRA-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338599#comment-15338599 ] Robert Stupp commented on CASSANDRA-11537: -- Hm - now other tests fail: http://cassci.datastax.com/job/snazy-CASSANDRA-11537-2-testall/2/#showFailuresLink > Give clear error when certain nodetool commands are issued before server is > ready > - > > Key: CASSANDRA-11537 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11537 > Project: Cassandra > Issue Type: Improvement >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Labels: lhf > > As an ops person upgrading and servicing Cassandra servers, I require a more > clear message when I issue a nodetool command that the server is not ready > for it so that I am not confused. > Technical description: > If you deploy a new binary, restart, and issue nodetool > scrub/compact/updatess etc you get unfriendly assertion. An exception would > be easier to understand. Also if a user has turned assertions off it is > unclear what might happen. > {noformat} > EC1: Throw exception to make it clear server is still in start up process. > :~# nodetool upgradesstables > error: null > -- StackTrace -- > java.lang.AssertionError > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:97) > at > org.apache.cassandra.service.StorageService.getValidKeyspace(StorageService.java:2573) > at > org.apache.cassandra.service.StorageService.getValidColumnFamilies(StorageService.java:2661) > at > org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2421) > {noformat} > EC1: > Patch against 2.1 (branch) > https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:exception-on-startup?expand=1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11972) Use byte[] instead of object tree in Frame.Header
[ https://issues.apache.org/jira/browse/CASSANDRA-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11972: - Status: Patch Available (was: In Progress) Yea - maybe. Unsure after I've sorted out all the other stuff. Had the patch handy anyway. Wouldn't mind if it's considered not worth to commit. ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11972-bytes-in-header-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11972-bytes-in-header-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11972-bytes-in-header-trunk-dtest/lastSuccessfulBuild/] > Use byte[] instead of object tree in Frame.Header > - > > Key: CASSANDRA-11972 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11972 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > Replacing the object tree/references in {{Frame.Header}} with {{byte[9]}} > saves a couple of object allocations. Also, not allocating the 9 bytes for > the header off-heap is less expensive. > (will provide a patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338583#comment-15338583 ] Robert Stupp commented on CASSANDRA-12010: -- Oops - sorry. Haven't seen this ticket. > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11967) Export metrics for prometheus in its native format
[ https://issues.apache.org/jira/browse/CASSANDRA-11967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11967: - Status: Patch Available (was: In Progress) Patch for this: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11967-prometheus-exporter-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11967-prometheus-exporter-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11967-prometheus-exporter-trunk-dtest/lastSuccessfulBuild/] Text from NEWS.txt: bq. Support for alternative metrics exporters has been added. To use them, the appropriate libraries need to be placed in the lib directory. Cassandra will load the class given in the system property cassandra.metricsExporter and instantiate it by calling the constructor taking an instance of com.codahale.metrics.MetricRegistry. If the provided class implements java.io.Closeable, its close() method will be called on shutdown. > Export metrics for prometheus in its native format > -- > > Key: CASSANDRA-11967 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11967 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > https://github.com/snazy/prometheus-metrics-exporter allows to export > codahale metrics for prometheus.io. In order to integrate this, a minor > change to C* is necessary to load the library. > This eliminates the need to use the additional graphite-exporter tool and > therefore also allows prometheus to track the up/down status of C*. > (Will provide the patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338578#comment-15338578 ] Alex Petrov commented on CASSANDRA-12010: - Ninja-fixed by [~snazy] in [trunk|https://github.com/apache/cassandra/commit/471552e0429267ae62c292487ddbb4d15a7daa49] > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12010) UserTypesTest# is failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-12010: Resolution: Fixed Status: Resolved (was: Ready to Commit) > UserTypesTest# is failing on trunk > -- > > Key: CASSANDRA-12010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12010 > Project: Cassandra > Issue Type: Test > Components: Testing >Reporter: Alex Petrov >Assignee: Alex Petrov > Fix For: 3.x > > > Test failure: > http://cassci.datastax.com/job/trunk_utest/1445/testReport/org.apache.cassandra.cql3.validation.entities/UserTypesTest/testAlteringUserTypeNestedWithinNonFrozenMap/ > This was caused by the merge after > [11604|https://issues.apache.org/jira/browse/CASSANDRA-11604] which probably > coincided with some other change, as this failure did not happen during the > [test run on the > branch|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11604-trunk-testall/lastCompletedBuild/testReport/]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12034) Special handling for Netty's direct memory allocation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-12034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-12034: - Fix Version/s: 3.x > Special handling for Netty's direct memory allocation failure > - > > Key: CASSANDRA-12034 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12034 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.x > > > With CASSANDRA-12032, Netty throws a > {{io.netty.util.internal.OutOfDirectMemoryError}} if there's not enough > off-heap memory for the response buffer. We can easily handle this situation > and return an error. This is not a condition that destabilizes the system and > should therefore not passed to {{JVMStabilityInspector}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12034) Special handling for Netty's direct memory allocation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-12034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-12034: - Status: Patch Available (was: Open) Trivial patch for this: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:12034-netty-oom-special-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-12034-netty-oom-special-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-12034-netty-oom-special-trunk-dtest/lastSuccessfulBuild/] > Special handling for Netty's direct memory allocation failure > - > > Key: CASSANDRA-12034 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12034 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > > With CASSANDRA-12032, Netty throws a > {{io.netty.util.internal.OutOfDirectMemoryError}} if there's not enough > off-heap memory for the response buffer. We can easily handle this situation > and return an error. This is not a condition that destabilizes the system and > should therefore not passed to {{JVMStabilityInspector}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12034) Special handling for Netty's direct memory allocation failure
Robert Stupp created CASSANDRA-12034: Summary: Special handling for Netty's direct memory allocation failure Key: CASSANDRA-12034 URL: https://issues.apache.org/jira/browse/CASSANDRA-12034 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp With CASSANDRA-12032, Netty throws a {{io.netty.util.internal.OutOfDirectMemoryError}} if there's not enough off-heap memory for the response buffer. We can easily handle this situation and return an error. This is not a condition that destabilizes the system and should therefore not passed to {{JVMStabilityInspector}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-11818. -- Resolution: Done The major cause of this behaviour is {{ByteBuffer.allocateDirect}} resp {{Bits.reserveMemory}}, which is addressed in CASSANDRA-12032 and CASSANDRA-11870. > C* does neither recover nor trigger stability inspector on direct memory OOM > > > Key: CASSANDRA-11818 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11818 > Project: Cassandra > Issue Type: Bug >Reporter: Robert Stupp > Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, > oom-histo-live.txt, oom-stack.txt > > > The following stack trace is not caught by {{JVMStabilityInspector}}. > Situation was caused by a load test with a lot of parallel writes and reads > against a single node. > {code} > ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - > Unexpected exception during request; channel = [id: 0x1e02351b, > L:/127.0.0.1:9042 - R:/127.0.0.1:51087] > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > ~[na:1.8.0_92] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > ~[na:1.8.0_92] > at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314) > ~[main/:na] > at > io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445) > ~[main/:na] > at > io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92] > {code} > The situation does not get better when the load driver is stopped. > I can reproduce this scenario at will. Managed to get histogram, stack traces > and heap dump. Already increased {{-XX:MaxDirectMemorySize}} to {{2g}}. > A {{nodetool flush}} causes the daemon to exit (as that direct-memory OOM is > caught by {{JVMStabilityInspector}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11968) More metrics on native protocol requests & responses
[ https://issues.apache.org/jira/browse/CASSANDRA-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11968: - Status: Patch Available (was: In Progress) Patch to add these metrics: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:19968-transport-metrics-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-19968-transport-metrics-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-19968-transport-metrics-trunk-dtest/lastSuccessfulBuild/] > More metrics on native protocol requests & responses > > > Key: CASSANDRA-11968 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11968 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > Proposal to add more metrics to the native protocol: > - number of requests per request-type > - number of responses by response-type > - size of request messages in bytes > - size of response messages in bytes > - number of in-flight requests (from request arrival to response) > (Will provide a patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12033) Use Netty's off-heap allocator instead of ByteBuffer.allocateDirect()
[ https://issues.apache.org/jira/browse/CASSANDRA-12033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-12033. -- Resolution: Invalid Misunderstanding - Netty 4.0.37 already changes to use it's own off-heap space management. No additional ticket needed. > Use Netty's off-heap allocator instead of ByteBuffer.allocateDirect() > - > > Key: CASSANDRA-12033 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12033 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > > As outlined in CASSANDRA-11818, ByteBuffer.allocateDirect() has some major > issues. > This ticket configured Netty to use use its own off-heap "space". Requires > Netty 4.0.37 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12032) Update to Netty 4.0.37
[ https://issues.apache.org/jira/browse/CASSANDRA-12032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-12032: - Status: Patch Available (was: Open) "trivial" patch: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:12032-netty-4.0.37-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-12032-netty-4.0.37-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-12032-netty-4.0.37-trunk-dtest/lastSuccessfulBuild/] > Update to Netty 4.0.37 > -- > > Key: CASSANDRA-12032 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12032 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.x > > > Update Netty to 4.0.37 > (no C* code changes in this ticket) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-11921) Upgrade to Netty 4.1 + PR5314
[ https://issues.apache.org/jira/browse/CASSANDRA-11921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-11921. -- Resolution: Later Fix Version/s: (was: 3.x) 4.x Resolving this as later (4.x?). Using CASSANDRA-12032 + CASSANDRA-12033 + CASSANDRA-11870 for 3.x versions. > Upgrade to Netty 4.1 + PR5314 > - > > Key: CASSANDRA-11921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11921 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 4.x > > > Netty [PR5314|https://github.com/netty/netty/pull/5314] works around > {{Bits.reserveMemory}}+{{Cleaner}} and introduces an independent off-heap > memory pool. > Requirement for CASSANDRA-11870 > Local tests of Netty4.1+PR5314 against trunk were running fine. > Any incompatibilities or else to consider when upgrading from Netty 4.0 to > 4.1? > /cc [~norman] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11870) Consider allocating direct buffers bypassing ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/CASSANDRA-11870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338546#comment-15338546 ] Robert Stupp commented on CASSANDRA-11870: -- (removed link to CASSANDRA-11921) This code change by itself is independent, but related to CASSANDRA-12032 + CASSANDRA-12033. > Consider allocating direct buffers bypassing ByteBuffer.allocateDirect > -- > > Key: CASSANDRA-11870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11870 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > As outlined in CASSANDRA-11818, {{ByteBuffer.allocateDirect}} uses > {{Bits.reserveMemory}}, which is there to respect the JVM setting > {{-XX:MaxDirectMemorySize=...}}. > {{Bits.reserveMemory}} first tries an "optimistic" {{tryReserveMemory}} and > exits immediately on success. However, if that somehow doesn't succeed, it > triggers a {{System.gc()}}, which is bad IMO (however, kind of how direct > buffers work in Java). After that GC it sleeps and tries to reserve the > memory up to 9 times - up to 511 ms - and then throws > {{OutOfMemoryError("Direct buffer memory")}}. > This is unnecessary for us since we always immediately "free" direct buffers > as soon as we no longer need them. > Proposal: Manage direct-memory reservations in our own code and skip > {{Bits.reserveMemory}} that way. > (However, Netty direct buffers are not under our control.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11870) Consider allocating direct buffers bypassing ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/CASSANDRA-11870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp reassigned CASSANDRA-11870: Assignee: Robert Stupp > Consider allocating direct buffers bypassing ByteBuffer.allocateDirect > -- > > Key: CASSANDRA-11870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11870 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > As outlined in CASSANDRA-11818, {{ByteBuffer.allocateDirect}} uses > {{Bits.reserveMemory}}, which is there to respect the JVM setting > {{-XX:MaxDirectMemorySize=...}}. > {{Bits.reserveMemory}} first tries an "optimistic" {{tryReserveMemory}} and > exits immediately on success. However, if that somehow doesn't succeed, it > triggers a {{System.gc()}}, which is bad IMO (however, kind of how direct > buffers work in Java). After that GC it sleeps and tries to reserve the > memory up to 9 times - up to 511 ms - and then throws > {{OutOfMemoryError("Direct buffer memory")}}. > This is unnecessary for us since we always immediately "free" direct buffers > as soon as we no longer need them. > Proposal: Manage direct-memory reservations in our own code and skip > {{Bits.reserveMemory}} that way. > (However, Netty direct buffers are not under our control.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12033) Use Netty's off-heap allocator instead of ByteBuffer.allocateDirect()
Robert Stupp created CASSANDRA-12033: Summary: Use Netty's off-heap allocator instead of ByteBuffer.allocateDirect() Key: CASSANDRA-12033 URL: https://issues.apache.org/jira/browse/CASSANDRA-12033 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp As outlined in CASSANDRA-11818, ByteBuffer.allocateDirect() has some major issues. This ticket configured Netty to use use its own off-heap "space". Requires Netty 4.0.37 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12032) Update to Netty 4.0.37
Robert Stupp created CASSANDRA-12032: Summary: Update to Netty 4.0.37 Key: CASSANDRA-12032 URL: https://issues.apache.org/jira/browse/CASSANDRA-12032 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Update Netty to 4.0.37 (no C* code changes in this ticket) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11970) Reuse DataOutputBuffer from ColumnIndex
[ https://issues.apache.org/jira/browse/CASSANDRA-11970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11970: - Status: Patch Available (was: In Progress) Patch available: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11970-reuse-DOB-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11970-reuse-DOB-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11970-reuse-DOB-trunk-dtest/lastSuccessfulBuild/] > Reuse DataOutputBuffer from ColumnIndex > --- > > Key: CASSANDRA-11970 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11970 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > With a simple change, the {{DataOutputBuffer}} used in {{ColumnIndex}} can be > reused. This saves a couple of (larger) object allocations. > (Will provide a patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11971) More uses of DataOutputBuffer.RECYCLER
[ https://issues.apache.org/jira/browse/CASSANDRA-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11971: - Status: Patch Available (was: In Progress) Patch uses recycled {{DataOutputBuffer}}s instead of allocating new ones. Also introduces {{DataOutputBuffer.asNewBuffer()}} to replace some {{ByteBuffer.wrap(out.getData(), 0, out.getLength())}}. ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11971-more-recycler-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11971-more-recycler-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11971-more-recycler-trunk-dtest/lastSuccessfulBuild/] > More uses of DataOutputBuffer.RECYCLER > -- > > Key: CASSANDRA-11971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11971 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > There are a few more possible use cases for {{DataOutputBuffer.RECYCLER}}, > which prevents a couple of (larger) allocations. > (Will provide a patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11969) Prevent duplicate ctx.channel().attr() call
[ https://issues.apache.org/jira/browse/CASSANDRA-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11969: - Status: Patch Available (was: In Progress) Trivial patch - Netty's {{DefaultAttributeMap.attr()}} is not the cheapest method. ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11969-dup-ctx-ch-attr-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11969-dup-ctx-ch-attr-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11969-dup-ctx-ch-attr-trunk-dtest/lastSuccessfulBuild/] > Prevent duplicate ctx.channel().attr() call > --- > > Key: CASSANDRA-11969 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11969 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Trivial > Fix For: 3.x > > > In {{Frame}} we can save one call to > {{ctx.channel().attr(Connection.attributeKey)}}. > (Will provide a patch soon) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338535#comment-15338535 ] vin01 edited comment on CASSANDRA-11845 at 6/19/16 2:31 PM: After gettint the exception :- ERROR [STREAM-OUT-/NODE_IN_DC_1] 2016-06-19 08:36:10,187 StreamSession.java:524 - [Stream #80b94bf0-3611-11e6-a89a-87602fd2948b] Streaming error occurred java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.8.0_72] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_72] I went on to check network logs around that time. At ASA firewall between DCs i can see lot of deny messages for some packets :- %ASA-6-106015: Deny TCP (no connection) from [NODE_IN_DC_2]/7003 to [NODE_IN_DC_1]/45573 flags ACK on interface inside I think that's the reason for failure. That deny message basically indicates an idle timeout, which lead to an ACK to be sent after connection was already removed from connection pool by firewall. Does cassandra has something to handle such cases? some retry kind of mechanism? was (Author: vin01): At ASA firewall between DCs i can see lot of deny messages for some packets :- %ASA-6-106015: Deny TCP (no connection) from [NODE_IN_DC_2]/7003 to [NODE_IN_DC_1]/45573 flags ACK on interface inside I think that's the reason for failure. That deny message basically indicates an idle timeout, which lead to an ACK to be sent after connection was already removed from connection pool by firewall. Does cassandra has something to handle such cases? some retry kind of mechanism? > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > Attachments: cassandra-2.2.4.error.log > > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338535#comment-15338535 ] vin01 commented on CASSANDRA-11845: --- At ASA firewall between DCs i can see lot of deny messages for some packets :- %ASA-6-106015: Deny TCP (no connection) from [NODE_IN_DC_2]/7003 to [NODE_IN_DC_1]/45573 flags ACK on interface inside I think that's the reason for failure. That deny message basically indicates an idle timeout, which lead to an ACK to be sent after connection was already removed from connection pool by firewall. Does cassandra has something to handle such cases? some retry kind of mechanism? > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > Attachments: cassandra-2.2.4.error.log > > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_N
[jira] [Updated] (CASSANDRA-12031) "LEAK DETECTED" during incremental repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-12031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vin01 updated CASSANDRA-12031: -- Environment: Centos 6.6, x86_64, Cassandra 2.2.4 (was: Centos 6.6, x86_64) > "LEAK DETECTED" during incremental repairs > -- > > Key: CASSANDRA-12031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12031 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6.6, x86_64, Cassandra 2.2.4 >Reporter: vin01 >Priority: Minor > > I encountered some errors during an incremental repair session which look > like :- > ERROR [Reference-Reaper:1] 2016-06-19 03:28:35,884 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@2ce0fab3) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1513857473:Memory@[7f2d462191f0..7f2d46219510) > was not released before the reference was garbage collected > Should i be worried about these? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12031) "LEAK DETECTED" during incremental repairs
vin01 created CASSANDRA-12031: - Summary: "LEAK DETECTED" during incremental repairs Key: CASSANDRA-12031 URL: https://issues.apache.org/jira/browse/CASSANDRA-12031 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Environment: Centos 6.6, x86_64 Reporter: vin01 Priority: Minor I encountered some errors during an incremental repair session which look like :- ERROR [Reference-Reaper:1] 2016-06-19 03:28:35,884 Ref.java:187 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2ce0fab3) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1513857473:Memory@[7f2d462191f0..7f2d46219510) was not released before the reference was garbage collected Should i be worried about these? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11966) When SEPWorker assigned work, set thread name to match pool
[ https://issues.apache.org/jira/browse/CASSANDRA-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11966: - Resolution: Fixed Fix Version/s: 3.8 Status: Resolved (was: Patch Available) Looks good to me. +1 Committed as 7ede582fbd55dc959c3d3d26b4634b7472451e74 to trunk. Test results: ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11966-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11966-trunk-testall/lastSuccessfulBuild/]|[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11966-trunk-dtest/lastSuccessfulBuild/] > When SEPWorker assigned work, set thread name to match pool > --- > > Key: CASSANDRA-11966 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11966 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.8 > > Attachments: CASSANDRA-11966.patch, CASSANDRA-11966v3.patch > > > Currently in traces, logs, and stacktraces you cant really associate the > thread name with the pool since its just "SharedWorker-#". Calling setName > around the task could improve logging and tracing a little while being a > cheap operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/2] cassandra git commit: Ninja fix failing utest in trunk
Ninja fix failing utest in trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/471552e0 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/471552e0 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/471552e0 Branch: refs/heads/trunk Commit: 471552e0429267ae62c292487ddbb4d15a7daa49 Parents: 7ede582 Author: Robert Stupp Authored: Sat Jun 18 12:32:34 2016 +0200 Committer: Robert Stupp Committed: Sun Jun 19 13:18:28 2016 +0200 -- .../cassandra/cql3/validation/entities/UserTypesTest.java | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/471552e0/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java -- diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java index 9e9d0e2..7eae039 100644 --- a/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java +++ b/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java @@ -278,7 +278,7 @@ public class UserTypesTest extends CQLTester execute("INSERT INTO %s (x, y) VALUES(1, {'firstValue': {a: 1}})"); assertRows(execute("SELECT * FROM %s"), - row(1, map("firstValue", userType(1; + row(1, map("firstValue", userType("a", 1; flush(); @@ -286,14 +286,14 @@ public class UserTypesTest extends CQLTester execute("UPDATE %s SET y['secondValue'] = {a: 2, b: 2} WHERE x = 1"); assertRows(execute("SELECT * FROM %s"), - row(1, map("firstValue", userType(1), - "secondValue", userType(2, 2; + row(1, map("firstValue", userType("a", 1), + "secondValue", userType("a", 2, "b", 2; flush(); assertRows(execute("SELECT * FROM %s"), - row(1, map("firstValue", userType(1), - "secondValue", userType(2, 2; + row(1, map("firstValue", userType("a", 1), + "secondValue", userType("a", 2, "b", 2; } @Test
[1/2] cassandra git commit: When SEPWorker assigned work, set thread name to match pool
Repository: cassandra Updated Branches: refs/heads/trunk ea1739f72 -> 471552e04 When SEPWorker assigned work, set thread name to match pool patch by Chris Lohfink; reviewed by Robert Stupp for CASSANDRA-11966 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7ede582f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7ede582f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7ede582f Branch: refs/heads/trunk Commit: 7ede582fbd55dc959c3d3d26b4634b7472451e74 Parents: ea1739f Author: Chris Lohfink Authored: Sat Jun 18 12:25:05 2016 +0200 Committer: Robert Stupp Committed: Sun Jun 19 13:18:14 2016 +0200 -- CHANGES.txt | 1 + NEWS.txt | 2 ++ src/java/org/apache/cassandra/concurrent/SEPExecutor.java | 2 ++ src/java/org/apache/cassandra/concurrent/SEPWorker.java | 3 +++ 4 files changed, 8 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/7ede582f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index a74799c..3d8d511 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.8 + * When SEPWorker assigned work, set thread name to match pool (CASSANDRA-11966) * Add cross-DC latency metrics (CASSANDRA-11596) * Allow terms in selection clause (CASSANDRA-10783) * Add bind variables to trace (CASSANDRA-11719) http://git-wip-us.apache.org/repos/asf/cassandra/blob/7ede582f/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index aa2612d..d34613c 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -19,6 +19,8 @@ using the provided 'sstableupgrade' tool. New features + - Shared pool threads are now named according to the stage they are executing + tasks for. Thread names mentioned in traced queries change accordingly. - A new option has been added to cassandra-stress "-rate fixed={number}/s" that forces a scheduled rate of operations/sec over time. Using this, stress can accurately account for coordinated ommission from the stress process. http://git-wip-us.apache.org/repos/asf/cassandra/blob/7ede582f/src/java/org/apache/cassandra/concurrent/SEPExecutor.java -- diff --git a/src/java/org/apache/cassandra/concurrent/SEPExecutor.java b/src/java/org/apache/cassandra/concurrent/SEPExecutor.java index 8b12b82..c87614b 100644 --- a/src/java/org/apache/cassandra/concurrent/SEPExecutor.java +++ b/src/java/org/apache/cassandra/concurrent/SEPExecutor.java @@ -34,6 +34,7 @@ public class SEPExecutor extends AbstractLocalAwareExecutorService private final SharedExecutorPool pool; public final int maxWorkers; +public final String name; private final int maxTasksQueued; private final SEPMetrics metrics; @@ -55,6 +56,7 @@ public class SEPExecutor extends AbstractLocalAwareExecutorService SEPExecutor(SharedExecutorPool pool, int maxWorkers, int maxTasksQueued, String jmxPath, String name) { this.pool = pool; +this.name = name; this.maxWorkers = maxWorkers; this.maxTasksQueued = maxTasksQueued; this.permits.set(combine(0, maxWorkers)); http://git-wip-us.apache.org/repos/asf/cassandra/blob/7ede582f/src/java/org/apache/cassandra/concurrent/SEPWorker.java -- diff --git a/src/java/org/apache/cassandra/concurrent/SEPWorker.java b/src/java/org/apache/cassandra/concurrent/SEPWorker.java index d7c21bc..b3f817a 100644 --- a/src/java/org/apache/cassandra/concurrent/SEPWorker.java +++ b/src/java/org/apache/cassandra/concurrent/SEPWorker.java @@ -30,6 +30,7 @@ import org.apache.cassandra.utils.JVMStabilityInspector; final class SEPWorker extends AtomicReference implements Runnable { private static final Logger logger = LoggerFactory.getLogger(SEPWorker.class); +private static final boolean SET_THREAD_NAME = Boolean.parseBoolean(System.getProperty("cassandra.set_sep_thread_name", "true")); final Long workerId; final Thread thread; @@ -89,6 +90,8 @@ final class SEPWorker extends AtomicReference implements Runnabl assigned = get().assigned; if (assigned == null) continue; +if (SET_THREAD_NAME) +Thread.currentThread().setName(assigned.name + "-" + workerId); task = assigned.tasks.poll(); // if we do have tasks assigned, nobody will change our state so we can simply set it to WORKING
[jira] [Commented] (CASSANDRA-11983) Migration task failed to complete
[ https://issues.apache.org/jira/browse/CASSANDRA-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338421#comment-15338421 ] Marcus Eriksson commented on CASSANDRA-11983: - Could you try with 3.0.x? > Migration task failed to complete > - > > Key: CASSANDRA-11983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11983 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle > Environment: Docker / Kubernetes running > Linux cassandra-21 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) > x86_64 GNU/Linux > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) > Cassnadra 3.5 installed from > deb-src http://www.apache.org/dist/cassandra/debian 35x main >Reporter: Chris Love > Attachments: cass.log > > > When nodes are boostrapping I am getting mulitple errors: "Migration task > failed to complete", from MigrationManager.java > The errors increase as more nodes are added to the ring, as I am creating a > ring of 1k nodes. > Cassandra yaml i here > https://github.com/k8s-for-greeks/gpmr/blob/3d50ff91a139b9c4a7a26eda0fb4dcf9a008fbed/pet-race-devops/docker/cassandra-debian/files/cassandra.yaml -- This message was sent by Atlassian JIRA (v6.3.4#6332)