[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-09-10 Thread Dimitar Dimitrov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542961#comment-16542961
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13938 at 9/11/18 5:52 AM:
---

{quote}The problem is that when {{CompressedInputStream#position()}} is called, 
the new position might be in the middle of a buffer. We need to remember that 
offset, and subtract that value when updating {{current}} in 
{{#reBuffer(boolean)}}. The resaon why is that those offset bytes get double 
counted on the first call to {{#reBuffer()}} after {{#position()}} as we add 
the {{buffer.position()}} to {{current}}. {{current}} already accounts for 
those offset bytes when {{#position()}} was called.
{quote}
[~jasobrown], isn't that equivalent (although a bit more complex) to just 
setting {{current}} to the last reached/read position in the stream when 
rebuffering? (i.e. {{current = streamOffset + buffer.position()}}).

I might be missing something, but the role of {{currentBufferOffset}} seems to 
be solely to "align" {{current}} and {{streamOffset}} the first time after a 
new section is started. Then {{current += buffer.position() - 
currentBufferOffset}} expands to {{current = -current- + buffer.position() + 
streamOffset - -current- }} which is the same as {{current = streamOffset + 
buffer.position()}}. After that first time, {{current}} naturally follows 
{{streamOffset}} without the need of any adjustment, but it seems more natural 
to express this as {{streamOffset + buffer.position()}} instead of the new 
expression or the old {{current + buffer.position()}}. To me, it's also a bit 
more intuitive and easier to understand (hopefully it's also right in addition 
to intuitive :)).

The equivalence above would hold true if {{current}} and {{streamOffset}} don't 
change their value in the meantime, but I think this is ensured by the 
well-ordered sequential fashion in which the decompressing and the offset 
bookkeeping functionality of {{CompressedInputStream}} happen in the thread 
running the corresponding {{StreamDeserializingTask}}.
 * The aforementioned well-ordered sequential fashion seems to be POSITION 
followed by 0-N times REBUFFER + DECOMPRESS, where the first REBUFFER might not 
update {{current}} with the above calculation in case {{current}} is already 
too far ahead (i.e. the new section is not starting within the current buffer).


was (Author: dimitarndimitrov):
{quote}The problem is that when {{CompressedInputStream#position()}} is called, 
the new position might be in the middle of a buffer. We need to remember that 
offset, and subtract that value when updating {{current}} in 
{{#reBuffer(boolean)}}. The resaon why is that those offset bytes get double 
counted on the first call to {{#reBuffer()}} after {{#position()}} as we add 
the {{buffer.position()}} to {{current}}. {{current}} already accounts for 
those offset bytes when {{#position()}} was called.
{quote}
[~jasobrown], isn't that equivalent (although a bit more complex) to just 
setting {{current}} to the last reached/read position in the stream when 
rebuffering? (i.e. {{current = streamOffset + buffer.position()}}).

I might be missing something, but the role of {{currentBufferOffset}} seems to 
be solely to "align" {{current}} and {{streamOffset}} the first time after a 
new section is started. Then {{current += buffer.position() - 
currentBufferOffse expands to }}{{current = -current- + buffer.position() + 
streamOffset - -current- }}which is the same as {{current = streamOffset + 
buffer.position()}}. After that first time, {{current}} naturally follows 
{{streamOffset}} without the need of any adjustment, but it seems more natural 
to express this as {{streamOffset + buffer.position()}} instead of the new 
expression or the old {{current + buffer.position()}}. To me, it's also a bit 
more intuitive and easier to understand (hopefully it's also right in addition 
to intuitive :)).

The equivalence above would hold true if {{current}} and {{streamOffset}} don't 
change their value in the meantime, but I think this is ensured by the 
well-ordered sequential fashion in which the decompressing and the offset 
bookkeeping functionality of {{CompressedInputStream}} happen in the thread 
running the corresponding {{StreamDeserializingTask}}.
 * The aforementioned well-ordered sequential fashion seems to be POSITION 
followed by 0-N times REBUFFER + DECOMPRESS, where the first REBUFFER might not 
update {{current}} with the above calculation in case {{current}} is already 
too far ahead (i.e. the new section is not starting within the current buffer).

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: 

[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

2018-09-10 Thread Cameron Zemek (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610127#comment-16610127
 ] 

Cameron Zemek commented on CASSANDRA-14715:
---

I should also point out this means that the timeouts don't get captured in the 
read timeout metric either due to the timeout occuring on the close for the 
PartitionIterator returned by StorageProxy:read where the timeouts are caught 
(see readRegular)

> Read repairs can result in bogus timeout errors to the client
> -
>
> Key: CASSANDRA-14715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Cameron Zemek
>Priority: Minor
>
> In RepairMergeListener:close() it does the following:
>  
> {code:java}
> try
> {
> FBUtilities.waitOnFutures(repairResults, 
> DatabaseDescriptor.getWriteRpcTimeout());
> }
> catch (TimeoutException ex)
> {
> // We got all responses, but timed out while repairing
> int blockFor = consistency.blockFor(keyspace);
> if (Tracing.isTracing())
> Tracing.trace("Timed out while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> else
> logger.debug("Timeout while read-repairing after receiving all {} 
> data and digest responses", blockFor);
> throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
> }
> {code}
> This propagates up and gets sent to the client and we have customers get 
> confused cause they see timeouts for CL ALL requiring ALL replicas even 
> though they have read_repair_chance = 0 and using a LOCAL_* CL.
> At minimum I suggest instead of using the consistency level of DataResolver 
> (which is always ALL with read repairs) for the timeout it instead use 
> repairResults.size(). That is blockFor = repairResults.size() . But saying it 
> received _blockFor - 1_ is bogus still. Fixing that would require more 
> changes. I was thinking maybe like so:
>  
> {code:java}
> public static void waitOnFutures(List results, long ms, 
> MutableInt counter) throws TimeoutException
> {
> for (AsyncOneResponse result : results)
> {
> result.get(ms, TimeUnit.MILLISECONDS);
> counter.increment();
> }
> }
> {code}
>  
>  
>  
> Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
> _blockFor - 1_ for how many were received, which is also bogus.
>  
> Steps used to reproduce was modify RepairMergeListener:close() to always 
> throw timeout exception.  With schema:
> {noformat}
> CREATE KEYSPACE weather WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
> CREATE TABLE weather.city (
> cityid int PRIMARY KEY,
> name text
> ) WITH bloom_filter_fp_chance = 0.01
> AND dclocal_read_repair_chance = 0.0
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {noformat}
> Then using the following steps:
>  # ccm node1 cqlsh
>  # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
>  # exit;
>  # ccm node1 flush
>  # ccm node1 stop
>  # rm -rf 
> ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
>  # remove the sstable with the insert
>  # ccm node1 start
>  # ccm node1 cqlsh
>  # CONSISTENCY LOCAL_QUORUM;
>  # select * from weather.city where cityid = 1;
> You get result of:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 5 
> responses." info={'received_responses': 5, 'required_responses': 6, 
> 'consistency': 'ALL'}{noformat}
> But was expecting:
> {noformat}
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 1 
> responses." info={'received_responses': 1, 'required_responses': 2, 
> 'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14702) Cassandra Write failed even when the required nodes to Ack(consistency) are up.

2018-09-10 Thread Rohit Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610103#comment-16610103
 ] 

Rohit Singh commented on CASSANDRA-14702:
-

Any update?

> Cassandra Write failed even when the required nodes to Ack(consistency) are 
> up.
> ---
>
> Key: CASSANDRA-14702
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14702
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rohit Singh
>Priority: Major
>
> Hi,
> We have following configuration in our project for cassandra. 
> Total nodes in Cluster-5
> Replication Factor- 3
> Consistency- LOCAL_QUORUM
> We get the writetimeout exception from cassandra even when 2 nodes are up and 
> why does stack trace says that 3 replica were required when consistency is 2?
> Below is the exception we got:-
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency LOCAL_QUORUM (3 replica were required but 
> only 2 acknowledged the write)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:59)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:289)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:269)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105
 ] 

Jason Brown commented on CASSANDRA-13938:
-

[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and
 counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Critical
> Fix For: 4.x
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last 

[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105
 ] 

Jason Brown edited comment on CASSANDRA-13938 at 9/11/18 5:01 AM:
--

[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)


was (Author: jasobrown):
[~dimitarndimitrov], Thanks for your comments, and apologies for the late 
response.

While your proposed simplification indeed clarifies the logic, unfortunately it 
doesn't resolve the bug (my dtest still fails - this is due to the need to 
reset a 'some' value, like the currentBufferOffset, after rebufferring). 
However, your observation about simplifying this patch (in particular eliminate 
{{currentBufferOffset}} made me reconsider the needs of this class. Basically, 
we just need to correctly track the streamOffset for the current buffer, and 
that's all. When I ported this clas from 3.11, I over-complicated the offsets 
and
 counters into the first version of this class (committed with 
CASSANDRA-12229), and then confused it again (while resolving the error) with 
the first patch.

In short: as long as I correctly calculate streamOffset, that should satisfy 
the needs for the class. Thus, I eliminated both {{current}} and 
{{currentBufferOffset}}, and the result is clearer and correct.

I've pushed a cleaned up branch (which has been rebased to trunk). Please note 
that, as with the first patch, the majority of this patch is refactoring to 
clean up the class in general. I've also updated my dtest patch as my version 
required a stress profile (based on [~zznate]'s original) to be committed, as 
well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as 
before, I'm unable to get that to fail on trunk.)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Critical
> Fix For: 4.x
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   

[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610049#comment-16610049
 ] 

Jordan West commented on CASSANDRA-14714:
-

It would be nice to have a workaround for this that doesn’t involve needing 
Java 11 installed on the machine. Is that being tracked as part of 
CASSANDRA-14712? I came across this while trying to run {{mvn-install}}. Fwiw, 
at least for {{mvn-install}}, removing this line fixes it: 
[https://github.com/apache/cassandra/blob/trunk/build.xml#L1069.] 

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
>  Labels: Java11
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client

2018-09-10 Thread Cameron Zemek (JIRA)
Cameron Zemek created CASSANDRA-14715:
-

 Summary: Read repairs can result in bogus timeout errors to the 
client
 Key: CASSANDRA-14715
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14715
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
Reporter: Cameron Zemek


In RepairMergeListener:close() it does the following:

 
{code:java}
try
{
FBUtilities.waitOnFutures(repairResults, 
DatabaseDescriptor.getWriteRpcTimeout());
}
catch (TimeoutException ex)
{
// We got all responses, but timed out while repairing
int blockFor = consistency.blockFor(keyspace);
if (Tracing.isTracing())
Tracing.trace("Timed out while read-repairing after receiving all {} 
data and digest responses", blockFor);
else
logger.debug("Timeout while read-repairing after receiving all {} data 
and digest responses", blockFor);

throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
}
{code}
This propagates up and gets sent to the client and we have customers get 
confused cause they see timeouts for CL ALL requiring ALL replicas even though 
they have read_repair_chance = 0 and using a LOCAL_* CL.

At minimum I suggest instead of using the consistency level of DataResolver 
(which is always ALL with read repairs) for the timeout it instead use 
repairResults.size(). That is blockFor = repairResults.size() . But saying it 
received _blockFor - 1_ is bogus still. Fixing that would require more changes. 
I was thinking maybe like so:

 
{code:java}
public static void waitOnFutures(List results, long ms, 
MutableInt counter) throws TimeoutException
{
for (AsyncOneResponse result : results)
{
result.get(ms, TimeUnit.MILLISECONDS);
counter.increment();
}
}
{code}
 

 

 

Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says 
_blockFor - 1_ for how many were received, which is also bogus.

 

Steps used to reproduce was modify RepairMergeListener:close() to always throw 
timeout exception.  With schema:
{noformat}
CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3', 'dc2': '3'}  AND durable_writes = true;

CREATE TABLE weather.city (
cityid int PRIMARY KEY,
name text
) WITH bloom_filter_fp_chance = 0.01
AND dclocal_read_repair_chance = 0.0
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}
Then using the following steps:
 # ccm node1 cqlsh
 # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
 # exit;
 # ccm node1 flush
 # ccm node1 stop
 # rm -rf 
~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-*
 # remove the sstable with the insert
 # ccm node1 start
 # ccm node1 cqlsh
 # CONSISTENCY LOCAL_QUORUM;
 # select * from weather.city where cityid = 1;

You get result of:
{noformat}
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
for replica nodes' responses] message="Operation timed out - received only 5 
responses." info={'received_responses': 5, 'required_responses': 6, 
'consistency': 'ALL'}{noformat}
But was expecting:
{noformat}
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
for replica nodes' responses] message="Operation timed out - received only 1 
responses." info={'received_responses': 1, 'required_responses': 2, 
'consistency': 'LOCAL_QUORUM'}{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609973#comment-16609973
 ] 

Jason Brown commented on CASSANDRA-14346:
-

Somehow this got marked as Ready to Commit; switched back to Patch Available.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: Patch Available  (was: Awaiting Feedback)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: Awaiting Feedback  (was: In Progress)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14346:

Status: In Progress  (was: Ready to Commit)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-09-10 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated CASSANDRA-14346:
--
Status: Ready to Commit  (was: Patch Available)

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.x
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread Dinesh Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-14503:
-
Reviewers: Dinesh Joshi

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14714:

Labels: Java11  (was: )

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
>  Labels: Java11
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14712) Cassandra 4.0 packaging support

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14712:

Labels: Java11  (was: )

> Cassandra 4.0 packaging support
> ---
>
> Key: CASSANDRA-14712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Stefan Podkowinski
>Priority: Major
>  Labels: Java11
> Fix For: 4.x
>
>
> Currently it's not possible to build any native packages (.deb/.rpm) for 
> trunk.
> cassandra-builds - docker/*-image.docker
>  * Add Java11 to debian+centos build image
>  * (packaged ant scripts won't work with Java 11 on centos, so we may have to 
> install ant from tarballs)
> cassandra-builds - docker/build-*.sh
>  * set JAVA8_HOME to Java8
>  * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0)
> cassandra - redhat/cassandra.spec
>  * Check if patches still apply after CASSANDRA-14707
>  * Add fqltool as %files
> We may also have to change the version handling in build.xml or build-*.sh, 
> depending how we plan to release packages during beta, or if we plan to do so 
> at all before GA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Michael Shuler (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler resolved CASSANDRA-14714.

Resolution: Not A Problem

Thanks for the Jira pointer.

Local fix and I can build tar.gz artifacts successfully:
{noformat}
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
export JAVA8_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64{noformat}

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Michael Shuler (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609740#comment-16609740
 ] 

Michael Shuler commented on CASSANDRA-14714:


{noformat}
  - *Experimental* support for Java 11 has been added. JVM options that differ 
between or are 
specific for Java 8 and 11 have been moved from jvm.options into 
jvm8.options and jvm11.options. 
IMPORTANT: Running C* on Java 11 is *experimental* and do it at your own 
risk. 
Compilation recommendations: configure Java 11 SDK via JAVA_HOME and Java 8 
SDK via JAVA8_HOME. 
Release builds require Java 11 + Java 8. Development builds can use Java 8 
without 11.
{noformat}
We'll see what I can work out here locally with some env vars.

I found this issue when checking on linking to artifacts builds in Jenkins. 
Basic Jenkins slave usage means only one JDK version available..

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Stefan Podkowinski (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609722#comment-16609722
 ] 

Stefan Podkowinski commented on CASSANDRA-14714:


I've tried to wrap up some of the 4.0 related build/packaging issues in 
CASSANDRA-14712

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Michael Shuler (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609717#comment-16609717
 ] 

Michael Shuler commented on CASSANDRA-14714:


{noformat}
((6ba2fb9395...)|BISECTING)mshuler@hana:~/git/cassandra$ git bisect bad 
6ba2fb9395226491872b41312d978a169f36fcdb is the first bad commit 
commit 6ba2fb9395226491872b41312d978a169f36fcdb 
Author: Robert Stupp  
Date:   Tue Sep 12 20:04:30 2017 +0200 

   Make C* compile and run on Java 11 and Java 8 

   patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9608
{noformat}

> `ant artifacts` broken on trunk (4.0); creates no tar artifacts
> ---
>
> Key: CASSANDRA-14714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Priority: Blocker
> Fix For: 4.0
>
>
> `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
> Additionally, the target does not exit non-zero, so the result is:
> {noformat}
> <...>
> artifacts:
> BUILD SUCCESSFUL
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts

2018-09-10 Thread Michael Shuler (JIRA)
Michael Shuler created CASSANDRA-14714:
--

 Summary: `ant artifacts` broken on trunk (4.0); creates no tar 
artifacts
 Key: CASSANDRA-14714
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14714
 Project: Cassandra
  Issue Type: Bug
Reporter: Michael Shuler
 Fix For: 4.0


`ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. 
Additionally, the target does not exit non-zero, so the result is:

{noformat}
<...>
artifacts:

BUILD SUCCESSFUL
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14289) Document sstable tools

2018-09-10 Thread Valerie Parham-Thompson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609668#comment-16609668
 ] 

Valerie Parham-Thompson commented on CASSANDRA-14289:
-

I've completed these documents, and am getting peer review.

> Document sstable tools
> --
>
> Key: CASSANDRA-14289
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14289
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Hannu Kröger
>Priority: Major
> Attachments: gen-sstable-docs.py, sstabledocs.tar.gz
>
>
> Following tools are missing in the documentation of cassandra tools on the 
> documentation site (http://cassandra.apache.org/doc/latest/tools/index.html):
>  * sstabledump
>  * sstableexpiredblockers
>  * sstablelevelreset
>  * sstableloader
>  * sstablemetadata
>  * sstableofflinerelevel
>  * sstablerepairedset
>  * sstablescrub
>  * sstablesplit
>  * sstableupgrade
>  * sstableutil
>  * sstableverify



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation

2018-09-10 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609614#comment-16609614
 ] 

Marcus Eriksson commented on CASSANDRA-3200:


While reviewing CASSANDRA-14693 I realised that the dtests for this were never 
committed, could you have a quick look [~bdeggleston]?
https://github.com/krummas/cassandra-dtest/commits/marcuse/3200
and circle run:
https://circleci.com/gh/krummas/cassandra/tree/marcuse%2Ffor_3200_dtests

> Repair: compare all trees together (for a given range/cf) instead of by pair 
> in isolation
> -
>
> Key: CASSANDRA-3200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: repair
> Fix For: 4.0
>
>
> Currently, repair compare merkle trees by pair, in isolation of any other 
> tree. What that means concretely is that if I have three node A, B and C 
> (RF=3) with A and B in sync, but C having some range r inconsitent with both 
> A and B (since those are consistent), we will do the following transfer of r: 
> A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know 
> which one is more to date from A or C. However, the transfer B -> C is 
> useless provided we do A -> C if A and B are in sync. Not doing that transfer 
> will be a 25% improvement in that case. With RF=5 and only one node 
> inconsistent with all the others, that almost a 40% improvement, etc...
> Given that this situation of one node not in sync while the others are is 
> probably fairly common (one node died so it is behind), this could be a fair 
> improvement over what is transferred. In the case where we use repair to 
> rebuild completely a node, this will be a dramatic improvement, because it 
> will avoid the rebuilded node to get RF times the data it should get.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14705) ReplicaLayout follow-up

2018-09-10 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609615#comment-16609615
 ] 

Ariel Weisberg commented on CASSANDRA-14705:


[~ifesdjeen] that branch you linked to in your PR is the wrong one, it's 14705

> ReplicaLayout follow-up
> ---
>
> Key: CASSANDRA-14705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14705
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Benedict
>Priority: Major
>
> Clarify the new {{ReplicaLayout}} code, separating it into ReplicaPlan (for 
> what we want to do) and {{ReplicaLayout}} (for what we know about the 
> cluster), with well defined semantics (and comments in the rare cases those 
> semantics are weird)
> Found and fixed some bugs:
>   - {{commitPaxos}} was using only live nodes, when needed to include down
>   - We were not writing to pending transient replicas
>   - On write, we were not hinting to full nodes with transient 
> replication enabled (since we filtered to {{liveOnly}}, in order to include 
> our transient replicas above {{blockFor}})
> - If we speculated, in {{maybeSendAdditionalReads}} (in read repair) 
> we would only consult the same node we had speculated too.  This also applied 
> to {{maybeSendAdditionalWrites}} - and this issue was also true pre-TR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14693) Follow-up: allow transient node to serve as repair coordinator

2018-09-10 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609471#comment-16609471
 ] 

Marcus Eriksson commented on CASSANDRA-14693:
-

the new class hierarchy looks great, just a minor comment that we could remove 
the parameter to {{startSync}} and instead make {{private final 
List> rangesToSync;}} protected and use that, makes it a bit 
clearer since we never call {{startSync}} with anything else

> Follow-up: allow transient node to serve as repair coordinator
> --
>
> Key: CASSANDRA-14693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14693
> Project: Cassandra
>  Issue Type: Task
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Minor
>
> Allow transient node to serve as a coordinator. 
> |[trunk|https://github.com/apache/cassandra/pull/257]|[utest|https://circleci.com/gh/ifesdjeen/cassandra/329]|[dtest|https://circleci.com/gh/ifesdjeen/cassandra/330]|[dtest-novnode|https://circleci.com/gh/ifesdjeen/cassandra/328]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14549) Transient Replication: support logged batches

2018-09-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-14549:
---
Labels: pull-request-available  (was: )

> Transient Replication: support logged batches
> -
>
> Key: CASSANDRA-14549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14549
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Blake Eggleston
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609190#comment-16609190
 ] 

Saurabh commented on CASSANDRA-14711:
-

[~jasobrown] - Thanks for your response. We are in a process of planning the 
upgrade but as it is a Prod it will take time.

We have started seeing this issue just a few days back and trying to fix it. 
There were no changes from Application code/DB changes. 

As per the hr_err log file (attached), I can see a lot of threads in Blocked 
status and also 100% used HEAP regions. I have tried increasing the

-Xms - 4G -> 8G  -> 16G

-Xmx - 4G -> 8G -> 16G  

 

but this didnot thelp much but just delayed the crash. Something is pinning up 
in the memory but the cassandra logs does not show any OOM errors too.

> Apache Cassandra 3.2 crashing with exception 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom
> 
>
> Key: CASSANDRA-14711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Saurabh
>Priority: Major
> Attachments: hs_err_pid32069.log
>
>
> Hi Team,
> I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..
> Issue:
> Cassandra is continuously crashing with generating an HEAP dump log. There 
> are no errors reported in system.log OR Debug.log.
> Exception in hs_err_PID.log:
>  # Problematic frame:
>  # J 8283 C2 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
>  (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
> Java Threads: ( => current thread )
>  0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
> [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
>  0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
> [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
>  0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon 
> [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
>  0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
> [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
>  0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
>  :
>  :
>  lot of threads in BLOCKED status
> Other Threads:
>  0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
> [id=32098]
>  0x2b7d38fa9de0 WatcherThread [stack: 
> 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]
> VM state:not at safepoint (normal execution)
> VM Mutex/Monitor currently owned by a thread: None
> Heap:
>  garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
> 0x0003c0404000, 0x0007c000)
>  region size 4096K, 785 young (3215360K), 55 survivors (225280K)
>  Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
>  class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K
> Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
> HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
> PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
>  AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
> 100% used [0x0003c000, 0x0003c040)
>  AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c040, 0x0003c080)
>  AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c080, 0x0003c0c0)
>  AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
> 100% used [0x0003c0c0, 0x0003c100)
>  AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
> 100% used [0x0003c100, 0x0003c140)
>  AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
> 100% used [0x0003c140, 0x0003c180)
>  :
>  :
>  lot of such messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13348) Duplicate tokens after bootstrap

2018-09-10 Thread Stefan Podkowinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski resolved CASSANDRA-13348.

Resolution: Cannot Reproduce

> Duplicate tokens after bootstrap
> 
>
> Key: CASSANDRA-13348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13348
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom van der Woerdt
>Assignee: Dikang Gu
>Priority: Blocker
> Fix For: 3.0.x
>
>
> This one is a bit scary, and probably results in data loss. After a bootstrap 
> of a few new nodes into an existing cluster, two new nodes have chosen some 
> overlapping tokens.
> In fact, of the 256 tokens chosen, 51 tokens were already in use on the other 
> node.
> Node 1 log :
> {noformat}
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 
> StorageService.java:1160 - JOINING: waiting for ring information
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 
> StorageService.java:1160 - JOINING: waiting for schema information to complete
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 
> StorageService.java:1160 - JOINING: schema complete, ready to bootstrap
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 
> StorageService.java:1160 - JOINING: waiting for pending range calculation
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 
> StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 
> StorageService.java:1160 - JOINING: getting bootstrap token
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,564 
> TokenAllocation.java:61 - Selected tokens [, 2959334889475814712, 
> 3727103702384420083, 7183119311535804926, 6013900799616279548, 
> -1222135324851761575, 1645259890258332163, -1213352346686661387, 
> 7604192574911909354]
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 
> TokenAllocation.java:65 - Replicated node load in datacentre before 
> allocation max 1.00 min 1.00 stddev 0.
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 
> TokenAllocation.java:66 - Replicated node load in datacentre after allocation 
> max 1.00 min 1.00 stddev 0.
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 
> TokenAllocation.java:70 - Unexpected growth in standard deviation after 
> allocation.
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:44,150 
> StorageService.java:1160 - JOINING: sleeping 3 ms for pending range setup
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:43:14,151 
> StorageService.java:1160 - JOINING: Starting to bootstrap...
> {noformat}
> Node 2 log:
> {noformat}
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:51,937 
> StorageService.java:971 - Joining ring by operator request
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 
> StorageService.java:1160 - JOINING: waiting for ring information
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 
> StorageService.java:1160 - JOINING: waiting for schema information to complete
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 
> StorageService.java:1160 - JOINING: schema complete, ready to bootstrap
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 
> StorageService.java:1160 - JOINING: waiting for pending range calculation
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 
> StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 
> StorageService.java:1160 - JOINING: getting bootstrap token
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,630 
> TokenAllocation.java:61 - Selected tokens [.., 2890709530010722764, 
> -2416006722819773829, -5820248611267569511, -5990139574852472056, 
> 1645259890258332163, 9135021011763659240, -5451286144622276797, 
> 7604192574911909354]
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,794 
> TokenAllocation.java:65 - Replicated node load in datacentre before 
> allocation max 1.02 min 0.98 stddev 0.
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,795 
> TokenAllocation.java:66 - Replicated node load in datacentre after allocation 
> max 1.00 min 1.00 stddev 0.
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:53,149 
> StorageService.java:1160 - JOINING: sleeping 3 ms for pending range setup
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:56:23,149 
> StorageService.java:1160 - JOINING: Starting to bootstrap...
> {noformat}
> eg. 7604192574911909354 has been chosen by both.
> The joins were eight days apart, so I don't 

[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14503:

Fix Version/s: 4.0
   Status: Patch Available  (was: Open)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609099#comment-16609099
 ] 

Jason Brown commented on CASSANDRA-14503:
-

Patch available here:

||14503||
|[branch|https://github.com/jasobrown/cassandra/tree/14503]|
|[utests  
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503]|
||

Additionally, I've [created a Pull 
Request|https://github.com/apache/cassandra/pull/264] for review, as well.

Note: this patch will need to be rebased when CASSANDRA-13630 is committed, and 
incorprate the changes ChannelWriter for large messages, but that should not 
affect this patch much (I've been keeping that in mind as I worked on this)

- OutboundMessagingConnection changes 
-- All producer threads queue messages into the backlog, and messages are only 
consumed by a task on a fixed thread (the event loop). Producers will contend 
to schedule the consumer, 
but have no further involvement in sending a message (unlike the current 
implementation).
-- All netty-related activity (setting up a remote connection, 
connection-related callbacks and time outs, consuming form the backlog and 
writing to the channel and associated callbacks)
are all handled on the event loop. OutboundMessagingConnection gets a reference 
to a event loop in it's constructor, and uses that for the duration of it's 
lifetime.
-- Finally forward-ported the queue bounding functionality of CASSANDRA-13265. 
In short, we want to limit the size of queued messages in order to not OOM. 
Thus, we schedule a task for the consumer thread
 that examines the queue looking for elements to prune. Further, I've added a 
naive upper bound to the queue so that producers drop messages before enqueuing 
if the backlog is in a *really* bad state.
@djoshi3 has recomended bounding by message size rather than by message count, 
which I agree with, but propose saving that for a followup ticket.
-- Cleaner, more documented, and better tested State machine to manage state 
transitions for the class.

- ChannelWriter and MessageOutHandler became much simpler as we can control the 
flush behaviors from the OMC (instead of the previous complicated CW/MOH dance) 
because we're already on the event loop
when consuming from the backlog and writing to the channel.

- I was able to clean up/remove a bunch of extra code due to this 
simplification, as well (ExpiredException, OutboundMessagingParameters, 
MessageResult)

- Updated all the javadoc documentation for these changes (mostly OMC and 
ChannelWriter)

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone

2018-09-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-14503:
---
Labels: pull-request-available  (was: )

> Internode connection management is race-prone
> -
>
> Key: CASSANDRA-14503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14503
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Jason Brown
>Priority: Major
>  Labels: pull-request-available
>
> Following CASSANDRA-8457, internode connection management has been rewritten 
> to rely on Netty, but the new implementation in 
> {{OutboundMessagingConnection}} seems quite race prone to me, in particular 
> on those two cases:
> * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the 
> former could run into an NPE if the latter nulls the {{channelWriter}} (but 
> this is just an example, other conflicts might happen).
> * Connection timeout and retry racing with state changing methods: 
> {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when 
> handshaking or closing, but there's no guarantee those will be actually 
> cancelled (as they might be already running), so they might end up changing 
> the connection state concurrently with other methods (i.e. by unexpectedly 
> closing the channel or clearing the backlog).
> Overall, the thread safety of {{OutboundMessagingConnection}} is very 
> difficult to assess given the current implementation: I would suggest to 
> refactor it into a single-thread model, where all connection state changing 
> actions are enqueued on a single threaded scheduler, so that state 
> transitions can be clearly defined and checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients

2018-09-10 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609073#comment-16609073
 ] 

Benedict commented on CASSANDRA-14708:
--

Thanks.  It looks like we've at least introduced a bug wrt adding hours and 
seconds to a date/timestamp across leap second boundaries (and if we introduce 
TZ support, across DST boundaries), but that's an issue for another ticket.  
You brought up the issue of leap seconds in that discussion, I can see, so it's 
a shame this wasn't accounted for in the eventual solution.

On the topic of this ticket, I agree that making the type accept nanos 
exclusively is not the solution; that is a different type of duration.  It 
might have been nice to use the JDK or Joda time nomenclature for some 
consistency, and call it a period (and reserve duration for those operating 
exclusively on nanos/millis, much as in Go), but c'est la vie.

> protocol v5 duration wire format is overly complex and awkward to implement 
> for clients
> ---
>
> Key: CASSANDRA-14708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14708
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Bannister
>Priority: Major
>
> Protocol V5 defines the duration type to be on the wire as months, days and 
> nanoseconds. Days and months require a timezone to make sense of the duration 
> and varies depending on from which they are applied for.
>  
> Go defines a [duration|https://golang.org/pkg/time/#Duration] type as 
> nanoseconds in int64 which can represent ~290 years. Java 
> [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] 
> does not have way to handle months.
>  
> I suggest that before 4.0 is release the duration format is converted to just 
> be represented as nanoseconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609050#comment-16609050
 ] 

Jason Brown commented on CASSANDRA-14711:
-

So, the first thing to know is that 3.2 is an, old unsupported release. 3.11.3 
is the currently supported 3.X release.

> Apache Cassandra 3.2 crashing with exception 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom
> 
>
> Key: CASSANDRA-14711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Saurabh
>Priority: Major
> Attachments: hs_err_pid32069.log
>
>
> Hi Team,
> I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..
> Issue:
> Cassandra is continuously crashing with generating an HEAP dump log. There 
> are no errors reported in system.log OR Debug.log.
> Exception in hs_err_PID.log:
>  # Problematic frame:
>  # J 8283 C2 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
>  (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
> Java Threads: ( => current thread )
>  0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
> [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
>  0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
> [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
>  0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon 
> [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
>  0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
> [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
>  0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
>  :
>  :
>  lot of threads in BLOCKED status
> Other Threads:
>  0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
> [id=32098]
>  0x2b7d38fa9de0 WatcherThread [stack: 
> 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]
> VM state:not at safepoint (normal execution)
> VM Mutex/Monitor currently owned by a thread: None
> Heap:
>  garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
> 0x0003c0404000, 0x0007c000)
>  region size 4096K, 785 young (3215360K), 55 survivors (225280K)
>  Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
>  class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K
> Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
> HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
> PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
>  AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
> 100% used [0x0003c000, 0x0003c040)
>  AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c040, 0x0003c080)
>  AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c080, 0x0003c0c0)
>  AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
> 100% used [0x0003c0c0, 0x0003c100)
>  AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
> 100% used [0x0003c100, 0x0003c140)
>  AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
> 100% used [0x0003c140, 0x0003c180)
>  :
>  :
>  lot of such messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients

2018-09-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609045#comment-16609045
 ] 

Sylvain Lebresne commented on CASSANDRA-14708:
--

bq. Do you have a link to the original discussions around its inclusion

Well, it's not exactly hard to find: CASSANDRA-11873.

> protocol v5 duration wire format is overly complex and awkward to implement 
> for clients
> ---
>
> Key: CASSANDRA-14708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14708
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Bannister
>Priority: Major
>
> Protocol V5 defines the duration type to be on the wire as months, days and 
> nanoseconds. Days and months require a timezone to make sense of the duration 
> and varies depending on from which they are applied for.
>  
> Go defines a [duration|https://golang.org/pkg/time/#Duration] type as 
> nanoseconds in int64 which can represent ~290 years. Java 
> [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] 
> does not have way to handle months.
>  
> I suggest that before 4.0 is release the duration format is converted to just 
> be represented as nanoseconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients

2018-09-10 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609039#comment-16609039
 ] 

Benedict commented on CASSANDRA-14708:
--

{quote}I'll also note this all went in C* 3.10, so it's not like we can really 
change the goals of the duration type
{quote}
We can at least revisit if it turns out to not make enough sense, and I'm not 
sure that it does.  Do you have a link to the original discussions around its 
inclusion, because it seems to treat the concept of durations in a confusing 
manner.  At the very least, if it's accepting {{months}} and {{days}} as 
parameters, it should be accepting {{hours}}, and {{seconds}} because these are 
not occupy a consistent number of nanos across all points in time.  Typically, 
a time library will offer facilities to work exclusively in millis/nanos, or in 
all date components, not mix the two half-heartedly.

This has me generally worried about how we handle time in Cassandra.

> protocol v5 duration wire format is overly complex and awkward to implement 
> for clients
> ---
>
> Key: CASSANDRA-14708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14708
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Bannister
>Priority: Major
>
> Protocol V5 defines the duration type to be on the wire as months, days and 
> nanoseconds. Days and months require a timezone to make sense of the duration 
> and varies depending on from which they are applied for.
>  
> Go defines a [duration|https://golang.org/pkg/time/#Duration] type as 
> nanoseconds in int64 which can represent ~290 years. Java 
> [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] 
> does not have way to handle months.
>  
> I suggest that before 4.0 is release the duration format is converted to just 
> be represented as nanoseconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14298) cqlshlib tests broken on b.a.o

2018-09-10 Thread Stefan Podkowinski (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608965#comment-16608965
 ] 

Stefan Podkowinski edited comment on CASSANDRA-14298 at 9/10/18 10:15 AM:
--

Thanks [~mkjellman]! Can you please attach your Dockerfile to CASSANDRA-14713? 
I'll then try to get rid of the ADDed resource dependencies.


was (Author: spo...@gmail.com):
Thanks [~mkjellman]! Can you please attach your Dockerimage file to 
CASSANDRA-14713? I'll then try to get rid of the ADDed resource dependencies.

> cqlshlib tests broken on b.a.o
> --
>
> Key: CASSANDRA-14298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14298
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build, Testing
>Reporter: Stefan Podkowinski
>Assignee: Patrick Bannister
>Priority: Major
>  Labels: cqlsh, dtest
> Attachments: CASSANDRA-14298-old.txt, CASSANDRA-14298.txt, 
> cqlsh_tests_notes.md
>
>
> It appears that cqlsh-tests on builds.apache.org on all branches stopped 
> working since we removed nosetests from the system environment. See e.g. 
> [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-cqlsh-tests/458/cython=no,jdk=JDK%201.8%20(latest),label=cassandra/console].
>  Looks like we either have to make nosetests available again or migrate to 
> pytest as we did with dtests. Giving pytest a quick try resulted in many 
> errors locally, but I haven't inspected them in detail yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14298) cqlshlib tests broken on b.a.o

2018-09-10 Thread Stefan Podkowinski (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608965#comment-16608965
 ] 

Stefan Podkowinski commented on CASSANDRA-14298:


Thanks [~mkjellman]! Can you please attach your Dockerimage file to 
CASSANDRA-14713? I'll then try to get rid of the ADDed resource dependencies.

> cqlshlib tests broken on b.a.o
> --
>
> Key: CASSANDRA-14298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14298
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build, Testing
>Reporter: Stefan Podkowinski
>Assignee: Patrick Bannister
>Priority: Major
>  Labels: cqlsh, dtest
> Attachments: CASSANDRA-14298-old.txt, CASSANDRA-14298.txt, 
> cqlsh_tests_notes.md
>
>
> It appears that cqlsh-tests on builds.apache.org on all branches stopped 
> working since we removed nosetests from the system environment. See e.g. 
> [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-cqlsh-tests/458/cython=no,jdk=JDK%201.8%20(latest),label=cassandra/console].
>  Looks like we either have to make nosetests available again or migrate to 
> pytest as we did with dtests. Giving pytest a quick try resulted in many 
> errors locally, but I haven't inspected them in detail yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14713) Add docker testing image to cassandra-builds

2018-09-10 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-14713:
--

 Summary: Add docker testing image to cassandra-builds
 Key: CASSANDRA-14713
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14713
 Project: Cassandra
  Issue Type: New Feature
  Components: Testing
Reporter: Stefan Podkowinski


Tests executed on builds.apache.org ({{docker/jenkins/jenkinscommand.sh}}) and 
circleCI ({{.circleci/config.yml}}) will currently use the same 
[cassandra-test|https://hub.docker.com/r/kjellman/cassandra-test/] docker image 
([github|https://github.com/mkjellman/cassandra-test-docker]) by [~mkjellman].

We should manage this image on our own as part of cassandra-builds, to keep it 
updated. There's also a [Apache user|https://hub.docker.com/u/apache/?page=1] 
on docker hub for publishing images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients

2018-09-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608961#comment-16608961
 ] 

Sylvain Lebresne commented on CASSANDRA-14708:
--

The duration time has been added primarily for doing aggregations over time, 
and if you want to aggregate things by months, you don't want that to be all 
messed up because you have to provide a time in nanoseconds which gives you no 
way to get proper month boundaries. Overall, we cannot use nanoseconds for 
duration in the way duration are currently implemented and used (including user 
visible duration values like {{3m2d5s}}).

I just don't think our duration type and the similarly-named Golang one have 
the same purpose. It might be a shame they have the same name, but well... 

I'll also note this all went in C* 3.10, so it's not like we can really change 
the goals of the duration type now even if we agreed this was a good idea (I 
don't).

 

> protocol v5 duration wire format is overly complex and awkward to implement 
> for clients
> ---
>
> Key: CASSANDRA-14708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14708
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Bannister
>Priority: Major
>
> Protocol V5 defines the duration type to be on the wire as months, days and 
> nanoseconds. Days and months require a timezone to make sense of the duration 
> and varies depending on from which they are applied for.
>  
> Go defines a [duration|https://golang.org/pkg/time/#Duration] type as 
> nanoseconds in int64 which can represent ~290 years. Java 
> [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] 
> does not have way to handle months.
>  
> I suggest that before 4.0 is release the duration format is converted to just 
> be represented as nanoseconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14712) Cassandra 4.0 packaging support

2018-09-10 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-14712:
--

 Summary: Cassandra 4.0 packaging support
 Key: CASSANDRA-14712
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14712
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Reporter: Stefan Podkowinski
 Fix For: 4.x


Currently it's not possible to build any native packages (.deb/.rpm) for trunk.

cassandra-builds - docker/*-image.docker
 * Add Java11 to debian+centos build image
 * (packaged ant scripts won't work with Java 11 on centos, so we may have to 
install ant from tarballs)

cassandra-builds - docker/build-*.sh
 * set JAVA8_HOME to Java8
 * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0)

cassandra - redhat/cassandra.spec
 * Check if patches still apply after CASSANDRA-14707
 * Add fqltool as %files

We may also have to change the version handling in build.xml or build-*.sh, 
depending how we plan to release packages during beta, or if we plan to do so 
at all before GA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14704) Validate transient status on query

2018-09-10 Thread Alex Petrov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14704:

Status: Patch Available  (was: Open)

>  Validate transient status on query
> ---
>
> Key: CASSANDRA-14704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>
> Validate transient status on query:
> |[patch|https://github.com/apache/cassandra/pull/261]|[utest|https://circleci.com/gh/ifesdjeen/cassandra/393]|[dtest-novnode|https://circleci.com/gh/ifesdjeen/cassandra/394]|[dtest-vnode|https://circleci.com/gh/ifesdjeen/cassandra/392]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608828#comment-16608828
 ] 

Saurabh commented on CASSANDRA-14711:
-

Cassandra config:

 

java -ea -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k 
-XX:StringTableSize=103 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking 
-XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem 
-Djava.net.preferIPv4Stack=true -Xms8G -Xmx16G -XX:+CMSClassUnloadingEnabled 
-XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 
-XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=10 
-XX:ConcGCThreads=3 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure 
-XX:PrintFLSStatistics=1 -Xloggc:/data/cassandra/log/gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M 
-XX:CompileCommandFile=/data/tools/repository/apache-cassandra-3.2/conf/hotspot_compiler
 -javaagent:/data/tools/repository/apache-cassandra-3.2/lib/jamm-0.3.0.jar 
-Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 
-XX:+DisableExplicitGC 
-Djava.library.path=/data/tools/repository/apache-cassandra-3.2/lib/sigar-bin 
-Dcassandra.max_queued_native_transport_requests=4096 
-Dlogback.configurationFile=logback.xml -Dcas 
andra.logdir=/data/tools/repository/apache-cassandra-3.2/logs 
-Dcassandra.storagedir=/data/tools/repository/apache-cassandra-3.2/data -cp 
/data/tools/repository/apache-cassandra-3.2/conf

> Apache Cassandra 3.2 crashing with exception 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom
> 
>
> Key: CASSANDRA-14711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Saurabh
>Priority: Minor
> Attachments: hs_err_pid32069.log
>
>
> Hi Team,
> I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..
> Issue:
> Cassandra is continuously crashing with generating an HEAP dump log. There 
> are no errors reported in system.log OR Debug.log.
> Exception in hs_err_PID.log:
>  # Problematic frame:
>  # J 8283 C2 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
>  (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
> Java Threads: ( => current thread )
>  0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
> [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
>  0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
> [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
>  0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon 
> [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
>  0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
> [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
>  0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
>  :
>  :
>  lot of threads in BLOCKED status
> Other Threads:
>  0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
> [id=32098]
>  0x2b7d38fa9de0 WatcherThread [stack: 
> 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]
> VM state:not at safepoint (normal execution)
> VM Mutex/Monitor currently owned by a thread: None
> Heap:
>  garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
> 0x0003c0404000, 0x0007c000)
>  region size 4096K, 785 young (3215360K), 55 survivors (225280K)
>  Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
>  class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K
> Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
> HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
> PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
>  AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
> 100% used [0x0003c000, 0x0003c040)
>  AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c040, 0x0003c080)
>  AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c080, 0x0003c0c0)
>  AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
> 100% used [0x0003c0c0, 0x0003c100)
>  AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
> 100% used [0x0003c100, 0x0003c140)
>  AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
> 100% used [0x0003c140, 0x0003c180)
>  :
>  :
>  lot 

[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh updated CASSANDRA-14711:

Priority: Major  (was: Minor)

> Apache Cassandra 3.2 crashing with exception 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom
> 
>
> Key: CASSANDRA-14711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Saurabh
>Priority: Major
> Attachments: hs_err_pid32069.log
>
>
> Hi Team,
> I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..
> Issue:
> Cassandra is continuously crashing with generating an HEAP dump log. There 
> are no errors reported in system.log OR Debug.log.
> Exception in hs_err_PID.log:
>  # Problematic frame:
>  # J 8283 C2 
> org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
>  (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
> Java Threads: ( => current thread )
>  0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
> [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
>  0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
> [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
>  0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon 
> [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
>  0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
> [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
>  0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
>  :
>  :
>  lot of threads in BLOCKED status
> Other Threads:
>  0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
> [id=32098]
>  0x2b7d38fa9de0 WatcherThread [stack: 
> 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]
> VM state:not at safepoint (normal execution)
> VM Mutex/Monitor currently owned by a thread: None
> Heap:
>  garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
> 0x0003c0404000, 0x0007c000)
>  region size 4096K, 785 young (3215360K), 55 survivors (225280K)
>  Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
>  class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K
> Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
> HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
> PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
>  AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
> 100% used [0x0003c000, 0x0003c040)
>  AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c040, 0x0003c080)
>  AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
> 100% used [0x0003c080, 0x0003c0c0)
>  AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
> 100% used [0x0003c0c0, 0x0003c100)
>  AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
> 100% used [0x0003c100, 0x0003c140)
>  AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
> 100% used [0x0003c140, 0x0003c180)
>  :
>  :
>  lot of such messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh updated CASSANDRA-14711:

Description: 
Hi Team,

I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..

Issue:

Cassandra is continuously crashing with generating an HEAP dump log. There are 
no errors reported in system.log OR Debug.log.

Exception in hs_err_PID.log:
 # Problematic frame:
 # J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
 #--- P R O C E S S ---

Java Threads: ( => current thread )
 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
[_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
[_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, 
id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
[_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
 :
 :
 lot of threads in BLOCKED status

Other Threads:
 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
[id=32098]
 0x2b7d38fa9de0 WatcherThread [stack: 
0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap:
 garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
0x0003c0404000, 0x0007c000)
 region size 4096K, 785 young (3215360K), 55 survivors (225280K)
 Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
 class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K

Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
 AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
100% used [0x0003c000, 0x0003c040)
 AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
100% used [0x0003c040, 0x0003c080)
 AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
100% used [0x0003c080, 0x0003c0c0)
 AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
100% used [0x0003c0c0, 0x0003c100)
 AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
100% used [0x0003c100, 0x0003c140)
 AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
100% used [0x0003c140, 0x0003c180)
 :
 :
 lot of such messages

  was:
Hi Team,

I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..

Issue:

Cassandra is continuously crashing with generating an HEAP dump log. There are 
no errors reported in system.log OR Debug.log.

Exception in hs_err_PID.log:

# Problematic frame:
# J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
#
--- T H R E A D ---
Current thread (0x2b7d3a1033e0): JavaThread "SharedPool-Worker-1" daemon 
[_thread_in_Java, id=32216, stack(0x2b7e4085f000,0x2b7e408a)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x14914c69
Registers:
RAX=0x0001, RBX=0x, RCX=0x9f1fbef0, 
RDX=0x0004f8fdf798
RSP=0x2b7e4089e4b0, RBP=0x0001, RSI=0x14907800, 
RDI=0x
R8 =0xd469, R9 =0x, R10=0x0004a41764c8, 
R11=0x
R12=0x, R13=0x, R14=0xd469, 
R15=0x2b7d3a1033e0
RIP=0x2b7d3d417fb4, EFLAGS=0x00010283, CSGSFS=0x0033, 
ERR=0x0004
 TRAPNO=0x000e
[error occurred during error reporting (printing register info), id 0xb]
Stack: [0x2b7e4085f000,0x2b7e408a], sp=0x2b7e4089e4b0, free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
J 12970 C2 
org.apache.cassandra.db.Slice$Bound.compareTo(Lorg/apache/cassandra/db/ClusteringComparator;Ljava/util/List;)I
 (119 bytes) @ 0x2b7d3e0291c0 [0x2b7d3e028900+0x8c0]
J 16245 C2 

[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh updated CASSANDRA-14711:

Description: 
Hi Team,

I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..

Issue:

Cassandra is continuously crashing with generating an HEAP dump log. There are 
no errors reported in system.log OR Debug.log.

Exception in hs_err_PID.log:
 # Problematic frame:
 # J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]

Java Threads: ( => current thread )
 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
[_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
[_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, 
id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
[_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
 :
 :
 lot of threads in BLOCKED status

Other Threads:
 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
[id=32098]
 0x2b7d38fa9de0 WatcherThread [stack: 
0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap:
 garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
0x0003c0404000, 0x0007c000)
 region size 4096K, 785 young (3215360K), 55 survivors (225280K)
 Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
 class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K

Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 
PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start)
 AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 
100% used [0x0003c000, 0x0003c040)
 AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
100% used [0x0003c040, 0x0003c080)
 AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 
100% used [0x0003c080, 0x0003c0c0)
 AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 
100% used [0x0003c0c0, 0x0003c100)
 AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 
100% used [0x0003c100, 0x0003c140)
 AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 
100% used [0x0003c140, 0x0003c180)
 :
 :
 lot of such messages

  was:
Hi Team,

I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..

Issue:

Cassandra is continuously crashing with generating an HEAP dump log. There are 
no errors reported in system.log OR Debug.log.

Exception in hs_err_PID.log:
 # Problematic frame:
 # J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
 #--- P R O C E S S ---

Java Threads: ( => current thread )
 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
[_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
[_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, 
id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
[_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
 :
 :
 lot of threads in BLOCKED status

Other Threads:
 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
[id=32098]
 0x2b7d38fa9de0 WatcherThread [stack: 
0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap:
 garbage-first heap total 8388608K, used 6791168K [0x0003c000, 
0x0003c0404000, 0x0007c000)
 region size 4096K, 785 young (3215360K), 55 survivors (225280K)
 Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K
 class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K

Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), 
HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, 

[jira] [Created] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom

2018-09-10 Thread Saurabh (JIRA)
Saurabh created CASSANDRA-14711:
---

 Summary: Apache Cassandra 3.2 crashing with exception 
org.apache.cassandra.db.marshal.TimestampType.compareCustom
 Key: CASSANDRA-14711
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14711
 Project: Cassandra
  Issue Type: Bug
Reporter: Saurabh
 Attachments: hs_err_pid32069.log

Hi Team,

I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12..

Issue:

Cassandra is continuously crashing with generating an HEAP dump log. There are 
no errors reported in system.log OR Debug.log.

Exception in hs_err_PID.log:

# Problematic frame:
# J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
#
--- T H R E A D ---
Current thread (0x2b7d3a1033e0): JavaThread "SharedPool-Worker-1" daemon 
[_thread_in_Java, id=32216, stack(0x2b7e4085f000,0x2b7e408a)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x14914c69
Registers:
RAX=0x0001, RBX=0x, RCX=0x9f1fbef0, 
RDX=0x0004f8fdf798
RSP=0x2b7e4089e4b0, RBP=0x0001, RSI=0x14907800, 
RDI=0x
R8 =0xd469, R9 =0x, R10=0x0004a41764c8, 
R11=0x
R12=0x, R13=0x, R14=0xd469, 
R15=0x2b7d3a1033e0
RIP=0x2b7d3d417fb4, EFLAGS=0x00010283, CSGSFS=0x0033, 
ERR=0x0004
 TRAPNO=0x000e
[error occurred during error reporting (printing register info), id 0xb]
Stack: [0x2b7e4085f000,0x2b7e408a], sp=0x2b7e4089e4b0, free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8283 C2 
org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334]
J 12970 C2 
org.apache.cassandra.db.Slice$Bound.compareTo(Lorg/apache/cassandra/db/ClusteringComparator;Ljava/util/List;)I
 (119 bytes) @ 0x2b7d3e0291c0 [0x2b7d3e028900+0x8c0]
J 16245 C2 
org.apache.cassandra.db.Slices$ArrayBackedSlices.intersects(Ljava/util/List;Ljava/util/List;)Z
 (46 bytes) @ 0x2b7d3e619cfc [0x2b7d3e619b20+0x1dc]
J 18878 C2 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
 (822 bytes) @ 0x2b7d3ebcabf4 [0x2b7d3ebc7be0+0x3014]
J 9377 C2 
org.apache.cassandra.db.ReadCommand.executeLocally(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;
 (219 bytes) @ 0x2b7d3d80cde8 [0x2b7d3d80c0a0+0xd48]
J 14198 C2 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(Lorg/apache/cassandra/net/MessageIn;I)V
 (328 bytes) @ 0x2b7d3c8bcbd0 [0x2b7d3c8bca20+0x1b0]
J 9731 C2 org.apache.cassandra.net.MessageDeliveryTask.run()V (187 bytes) @ 
0x2b7d3d158d60 [0x2b7d3d158bc0+0x1a0]
J 18999% C2 org.apache.cassandra.concurrent.SEPWorker.run()V (253 bytes) @ 
0x2b7d3eaa10ec [0x2b7d3eaa0960+0x78c]
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x695ae6] JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V [libjvm.so+0x695ff1] JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V [libjvm.so+0x696497] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, 
Symbol*, Symbol*, Thread*)+0x47
V [libjvm.so+0x731cb0] thread_entry(JavaThread*, Thread*)+0xa0
V [libjvm.so+0xa7eaa3] JavaThread::thread_main_inner()+0x103
V [libjvm.so+0xa7ebec] JavaThread::run()+0x11c
V [libjvm.so+0x92da28] java_start(Thread*)+0x108
C [libpthread.so.0+0x7e25] start_thread+0xc5


--- P R O C E S S ---

Java Threads: ( => current thread )
 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon 
[_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)]
 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon 
[_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)]
 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, 
id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)]
 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon 
[_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)]
 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon 
:
:
lot of threads in BLOCKED status


Other Threads:
 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] 
[id=32098]
 0x2b7d38fa9de0 WatcherThread [stack: 
0x2b7d88ee9000,0x2b7d88fe9000] [id=32108]

VM