[jira] [Updated] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-25 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15905:
--
Description: 
The cqlsh in trunk only display the first page when running in the batch mode, 
i.e. using {{\-\-execute}} or {{\-\-file}} option. 
 
It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
 
It can be reproduced in 3 steps.

{code:java}
 1. ccm create trunk -v git:trunk -n1 && ccm start
 2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"
 3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"
{code}


 
There are 1000 rows written. But the output in step 3 will only list 100 rows, 
which is the first page. 
 
The related change was introduced in 
https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
script no longer fetch all rows when not using tty in the print_result method. 

  was:
The cqlsh in trunk only display the first page when running in the batch mode, 
i.e. using `--execute` or `--file` option. 
 
It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
 
It can be reproduced in 3 steps.
 # ccm create trunk -v git:trunk -n1 && ccm start
 # tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"
 # bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"

 
There are 1000 rows written. But the output in step 3 will only list 100 rows, 
which is the first page. 
 
The related change was introduced in 
https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
script no longer fetch all rows when not using tty in the print_result method. 


> cqlsh not able to fetch all rows when in batch mode
> ---
>
> Key: CASSANDRA-15905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Yifan Cai
>Priority: Normal
>
> The cqlsh in trunk only display the first page when running in the batch 
> mode, i.e. using {{\-\-execute}} or {{\-\-file}} option. 
>  
> It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
>  
> It can be reproduced in 3 steps.
> {code:java}
>  1. ccm create trunk -v git:trunk -n1 && ccm start
>  2. tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"
>  3. bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"
> {code}
>  
> There are 1000 rows written. But the output in step 3 will only list 100 
> rows, which is the first page. 
>  
> The related change was introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
> script no longer fetch all rows when not using tty in the print_result 
> method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15905) cqlsh not able to fetch all rows when in batch mode

2020-06-25 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-15905:
-

 Summary: cqlsh not able to fetch all rows when in batch mode
 Key: CASSANDRA-15905
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15905
 Project: Cassandra
  Issue Type: Bug
  Components: Legacy/CQL
Reporter: Yifan Cai


The cqlsh in trunk only display the first page when running in the batch mode, 
i.e. using `--execute` or `--file` option. 
 
It is a change of behavior. In 3.x branches, the cqlsh returns all rows. 
 
It can be reproduced in 3 steps.
 # ccm create trunk -v git:trunk -n1 && ccm start
 # tools/bin/cassandra-stress write n=1k -schema keyspace="keyspace1"
 # bin/cqlsh -e "SELECT * FROM keyspace1.standard1;"

 
There are 1000 rows written. But the output in step 3 will only list 100 rows, 
which is the first page. 
 
The related change was introduced in 
https://issues.apache.org/jira/browse/CASSANDRA-11534, where the cqlsh.py 
script no longer fetch all rows when not using tty in the print_result method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15904) nodetool getendpoints man page improvements

2020-06-25 Thread Arvinder Singh (Jira)
Arvinder Singh created CASSANDRA-15904:
--

 Summary: nodetool getendpoints man page improvements
 Key: CASSANDRA-15904
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15904
 Project: Cassandra
  Issue Type: Improvement
Reporter: Arvinder Singh
Assignee: Erick Ramirez


Please include support for compound primary key. Ex:
nodetool getendpoints keyspace1 table1 pk1:pk2:pk2

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145765#comment-17145765
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15234:
-

Thank you [~dcapwell]!
 Please find my comments below in Italic. I will also make a pass later again 
as I admit I was in a hurry to roll it out last night so it can be visible for 
people in more time zones before the end of the week to express any bigger 
concerns that I can address as this is a beta blocker.
 * Config. otc_coalescing_window_us_default was not renamed but the other otc* 
configs were. --> _will be corrected_
 * block_for_peers_timeout_in_secs and block_for_peers_in_remote_dcs are 
exposed in yaml, the comment says "No need of unit conversion as this parameter 
is not exposed in the yaml file", can you explain why this comment? --> _it is 
not exposed in the yaml for the users so default value and unit is used as per 
the Config class setup_
 * in database descriptor I see you use toString to compare, why not convert to 
the expected unit --> _I was testing and forgot to correct it, will make a pass 
to ensure it is corrected everywhere, thanks!_
 * we should add docs for Replaces and ReplacesList (also calling out that you 
shouldn't use ReplacesList directly) --> _noted_
 * for the new types, would be good to have a property test which validates that
{code:java}
type(value, unit).equals(type.parse(type(value, unit).toString())){code}
--> _noted_
 * LoadOldYAMLBackwardCompatibilityTest and ParseAndConvertUnitsTest look to be 
copy/paste. junit supports inheritance so could remove the copy based by having 
LoadOldYAMLBackwardCompatibilityTest extend ParseAndConvertUnitsTest but use 
the other config --> _will definitely take care of that, left over from the 
previous version, apologize_

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-25 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145740#comment-17145740
 ] 

Caleb Rackliffe commented on CASSANDRA-15900:
-

+1

I think [~djoshi] might take a quick look as well.

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15874) Bootstrap completes Successfully without streaming all the data

2020-06-25 Thread Jai Bheemsen Rao Dhanwada (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145722#comment-17145722
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-15874:
---

[~brandon.williams] Thanks for the information. Before I upgrade the cluster to 
3.11.6, I would like to understand if there are any known issues with 3.11.6?

> Bootstrap completes Successfully without streaming all the data
> ---
>
> Key: CASSANDRA-15874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> I am seeing a strange issue where, adding a new node with auto_bootstrap: 
> true is not streaming all the data before it joins the cluster. Don't see any 
> information in the logs about bootstrap failures.
> Here is the sequence of logs
>  
> {code:java}
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> schema complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,642 StorageService.java:1446 - JOINING: 
> waiting for pending range calculation
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> calculation complete, ready to bootstrap
> INFO [main] 2020-06-12 01:41:49,643 StorageService.java:1446 - JOINING: 
> getting bootstrap token
> INFO [main] 2020-06-12 01:42:19,656 StorageService.java:1446 - JOINING: 
> Starting to bootstrap...
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId . If a table was just created, this is likely due to the schema 
> not being fully propagated. Please wait for schema agreement on table 
> creation.
> INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StreamResultFuture.java:219 - [Stream #f4224f444-a55d-154a-23e3-867899486f5f] 
> All sessions completed INFO [StreamReceiveTask:1] 2020-06-12 02:29:51,892 
> StorageService.java:1505 - Bootstrap completed! for the tokens
> {code}
> Cassandra Version: 3.11.3
> I am not able to reproduce this issue all the time, but it happened couple of 
> times. Is there any  race condition/corner case, which could cause this issue?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-25 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15900:
-
Test and Documentation Plan: 
[https://circleci.com/workflow-run/ba9f4692-da21-44e9-ac31-fe8d2e6215cb]  (was: 
[https://circleci.com/workflow-run/8d266871-2d78-4c67-80ec-3e817187af0c])

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-25 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15900:
-
Status: Review In Progress  (was: Changes Suggested)

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL

2020-06-25 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145693#comment-17145693
 ] 

ZhaoYang commented on CASSANDRA-15900:
--

bq. It might be worthwhile to have a test in AsyncStreamingOutputPlusTest that 
verifies AsyncStreamingOutputPlus#writeFileToChannel() closes the provided 
channel.

+1

bq. AsyncStreamingOutputPlus#writeFileToChannel(FileChannel, StreamRateLimiter, 
int) and AsyncStreamingOutputPlus#writeFileToChannelZeroCopy() may be better 
off at private visibility, given we're treating them as transport-level 
implementation details. (Perhaps writeFileToChannel would be easier to test at 
package-private though.)

I left them as public and marked "@VisibleForTesting"..

bq. The JavaDoc for writeFileToChannel(FileChannel, StreamRateLimiter) is 
slightly out-of date now, given we've lowered the batch size for the SSL case. 
(We should make sure to preserve the bit about the method taking ownership of 
the FileChannel.)

+1

> Close channel and reduce buffer allocation during entire sstable streaming 
> with SSL
> ---
>
> Key: CASSANDRA-15900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15900
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk 
> file into user-space off-heap buffer when SSL is enabled, because netty 
> doesn't support zero-copy with SSL.
> But there are two issues:
>  # file channel is not closed.
>  # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, 
> thus it's all allocated outside the pool and will cause large amount of 
> allocations.
> [Patch|https://github.com/apache/cassandra/pull/651]:
>  # close file channel when the last batch is loaded into off-heap bytebuffer. 
> I don't think we need to wait until buffer is flushed by netty.
>  # reduce the batch to 64kb which is more buffer pool friendly when streaming 
> entire sstable with SSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-25 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145118#comment-17145118
 ] 

David Capwell commented on CASSANDRA-15234:
---

This is a large patch, so will take a few attempts to review; below is my first 
pass (mostly skim).

* Config. otc_coalescing_window_us_default was not renamed but the other otc* 
configs were.  
https://github.com/apache/cassandra/compare/trunk...ekaterinadimitrova2:CASSANDRA-15234-new#diff-b66584c9ce7b64019b5db5a531deeda1R429
* block_for_peers_timeout_in_secs and block_for_peers_in_remote_dcs are exposed 
in yaml, the comment says "No need of unit conversion as this parameter is not 
exposed in the yaml file", can you explain why this comment?
*  
https://github.com/apache/cassandra/compare/trunk...ekaterinadimitrova2:CASSANDRA-15234-new#diff-76c34368f112a0a4e6da5b22c8b50300R353-R376.
 enums can use "abstract" methods, so you can do that rather than use a default 
which throws
* in database descriptor I see you use toString to compare, why not convert to 
the expected unit?  
https://github.com/apache/cassandra/compare/trunk...ekaterinadimitrova2:CASSANDRA-15234-new#diff-a8a9935b164cd23da473fd45784fd1ddR381.
 this could be {code}if (conf.commitlog_sync_period.toMillis() != 0){code}.  
This is true for all examples I see in this method
* we should add docs for Replaces and ReplacesList (also calling out that you 
shouldn't use ReplacesList directly)
* for the new types, would be good to have a property test which validates that 
{code}type(value, unit).equals(type.parse(type(value, unit).toString())){code}.
* LoadOldYAMLBackwardCompatibilityTest and ParseAndConvertUnitsTest look to be 
copy/paste.  junit supports inheritance so could remove the copy based by 
having LoadOldYAMLBackwardCompatibilityTest extend ParseAndConvertUnitsTest but 
use the other config.


Overall I am good with this patch, need to do a closer review of the details 
still.

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145076#comment-17145076
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-13994 at 6/25/20, 4:35 PM:
---

Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. (and returning the KEYS index back 
of course)
If that is the case I also support you that this one would be even doable in 
beta as it will be really only the dead code removal. Yes, there was a bit of a 
work done around the parser but I also don't see it as breaking or blocker. 




was (Author: e.dimitrova):
Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. (and returning the KEYS index back 
of course)



> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145076#comment-17145076
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-13994 at 6/25/20, 4:33 PM:
---

Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. (and returning the KEY index back 
of course)




was (Author: e.dimitrova):
Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. 



> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145076#comment-17145076
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-13994 at 6/25/20, 4:33 PM:
---

Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. (and returning the KEYS index back 
of course)




was (Author: e.dimitrova):
Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. (and returning the KEY index back 
of course)



> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145076#comment-17145076
 ] 

Ekaterina Dimitrova commented on CASSANDRA-13994:
-

Thanks [~slebresne] for spending the time on this. Appreciate it!
I also did a quick CCM test upgrading locally while still having the indexes 
and I didn't see any issues. 

If the community still wants the dead code removed, I can rebase and go through 
the points of your initial review next week. 



> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145056#comment-17145056
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

Getting back to the KEYS index question.

bq. I don't think user can upgrade to 4.0 at all if they still have KEYS index.

I was wrong.

I tested it now (see [this upgrade 
test|https://github.com/pcmanus/cassandra-dtest/commit/09a6a9888a73eb14613eaecb4dc7e5cba9a46765#diff-59d5ceaa3ba81b0a9a360dedcf3bed16R378],
 but you if you create a KEYS index through thrift (in 2.x/3.x), {{DROP COMPACT 
STORAGE}} on the base table and then uprade from 3.11 to 4.0, this 'just work'™ 
(a rolling upgrade can be done while continuing to use the KEYS index before, 
during and after the upgrade).

I though this would have broke because 4.0 crashes if it finds tables with 
compact storage "flags" when reading the schema system tables, but 2i metadata 
are not written there so this pass. We still hit [this 
warning|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/TableMetadata.java#L140]
 (and we should remove the calls to {{isDense}} and {{isCompound}} in 
{{CassandraIndex}} to avoid that) but that is the only consequence.

So my opinion here is that we should keep KEYS index for now (so revert their 
removal by this ticket), as we currently imo don't have a good upgrade story 
for them otherwise. We can look at this more closely later, but for me, it's 
not urgent, as their code is not that complex and fairly isolated.

I'll create a followup ticket soonish to add that upgrade test mentioned above 
and discuss a few minor related points, but as far as this ticket goes, let's 
keep KEYS indexes.


> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-06-25 Thread Sergio Bossa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145048#comment-17145048
 ] 

Sergio Bossa commented on CASSANDRA-15700:
--

[~aleksey] apologies for this late reply. I've pushed the recommended changes, 
please have a look when you have a moment.

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Attachment: heap-mem-histo.txt

> OOM because repair session thread not closed when terminating repair
> 
>
> Key: CASSANDRA-15902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15902
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Swen Fuhrmann
>Assignee: Swen Fuhrmann
>Priority: Normal
> Attachments: heap-mem-histo.txt, repair-terminated.txt
>
>
> In our cluster, after a while some nodes running slowly out of memory. On 
> that nodes we observed that Cassandra Reaper terminate repairs with a JMX 
> call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} because 
> reaching timeout of 30 min.
> In the memory heap dump we see lot of instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory:
> {noformat}
> 119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 
> %) bytes. {noformat}
> In the thread dump we see lot of repair threads:
> {noformat}
> grep "Repair#" threaddump.txt | wc -l
>   50 {noformat}
>  
> The repair jobs are waiting for the validation to finish:
> {noformat}
> "Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 
> nid=0x542a waiting on condition [0x7f81ee414000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007939bcfc8> (a 
> com.google.common.util.concurrent.AbstractFuture$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748) {noformat}
>  
> Thats the line where the threads stuck:
> {noformat}
> // Wait for validation to complete
> Futures.getUnchecked(validations); {noformat}
>  
> The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops 
> the thread pool executor. It looks like that futures which are in progress 
> will therefor never be completed and the repair thread waits forever and 
> won't be finished.
>  
> Environment:
> Cassandra version: 3.11.4
> Cassandra Reaper: 1.4.0
> JVM memory settings:
> {noformat}
> -Xms11771M -Xmx11771M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
> -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
> on another cluster with same issue:
> {noformat}
> -Xms31744M -Xmx31744M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
> -XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
> Java Runtime:
> {noformat}
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) 
> {noformat}
>  
> The same issue described in this comment: 
> https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973
> As suggested in the comments I created this new specific ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Description: 
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper terminate repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see lot of instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory:
{noformat}
119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 %) 
bytes. {noformat}
In the thread dump we see lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

JVM memory settings:
{noformat}
-Xms11771M -Xmx11771M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
-XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
on another cluster with same issue:
{noformat}
-Xms31744M -Xmx31744M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
-XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 

The same issue described in this comment: 
https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973

As suggested in the comments I created this new specific ticket.

  was:
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper terminate repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see lot of instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory:
{noformat}
119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 %) 
bytes. {noformat}
In the thread dump we see lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000793

[jira] [Commented] (CASSANDRA-15901) Force jenkins tests to run on the private VPC IP

2020-06-25 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144997#comment-17144997
 ] 

Berenguer Blasi commented on CASSANDRA-15901:
-

[~mck] This is the POC I came up with. I tested it on a local jenkins and I can 
see it being effective:

{noformat}
16:56:29 forceDeviceListenAddress:
16:56:29  [exec] docker0: flags=4099  mtu 1500
16:56:29  [exec] inet 172.17.0.1  netmask 255.255.0.0  broadcast 
172.17.255.255
16:56:29  [exec] inet6 fe80::42:26ff:fe25:c1e2  prefixlen 64  
scopeid 0x20
16:56:29  [exec] ether 02:42:26:25:c1:e2  txqueuelen 0  (Ethernet)
16:56:29  [exec] RX packets 78645  bytes 5621831 (5.6 MB)
16:56:29  [exec] RX errors 0  dropped 0  overruns 0  frame 0
16:56:29  [exec] TX packets 85630  bytes 1329098560 (1.3 GB)
16:56:29  [exec] TX errors 0  dropped 0 overruns 0  carrier 0  
collisions 0
16:56:29  [exec] 
16:56:29  [exec] lo: flags=73  mtu 65536
16:56:29  [exec] inet 127.0.0.1  netmask 255.0.0.0
16:56:29  [exec] inet6 ::1  prefixlen 128  scopeid 0x10
16:56:29  [exec] loop  txqueuelen 1000  (Local Loopback)
16:56:29  [exec] RX packets 66296  bytes 21093416 (21.0 MB)
16:56:29  [exec] RX errors 0  dropped 0  overruns 0  frame 0
16:56:29  [exec] TX packets 66296  bytes 21093416 (21.0 MB)
16:56:29  [exec] TX errors 0  dropped 0 overruns 0  carrier 0  
collisions 0
16:56:29  [exec] 
16:56:29  [exec] wlp59s0: flags=4163  mtu 
1500
16:56:29  [exec] inet 192.168.1.131  netmask 255.255.255.0  
broadcast 192.168.1.255
16:56:29  [exec] inet6 fe80::9c8e:fcad:d881:ffda  prefixlen 64  
scopeid 0x20
16:56:29  [exec] ether 18:1d:ea:b1:51:48  txqueuelen 1000  
(Ethernet)
16:56:29  [exec] RX packets 6031123  bytes 7766582727 (7.7 GB)
16:56:29  [exec] RX errors 0  dropped 0  overruns 0  frame 0
16:56:29  [exec] TX packets 2398356  bytes 1840272042 (1.8 GB)
16:56:29  [exec] TX errors 0  dropped 0 overruns 0  carrier 0  
collisions 0
16:56:29  [exec] 
16:56:29  [echo] *** 192.168.1.131
{noformat}

You'll notice that a device is being force upon calling {{ifconfig}}. This is 
bc many ips can match the regexp. On AWS the private ip is on eth0. So I'd 
suggest this course of action:
- Take a look at what I did and see if it makes sense at all
- If it does we'd need to run it against {{eth0}} on all agents and see we 
indeed get what we expect and tests run ok

wdyt?


> Force jenkins tests to run on the private VPC IP
> 
>
> Key: CASSANDRA-15901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
> not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
> Currently test C* nodes are either failing or listening on a public ip 
> depending on which agent they end up.
> The idea behind this ticket is to make ant force the private VPC ip in the 
> cassandra yaml when building, this will force the nodes to listen on the 
> correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption

2020-06-25 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15903:
---
Description: As [~mck] point out, doc needs to be updated for 
CASSANDRA-15657  and CASSANDRA-15740.  (was: As [~mck]] point out, doc needs to 
be updated for CASSANDRA-15657  and CASSANDRA-15740.)

> Doc update: stream-entire-sstable supports all compaction strategies and 
> internode encryption
> -
>
> Key: CASSANDRA-15903
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15903
> Project: Cassandra
>  Issue Type: Task
>Reporter: ZhaoYang
>Priority: Normal
>
> As [~mck] point out, doc needs to be updated for CASSANDRA-15657  and 
> CASSANDRA-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption

2020-06-25 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15903:
---
Description: As [~mck]] point out, doc needs to be updated for 
CASSANDRA-15657  and CASSANDRA-15740.  (was: As [~mck2] point out, doc needs to 
be updated for CASSANDRA-15657  and CASSANDRA-15740.)

> Doc update: stream-entire-sstable supports all compaction strategies and 
> internode encryption
> -
>
> Key: CASSANDRA-15903
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15903
> Project: Cassandra
>  Issue Type: Task
>Reporter: ZhaoYang
>Priority: Normal
>
> As [~mck]] point out, doc needs to be updated for CASSANDRA-15657  and 
> CASSANDRA-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15794) Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact tables in downgraded C* (3.11.4) introduces non-existent columns

2020-06-25 Thread Zhuqi Jin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134575#comment-17134575
 ] 

Zhuqi Jin edited comment on CASSANDRA-15794 at 6/25/20, 2:53 PM:
-

Hi, [~ifesdjeen].

I created patches for 3.0 and 3.11, according to the third method we discussed 
earlier.

[^CASSANDRA-15794-branch-3.0.patch]

 

I'd like to move on to the second point and make sure we don't write any commit 
log messages in such case. Could you help give me some pointers of where commit 
logs are generated in this scenario?


was (Author: zhuqi1108):
Hi, [~ifesdjeen].

I created patches for 3.0 and 3.11, according to the third method we discussed 
earlier.

[^CASSANDRA-15794-branch-3.0.patch]

I've attached the patches. Would you mind reviewing them? 

And I'd like to move on to the second method. Could you please do me a favor? 
We don‘t want to generate new commit logs before we hit the error in 4.x, so I 
need to know when and where the commit logs were written.

> Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact 
> tables in downgraded C* (3.11.4) introduces non-existent columns
> -
>
> Key: CASSANDRA-15794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Zhuqi Jin
>Priority: Normal
> Attachments: CASSANDRA-15794-branch-3.0.patch, 
> CASSANDRA-15794-branch-3.11.patch
>
>
> We tried to test upgrading a 3.11.4 C* cluster to 4.x and run into the 
> following problems. 
>  * We started a single 3.11.4 C* node. 
>  * We ran cassandra-stress like this
> {code:java}
> ./cassandra-stress write n = 30 -rate threads = 10 -node  172.17.0.2 {code}
>  * We stopped this node, and started a C* node running C* compiled from trunk 
> (git commit: e394dc0bb32f612a476269010930c617dd1ed3cb)
>  * New C* failed to start with the following error message
> {code:java}
> ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 - Error while 
> loading schema: ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 
> - Error while loading schema: java.lang.IllegalArgumentException: Compact 
> Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... 
> DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to 
> migrate off Compact Storage. at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:874)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:862)
>  at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:102) at 
> org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:91) at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:241) 
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:653)
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:770)Exception
>  (java.lang.IllegalArgumentException) encountered during startup: Compact 
> Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... 
> DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to 
> migrate off Compact Storage.ERROR [main] 2020-05-07 00:58:18,520 
> CassandraDaemon.java:792 - Exception encountered during 
> startupjava.lang.IllegalArgumentException: Compact Tables are not allowed in 
> Cassandra starting with 4.0 version. Use `ALTER ... DROP COMPACT STORAGE` 
> command supplied in 3.x/3.11 Cassandra in order to migrate off Compact 
> Storage. at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:874)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:862)
>  at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:102) at 
> org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:91) at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:241) 
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:653)
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:770){code}
>  * 

[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Description: 
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper terminate repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see lot of instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}} occupy most of the memory:
{noformat}
119 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x51a80" occupy 8.445.684.480 (93,96 %) 
bytes. {noformat}
In the thread dump we see lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

JVM memory settings:
{noformat}
-Xms11771M -Xmx11771M -XX:+UseG1GC -XX:MaxGCPauseMillis=100 
-XX:+ParallelRefProcEnabled -XX:MaxMetaspaceSize=100M {noformat}
Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 

The same issue described in this comment: 
https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973

As suggested in the comments I created this new specific ticket.

  was:
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.

[jira] [Created] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption

2020-06-25 Thread ZhaoYang (Jira)
ZhaoYang created CASSANDRA-15903:


 Summary: Doc update: stream-entire-sstable supports all compaction 
strategies and internode encryption
 Key: CASSANDRA-15903
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15903
 Project: Cassandra
  Issue Type: Task
Reporter: ZhaoYang


As [~mck2] point out, doc needs to be updated for CASSANDRA-15657  and 
CASSANDRA-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Description: 
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 

The same issue described in this comment: 
https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973

As suggested in the comments I created this new specific ticket.

  was:
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Unin

[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Attachment: repair-terminated.txt

> OOM because repair session thread not closed when terminating repair
> 
>
> Key: CASSANDRA-15902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15902
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Swen Fuhrmann
>Assignee: Swen Fuhrmann
>Priority: Normal
> Attachments: repair-terminated.txt
>
>
> In our cluster, after a while some nodes running slowly out of memory. On 
> that nodes we observed that Cassandra Reaper cancel repairs with a JMX call 
> to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
> timeout of 30 min.
> In the memory heap dump we see >100 instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
> lot of repair threads:
> {noformat}
> grep "Repair#" threaddump.txt | wc -l
>   50 {noformat}
>  
> The repair jobs are waiting for the validation to finish:
> {noformat}
> "Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 
> nid=0x542a waiting on condition [0x7f81ee414000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007939bcfc8> (a 
> com.google.common.util.concurrent.AbstractFuture$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748) {noformat}
>  
> Thats the line where the threads stuck:
> {noformat}
> // Wait for validation to complete
> Futures.getUnchecked(validations); {noformat}
>  
> The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops 
> the thread pool executor. It looks like that futures which are in progress 
> will therefor never be completed and the repair thread waits forever and 
> won't be finished.
>  
> Environment:
> Cassandra version: 3.11.4
> Cassandra Reaper: 1.4.0
> Java Runtime:
> {noformat}
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) 
> {noformat}
>  
> Here is the same issue described in this comment: 
> https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973
> As suggested in the comments I created this new specific ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Description: 
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 

Here is the same issue described in this comment: 
https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973

As suggested in the comments I created this new specific ticket.

  was:
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurr

[jira] [Updated] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann updated CASSANDRA-15902:
--
Description: 
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 

Here is the same issue described: 
https://issues.apache.org/jira/browse/CASSANDRA-14355?focusedCommentId=16992973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16992973

As suggested in the comments I created this new specific ticket.

  was:
In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterrupti

[jira] [Assigned] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swen Fuhrmann reassigned CASSANDRA-15902:
-

Assignee: Swen Fuhrmann

> OOM because repair session thread not closed when terminating repair
> 
>
> Key: CASSANDRA-15902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15902
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Swen Fuhrmann
>Assignee: Swen Fuhrmann
>Priority: Normal
>
> In our cluster, after a while some nodes running slowly out of memory. On 
> that nodes we observed that Cassandra Reaper cancel repairs with a JMX call 
> to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
> timeout of 30 min.
> In the memory heap dump we see >100 instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
> lot of repair threads:
> {noformat}
> grep "Repair#" threaddump.txt | wc -l
>   50 {noformat}
>  
> The repair jobs are waiting for the validation to finish:
> {noformat}
> "Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 
> nid=0x542a waiting on condition [0x7f81ee414000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007939bcfc8> (a 
> com.google.common.util.concurrent.AbstractFuture$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> at 
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748) {noformat}
>  
> Thats the line where the threads stuck:
> {noformat}
> // Wait for validation to complete
> Futures.getUnchecked(validations); {noformat}
>  
> The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops 
> the thread pool executor. It looks like that futures which are in progress 
> will therefor never be completed and the repair thread waits forever and 
> won't be finished.
>  
> Environment:
> Cassandra version: 3.11.4
> Cassandra Reaper: 1.4.0
> Java Runtime:
> {noformat}
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) 
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15902) OOM because repair session thread not closed when terminating repair

2020-06-25 Thread Swen Fuhrmann (Jira)
Swen Fuhrmann created CASSANDRA-15902:
-

 Summary: OOM because repair session thread not closed when 
terminating repair
 Key: CASSANDRA-15902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15902
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair
Reporter: Swen Fuhrmann


In our cluster, after a while some nodes running slowly out of memory. On that 
nodes we observed that Cassandra Reaper cancel repairs with a JMX call to 
{{StorageServiceMBean.forceTerminateAllRepairSessions()}} because reaching 
timeout of 30 min.

In the memory heap dump we see >100 instances of 
{{io.netty.util.concurrent.FastThreadLocalThread}}. In the thread dump we see 
lot of repair threads:
{noformat}
grep "Repair#" threaddump.txt | wc -l
  50 {noformat}
 

The repair jobs are waiting for the validation to finish:
{noformat}
"Repair#152:1" #96170 daemon prio=5 os_prio=0 tid=0x12fc5000 nid=0x542a 
waiting on condition [0x7f81ee414000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007939bcfc8> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$13/480490520.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:748) {noformat}
 

Thats the line where the threads stuck:
{noformat}
// Wait for validation to complete
Futures.getUnchecked(validations); {noformat}
 

The call to {{StorageServiceMBean.forceTerminateAllRepairSessions()}} stops the 
thread pool executor. It looks like that futures which are in progress will 
therefor never be completed and the repair thread waits forever and won't be 
finished.

 

Environment:

Cassandra version: 3.11.4

Cassandra Reaper: 1.4.0

Java Runtime:
{noformat}
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15901) Force jenkins tests to run on the private VPC IP

2020-06-25 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-15901:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Force jenkins tests to run on the private VPC IP
> 
>
> Key: CASSANDRA-15901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
> not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
> Currently test C* nodes are either failing or listening on a public ip 
> depending on which agent they end up.
> The idea behind this ticket is to make ant force the private VPC ip in the 
> cassandra yaml when building, this will force the nodes to listen on the 
> correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144894#comment-17144894
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

bq. can someone please clarify if we want to move forward with this?

I vote yes (see additional context on the mailing list).

bq. This ticket was deemed too risky and would 'invalidate testing'

I don't know where those qualifications come from, and on what they were based, 
but they simply don't match what this ticket does and the attached patch. This 
ticket just removes dead code, not _that_ much of it, and it's hardly a 
chirurgical removal. There is no testing invalidation nor invasive changes 
involved. This is fake news.

I think people just got scared because this is compact storage and that has 
historically been messy. But the truth is that nearly all the complex and 
invasive removal of legacy code has been committed _years_ ago, mainly by 
CASSANDRA-5, CASSANDRA-12716 and CASSANDRA-10857. This ticket is just 
cleaning 2 small left-over, that's all.


> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated: ninja-fix: github urls

2020-06-25 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ec9a12  ninja-fix: github urls
3ec9a12 is described below

commit 3ec9a12a5959ca9caa0d05fed7ed922f62df4a68
Author: Mick Semb Wever 
AuthorDate: Thu Jun 25 14:33:52 2020 +0200

ninja-fix: github urls
---
 jenkins-dsl/cassandra_job_dsl_seed.groovy | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/jenkins-dsl/cassandra_job_dsl_seed.groovy 
b/jenkins-dsl/cassandra_job_dsl_seed.groovy
index 7315c59..9c99d2d 100644
--- a/jenkins-dsl/cassandra_job_dsl_seed.groovy
+++ b/jenkins-dsl/cassandra_job_dsl_seed.groovy
@@ -22,11 +22,11 @@ if(binding.hasVariable("CASSANDRA_LARGE_SLAVE_LABEL")) {
 largeSlaveLabel = "${CASSANDRA_LARGE_SLAVE_LABEL}"
 }
 def mainRepo = "https://gitbox.apache.org/repos/asf/cassandra.git";
-def githubRepo = "https://github.com/apache/cassandra.git";
+def githubRepo = "https://github.com/apache/cassandra";
 if(binding.hasVariable("CASSANDRA_GIT_URL")) {
 mainRepo = "${CASSANDRA_GIT_URL}"
 // just presume custom repos are github, not critical if they are not
-githubRepo = "${mainRepo}"
+githubRepo = "${mainRepo}".minus(".git")
 }
 def buildsRepo = "https://gitbox.apache.org/repos/asf/cassandra-builds.git";
 if(binding.hasVariable("CASSANDRA_BUILDS_GIT_URL")) {
@@ -458,6 +458,9 @@ cassandraBranches.each {
 numToKeep(30)
 artifactNumToKeep(10)
 }
+properties {
+githubProjectUrl(githubRepo)
+}
 definition {
 cpsScm {
 scm {


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15901) Force jenkins tests to run on the private VPC IP

2020-06-25 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-15901:

Fix Version/s: 4.0-rc

> Force jenkins tests to run on the private VPC IP
> 
>
> Key: CASSANDRA-15901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
> not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
> Currently test C* nodes are either failing or listening on a public ip 
> depending on which agent they end up.
> The idea behind this ticket is to make ant force the private VPC ip in the 
> cassandra yaml when building, this will force the nodes to listen on the 
> correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15901) Force jenkins tests to run on the private VPC IP

2020-06-25 Thread Berenguer Blasi (Jira)
Berenguer Blasi created CASSANDRA-15901:
---

 Summary: Force jenkins tests to run on the private VPC IP
 Key: CASSANDRA-15901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15901
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest
Reporter: Berenguer Blasi
Assignee: Berenguer Blasi


Many of the ci-cassandra jenkins runs fail on {{ip-10-0-5-5: Name or service 
not known}}. CASSANDRA-15622 addressed some of these but many still remain. 
Currently test C* nodes are either failing or listening on a public ip 
depending on which agent they end up.

The idea behind this ticket is to make ant force the private VPC ip in the 
cassandra yaml when building, this will force the nodes to listen on the 
correct ip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15406) Add command to show the progress of data streaming and index build

2020-06-25 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144825#comment-17144825
 ] 

Berenguer Blasi edited comment on CASSANDRA-15406 at 6/25/20, 10:53 AM:


[~stefan.miklosovic] I was looking at this and wondering the same you did, why 
not backport it? Also for the 4.0 review it would be good to have CI ran. Wdyt?

Edit: I just learned we don't do improvements on older branches. Apologies for 
the noise. 4.0 only it is :-)


was (Author: bereng):
[~stefan.miklosovic] I was looking at this and wondering the same you did, why 
not backport it? Also for the 4.0 review it would be good to have CI ran. Wdyt?

> Add command to show the progress of data streaming and index build 
> ---
>
> Key: CASSANDRA-15406
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15406
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Streaming, Legacy/Streaming and Messaging, 
> Tool/nodetool
>Reporter: maxwellguo
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0, 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that we should supply a command to show the progress of streaming 
> when we do the operation of bootstrap/move/decommission/removenode. For when 
> do data streaming , noboday knows which steps there program are in , so I 
> think a command to show the joing/leaving node's is needed .
>  
> PR [https://github.com/apache/cassandra/pull/558]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15406) Add command to show the progress of data streaming and index build

2020-06-25 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144825#comment-17144825
 ] 

Berenguer Blasi commented on CASSANDRA-15406:
-

[~stefan.miklosovic] I was looking at this and wondering the same you did, why 
not backport it? Also for the 4.0 review it would be good to have CI ran. Wdyt?

> Add command to show the progress of data streaming and index build 
> ---
>
> Key: CASSANDRA-15406
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15406
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Streaming, Legacy/Streaming and Messaging, 
> Tool/nodetool
>Reporter: maxwellguo
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0, 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that we should supply a command to show the progress of streaming 
> when we do the operation of bootstrap/move/decommission/removenode. For when 
> do data streaming , noboday knows which steps there program are in , so I 
> think a command to show the joing/leaving node's is needed .
>  
> PR [https://github.com/apache/cassandra/pull/558]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15885) replication_test.TestSnitchConfigurationUpdate.test_rf_expand_gossiping_property_file_snitch_multi_dc failure

2020-06-25 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144765#comment-17144765
 ] 

Berenguer Blasi commented on CASSANDRA-15885:
-

Great Thx a lot [~adelapena]

> replication_test.TestSnitchConfigurationUpdate.test_rf_expand_gossiping_property_file_snitch_multi_dc
>  failure
> -
>
> Key: CASSANDRA-15885
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15885
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-alpha5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> replication_test.TestSnitchConfigurationUpdate.test_rf_expand_gossiping_property_file_snitch_multi_dc
>  has been failing for a 
> [while|https://ci-cassandra.apache.org/job/Cassandra-trunk/176/testReport/dtest-large.replication_test/TestSnitchConfigurationUpdate/test_rf_expand_gossiping_property_file_snitch_multi_dc/history/].
>  Also fails locally



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15895) Code comment with issue number

2020-06-25 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144761#comment-17144761
 ] 

Benedict Elliott Smith commented on CASSANDRA-15895:


If this were inserting useful comments near where critical bugs were fixed, 
that could be valuable - or simply auditing those bug fixes above a certain 
complexity level and reporting the ones that have no such comment, so we could 
add them (and so they would survive refactors).  But I agree with Brandon that, 
as it stands, this seems inferior to git annotate, and pollutes the codebase.

> Code comment with issue number
> --
>
> Key: CASSANDRA-15895
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15895
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: ackelcn
>Priority: Normal
>
> When I read the code of cassandra, I find some comments with issue numbers. 
> One of them comes from JavaDriverClient.java:
> {code:java}
> public JavaDriverClient(StressSettings settings, String host, int port, 
> EncryptionOptions encryptionOptions){
>...
>//Always allow enough pending requests so every thread can have a request 
> pending
> //See https://issues.apache.org/jira/browse/CASSANDRA-7217
> int requestsPerConnection = (maxThreadCount / connectionsPerHost) + 
> connectionsPerHost;
> maxPendingPerConnection = settings.mode.maxPendingPerConnection == null ? 
> Math.max(128, requestsPerConnection ) : settings.mode.maxPendingPerConnection;
> }{code}
> These comments are quite useful for other programmers and me to understand 
> the code, but I notice that not all issue numbers are written in code 
> comments. It can be already quite tedious to write them into commit messages 
> :)
>  
> To handle the problem, I implemented a tool to automatically instrument issue 
> numbers into code comments. I tried my tool on activemq, and the instrumented 
> version is [https://github.com/ackelcn/cassandra] 
>  
> To avoid confusion, if there is already an issue number in code comments, my 
> tool ignored the issue number. All my generated comments start from //IC, so 
> it is easy to find them.
>  
> Would you please some feedbacks to my tool? Please feel free to merge my 
> generated comments in your code, if you feel that some are useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated: In Jenkins add url to github repo, remove publishTestStabilityData from devbranch junit reports, and add the formatChanges function. In jenkinscommand.sh add

2020-06-25 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ffc0ad  In Jenkins add url to github repo, remove 
publishTestStabilityData from devbranch junit reports, and add the 
formatChanges function. In jenkinscommand.sh add debug when container exits 
unsuccessfully.
4ffc0ad is described below

commit 4ffc0adf99c8a2175e526d29de301b65e7b6ac1e
Author: Mick Semb Wever 
AuthorDate: Thu Jun 11 22:06:12 2020 +0200

In Jenkins add url to github repo, remove publishTestStabilityData from 
devbranch junit reports, and add the formatChanges function.
In jenkinscommand.sh add debug when container exits unsuccessfully.
---
 docker/jenkins/jenkinscommand.sh  | 21 +++---
 jenkins-dsl/cassandra_job_dsl_seed.groovy | 64 ---
 jenkins-dsl/cassandra_pipeline.groovy | 16 
 3 files changed, 72 insertions(+), 29 deletions(-)

diff --git a/docker/jenkins/jenkinscommand.sh b/docker/jenkins/jenkinscommand.sh
index caa97e2..ea6d764 100644
--- a/docker/jenkins/jenkinscommand.sh
+++ b/docker/jenkins/jenkinscommand.sh
@@ -23,14 +23,23 @@ status="$?"
 echo "$ID done (${status}), copying files"
 
 if [ "$status" -ne 0 ] ; then
+echo "$ID failed (${status}), debug…"
+docker inspect $ID
+echo "–––"
+docker logs $ID
+echo "–––"
 docker ps -a
+echo "–––"
 docker info
+echo "–––"
+dmesg
+else
+echo "$ID done (${status}), copying files"
+# pytest results
+docker cp $ID:/home/cassandra/cassandra/cassandra-dtest/nosetests.xml .
+# pytest logs
+docker cp $ID:/home/cassandra/cassandra/test_stdout.txt .
+docker cp $ID:/home/cassandra/cassandra/cassandra-dtest/ccm_logs.tar.xz .
 fi
 
-# pytest results
-docker cp $ID:/home/cassandra/cassandra/cassandra-dtest/nosetests.xml .
-# pytest logs
-docker cp $ID:/home/cassandra/cassandra/test_stdout.txt .
-docker cp $ID:/home/cassandra/cassandra/cassandra-dtest/ccm_logs.tar.xz .
-
 docker rm $ID
diff --git a/jenkins-dsl/cassandra_job_dsl_seed.groovy 
b/jenkins-dsl/cassandra_job_dsl_seed.groovy
index f006dae..7315c59 100644
--- a/jenkins-dsl/cassandra_job_dsl_seed.groovy
+++ b/jenkins-dsl/cassandra_job_dsl_seed.groovy
@@ -22,8 +22,11 @@ if(binding.hasVariable("CASSANDRA_LARGE_SLAVE_LABEL")) {
 largeSlaveLabel = "${CASSANDRA_LARGE_SLAVE_LABEL}"
 }
 def mainRepo = "https://gitbox.apache.org/repos/asf/cassandra.git";
+def githubRepo = "https://github.com/apache/cassandra.git";
 if(binding.hasVariable("CASSANDRA_GIT_URL")) {
 mainRepo = "${CASSANDRA_GIT_URL}"
+// just presume custom repos are github, not critical if they are not
+githubRepo = "${mainRepo}"
 }
 def buildsRepo = "https://gitbox.apache.org/repos/asf/cassandra-builds.git";
 if(binding.hasVariable("CASSANDRA_BUILDS_GIT_URL")) {
@@ -79,7 +82,7 @@ job('Cassandra-template-artifacts') {
 label(slaveLabel)
 compressBuildLog()
 logRotator {
-numToKeep(30)
+numToKeep(10)
 artifactNumToKeep(10)
 }
 wrappers {
@@ -88,6 +91,9 @@ job('Cassandra-template-artifacts') {
 }
 timestamps()
 }
+properties {
+githubProjectUrl(githubRepo)
+}
 scm {
 git {
 remote {
@@ -154,7 +160,7 @@ job('Cassandra-template-test') {
 label(slaveLabel)
 compressBuildLog()
 logRotator {
-numToKeep(30)
+numToKeep(10)
 artifactNumToKeep(10)
 }
 wrappers {
@@ -166,6 +172,9 @@ job('Cassandra-template-test') {
 throttleConcurrentBuilds {
 categories(['Cassandra'])
 }
+properties {
+githubProjectUrl(githubRepo)
+}
 scm {
 git {
 remote {
@@ -217,7 +226,7 @@ job('Cassandra-template-dtest') {
 label(slaveLabel)
 compressBuildLog()
 logRotator {
-numToKeep(30)
+numToKeep(10)
 artifactNumToKeep(10)
 }
 wrappers {
@@ -226,6 +235,9 @@ job('Cassandra-template-dtest') {
 }
 timestamps()
 }
+properties {
+githubProjectUrl(githubRepo)
+}
 scm {
 git {
 remote {
@@ -274,7 +286,7 @@ matrixJob('Cassandra-template-cqlsh-tests') {
 concurrentBuild()
 compressBuildLog()
 logRotator {
-numToKeep(30)
+numToKeep(10)
 artifactNumToKeep(10)
 }
 wrappers {
@@ -293,6 +305,9 @@ matrixJob('Cassandra-template-cqlsh-tests') {
 }
 // this should prevent long path expansion from the axis definitions
 childCustomWorkspace('.')
+properties {
+githubProjectUrl(githubRepo)
+}
 scm {
 git {
 remote {
@@ -481,7 +496,7 @@ job('Cassandra-devbranch-artifacts') {
 label(slaveLabel)
 compressBuildLog()
 logRotator {
-numToKeep(30)
+numToKeep(10)
 artifactNumToKeep(1

[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144711#comment-17144711
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-15234 at 6/25/20, 7:29 AM:
---

This patch implements the functionality which gives the end users the 
opportunity to attach units suffixes to DataStorage, Duration, BitRates 
parameters. New custom types for the three group of parameters were added.

It renames certain parameters with main goal - standardization. 

Backward compatibility implemented on parameter level utilizing annotations.

[Branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]

This patch requires custom [ [DTests 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]
 which run successfully with the following patch of [CCM 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]

(The failed test in circle are failing because the CCM patch didn't apply due 
to config issue. Also, one exclusion of warning about using the old parameters 
names should be added. Upgrade tests still running)

DTests - requirements.txt should be updated back to the original one before 
commit!

[ [JAVA8 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/98c356fb-b16a-4508-bda8-ff877569b0f5]]
 [ [JAVA11 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/d49295a8-d8f5-4836-a18a-df25ff955219]]

*TO DO:* IN-JVM tests are not using snakeyaml so I have to update them to use 
its converter in order for the tests to work with this patch. This is feasible 
solution aligned with Alex Petrov and [~dcapwell].

I will provide the additional patch/commit until the end of the week but didn't 
want to delay the review of the main patch further.

*WARNING:* Before commit return to the default requirements.txt file

*Order of commits:* 1) [ [CASSANDRA branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]] 2) 
[ [CCM branch 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]] 3) [ 
[DTests branch 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]


was (Author: e.dimitrova):
This patch implements the functionality which gives the end users the 
opportunity to attach units suffixes to DataStorage, Duration, BitRates 
parameters.

It renames certain parameters with main goal - standardization. 

Backward compatibility implemented on parameter level utilizing annotations.

[Branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]

This patch requires custom [ [DTests 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]
 which run successfully with the following patch of [CCM 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]

(The failed test in circle are failing because the CCM patch didn't apply due 
to config issue. Also, one exclusion of warning about using the old parameters 
names should be added. Upgrade tests still running)

DTests - requirements.txt should be updated back to the original one before 
commit!

[ [JAVA8 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/98c356fb-b16a-4508-bda8-ff877569b0f5]]
 [ [JAVA11 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/d49295a8-d8f5-4836-a18a-df25ff955219]]

*TO DO:* IN-JVM tests are not using snakeyaml so I have to update them to use 
its converter in order for the tests to work with this patch. This is feasible 
solution aligned with Alex Petrov and [~dcapwell].

I will provide the additional patch/commit until the end of the week but didn't 
want to delay the review of the main patch further.

*WARNING:* Before commit return to the default requirements.txt file

*Order of commits:* 1) [ [CASSANDRA branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]] 2) 
[ [CCM branch 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]] 3) [ 
[DTests branch 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ v

[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144711#comment-17144711
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-15234 at 6/25/20, 7:27 AM:
---

This patch implements the functionality which gives the end users the 
opportunity to attach units suffixes to DataStorage, Duration, BitRates 
parameters.

It renames certain parameters with main goal - standardization. 

Backward compatibility implemented on parameter level utilizing annotations.

[Branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]

This patch requires custom [ [DTests 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]
 which run successfully with the following patch of [CCM 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]

(The failed test in circle are failing because the CCM patch didn't apply due 
to config issue. Also, one exclusion of warning about using the old parameters 
names should be added. Upgrade tests still running)

DTests - requirements.txt should be updated back to the original one before 
commit!

[ [JAVA8 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/98c356fb-b16a-4508-bda8-ff877569b0f5]]
 [ [JAVA11 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/d49295a8-d8f5-4836-a18a-df25ff955219]]

*TO DO:* IN-JVM tests are not using snakeyaml so I have to update them to use 
its converter in order for the tests to work with this patch. This is feasible 
solution aligned with Alex Petrov and [~dcapwell].

I will provide the additional patch/commit until the end of the week but didn't 
want to delay the review of the main patch further.

*WARNING:* Before commit return to the default requirements.txt file

*Order of commits:* 1) [ [CASSANDRA branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]] 2) 
[ [CCM branch 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]] 3) [ 
[DTests branch 
|https://github.com/ekaterinadimitrova2/cassandra-dtest/tree/CASSANDRA-15234-new]]


was (Author: e.dimitrova):
This patch implements the functionality which gives the end users the 
opportunity to attach units suffixes to DataStorage, Duration, BitRates 
parameters.

It renames certain parameters with main goal - standardization. 

Backward compatibility implemented on parameter level utilizing annotations.

[Branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]

This patch requires custom [ [DTests 
|https://github.com/ekaterinadimitrova2/cassandra-dtest]] which run 
successfully with the following patch of [CCM 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]

(The failed test in circle are failing because the CCM patch didn't apply due 
to config issue. Also, one exclusion of warning about using the old parameters 
names should be added. Upgrade tests still running)

DTests - requirements.txt should be updated back to the original one before 
commit!

[ [JAVA8 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/98c356fb-b16a-4508-bda8-ff877569b0f5]]
 [ [JAVA11 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/d49295a8-d8f5-4836-a18a-df25ff955219]]

*TO DO:* IN-JVM tests are not using snakeyaml so I have to update them to use 
its converter in order for the tests to work with this patch. This is feasible 
solution aligned with Alex Petrov and [~dcapwell].

I will provide the additional patch/commit until the end of the week but didn't 
want to delay the review of the main patch further.

*WARNING:* Before commit return to the default requirements.txt file

*Order of commits:* 1) [ [CASSANDRA branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]] 2) 
[ [CCM branch 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]] 3) [ 
[DTests branch |https://github.com/ekaterinadimitrova2/cassandra-dtest]]

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibili

[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144711#comment-17144711
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15234:
-

This patch implements the functionality which gives the end users the 
opportunity to attach units suffixes to DataStorage, Duration, BitRates 
parameters.

It renames certain parameters with main goal - standardization. 

Backward compatibility implemented on parameter level utilizing annotations.

[Branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]

This patch requires custom [ [DTests 
|https://github.com/ekaterinadimitrova2/cassandra-dtest]] which run 
successfully with the following patch of [CCM 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]

(The failed test in circle are failing because the CCM patch didn't apply due 
to config issue. Also, one exclusion of warning about using the old parameters 
names should be added. Upgrade tests still running)

DTests - requirements.txt should be updated back to the original one before 
commit!

[ [JAVA8 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/98c356fb-b16a-4508-bda8-ff877569b0f5]]
 [ [JAVA11 
|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/257/workflows/d49295a8-d8f5-4836-a18a-df25ff955219]]

*TO DO:* IN-JVM tests are not using snakeyaml so I have to update them to use 
its converter in order for the tests to work with this patch. This is feasible 
solution aligned with Alex Petrov and [~dcapwell].

I will provide the additional patch/commit until the end of the week but didn't 
want to delay the review of the main patch further.

*WARNING:* Before commit return to the default requirements.txt file

*Order of commits:* 1) [ [CASSANDRA branch 
|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15234-new]] 2) 
[ [CCM branch 
|https://github.com/ekaterinadimitrova2/ccm/tree/CASSANDRA-15234-new]] 3) [ 
[DTests branch |https://github.com/ekaterinadimitrova2/cassandra-dtest]]

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org