[jira] [Updated] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-25 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18673:
-
Source Control Link: https://github.com/apache/cassandra/pull/2498  (was: 
{color:red}colored text{color}https://github.com/apache/cassandra/pull/2498)

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-25 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18673:
-
Source Control Link: {color:red}colored 
text{color}https://github.com/apache/cassandra/pull/2498  (was: 
https://github.com/apache/cassandra/pull/2498)

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18466) Paxos only repair is treated as an incremental repair

2023-07-19 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18466:
-
Test and Documentation Plan:   (was: Added test in 
LongLeveledCompactionStrategyTest:

 
testValidationDuringConstruction)

> Paxos only repair is treated as an incremental repair
> -
>
> Key: CASSANDRA-18466
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18466
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Andrew
>Assignee: Ningzi Zhan
>Priority: Normal
>  Labels: lhf
> Fix For: 4.1.x, 5.x
>
>
> Paxos only repair tries to continue or is treated as an incremental repair. 
> This happened on 4.1.0 and 4.1.1 when trying to run repair in preparation for 
> enabling paxos_state_purging. The repair was in preparation mode triggered 
> multiple anti-compactions on the nodes. Running the command with --full 
> behaves in the expected way, ie. only the paxos data is repaired and it's 
> finished within a few seconds.
> {code:java}
> nodetool repair --paxos-only // This does not behave as expected, does it 
> complete quickly and seems to be waiting on anticompactions
> {code}
> {code:java}
> nodetool repair --full --paxos-only // Completes within a few seconds as 
> expected
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18329) Upgrade jamm

2023-05-12 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722189#comment-17722189
 ] 

Matt Fleming commented on CASSANDRA-18329:
--

This ticket is blocked on [https://github.com/jbellis/jamm/pull/50] right?

> Upgrade jamm
> 
>
> Key: CASSANDRA-18329
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18329
> Project: Cassandra
>  Issue Type: Task
>  Components: Jamm
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> Jamm is currently under maintenance that will solve JDK11 issues and enable 
> it to work with post JDK11+ versions up to JDK17.
> This ticket will serve as a placeholder for upgrading Jamm in Cassandra when 
> the new Jamm release is out. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2023-05-04 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719534#comment-17719534
 ] 

Matt Fleming commented on CASSANDRA-16718:
--

Patch looks good to me. nb +1

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18067) On-disk numeric index

2023-02-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18067:
-
Fix Version/s: NA
   (was: 4.x)

> On-disk numeric index
> -
>
> Key: CASSANDRA-18067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18067
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: NA
>
>
> An on-disk numeric index for all datatypes not supported by the on-disk 
> literal index (CASSANDRA-18062)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18062) On-disk string index with index building and on-disk query path

2023-02-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18062:
-
Fix Version/s: (was: 4.x)

> On-disk string index with index building and on-disk query path
> ---
>
> Key: CASSANDRA-18062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18062
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
>
> An on-disk index for string (literal) datatypes. This index is used for the 
> following datatypes:
>  * UTF8Type
>  * AsciiType
>  * CompositeType
>  * Frozen types
> This includes the ability to write the index to disk at index creation, by 
> specific index rebuild and during SSTable compaction. 
> Also the ability to query the on-disk index and combine the results with 
> those from the in-memory indexes created by CASSANDRA-18058.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18062) On-disk string index with index building and on-disk query path

2023-02-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18062:
-
Fix Version/s: NA

> On-disk string index with index building and on-disk query path
> ---
>
> Key: CASSANDRA-18062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18062
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: NA
>
>
> An on-disk index for string (literal) datatypes. This index is used for the 
> following datatypes:
>  * UTF8Type
>  * AsciiType
>  * CompositeType
>  * Frozen types
> This includes the ability to write the index to disk at index creation, by 
> specific index rebuild and during SSTable compaction. 
> Also the ability to query the on-disk index and combine the results with 
> those from the in-memory indexes created by CASSANDRA-18058.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18062) On-disk string index with index building and on-disk query path

2023-02-21 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691712#comment-17691712
 ] 

Matt Fleming commented on CASSANDRA-18062:
--

[~maedhroz] Sure can do. I was just monkeying about with the versions to create 
a simpler kanban board, but I'm happy to use NA here.

> On-disk string index with index building and on-disk query path
> ---
>
> Key: CASSANDRA-18062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18062
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 4.x
>
>
> An on-disk index for string (literal) datatypes. This index is used for the 
> following datatypes:
>  * UTF8Type
>  * AsciiType
>  * CompositeType
>  * Frozen types
> This includes the ability to write the index to disk at index creation, by 
> specific index rebuild and during SSTable compaction. 
> Also the ability to query the on-disk index and combine the results with 
> those from the in-memory indexes created by CASSANDRA-18058.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18067) On-disk numeric index

2023-02-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18067:
-
Fix Version/s: 4.x

> On-disk numeric index
> -
>
> Key: CASSANDRA-18067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18067
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 4.x
>
>
> An on-disk numeric index for all datatypes not supported by the on-disk 
> literal index (CASSANDRA-18062)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18062) On-disk string index with index building and on-disk query path

2023-02-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-18062:
-
Fix Version/s: 4.x

> On-disk string index with index building and on-disk query path
> ---
>
> Key: CASSANDRA-18062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18062
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 4.x
>
>
> An on-disk index for string (literal) datatypes. This index is used for the 
> following datatypes:
>  * UTF8Type
>  * AsciiType
>  * CompositeType
>  * Frozen types
> This includes the ability to write the index to disk at index creation, by 
> specific index rebuild and during SSTable compaction. 
> Also the ability to query the on-disk index and combine the results with 
> those from the in-memory indexes created by CASSANDRA-18058.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-17361) STCS documentation on website mentions LCS in title

2022-02-08 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-17361:


 Summary: STCS documentation on website mentions LCS in title
 Key: CASSANDRA-17361
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17361
 Project: Cassandra
  Issue Type: Bug
Reporter: Matt Fleming


The STCS page here, 
[https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/stcs.html,]
 says "Leveled Compaction Strategy" in the title where it should say 
"Size-tiered Compaction Strategy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-11-29 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450581#comment-17450581
 ] 

Matt Fleming commented on CASSANDRA-16840:
--

Hi Aleks :)

Yeah, I agree this should have a test. I'll add one.

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-11-26 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449654#comment-17449654
 ] 

Matt Fleming commented on CASSANDRA-16840:
--

Is there anyone available and has the interest to review this patch?

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17057) Refactor and support pluggable failure detection

2021-11-02 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437240#comment-17437240
 ] 

Matt Fleming commented on CASSANDRA-17057:
--

Ah, you're right. I completely missed that pluggable failure detection support 
was added as part of CEP-10. Sorry for the noise.

> Refactor and support pluggable failure detection
> 
>
> Key: CASSANDRA-17057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17057
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Matt Fleming
>Priority: Normal
>
> Making it possible to supply custom failure detectors enables supporting new 
> failure detection algorithms and makes testing using mocks much easier.
> The general idea is to introduce a new config parameter, such as 
> org.apache.cassandra.custom_failure_detector_class, that specifies the 
> failure detection class to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-17059) Support adding custom verbs at runtime

2021-10-22 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-17059:


 Summary: Support adding custom verbs at runtime
 Key: CASSANDRA-17059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17059
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matt Fleming


Cassandra already has support for registering custom verbs at build time, but 
there's value in allowing verbs to be added at runtime since that enables new 
use cases where it's inconvenient or impossible to modify the Cassandra source.

Additionally, apps that went to register new verbs benefit from running custom 
code after the default verb handlers execute. This can be achieved with 
straightforward modifications to the Sink interface (e.g. adding a PostSink 
class).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-17058) Refactor and support pluggable cluster membership

2021-10-22 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-17058:


 Summary: Refactor and support pluggable cluster membership
 Key: CASSANDRA-17058
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17058
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matt Fleming


Allowing users to specify a classes that implement a new 
CustomTokenMetadataProvider class makes cluster membership pluggable 
(supporting custom code) and makes testing much easier.

Users could specify the cluster membership algorithm using a new config 
parameter such as org.apache.cassandra.token_metadata_provider_class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-17057) Refactor and support pluggable failure detection

2021-10-22 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-17057:


 Summary: Refactor and support pluggable failure detection
 Key: CASSANDRA-17057
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17057
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matt Fleming


Making it possible to supply custom failure detectors enables supporting new 
failure detection algorithms and makes testing using mocks much easier.

The general idea is to introduce a new config parameter, such as 
org.apache.cassandra.custom_failure_detector_class, that specifies the failure 
detection class to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-09-16 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16840:
-
Test and Documentation Plan: Patch available here, 
https://github.com/mfleming/cassandra/commit/ff07793d04823d39735190d930260bfeea6df59f
 Status: Patch Available  (was: In Progress)

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-09-16 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming reassigned CASSANDRA-16840:


Assignee: Matt Fleming

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-08-10 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16840:
-
Source Control Link: 
https://github.com/mfleming/cassandra/commit/ff07793d04823d39735190d930260bfeea6df59f

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-08-10 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16840:
-
Source Control Link: 
https://github.com/mfleming/cassandra/commit/ff07793d04823d39735190d930260bfeea6df59f

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-08-10 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16840:
-
Source Control Link:   (was: 
https://github.com/mfleming/cassandra/commit/ff07793d04823d39735190d930260bfeea6df59f)

> Close native transport port before hint transfer during decommission
> 
>
> Key: CASSANDRA-16840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Hints
>Reporter: Matt Fleming
>Priority: Normal
> Fix For: 4.x
>
>
> New hints can be generated on a node when it's decommissioning which is a 
> problem if the node has already started hint transfer because any hints that 
> come in after the transfer has begun will remain on-disk and not be 
> transferred to a peer.
> You can work around this problem by manually closing the native transport 
> port before starting the decommission with {{nodetool disablebinary}} but it 
> feels like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16840) Close native transport port before hint transfer during decommission

2021-08-10 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-16840:


 Summary: Close native transport port before hint transfer during 
decommission
 Key: CASSANDRA-16840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16840
 Project: Cassandra
  Issue Type: Improvement
  Components: Consistency/Hints
Reporter: Matt Fleming


New hints can be generated on a node when it's decommissioning which is a 
problem if the node has already started hint transfer because any hints that 
come in after the transfer has begun will remain on-disk and not be transferred 
to a peer.

You can work around this problem by manually closing the native transport port 
before starting the decommission with {{nodetool disablebinary}} but it feels 
like something we might want to do automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-21 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349349#comment-17349349
 ] 

Matt Fleming commented on CASSANDRA-16668:
--

[~adelapena] your changes look good! Thanks for doing that. +1

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-18 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346892#comment-17346892
 ] 

Matt Fleming commented on CASSANDRA-16668:
--

I think that failure is probably unrelated because we saw similar failures for 
shutdownTest before this patch here 
[https://app.circleci.com/pipelines/github/adelapena/cassandra/441/workflows/bcf154ff-0b56-48ed-9f82-6b3e395f53ed/jobs/3880/tests#failed-test-0]

Btw, I've also written a new unit test to catch this bug in the future: 
[https://github.com/mfleming/cassandra/commit/b4f43608c9a8db23a622608804d95629616a66da]
 

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-13 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-16668:


 Summary: Intermittent failure of 
SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
condition when shrinking maximum pool size to zero
 Key: CASSANDRA-16668
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
 Project: Cassandra
  Issue Type: Bug
Reporter: Matt Fleming


A difficult-to-hit race condition exists in 
changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool size 
from 0 -> 4 which results in the test failing like so:

{{junit.framework.AssertionFailedError: Test tasks did not hit max concurrency 
goal expected: but was:junit.framework.AssertionFailedError: Test 
tasks did not hit max concurrency goal expected: but was: at 
org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
 at 
org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}

I can hit this issue maybe 2/3 times for every 100 invocations of the unit test.

The issue that causes the failure is that if tasks are still enqueued when the 
maximum pool size is set to zero and if all of the SEPWorker threads enter the 
STOP state before the pool size is bumped to 4, then no SEPWorker threads will 
be spun up to service the task queue. This causes the above error.

Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
guard logic in addTask: 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]

In this scenario taskPermits will not be zero (because we have tasks on the 
queue) so we never call {{maybeStartSpinningWorker()}}.

A trick to make this issue much easier to hit is to insert a 
{{Thread.sleep(500)}} immediately after setting the pool size to zero. This has 
the effect of guaranteeing that all SEPWorker threads will be STOP'd before 
enqueueing more work.

Here's a fix that attempts to spin up an SEPWorker whenever we grow the number 
of work permits: 
https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16632) Add gossip tests from CASSANDRA-16588

2021-04-26 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16632:
-
Authors: Matt Fleming
Description: 
While working on CASSANDRA-16588 I had some tests that were very useful for 
getting the cluster gossip state into particular configurations and they even 
caught some oversights in the original suggestion for CASSANDRA-16588's fix.

Patch here: 
https://github.com/mfleming/cassandra-dtest/commit/f3eb50f33444da3ea599f2d51129b54f2024ead4

  was:While working on CASSANDRA-16588 I had some tests that were very useful 
for getting the cluster gossip state into particular configurations and they 
even caught some oversights in the original suggestion for CASSANDRA-16588's 
fix.


> Add gossip tests from CASSANDRA-16588
> -
>
> Key: CASSANDRA-16632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16632
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> While working on CASSANDRA-16588 I had some tests that were very useful for 
> getting the cluster gossip state into particular configurations and they even 
> caught some oversights in the original suggestion for CASSANDRA-16588's fix.
> Patch here: 
> https://github.com/mfleming/cassandra-dtest/commit/f3eb50f33444da3ea599f2d51129b54f2024ead4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16632) Add gossip tests from CASSANDRA-16588

2021-04-26 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16632:
-
Component/s: Test/dtest/python

> Add gossip tests from CASSANDRA-16588
> -
>
> Key: CASSANDRA-16632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16632
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> While working on CASSANDRA-16588 I had some tests that were very useful for 
> getting the cluster gossip state into particular configurations and they even 
> caught some oversights in the original suggestion for CASSANDRA-16588's fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16632) Add gossip tests from CASSANDRA-16588

2021-04-26 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-16632:


 Summary: Add gossip tests from CASSANDRA-16588
 Key: CASSANDRA-16632
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16632
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matt Fleming


While working on CASSANDRA-16588 I had some tests that were very useful for 
getting the cluster gossip state into particular configurations and they even 
caught some oversights in the original suggestion for CASSANDRA-16588's fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-22 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329194#comment-17329194
 ] 

Matt Fleming commented on CASSANDRA-16623:
--

I think the big problem is that run_dtests.py doesn't actually provide any 
useful output (see the suspected issue with pipe buffering mentioned in the GH 
PR) which makes it a bad introduction for people with less experience. 
Regardless of whether the execute dtest passes or fails, nothing is displayed 
the user after the "test session starts" line.

> Remove references to run_dtests from README
> ---
>
> Key: CASSANDRA-16623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Low
> Fix For: 4.0.x
>
>
> Newcomers to cassandra-dtest that look through README.md will see that the 
> run_dtests.py script is the quickest way to get started running tests. 
> Unfortunately, the script has a number of problems and I'm not sure it ever 
> work properly after the move to the pytest framework.
> h2. Process stdout/stderr buffering
> Firstly, when I execute run_dtests.py I don't see any output after
> {{$ ./run_dtests.py --dtest-tests paging_test.py }}
> {{= test session starts 
> ==}}
> This looks likely to be because of the buffering that pytest does internally 
> for stdout and stderr and because of the way that it's executed by 
> run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
> line for most of the execution because there's no data available in the pipe 
> for stderr:
> {{stderr_output = sp.stderr.readline()}}
> See also [https://github.com/pytest-dev/pytest/issues/1886]
> h2. --pytest-options doesn't work
> Secondly, the options specified in --pytest-options aren't actually passed 
> through to pytest.
> h2. Most devs run pytest directly
> When I spoke to [~edimitrova] it seemed like most developers just run the 
> tests directly with pytest which would explain why run_dtests.py has 
> bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16623:
-
Description: 
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering

Firstly, when I execute run_dtests.py I don't see any output after

{{$ ./run_dtests.py --dtest-tests paging_test.py }}
{{= test session starts 
==}}

This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

{{stderr_output = sp.stderr.readline()}}

See also [https://github.com/pytest-dev/pytest/issues/1886]
h2. --pytest-options doesn't work

Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly

When I spoke to [~edimitrova] it seemed like most developers just run the tests 
directly with pytest which would explain why run_dtests.py has bitrotted.

  was:
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering

Firstly, when I execute run_dtests.py I don't see any output after

{{$ ./run_dtests.py --dtest-tests paging_test.py }}
{{ = test session starts 
==}}


 This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

{{stderr_output = sp.stderr.readline()}}


 See also https://github.com/pytest-dev/pytest/issues/1886
h2. --pytest-options doesn't work

Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly

When I spoke to [~edimitrova] it seemed like most developers just run the tests 
directly with pytest which would explain why run_dtests.py has bitrotted.


> Remove references to run_dtests from README
> ---
>
> Key: CASSANDRA-16623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> Newcomers to cassandra-dtest that look through README.md will see that the 
> run_dtests.py script is the quickest way to get started running tests. 
> Unfortunately, the script has a number of problems and I'm not sure it ever 
> work properly after the move to the pytest framework.
> h2. Process stdout/stderr buffering
> Firstly, when I execute run_dtests.py I don't see any output after
> {{$ ./run_dtests.py --dtest-tests paging_test.py }}
> {{= test session starts 
> ==}}
> This looks likely to be because of the buffering that pytest does internally 
> for stdout and stderr and because of the way that it's executed by 
> run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
> line for most of the execution because there's no data available in the pipe 
> for stderr:
> {{stderr_output = sp.stderr.readline()}}
> See also [https://github.com/pytest-dev/pytest/issues/1886]
> h2. --pytest-options doesn't work
> Secondly, the options specified in --pytest-options aren't actually passed 
> through to pytest.
> h2. Most devs run pytest directly
> When I spoke to [~edimitrova] it seemed like most developers just run the 
> tests directly with pytest which would explain why run_dtests.py has 
> bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16623:
-
Description: 
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering

Firstly, when I execute run_dtests.py I don't see any output after

{{$ ./run_dtests.py --dtest-tests paging_test.py }}
{{ = test session starts 
==}}


 This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

{{stderr_output = sp.stderr.readline()}}


 See also https://github.com/pytest-dev/pytest/issues/1886
h2. --pytest-options doesn't work

Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly

When I spoke to [~edimitrova] it seemed like most developers just run the tests 
directly with pytest which would explain why run_dtests.py has bitrotted.

  was:
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering

Firstly, when I execute run_dtests.py I don't see any output after

$ ./run_dtests.py --dtest-tests paging_test.py 
 = test session starts 
==
 This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

stderr_output = sp.stderr.readline()
 See also pytest-dev/pytest#1886
h2. --pytest-options doesn't work

Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly

When I spoke to [~edimitrova] it seemed like most developers just run the tests 
directly with pytest which would explain why run_dtests.py has bitrotted.


> Remove references to run_dtests from README
> ---
>
> Key: CASSANDRA-16623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> Newcomers to cassandra-dtest that look through README.md will see that the 
> run_dtests.py script is the quickest way to get started running tests. 
> Unfortunately, the script has a number of problems and I'm not sure it ever 
> work properly after the move to the pytest framework.
> h2. Process stdout/stderr buffering
> Firstly, when I execute run_dtests.py I don't see any output after
> {{$ ./run_dtests.py --dtest-tests paging_test.py }}
> {{ = test session starts 
> ==}}
>  This looks likely to be because of the buffering that pytest does internally 
> for stdout and stderr and because of the way that it's executed by 
> run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
> line for most of the execution because there's no data available in the pipe 
> for stderr:
> {{stderr_output = sp.stderr.readline()}}
>  See also https://github.com/pytest-dev/pytest/issues/1886
> h2. --pytest-options doesn't work
> Secondly, the options specified in --pytest-options aren't actually passed 
> through to pytest.
> h2. Most devs run pytest directly
> When I spoke to [~edimitrova] it seemed like most developers just run the 
> tests directly with pytest which would explain why run_dtests.py has 
> bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Fleming updated CASSANDRA-16623:
-
Description: 
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering

Firstly, when I execute run_dtests.py I don't see any output after

$ ./run_dtests.py --dtest-tests paging_test.py 
 = test session starts 
==
 This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

stderr_output = sp.stderr.readline()
 See also pytest-dev/pytest#1886
h2. --pytest-options doesn't work

Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly

When I spoke to [~edimitrova] it seemed like most developers just run the tests 
directly with pytest which would explain why run_dtests.py has bitrotted.

  was:
Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering


Firstly, when I execute run_dtests.py I don't see any output after

$ ./run_dtests.py --dtest-tests paging_test.py 
= test session starts ==
This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

stderr_output = sp.stderr.readline()
See also pytest-dev/pytest#1886
h2. --pytest-options doesn't work


Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly


When I spoke to @ekaterinadimitrova2 it seemed like most developers just run 
the tests directly with pytest which would explain why run_dtests.py has 
bitrotted.


> Remove references to run_dtests from README
> ---
>
> Key: CASSANDRA-16623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> Newcomers to cassandra-dtest that look through README.md will see that the 
> run_dtests.py script is the quickest way to get started running tests. 
> Unfortunately, the script has a number of problems and I'm not sure it ever 
> work properly after the move to the pytest framework.
> h2. Process stdout/stderr buffering
> Firstly, when I execute run_dtests.py I don't see any output after
> $ ./run_dtests.py --dtest-tests paging_test.py 
>  = test session starts 
> ==
>  This looks likely to be because of the buffering that pytest does internally 
> for stdout and stderr and because of the way that it's executed by 
> run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
> line for most of the execution because there's no data available in the pipe 
> for stderr:
> stderr_output = sp.stderr.readline()
>  See also pytest-dev/pytest#1886
> h2. --pytest-options doesn't work
> Secondly, the options specified in --pytest-options aren't actually passed 
> through to pytest.
> h2. Most devs run pytest directly
> When I spoke to [~edimitrova] it seemed like most developers just run the 
> tests directly with pytest which would explain why run_dtests.py has 
> bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326599#comment-17326599
 ] 

Matt Fleming commented on CASSANDRA-16623:
--

A patch to update README.md can be found here 
https://github.com/mfleming/cassandra-dtest/tree/run_dtests

> Remove references to run_dtests from README
> ---
>
> Key: CASSANDRA-16623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Matt Fleming
>Priority: Normal
>
> Newcomers to cassandra-dtest that look through README.md will see that the 
> run_dtests.py script is the quickest way to get started running tests. 
> Unfortunately, the script has a number of problems and I'm not sure it ever 
> work properly after the move to the pytest framework.
> h2. Process stdout/stderr buffering
> Firstly, when I execute run_dtests.py I don't see any output after
> $ ./run_dtests.py --dtest-tests paging_test.py 
> = test session starts 
> ==
> This looks likely to be because of the buffering that pytest does internally 
> for stdout and stderr and because of the way that it's executed by 
> run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
> line for most of the execution because there's no data available in the pipe 
> for stderr:
> stderr_output = sp.stderr.readline()
> See also pytest-dev/pytest#1886
> h2. --pytest-options doesn't work
> Secondly, the options specified in --pytest-options aren't actually passed 
> through to pytest.
> h2. Most devs run pytest directly
> When I spoke to @ekaterinadimitrova2 it seemed like most developers just run 
> the tests directly with pytest which would explain why run_dtests.py has 
> bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16623) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-16623:


 Summary: Remove references to run_dtests from README
 Key: CASSANDRA-16623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16623
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest/python
Reporter: Matt Fleming


Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering


Firstly, when I execute run_dtests.py I don't see any output after

$ ./run_dtests.py --dtest-tests paging_test.py 
= test session starts ==
This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

stderr_output = sp.stderr.readline()
See also pytest-dev/pytest#1886
h2. --pytest-options doesn't work


Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest.
h2. Most devs run pytest directly


When I spoke to @ekaterinadimitrova2 it seemed like most developers just run 
the tests directly with pytest which would explain why run_dtests.py has 
bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16622) Remove references to run_dtests from README

2021-04-21 Thread Matt Fleming (Jira)
Matt Fleming created CASSANDRA-16622:


 Summary: Remove references to run_dtests from README
 Key: CASSANDRA-16622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16622
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest/python
Reporter: Matt Fleming


Newcomers to cassandra-dtest that look through README.md will see that the 
run_dtests.py script is the quickest way to get started running tests. 
Unfortunately, the script has a number of problems and I'm not sure it ever 
work properly after the move to the pytest framework.
h2. Process stdout/stderr buffering


Firstly, when I execute run_dtests.py I don't see any output after

$ ./run_dtests.py --dtest-tests paging_test.py 
= test session starts ==
This looks likely to be because of the buffering that pytest does internally 
for stdout and stderr and because of the way that it's executed by 
run_dtests.py, i.e. I suspect that run_dtests.py is blocked on the following 
line for most of the execution because there's no data available in the pipe 
for stderr:

stderr_output = sp.stderr.readline()
See also pytest-dev/pytest#1886
h2. --pytest-options doesn't work


Secondly, the options specified in --pytest-options aren't actually passed 
through to pytest. I ran into this when trying to get more output during the 
execution of run_dtests.py (see above).
h2. Most devs run pytest directly


When I spoke to @ekaterinadimitrova2 it seemed like most developers just run 
the tests directly with pytest which would explain why run_dtests.py has 
bitrotted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-16 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323917#comment-17323917
 ] 

Matt Fleming commented on CASSANDRA-16588:
--

I think the patch from Sam is a bit too aggressive and will incorrectly think 
that gossip data for the local node that contains dead states ("left", 
"removing", "hibernate", etc) is the bad ACK that we're trying to detect to 
avoid the NPE in isSafeForStartup. You should be able to trigger this by 
assassinating a non-seed node in a cluster.

We should probably filter out deadStates because they won't trigger the NPE.

Something like this 
https://github.com/mfleming/cassandra/commit/e68602ae300e6a2567e1b59efa4229ff3456e521

> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org