[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326172#comment-17326172
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/21/21, 1:11 AM:
---

I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code in 4.0 (with huge clusters and a lot of data), I think it 
sounds probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the latest 
[PR|https://github.com/grighetto/cassandra/pull/6/files] in GitHub. I will 
leave the patch not committed until tomorrow morning if [~jasonstack] or 
[~bereng](or anyone else) has something to add, based on CI or some of my 
statements or anything and I think tomorrow morning I can commit it (if they 
don't do it and there is nothing else to be added here)

 


was (Author: e.dimitrova):
I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code in 4.0 (with huge clusters and a lot of data), I think it 
sounds probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the latest 
[PR|https://github.com/grighetto/cassandra/pull/6/files] in GitHub. I will 
leave the patch not committed until tomorrow morning if [~jasonstack] or 
[~bereng] has something to add, based on CI or some of my statements or 
anything and I think tomorrow morning I can commit it (if they don't do it and 
there is nothing else to be added here)

 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326172#comment-17326172
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/21/21, 12:23 AM:


I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code in 4.0 (with huge clusters and a lot of data), I think it 
sounds probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the latest 
[PR|https://github.com/grighetto/cassandra/pull/6/files] in GitHub. I will 
leave the patch not committed until tomorrow morning if [~jasonstack] or 
[~bereng] has something to add, based on CI or some of my statements or 
anything and I think tomorrow morning I can commit it (if they don't do it and 
there is nothing else to be added here)

 


was (Author: e.dimitrova):
I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code in 4.0 (with huge clusters and a lot of data), I think it 
sounds probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the latest 
[PR|https://github.com/grighetto/cassandra/pull/6/files] in GitHub. I will 
leave the patch not committed until tomorrow morning if [~jasonstack] or 
[~bereng] has something to add, based on CI or some of my statements and I 
think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326172#comment-17326172
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/21/21, 12:22 AM:


I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code in 4.0 (with huge clusters and a lot of data), I think it 
sounds probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the latest 
[PR|https://github.com/grighetto/cassandra/pull/6/files] in GitHub. I will 
leave the patch not committed until tomorrow morning if [~jasonstack] or 
[~bereng] has something to add, based on CI or some of my statements and I 
think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 


was (Author: e.dimitrova):
I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code (with huge clusters and a lot of data), I think it sounds 
probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the PR. I will leave this until tomorrow morning 
if he or [~bereng] has something to add, based on CI or some of my statements 
and I think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326172#comment-17326172
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/21/21, 12:17 AM:


I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code (with huge clusters and a lot of data), I think it sounds 
probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the PR. I will leave this until tomorrow morning 
if he or [~bereng] has something to add, based on CI or some of my statements 
and I think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 


was (Author: e.dimitrova):
I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code (with huge clusters and a lot of data), I think it sounds 
probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the PR. I will leave this until tomorrow morning 
if he or [~bereng] have something based on CI or some of my statements and I 
think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> 

[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326172#comment-17326172
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16524:
-

I am +1 on the latest version of the patch.

We were brainstorming with [~gianluca] around the usage of  
DEFAULT_MAX_CAPACITY, and after looking at the whole testing done around that 
part of the code (with huge clusters and a lot of data), I think it sounds 
probably reasonable to keep the current limit/behavior.

[~jasonstack] already approved the PR. I will leave this until tomorrow morning 
if he or [~bereng] have something based on CI or some of my statements and I 
think tomorrow morning I can commit it (if they don't do it and there is 
nothing else to be added here)

 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  

[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326159#comment-17326159
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16524:
-

I think our messages crashed :D 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException: 
> writerIndex(8560) + minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.ssl.OpenSslKeyMaterialManager.setKeyMaterial(OpenSslKeyMaterialManager.java:115)
>   at 
> 

[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326158#comment-17326158
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16524:
-

[~gianluca] just shared with me that the same test just failed again in the 
latest trunk build actually:

https://jenkins-cm4.apache.org/job/Cassandra-trunk/451/testReport/junit/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException: 
> writerIndex(8560) + minWritableBytes(1977) exceeds maxCapacity(10240): 
> 

[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Gianluca Righetto (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326157#comment-17326157
 ] 

Gianluca Righetto commented on CASSANDRA-16524:
---

[~e.dimitrova] The third test recently failed on trunk too: 
[https://jenkins-cm4.apache.org/job/Cassandra-trunk/451/]

I agree they look unrelated to the patch.

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException: 
> writerIndex(8560) + minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326149#comment-17326149
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/20/21, 11:09 PM:


Jenkins finished with three failures which seem to me unrelated to the patch:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/]

Two of the tests seem flaky and have failed with the same issue before:

[https://ci-cassandra.apache.org/job/Cassandra-trunk/446/testReport/junit/org.apache.cassandra.net/AsyncPromiseTest/testFailure_cdc/]

[https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cass[…]ctionStrategyCQLTest/stressTestCompactionStrategyManager/|https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cassandra.db.compaction/LongLeveledCompactionStrategyCQLTest/stressTestCompactionStrategyManager/]

 

The third one looks to me as a CI env issue, WDYT?:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/Te[…]sandra_distributed_test_PreviewRepairCoordinatorFastTest/|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/]

 
{code:java}
ERROR 20:56:36 Repair e500b3d0-a21a-11eb-882f-cd9857e77aff failed:
java.lang.IllegalArgumentException: Unknown host specified 
thisreally.should.not.exist.apache.org
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:478)
 at 
org.apache.cassandra.repair.RepairRunnable.getNeighborsAndRanges(RepairRunnable.java:326)
 at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:265)
 at org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:241)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
thisreally.should.not.exist.apache.org: Name or service not known
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
 at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
 at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
 at java.net.InetAddress.getAllByName(InetAddress.java:1193)
 at java.net.InetAddress.getAllByName(InetAddress.java:1127)
 at java.net.InetAddress.getByName(InetAddress.java:1077)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByNameOverrideDefaults(InetAddressAndPort.java:228)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByName(InetAddressAndPort.java:213)
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:472)
 ... 11 common frames omitted{code}
 


was (Author: e.dimitrova):
Jenkins finished with three failures which seem to me unrelated to the patch:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/]

Two of the tests seems flaky and have failed with the same issue before:

[https://ci-cassandra.apache.org/job/Cassandra-trunk/446/testReport/junit/org.apache.cassandra.net/AsyncPromiseTest/testFailure_cdc/]

[https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cass[…]ctionStrategyCQLTest/stressTestCompactionStrategyManager/|https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cassandra.db.compaction/LongLeveledCompactionStrategyCQLTest/stressTestCompactionStrategyManager/]

 

The third one looks to me as a CI env issue, WDYT?:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/Te[…]sandra_distributed_test_PreviewRepairCoordinatorFastTest/|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/]

 
{code:java}
ERROR 20:56:36 Repair e500b3d0-a21a-11eb-882f-cd9857e77aff failed:
java.lang.IllegalArgumentException: Unknown host specified 
thisreally.should.not.exist.apache.org
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:478)
 at 
org.apache.cassandra.repair.RepairRunnable.getNeighborsAndRanges(RepairRunnable.java:326)
 at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:265)
 at 

[jira] [Comment Edited] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326149#comment-17326149
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16524 at 4/20/21, 11:08 PM:


Jenkins finished with three failures which seem to me unrelated to the patch:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/]

Two of the tests seems flaky and have failed with the same issue before:

[https://ci-cassandra.apache.org/job/Cassandra-trunk/446/testReport/junit/org.apache.cassandra.net/AsyncPromiseTest/testFailure_cdc/]

[https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cass[…]ctionStrategyCQLTest/stressTestCompactionStrategyManager/|https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cassandra.db.compaction/LongLeveledCompactionStrategyCQLTest/stressTestCompactionStrategyManager/]

 

The third one looks to me as a CI env issue, WDYT?:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/Te[…]sandra_distributed_test_PreviewRepairCoordinatorFastTest/|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/]

 
{code:java}
ERROR 20:56:36 Repair e500b3d0-a21a-11eb-882f-cd9857e77aff failed:
java.lang.IllegalArgumentException: Unknown host specified 
thisreally.should.not.exist.apache.org
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:478)
 at 
org.apache.cassandra.repair.RepairRunnable.getNeighborsAndRanges(RepairRunnable.java:326)
 at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:265)
 at org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:241)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
thisreally.should.not.exist.apache.org: Name or service not known
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
 at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
 at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
 at java.net.InetAddress.getAllByName(InetAddress.java:1193)
 at java.net.InetAddress.getAllByName(InetAddress.java:1127)
 at java.net.InetAddress.getByName(InetAddress.java:1077)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByNameOverrideDefaults(InetAddressAndPort.java:228)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByName(InetAddressAndPort.java:213)
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:472)
 ... 11 common frames omitted{code}
 


was (Author: e.dimitrova):
Jenkins finished with three failures which seem unrelated to me:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/]

Two of the tests seems flaky and have failed with the same issue before:

[https://ci-cassandra.apache.org/job/Cassandra-trunk/446/testReport/junit/org.apache.cassandra.net/AsyncPromiseTest/testFailure_cdc/]

[https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cass[…]ctionStrategyCQLTest/stressTestCompactionStrategyManager/|https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cassandra.db.compaction/LongLeveledCompactionStrategyCQLTest/stressTestCompactionStrategyManager/]

 

The third one looks to me as a CI env issue, WDYT?:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/Te[…]sandra_distributed_test_PreviewRepairCoordinatorFastTest/|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/]

 
{code:java}
ERROR 20:56:36 Repair e500b3d0-a21a-11eb-882f-cd9857e77aff failed:
java.lang.IllegalArgumentException: Unknown host specified 
thisreally.should.not.exist.apache.org
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:478)
 at 
org.apache.cassandra.repair.RepairRunnable.getNeighborsAndRanges(RepairRunnable.java:326)
 at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:265)
 at 

[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326149#comment-17326149
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16524:
-

Jenkins finished with three failures which seem unrelated to me:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/]

Two of the tests seems flaky and have failed with the same issue before:

[https://ci-cassandra.apache.org/job/Cassandra-trunk/446/testReport/junit/org.apache.cassandra.net/AsyncPromiseTest/testFailure_cdc/]

[https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cass[…]ctionStrategyCQLTest/stressTestCompactionStrategyManager/|https://ci-cassandra.apache.org/job/Cassandra-trunk/447/testReport/junit/org.apache.cassandra.db.compaction/LongLeveledCompactionStrategyCQLTest/stressTestCompactionStrategyManager/]

 

The third one looks to me as a CI env issue, WDYT?:

[https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/Te[…]sandra_distributed_test_PreviewRepairCoordinatorFastTest/|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/testReport/junit.framework/TestSuite/org_apache_cassandra_distributed_test_PreviewRepairCoordinatorFastTest/]

 
{code:java}
ERROR 20:56:36 Repair e500b3d0-a21a-11eb-882f-cd9857e77aff failed:
java.lang.IllegalArgumentException: Unknown host specified 
thisreally.should.not.exist.apache.org
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:478)
 at 
org.apache.cassandra.repair.RepairRunnable.getNeighborsAndRanges(RepairRunnable.java:326)
 at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:265)
 at org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:241)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
thisreally.should.not.exist.apache.org: Name or service not known
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
 at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
 at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
 at java.net.InetAddress.getAllByName(InetAddress.java:1193)
 at java.net.InetAddress.getAllByName(InetAddress.java:1127)
 at java.net.InetAddress.getByName(InetAddress.java:1077)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByNameOverrideDefaults(InetAddressAndPort.java:228)
 at 
org.apache.cassandra.locator.InetAddressAndPort.getByName(InetAddressAndPort.java:213)
 at 
org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:472)
 ... 11 common frames omitted{code}
 

> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns 

[jira] [Comment Edited] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326124#comment-17326124
 ] 

David Capwell edited comment on CASSANDRA-16620 at 4/20/21, 10:06 PM:
--

+1

tested trunk and saw the following

{code}
.build/build-rat.xml:65: Some files have missing or incorrect license 
information. Check RAT report in build/rat.txt for more details!
Unapproved licenses:

  src/java/org/apache/cassandra/utils/Mx4jTool.java

***
{code}


The patch only takes the top 5 lines, which is 3 files at the most; but that 
should be a lot more helpful than 0 we do now.


was (Author: dcapwell):
tested trunk and saw the following

{code}
.build/build-rat.xml:65: Some files have missing or incorrect license 
information. Check RAT report in build/rat.txt for more details!
Unapproved licenses:

  src/java/org/apache/cassandra/utils/Mx4jTool.java

***
{code}


The patch only takes the top 5 lines, which is 3 files at the most; but that 
should be a lot more helpful than 0 we do now.

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16620:
--
Reviewers: David Capwell
   Status: Review In Progress  (was: Patch Available)

tested trunk and saw the following

{code}
.build/build-rat.xml:65: Some files have missing or incorrect license 
information. Check RAT report in build/rat.txt for more details!
Unapproved licenses:

  src/java/org/apache/cassandra/utils/Mx4jTool.java

***
{code}


The patch only takes the top 5 lines, which is 3 files at the most; but that 
should be a lot more helpful than 0 we do now.

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326000#comment-17326000
 ] 

Brandon Williams edited comment on CASSANDRA-16588 at 4/20/21, 9:37 PM:


||Patch|CI||
|[3.11|https://github.com/driftx/cassandra/tree/CASSANDRA-16588]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/691/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/691/pipeline]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-16588-trunk]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/692/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/692/pipeline]|

We have to be careful in the test not to let StorageService get too far in 
joining the ring otherwise all kinds of things start up that are not easily 
restartable, so instead we can test checkForEndpointCollision directly.



was (Author: brandon.williams):

||Patch|CI||
|[3.11|https://github.com/driftx/cassandra/tree/CASSANDRA-16588]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/691/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/691/pipeline]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-16588]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/692/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/692/pipeline]|

We have to be careful in the test not to let StorageService get too far in 
joining the ring otherwise all kinds of things start up that are not easily 
restartable, so instead we can test checkForEndpointCollision directly.


> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16404) Provide a nodetool way of invalidating auth caches

2021-04-20 Thread Alexey Zotov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326113#comment-17326113
 ] 

Alexey Zotov edited comment on CASSANDRA-16404 at 4/20/21, 9:37 PM:


[~blerer] [~sumanth.pasupuleti]

Thanks for the feedback! I've addressed your comments. I'll push refactoring 
for {{NodeTool}} tests that are not related to the invalidation into a separate 
PR.

Basically I'm done with the changes I wanted to make. Could you, please, review 
the changes.

I read about [roles 
hierarchy|https://www.datastax.com/blog/role-based-access-control-cassandra] 
and I feel it would be complicated enough to support them. Basically the 
problem is related to the fact that we cache permissions (permissions, network 
permissions, jmx permissions) after we calculate them using roles hierarchy. 
Permissions invalidation for a user (a role with "login" attribute = true) 
works fine with the current changes. However, permissions invalidation for a 
group role (a role that is granted to other roles) would require tracing the 
hierarchy in a reverse direction and invalidating the whole affected hierarchy 
from cache. It is theoretically possible, but in practice there is one 
directional relation between roles ({{RolesCache extends 
AuthCache>}}). I can take a look to this part further 
if you believe that it needs to be address at this point, also I'd be glad to 
hear a piece of advice on the best way to tackle the hierarchical invalidation.


was (Author: azotcsit):
[~blerer] [~sumanth.pasupuleti]

Thanks for the feedback! I've addressed your comments. I'll push tests 
refactoring into a separate PR.

Basically I'm done with the changes I wanted to make. Could you, please, review 
the changes.

I read about [roles 
hierarchy|https://www.datastax.com/blog/role-based-access-control-cassandra] 
and I feel it would be complicated enough to support them. Basically the 
problem is related to the fact that we cache permissions (permissions, network 
permissions, jmx permissions) after we calculate them using roles hierarchy. 
Permissions invalidation for a user (a role with "login" attribute = true) 
works fine with the current changes. However, permissions invalidation for a 
group role (a role that is granted to other roles) would require tracing the 
hierarchy in a reverse direction and invalidating the whole affected hierarchy 
from cache. It is theoretically possible, but in practice there is one 
directional relation between roles ({{RolesCache extends 
AuthCache>}}). I can take a look to this part further 
if you believe that it needs to be address at this point, also I'd be glad to 
hear a piece of advice on the best way to tackle the hierarchical invalidation.

> Provide a nodetool way of invalidating auth caches
> --
>
> Key: CASSANDRA-16404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16404
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Authorization
>Reporter: Sumanth Pasupuleti
>Assignee: Alexey Zotov
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We currently have nodetool commands to invalidate certain caches like 
> KeyCache, RowCache and CounterCache. 
> Being able to invalidate auth caches as well can come in handy in situations 
> where, critical backend auth changes may need to be in effect right away for 
> all the connections, especially in configurations where cache validity is 
> chosen to be for a longer duration. An example can be that an authenticated 
> user "User1" is no longer authorized to access a table resource "table1" and 
> it is vital that this change is reflected right away, without having to wait 
> for cache expiry/refresh to trigger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16404) Provide a nodetool way of invalidating auth caches

2021-04-20 Thread Alexey Zotov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326113#comment-17326113
 ] 

Alexey Zotov commented on CASSANDRA-16404:
--

[~blerer] [~sumanth.pasupuleti]

Thanks for the feedback! I've addressed your comments. I'll push tests 
refactoring into a separate PR.

Basically I'm done with the changes I wanted to make. Could you, please, review 
the changes.

I read about [roles 
hierarchy|https://www.datastax.com/blog/role-based-access-control-cassandra] 
and I feel it would be complicated enough to support them. Basically the 
problem is related to the fact that we cache permissions (permissions, network 
permissions, jmx permissions) after we calculate them using roles hierarchy. 
Permissions invalidation for a user (a role with "login" attribute = true) 
works fine with the current changes. However, permissions invalidation for a 
group role (a role that is granted to other roles) would require tracing the 
hierarchy in a reverse direction and invalidating the whole affected hierarchy 
from cache. It is theoretically possible, but in practice there is one 
directional relation between roles ({{RolesCache extends 
AuthCache>}}). I can take a look to this part further 
if you believe that it needs to be address at this point, also I'd be glad to 
hear a piece of advice on the best way to tackle the hierarchical invalidation.

> Provide a nodetool way of invalidating auth caches
> --
>
> Key: CASSANDRA-16404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16404
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Authorization
>Reporter: Sumanth Pasupuleti
>Assignee: Alexey Zotov
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We currently have nodetool commands to invalidate certain caches like 
> KeyCache, RowCache and CounterCache. 
> Being able to invalidate auth caches as well can come in handy in situations 
> where, critical backend auth changes may need to be in effect right away for 
> all the connections, especially in configurations where cache validity is 
> chosen to be for a longer duration. An example can be that an authenticated 
> user "User1" is no longer authorized to access a table resource "table1" and 
> it is vital that this change is reflected right away, without having to wait 
> for cache expiry/refresh to trigger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16620:
---
Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16620:
---
Test and Documentation Plan: manual
 Status: Patch Available  (was: Open)

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326077#comment-17326077
 ] 

Michael Semb Wever commented on CASSANDRA-16620:


Patch at 
https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/rat-better-fail-msg

Also adds some needed exclusions to .gitignore
(rat parses .gitignore with ant-style globs)

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16620:
---
Fix Version/s: 4.0.x
   3.11.x
   3.0.x
   2.2.x

> Improve failure message for rat plugin
> --
>
> Key: CASSANDRA-16620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x
>
>
> The RAT plugins failure message isn't informative, only referring to the file 
> (build/rat.txt) where the offending files can be found.
> In CI systems, without access to that file, it is cumbersome to figure out 
> what those offending files are.
> Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16620) Improve failure message for rat plugin

2021-04-20 Thread Michael Semb Wever (Jira)
Michael Semb Wever created CASSANDRA-16620:
--

 Summary: Improve failure message for rat plugin
 Key: CASSANDRA-16620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16620
 Project: Cassandra
  Issue Type: Task
  Components: Build
Reporter: Michael Semb Wever
Assignee: Michael Semb Wever


The RAT plugins failure message isn't informative, only referring to the file 
(build/rat.txt) where the offending files can be found.

In CI systems, without access to that file, it is cumbersome to figure out what 
those offending files are.

Add to the failure message a grep of the first few offending files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16524) Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException

2021-04-20 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326049#comment-17326049
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16524:
-

Jenkins is running 
[here|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/690/] for the 
latest version of the patch.


> Upgrading SSL enabled Cassandra cluster from 3.11.10 to 4.0-beta4 failing 
> with javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException
> --
>
> Key: CASSANDRA-16524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16524
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Alaykumar Barochia
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: system.log.ssl-error.txt
>
>
> Hi,
> We have SSL enabled cluster running on Apache Cassandra 3.11.10 and we are 
> trying to upgrade it to 4.0-beta4 as a part of testing.
> Cluster size is 3x3 and deployed on Azure IaaS.
> {noformat}
> [cassandra@cass-521828978-1-1189299202 ~]$ nodetool status
> Datacenter: southcentral
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.74.31  85.61 KiB  16   32.2% 
> 6db7a7ef-3490-4823-9ff3-c60a32165124  2
> UN  10.12.74.42  263.27 KiB  16   27.6% 
> 7ad99ecf-7c7d-4780-872b-7c68b6b19849  1
> UN  10.12.74.34  85.61 KiB  16   37.8% 
> 41ce16b7-2ab2-44ea-a810-8391f7f3caf2  0
> Datacenter: westus
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>Rack
> UN  10.12.90.11  90.63 KiB  16   38.9% 
> 8d4cdb65-ff66-4bcd-8d4b-a4a0e893a728  2
> UN  10.12.90.6   85.61 KiB  16   34.5% 
> 4f8007e9-fa3e-4e99-a9f9-f7bf9625  1
> UN  10.12.89.80  94.1 KiB   16   28.9% 
> 11f86cb0-c86b-440e-848f-b160118f43d5  0
> {noformat}
> We placed a new 4.0-beta4 binary on the first seed node (10.12.74.310) and 
> starting Cassandra.
> It started throwing the below error:
> {noformat}
> ERROR [Messaging-EventLoop-3-11] 2021-03-15 22:10:05,188 
> InboundConnectionInitiator.java:342 - Failed to properly handshake with peer 
> /10.12.74.42:52356. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: 
> java.lang.IndexOutOfBoundsException: writerIndex(8560) + 
> minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLException: java.lang.IndexOutOfBoundsException: 
> writerIndex(8560) + minWritableBytes(1977) exceeds maxCapacity(10240): 
> BufferPoolAllocator$Wrapped(ridx: 0, widx: 8560, cap: 10240/10240)
>   at 
> 

[jira] [Updated] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-20 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16588:

Status: Ready to Commit  (was: Review In Progress)

bq. We should probably filter out deadStates because they won't trigger the NPE.
 
Thanks, good catch!

The patches look good with [~mfleming]'s fix and the new tests. 
+1 from me if CI looks healthy.

> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-20 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16588:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16595) Remove test parallelism from ant build.xml in all branches

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326020#comment-17326020
 ] 

David Capwell edited comment on CASSANDRA-16595 at 4/20/21, 6:15 PM:
-

[~bereng], the ci-test script isn't related to circle-ci, it is a wrapper 
around ant to match all CIs.

{code}
$ cat ci-test
#!/usr/bin/env bash

#set -o xtrace
set -o errexit
set -o pipefail
set -o nounset

usage() {
  if [[ $# -gt 0 ]]; then
echo "$*" 1>&2
  fi
  cat <&2
  fi
  cat < Remove test parallelism from ant build.xml in all branches
> --
>
> Key: CASSANDRA-16595
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16595
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.20, 3.0.25, 3.11.11, 4.0-rc1, 4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cassandra's build.xml supports parallel test runners. This functionality is 
> available through {{`-Dtest.runners`}} and the {{`testparallel`}} ant macro.
> Having not been actively used and atrophied over time, it breaks a number of 
> tests. The distributed in-jvm tests don't work at all with parallel runners 
> (currently they need `-Dtest.runners=1` specified to work). And there are 
> plenty of flakies, from where
> tests use fixed ports (StorageServiceServerTest), to byteman (eg 
> BMUnitRunner), and around conf files on disk.
> This was raised on the dev ML, where the consensus was to remove it: 
> https://lists.apache.org/thread.html/r1ca3c72b90fa6c57c1cb7dcd02a44221dcca991fe7392abd8c29fe95%40%3Cdev.cassandra.apache.org%3E
> The idea is to then replace ant test parallelism with docker container 
> parallelism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16595) Remove test parallelism from ant build.xml in all branches

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326020#comment-17326020
 ] 

David Capwell commented on CASSANDRA-16595:
---

The ci-test script isn't related to circle-ci, it is a wrapper around ant to 
match all CIs.

{code}
$ cat ci-test
#!/usr/bin/env bash

#set -o xtrace
set -o errexit
set -o pipefail
set -o nounset

usage() {
  if [[ $# -gt 0 ]]; then
echo "$*" 1>&2
  fi
  cat < Remove test parallelism from ant build.xml in all branches
> --
>
> Key: CASSANDRA-16595
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16595
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 2.2.20, 3.0.25, 3.11.11, 4.0-rc1, 4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cassandra's build.xml supports parallel test runners. This functionality is 
> available through {{`-Dtest.runners`}} and the {{`testparallel`}} ant macro.
> Having not been actively used and atrophied over time, it breaks a number of 
> tests. The distributed in-jvm tests don't work at all with parallel runners 
> (currently they need `-Dtest.runners=1` specified to work). And there are 
> plenty of flakies, from where
> tests use fixed ports (StorageServiceServerTest), to byteman (eg 
> BMUnitRunner), and around conf files on disk.
> This was raised on the dev ML, where the consensus was to remove it: 
> https://lists.apache.org/thread.html/r1ca3c72b90fa6c57c1cb7dcd02a44221dcca991fe7392abd8c29fe95%40%3Cdev.cassandra.apache.org%3E
> The idea is to then replace ant test parallelism with docker container 
> parallelism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16607:
--
  Fix Version/s: (was: 4.0-rc)
 4.0-beta
  Since Version: 3.11.0
Source Control Link: 
https://github.com/apache/cassandra/commit/60cf948f8bfdc23e1f718967fdd365fc3da7919d
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326006#comment-17326006
 ] 

Andres de la Peña commented on CASSANDRA-16607:
---

Committed to 3.11 as 
[60cf948f8bfdc23e1f718967fdd365fc3da7919d|https://github.com/apache/cassandra/commit/60cf948f8bfdc23e1f718967fdd365fc3da7919d]
 and [merged into 
trunk|https://github.com/apache/cassandra/commit/4e5bd273c640eb79c4947b22d56a68784b039c52].

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2021-04-20 Thread adelapena
This is an automated email from the ASF dual-hosted git repository.

adelapena pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 4e5bd273c640eb79c4947b22d56a68784b039c52
Merge: 6fa5300 60cf948
Author: Andrés de la Peña 
AuthorDate: Tue Apr 20 18:53:46 2021 +0100

Merge branch 'cassandra-3.11' into trunk

 .../org/apache/cassandra/gms/ShadowRoundTest.java|  6 +++---
 .../cassandra/net/MockMessagingServiceTest.java  |  6 +++---
 .../org/apache/cassandra/net/MockMessagingSpy.java   | 20 +++-
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --cc test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
index 2bcbc50,bc18813..2198821
--- a/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
+++ b/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
@@@ -111,10 -107,10 +111,10 @@@ public class ShadowRoundTes
  }
  
  // we expect one SYN for each seed during shadow round + additional 
SYNs after gossiper has been enabled
- assertTrue(spySyn.messagesIntercepted > noOfSeeds);
+ assertTrue(spySyn.messagesIntercepted() > noOfSeeds);
  
 -// we don't expect to emit any GOSSIP_DIGEST_ACK2 or 
MIGRATION_REQUEST messages
 +// we don't expect to emit any GOSSIP_DIGEST_ACK2 or SCHEMA_PULL 
messages
- assertEquals(0, spyAck2.messagesIntercepted);
- assertEquals(0, spyMigrationReq.messagesIntercepted);
+ assertEquals(0, spyAck2.messagesIntercepted());
+ assertEquals(0, spyMigrationReq.messagesIntercepted());
  }
  }
diff --cc test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
index e4787f7,ab97aaa..8023162
--- a/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
+++ b/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
@@@ -24,7 -25,9 +24,8 @@@ import org.junit.BeforeClass
  import org.junit.Test;
  
  import org.apache.cassandra.SchemaLoader;
+ import org.apache.cassandra.Util;
  import org.apache.cassandra.exceptions.ConfigurationException;
 -import org.apache.cassandra.gms.EchoMessage;
  import org.apache.cassandra.service.StorageService;
  import org.apache.cassandra.utils.FBUtilities;
  
@@@ -33,8 -35,7 +34,7 @@@ import static org.apache.cassandra.net.
  import static org.apache.cassandra.net.MockMessagingService.to;
  import static org.apache.cassandra.net.MockMessagingService.verb;
  import static org.junit.Assert.assertEquals;
 -import static org.junit.Assert.assertTrue;
 +import static org.junit.Assert.assertSame;
- import static org.junit.Assert.assertTrue;
  
  public class MockMessagingServiceTest
  {
@@@ -73,11 -86,11 +73,11 @@@
  });
  
  // we must have intercepted the outgoing message at this point
 -MessageOut msg = spy.captureMessageOut().get();
 +Message msg = spy.captureMessageOut().get();
- assertEquals(1, spy.messagesIntercepted);
+ assertEquals(1, spy.messagesIntercepted());
 -assertTrue(msg == echoMessageOut);
 +assertSame(echoMessage.payload, msg.payload);
  
  // and return a mocked response
- assertEquals(1, spy.mockedMessageResponses);
+ Util.spinAssertEquals(1, spy::mockedMessageResponses, 60);
  }
  }
diff --cc test/unit/org/apache/cassandra/net/MockMessagingSpy.java
index c61c301,2219c5a..2197787
--- a/test/unit/org/apache/cassandra/net/MockMessagingSpy.java
+++ b/test/unit/org/apache/cassandra/net/MockMessagingSpy.java
@@@ -41,11 -41,11 +42,11 @@@ public class MockMessagingSp
  {
  private static final Logger logger = 
LoggerFactory.getLogger(MockMessagingSpy.class);
  
- public int messagesIntercepted = 0;
- public int mockedMessageResponses = 0;
+ private final AtomicInteger messagesIntercepted = new AtomicInteger();
+ private final AtomicInteger mockedMessageResponses = new AtomicInteger();
  
 -private final BlockingQueue> interceptedMessages = new 
LinkedBlockingQueue<>();
 -private final BlockingQueue> deliveredResponses = new 
LinkedBlockingQueue<>();
 +private final BlockingQueue> interceptedMessages = new 
LinkedBlockingQueue<>();
 +private final BlockingQueue> deliveredResponses = new 
LinkedBlockingQueue<>();
  
  private static final Executor executor = 
Executors.newSingleThreadExecutor();
  
@@@ -131,16 -131,26 +132,26 @@@
  return ret;
  }
  
+ public int messagesIntercepted()
+ {
+ return messagesIntercepted.get();
+ }
+ 
+ public int mockedMessageResponses()
+ {
+ return mockedMessageResponses.get();
+ }
+ 
 -void matchingMessage(MessageOut message)
 +void matchingMessage(Message message)
  {
- messagesIntercepted++;
+ messagesIntercepted.incrementAndGet();
  logger.trace("Received matching message: {}", message);
  interceptedMessages.add(message);
  }
  
 -void matchingResponse(MessageIn response)
 +void 

[cassandra] branch cassandra-3.11 updated: Fix flaky MockMessagingServiceTest

2021-04-20 Thread adelapena
This is an automated email from the ASF dual-hosted git repository.

adelapena pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.11 by this push:
 new 60cf948  Fix flaky MockMessagingServiceTest
60cf948 is described below

commit 60cf948f8bfdc23e1f718967fdd365fc3da7919d
Author: Andrés de la Peña 
AuthorDate: Tue Apr 20 18:46:55 2021 +0100

Fix flaky MockMessagingServiceTest

patch by Andrés de la Peña; reviewed by Benjamin Lerer for CASSANDRA-16607
---
 .../org/apache/cassandra/gms/ShadowRoundTest.java|  6 +++---
 .../cassandra/net/MockMessagingServiceTest.java  |  5 +++--
 .../org/apache/cassandra/net/MockMessagingSpy.java   | 20 +++-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java 
b/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
index f8cc49c..bc18813 100644
--- a/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
+++ b/test/unit/org/apache/cassandra/gms/ShadowRoundTest.java
@@ -107,10 +107,10 @@ public class ShadowRoundTest
 }
 
 // we expect one SYN for each seed during shadow round + additional 
SYNs after gossiper has been enabled
-assertTrue(spySyn.messagesIntercepted > noOfSeeds);
+assertTrue(spySyn.messagesIntercepted() > noOfSeeds);
 
 // we don't expect to emit any GOSSIP_DIGEST_ACK2 or MIGRATION_REQUEST 
messages
-assertEquals(0, spyAck2.messagesIntercepted);
-assertEquals(0, spyMigrationReq.messagesIntercepted);
+assertEquals(0, spyAck2.messagesIntercepted());
+assertEquals(0, spyMigrationReq.messagesIntercepted());
 }
 }
diff --git a/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java 
b/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
index 3f6564e..ab97aaa 100644
--- a/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
+++ b/test/unit/org/apache/cassandra/net/MockMessagingServiceTest.java
@@ -25,6 +25,7 @@ import org.junit.BeforeClass;
 import org.junit.Test;
 
 import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.Util;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.gms.EchoMessage;
 import org.apache.cassandra.service.StorageService;
@@ -86,10 +87,10 @@ public class MockMessagingServiceTest
 
 // we must have intercepted the outgoing message at this point
 MessageOut msg = spy.captureMessageOut().get();
-assertEquals(1, spy.messagesIntercepted);
+assertEquals(1, spy.messagesIntercepted());
 assertTrue(msg == echoMessageOut);
 
 // and return a mocked response
-assertEquals(1, spy.mockedMessageResponses);
+Util.spinAssertEquals(1, spy::mockedMessageResponses, 60);
 }
 }
diff --git a/test/unit/org/apache/cassandra/net/MockMessagingSpy.java 
b/test/unit/org/apache/cassandra/net/MockMessagingSpy.java
index 80bdb39..2219c5a 100644
--- a/test/unit/org/apache/cassandra/net/MockMessagingSpy.java
+++ b/test/unit/org/apache/cassandra/net/MockMessagingSpy.java
@@ -24,6 +24,7 @@ import java.util.concurrent.Executor;
 import java.util.concurrent.Executors;
 import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import com.google.common.util.concurrent.AbstractFuture;
 import com.google.common.util.concurrent.Futures;
@@ -40,8 +41,8 @@ public class MockMessagingSpy
 {
 private static final Logger logger = 
LoggerFactory.getLogger(MockMessagingSpy.class);
 
-public int messagesIntercepted = 0;
-public int mockedMessageResponses = 0;
+private final AtomicInteger messagesIntercepted = new AtomicInteger();
+private final AtomicInteger mockedMessageResponses = new AtomicInteger();
 
 private final BlockingQueue> interceptedMessages = new 
LinkedBlockingQueue<>();
 private final BlockingQueue> deliveredResponses = new 
LinkedBlockingQueue<>();
@@ -130,21 +131,30 @@ public class MockMessagingSpy
 return ret;
 }
 
+public int messagesIntercepted()
+{
+return messagesIntercepted.get();
+}
+
+public int mockedMessageResponses()
+{
+return mockedMessageResponses.get();
+}
+
 void matchingMessage(MessageOut message)
 {
-messagesIntercepted++;
+messagesIntercepted.incrementAndGet();
 logger.trace("Received matching message: {}", message);
 interceptedMessages.add(message);
 }
 
 void matchingResponse(MessageIn response)
 {
-mockedMessageResponses++;
+mockedMessageResponses.incrementAndGet();
 logger.trace("Responding to intercepted message: {}", response);
 deliveredResponses.add(response);
 }
 
-
 private static class CapturedResultsFuture extends 
AbstractFuture> 

[cassandra] branch trunk updated (6fa5300 -> 4e5bd27)

2021-04-20 Thread adelapena
This is an automated email from the ASF dual-hosted git repository.

adelapena pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 6fa5300  Harden internode message resource limit accounting against 
serialization failures
 new 60cf948  Fix flaky MockMessagingServiceTest
 new 4e5bd27  Merge branch 'cassandra-3.11' into trunk

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/cassandra/gms/ShadowRoundTest.java|  6 +++---
 .../cassandra/net/MockMessagingServiceTest.java  |  6 +++---
 .../org/apache/cassandra/net/MockMessagingSpy.java   | 20 +++-
 3 files changed, 21 insertions(+), 11 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-20 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16588:
-
Status: Patch Available  (was: In Progress)

> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

2021-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326000#comment-17326000
 ] 

Brandon Williams commented on CASSANDRA-16588:
--


||Patch|CI||
|[3.11|https://github.com/driftx/cassandra/tree/CASSANDRA-16588]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/691/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/691/pipeline]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-16588]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/692/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/692/pipeline]|

We have to be careful in the test not to let StorageService get too far in 
joining the ring otherwise all kinds of things start up that are not easily 
restartable, so instead we can test checkForEndpointCollision directly.


> NPE getting host_id in Gossiper.isSafeForStartup
> 
>
> Key: CASSANDRA-16588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>   at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16616:
--
  Fix Version/s: (was: 4.0-rc)
 4.0
 4.0-rc1
  Since Version: 4.0-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/3918a67e67d2de8064dc98beb5166a5491c80b1e
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc1, 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 6fa5300  Harden internode message resource limit accounting against 
serialization failures
6fa5300 is described below

commit 6fa5300682fbfcbaaae9d4593a015c18ab34df1f
Author: Jon Meredith 
AuthorDate: Tue Apr 20 09:41:05 2021 -0700

Harden internode message resource limit accounting against serialization 
failures

patch by Jon Meredith; reviewed by Benjamin Lerer, David Capwell for 
CASSANDRA-16616
---
 CHANGES.txt|  1 +
 .../UnrecoverableIllegalStateException.java| 26 +
 .../apache/cassandra/net/OutboundConnection.java   |  4 +--
 .../org/apache/cassandra/net/ResourceLimits.java   | 14 -
 .../cassandra/utils/JVMStabilityInspector.java |  5 
 .../apache/cassandra/net/ResourceLimitsTest.java   | 34 +-
 6 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 6dd8bab..108a762 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-rc1
+ * Harden internode message resource limit accounting against serialization 
failures (CASSANDRA-16616)
  * Add back the source release of python driver in tree to avoid fetching from 
GitHub APIs (CASSANDRA-16599)
  * Fix false unavailable for queries due to cluster topology changes 
(CASSANDRA-16545)
  * Fixed a race condition issue in nodetool repair where we poll for the error 
before seeing the error notification, leading to a less meaningful message 
(CASSANDRA-16585)
diff --git 
a/src/java/org/apache/cassandra/exceptions/UnrecoverableIllegalStateException.java
 
b/src/java/org/apache/cassandra/exceptions/UnrecoverableIllegalStateException.java
new file mode 100644
index 000..2193630
--- /dev/null
+++ 
b/src/java/org/apache/cassandra/exceptions/UnrecoverableIllegalStateException.java
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.exceptions;
+
+public class UnrecoverableIllegalStateException extends RuntimeException
+{
+public UnrecoverableIllegalStateException(String message) {
+super(message);
+}
+}
diff --git a/src/java/org/apache/cassandra/net/OutboundConnection.java 
b/src/java/org/apache/cassandra/net/OutboundConnection.java
index 98c034c..82eb6ce 100644
--- a/src/java/org/apache/cassandra/net/OutboundConnection.java
+++ b/src/java/org/apache/cassandra/net/OutboundConnection.java
@@ -470,10 +470,10 @@ public class OutboundConnection
  */
 private boolean onExpired(Message message)
 {
+noSpamLogger.warn("{} dropping message of type {} whose timeout 
expired before reaching the network", id(), message.verb());
 releaseCapacity(1, canonicalSize(message));
 expiredCount += 1;
 expiredBytes += canonicalSize(message);
-noSpamLogger.warn("{} dropping message of type {} whose timeout 
expired before reaching the network", id(), message.verb());
 callbacks.onExpired(message, template.to);
 return true;
 }
@@ -485,11 +485,11 @@ public class OutboundConnection
  */
 private void onFailedSerialize(Message message, int messagingVersion, 
int bytesWrittenToNetwork, Throwable t)
 {
+logger.warn("{} dropping message of type {} due to error", id(), 
message.verb(), t);
 JVMStabilityInspector.inspectThrowable(t);
 releaseCapacity(1, canonicalSize(message));
 errorCount += 1;
 errorBytes += message.serializedSize(messagingVersion);
-logger.warn("{} dropping message of type {} due to error", id(), 
message.verb(), t);
 callbacks.onFailedSerialize(message, template.to, messagingVersion, 
bytesWrittenToNetwork, t);
 }
 
diff --git a/src/java/org/apache/cassandra/net/ResourceLimits.java 
b/src/java/org/apache/cassandra/net/ResourceLimits.java
index 7658d5f..8899040 100644
--- a/src/java/org/apache/cassandra/net/ResourceLimits.java
+++ b/src/java/org/apache/cassandra/net/ResourceLimits.java
@@ -17,6 +17,8 @@
  

[jira] [Commented] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325962#comment-17325962
 ] 

David Capwell commented on CASSANDRA-16616:
---

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-16616-trunk-82FED7F5-8760-4D6D-95B0-84E9D9393B22]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-16616-trunk-82FED7F5-8760-4D6D-95B0-84E9D9393B22]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/688/]|


> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16566) Fix test testIsrepairedArg - org.apache.cassandra.tools.SSTableRepairedAtSetterTest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16566:
---
Resolution: Duplicate
Status: Resolved  (was: Open)

Closing for now. We will reopen if it reappear.

> Fix test testIsrepairedArg - 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest
> ---
>
> Key: CASSANDRA-16566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16566
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/unit
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/874/workflows/0b0a1e36-107a-43c7-815f-bf8e61d3028d/jobs/5227/tests
> {code}
> junit.framework.AssertionFailedError: 
> [org.apache.cassandra.tools.SSTableRepairedAtSetter,
> --really-set,
> --is-repaired,
> 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db]
> exited with code -1
> stderr:
> java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db
>   at 
> org.apache.cassandra.tools.ToolRunner.runClassAsTool(ToolRunner.java:102)
>   at org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:249)
>   at org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:245)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeSupplier(ToolRunner.java:305)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:253)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:235)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest.testIsrepairedArg(SSTableRepairedAtSetterTest.java:81)
> Caused by: java.nio.file.NoSuchFileException: 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>   at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>   at java.nio.file.Files.readAttributes(Files.java:1737)
>   at java.nio.file.Files.getLastModifiedTime(Files.java:2266)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetter.main(SSTableRepairedAtSetter.java:90)
>   at 
> org.apache.cassandra.tools.ToolRunner.runClassAsTool(ToolRunner.java:82)
> stdout:
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertExitCode(ToolRunner.java:408)
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertOnExitCode(ToolRunner.java:402)
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertOnCleanExit(ToolRunner.java:450)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest.testIsrepairedArg(SSTableRepairedAtSetterTest.java:85)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16547) Prioritisation for sized-tier and TW compactions is based on outdated estimation

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16547:
---
Fix Version/s: (was: 4.0-rc)
   4.0.x
   3.11.x
   3.0.x

> Prioritisation for sized-tier and TW compactions is based on outdated 
> estimation
> 
>
> Key: CASSANDRA-16547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16547
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Due to the way getEstimatedRemainingTasks() works, it looks that compactions 
> prioritisation are based on outdated estimations (see 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionStrategyHolder.java#L109).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16566) Fix test testIsrepairedArg - org.apache.cassandra.tools.SSTableRepairedAtSetterTest

2021-04-20 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325943#comment-17325943
 ] 

Adam Holmberg commented on CASSANDRA-16566:
---

If it was parallelism, I'm wondering if this became a non-issue following 
CASSANDRA-16595. Close for now?

> Fix test testIsrepairedArg - 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest
> ---
>
> Key: CASSANDRA-16566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16566
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/unit
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/874/workflows/0b0a1e36-107a-43c7-815f-bf8e61d3028d/jobs/5227/tests
> {code}
> junit.framework.AssertionFailedError: 
> [org.apache.cassandra.tools.SSTableRepairedAtSetter,
> --really-set,
> --is-repaired,
> 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db]
> exited with code -1
> stderr:
> java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db
>   at 
> org.apache.cassandra.tools.ToolRunner.runClassAsTool(ToolRunner.java:102)
>   at org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:249)
>   at org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:245)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeSupplier(ToolRunner.java:305)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:253)
>   at 
> org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:235)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest.testIsrepairedArg(SSTableRepairedAtSetterTest.java:81)
> Caused by: java.nio.file.NoSuchFileException: 
> /tmp/cassandra/build/test/cassandra/data/legacy_sstables/legacy_ma_simple/ma-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>   at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>   at java.nio.file.Files.readAttributes(Files.java:1737)
>   at java.nio.file.Files.getLastModifiedTime(Files.java:2266)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetter.main(SSTableRepairedAtSetter.java:90)
>   at 
> org.apache.cassandra.tools.ToolRunner.runClassAsTool(ToolRunner.java:82)
> stdout:
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertExitCode(ToolRunner.java:408)
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertOnExitCode(ToolRunner.java:402)
>   at 
> org.apache.cassandra.tools.ToolRunner$ToolResult.assertOnCleanExit(ToolRunner.java:450)
>   at 
> org.apache.cassandra.tools.SSTableRepairedAtSetterTest.testIsrepairedArg(SSTableRepairedAtSetterTest.java:85)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16619:
---
Fix Version/s: (was: 4.0)
   4.0.x

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16199) cassandra.logdir undefined when CASSANDRA_LOG_DIR

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16199:
---
Fix Version/s: (was: 4.0-rc)
   4.0.x

> cassandra.logdir undefined when CASSANDRA_LOG_DIR
> -
>
> Key: CASSANDRA-16199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16199
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Cyril Scetbon
>Assignee: Brian Houser
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> When ${cassandra.logdir} is used in logback.xml nodetool doesn’t use the env 
> variable CASSANDRA_LOG_DIR or the default value. and complains
> {noformat}
> 03:07:27,387 |-ERROR in 
> ch.qos.logback.core.rolling.RollingFileAppender[DEBUGLOG] - Failed to create 
> parent directories for [/cassandra.logdir_IS_UNDEFINED/debug.log]03:07:27,387 
> |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[DEBUGLOG] - Failed 
> to create parent directories for 
> [/cassandra.logdir_IS_UNDEFINED/debug.log]03:07:27,388 |-ERROR in 
> ch.qos.logback.core.rolling.RollingFileAppender[DEBUGLOG] - 
> openFile(cassandra.logdir_IS_UNDEFINED/debug.log,true) call failed. 
> java.io.FileNotFoundException: cassandra.logdir_IS_UNDEFINED/debug.log (No 
> such file or directory) at java.io.FileNotFoundException: 
> cassandra.logdir_IS_UNDEFINED/debug.log (No such file or directory)
> ...{noformat}
> It’s different for cassandra for instance 
> [https://github.com/apache/cassandra/blob/324267b3c0676ad31bd4f2fac0e2e673a9257a37/bin/cassandra#L186].
>  I feel like it should be added to 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/bin/nodetool],
>  or that it should call cassandra-env.sh
>  
> Seen on 3.11 and 4.0-beta1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16592) The token function in where clause return incorrect data when using token equal condition and Specified a non-exist token value

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16592:
---
Fix Version/s: (was: 4.0-rc)
   4.0.x

> The token function in where clause return incorrect data when using token 
> equal condition and Specified a non-exist token value
> ---
>
> Key: CASSANDRA-16592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: cimon
>Assignee: cimon
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> I get incorrect value when use query like 'select Token(pk1,pk2),pk1,pk2 from 
> ks.table1 where token(pk1,pk2) = tokenValue'. The returned token value 
> mismatch the where condition.
> This problem is reproduced in 3.11.3 and 4.0.
> Here is my schema and select statement
> {code:java}
> // schema
> cqlsh> desc testprefix.cprefix_03 ;CREATE TABLE testprefix.cprefix_03 (
> pk1 int,
> pk2 int,
> ck1 text,
> ck2 text,
> t1 int,
> PRIMARY KEY ((pk1, pk2), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 ASC, ck2 ASC)
> AND additional_write_policy = '99p'
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';
> {code}
> execute cql query
> {code:java}
> // code placeholder
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9223372036854775808 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2) from testprefix.cprefix_03 where pk1 = 394560 
> and pk2 = 3394560 LIMIT 2; 
> system.token(pk1, pk2)
> 
>-9222849988925915479
>-9222849988925915479
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9222849988925915479 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows){code}
> we can find  that token value in the condition  are inconsistent with the 
> values in the result.
> 
> Then review the source code, to seek the anwser. 
> {code:java}
> // code placeholder
> private static void addRange(SSTableReader sstable, 
> AbstractBounds requested, 
> List> boundsList)
> {
> if (requested instanceof Range && ((Range)requested).isWrapAround())
> //  first condition
> {
> if (requested.right.compareTo(sstable.first) >= 0)
> {
> // since we wrap, we must contain the whole sstable prior to 
> stopKey()
> Boundary left = new 
> Boundary(sstable.first, true);
> Boundary right;
> right = requested.rightBoundary();
> right = minRight(right, sstable.last, true);
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> if (requested.left.compareTo(sstable.last) <= 0)
> {
> // since we wrap, we must contain the whole sstable after 
> dataRange.startKey()
> Boundary right = new 
> Boundary(sstable.last, true);
> Boundary left;
> left = requested.leftBoundary();
> left = maxLeft(left, sstable.first, true); // second condition
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> }
> else
> {
> assert requested.left.compareTo(requested.right) <= 0 || 
> requested.right.isMinimum();
> Boundary left, right;
> left = requested.leftBoundary();
> right = requested.rightBoundary();
> left = maxLeft(left, sstable.first, true);
> // apparently isWrapAround() doesn't count Bounds that extend to 

[jira] [Commented] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325941#comment-17325941
 ] 

Jon Meredith commented on CASSANDRA-16616:
--

Thanks for the speedy reviews.

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16616:
--
Status: Ready to Commit  (was: Review In Progress)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16596) Test org.apache.cassandra.net.AsyncPromiseTest FAILED

2021-04-20 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg reassigned CASSANDRA-16596:
-

Assignee: Adam Holmberg

> Test org.apache.cassandra.net.AsyncPromiseTest FAILED
> -
>
> Key: CASSANDRA-16596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16596
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
>
> Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
> problem:
> [Test org.apache.cassandra.net.AsyncPromiseTest 
> FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
> CircleCI 
> [failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]
> {noformat}
> Testcase: testFailure(org.apache.cassandra.net.AsyncPromiseTest):   FAILED
> 8
> junit.framework.AssertionFailedError: 8
> at 
> org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:53)
> at 
> org.apache.cassandra.net.TestAbstractAsyncPromise.testOneFailure(TestAbstractAsyncPromise.java:178)
> at 
> org.apache.cassandra.net.AsyncPromiseTest.testFailure(AsyncPromiseTest.java:54)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Caused by: java.util.concurrent.TimeoutException
> at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
> at 
> org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:49)
> Test org.apache.cassandra.net.AsyncPromiseTest FAILED
> {noformat}
> Also Caleb mentioned he is seeing it in another CI infra too
> CC [~maedhroz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16596) Test org.apache.cassandra.net.AsyncPromiseTest FAILED

2021-04-20 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg updated CASSANDRA-16596:
--
Description: 
Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
problem:

[Test org.apache.cassandra.net.AsyncPromiseTest 
FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
CircleCI 
[failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]

{noformat}
Testcase: testFailure(org.apache.cassandra.net.AsyncPromiseTest):   FAILED
8
junit.framework.AssertionFailedError: 8
at 
org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:53)
at 
org.apache.cassandra.net.TestAbstractAsyncPromise.testOneFailure(TestAbstractAsyncPromise.java:178)
at 
org.apache.cassandra.net.AsyncPromiseTest.testFailure(AsyncPromiseTest.java:54)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
at 
org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:49)


Test org.apache.cassandra.net.AsyncPromiseTest FAILED
{noformat}


Also Caleb mentioned he is seeing it in another CI infra too

CC [~maedhroz]

  was:
Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
problem:

[Test org.apache.cassandra.net.AsyncPromiseTest 
FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
CircleCI 
[failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]

Also Caleb mentioned he is seeing it in another CI infra too

CC [~maedhroz]


> Test org.apache.cassandra.net.AsyncPromiseTest FAILED
> -
>
> Key: CASSANDRA-16596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16596
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
>
> Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
> problem:
> [Test org.apache.cassandra.net.AsyncPromiseTest 
> FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
> CircleCI 
> [failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]
> {noformat}
> Testcase: testFailure(org.apache.cassandra.net.AsyncPromiseTest):   FAILED
> 8
> junit.framework.AssertionFailedError: 8
> at 
> org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:53)
> at 
> org.apache.cassandra.net.TestAbstractAsyncPromise.testOneFailure(TestAbstractAsyncPromise.java:178)
> at 
> org.apache.cassandra.net.AsyncPromiseTest.testFailure(AsyncPromiseTest.java:54)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Caused by: java.util.concurrent.TimeoutException
> at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
> at 
> org.apache.cassandra.net.TestAbstractPromise$Async.verify(TestAbstractPromise.java:49)
> Test org.apache.cassandra.net.AsyncPromiseTest FAILED
> {noformat}
> Also Caleb mentioned he is seeing it in another CI infra too
> CC [~maedhroz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325939#comment-17325939
 ] 

David Capwell commented on CASSANDRA-16619:
---

bq. If an SSTable is moved between nodes

What method are you using to "move" SSTables?  Streaming and nodetool import 
are expected to remove this info; can you elaborate?

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16596) Test org.apache.cassandra.net.AsyncPromiseTest FAILED

2021-04-20 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16596:

Description: 
Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
problem:

[Test org.apache.cassandra.net.AsyncPromiseTest 
FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
CircleCI 
[failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]

Also Caleb mentioned he is seeing it in another CI infra too

CC [~maedhroz]

  was:
Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
problem:

[Test org.apache.cassandra.net.AsyncPromiseTest 
FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]

 

CC [~maedhroz]


> Test org.apache.cassandra.net.AsyncPromiseTest FAILED
> -
>
> Key: CASSANDRA-16596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16596
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
>
> Not seen in Jenkins but reported a few times from CircleCI so it seems legit 
> problem:
> [Test org.apache.cassandra.net.AsyncPromiseTest 
> FAILED|https://jenkins-cm4.apache.org/job/Cassandra-trunk/434/testReport/org.apache.cassandra.net/AsyncPromiseTest/]
> CircleCI 
> [failiure|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/743/workflows/35cea1fd-6f38-4f09-9d7f-4673c34a9851/jobs/4104/parallel-runs/17]
> Also Caleb mentioned he is seeing it in another CI infra too
> CC [~maedhroz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16616:

Reviewers: Benjamin Lerer, David Capwell  (was: Benjamin Lerer, Caleb 
Rackliffe, David Capwell)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325933#comment-17325933
 ] 

David Capwell commented on CASSANDRA-16616:
---

+1

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16616:
--
Reviewers: Benjamin Lerer, Caleb Rackliffe, David Capwell  (was: Benjamin 
Lerer, Caleb Rackliffe)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16586) Fix flaky test testAvailabilityV30ToV4 - org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16586:
---
  Fix Version/s: (was: 4.0-rc)
 4.0-rc1
  Since Version: 4.0-rc1
Source Control Link: 
https://github.com/apache/cassandra/commit/27f1bdee5ecf37eda3dde6ea61a439bdda41ea0a
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed into trunk at 27f1bdee5ecf37eda3dde6ea61a439bdda41ea0a

> Fix flaky test testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> --
>
> Key: CASSANDRA-16586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16586
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-rc1
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/881/workflows/8e477260-ac6a-4eab-b4be-cbc048199565/jobs/5269
> testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> {code}
> junit.framework.AssertionFailedError: Unexpected error in case QUORUM-QUORUM 
> with not upgraded coordinator and 1 nodes down
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:127)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$2(MixedModeAvailabilityTestBase.java:79)
>   at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:186)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:81)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:53)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test.testAvailabilityV30ToV4(MixedModeAvailabilityV30Test.java:39)
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 1 responses.
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:209)
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$5(IsolatedExecutor.java:109)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeWithResult(Coordinator.java:69)
>   at 
> org.apache.cassandra.distributed.api.ICoordinator.execute(ICoordinator.java:32)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.lambda$test$1(MixedModeAvailabilityTestBase.java:120)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.maybeFail(MixedModeAvailabilityTestBase.java:139)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:119)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 1 responses.
>   at 
> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:136)
>   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:142)
>   at 
> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>   at 
> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1831)
>   at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1780)
>   at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1718)
>   at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1627)
>   at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1162)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:302)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:115)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:107)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:69)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> 

[jira] [Updated] (CASSANDRA-16586) Fix flaky test testAvailabilityV30ToV4 - org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16586:
---
Status: Ready to Commit  (was: Review In Progress)

> Fix flaky test testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> --
>
> Key: CASSANDRA-16586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16586
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/881/workflows/8e477260-ac6a-4eab-b4be-cbc048199565/jobs/5269
> testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> {code}
> junit.framework.AssertionFailedError: Unexpected error in case QUORUM-QUORUM 
> with not upgraded coordinator and 1 nodes down
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:127)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$2(MixedModeAvailabilityTestBase.java:79)
>   at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:186)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:81)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:53)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test.testAvailabilityV30ToV4(MixedModeAvailabilityV30Test.java:39)
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 1 responses.
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:209)
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$5(IsolatedExecutor.java:109)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeWithResult(Coordinator.java:69)
>   at 
> org.apache.cassandra.distributed.api.ICoordinator.execute(ICoordinator.java:32)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.lambda$test$1(MixedModeAvailabilityTestBase.java:120)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.maybeFail(MixedModeAvailabilityTestBase.java:139)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:119)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 1 responses.
>   at 
> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:136)
>   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:142)
>   at 
> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>   at 
> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1831)
>   at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1780)
>   at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1718)
>   at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1627)
>   at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1162)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:302)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:115)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:107)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:69)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CASSANDRA-16586) Fix flaky test testAvailabilityV30ToV4 - org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test

2021-04-20 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325918#comment-17325918
 ] 

Benjamin Lerer commented on CASSANDRA-16586:


The patch look good to me.

> Fix flaky test testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> --
>
> Key: CASSANDRA-16586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16586
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/881/workflows/8e477260-ac6a-4eab-b4be-cbc048199565/jobs/5269
> testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> {code}
> junit.framework.AssertionFailedError: Unexpected error in case QUORUM-QUORUM 
> with not upgraded coordinator and 1 nodes down
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:127)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$2(MixedModeAvailabilityTestBase.java:79)
>   at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:186)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:81)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:53)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test.testAvailabilityV30ToV4(MixedModeAvailabilityV30Test.java:39)
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 1 responses.
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:209)
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$5(IsolatedExecutor.java:109)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeWithResult(Coordinator.java:69)
>   at 
> org.apache.cassandra.distributed.api.ICoordinator.execute(ICoordinator.java:32)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.lambda$test$1(MixedModeAvailabilityTestBase.java:120)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.maybeFail(MixedModeAvailabilityTestBase.java:139)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:119)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 1 responses.
>   at 
> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:136)
>   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:142)
>   at 
> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>   at 
> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1831)
>   at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1780)
>   at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1718)
>   at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1627)
>   at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1162)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:302)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:115)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:107)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:69)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[cassandra] branch trunk updated (1fbecbc -> 27f1bde)

2021-04-20 Thread blerer
This is an automated email from the ASF dual-hosted git repository.

blerer pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 1fbecbc  Merge branch 'cassandra-3.11' into trunk
 add 27f1bde  Fix MixedModeAvailabilityV30Test.testAvailabilityV30ToV4 
flakiness

No new revisions were added by this update.

Summary of changes:
 .../upgrade/MixedModeAvailabilityTestBase.java| 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16616:

Reviewers: Benjamin Lerer, Caleb Rackliffe  (was: Benjamin Lerer)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16601) Flaky CassandraIndexTest

2021-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325834#comment-17325834
 ] 

Andres de la Peña commented on CASSANDRA-16601:
---

Last changes look good to me, +1

> Flaky CassandraIndexTest
> 
>
> Key: CASSANDRA-16601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16601
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> See failure 
> [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/436/testReport/junit/org.apache.cassandra.index.internal/CassandraIndexTest/indexCorrectlyMarkedAsBuildAndRemoved_cdc/]
> {noformat}
> Error Message
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.index.internal.CassandraIndexTest.indexCorrectlyMarkedAsBuildAndRemoved(CassandraIndexTest.java:588)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16614) Flaky test_pending_range

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16614:
--
Reviewers: Andres de la Peña, Andres de la Peña  (was: Andres de la Peña)
   Andres de la Peña, Andres de la Peña
   Status: Review In Progress  (was: Patch Available)

> Flaky test_pending_range
> 
>
> Key: CASSANDRA-16614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flaky 
> [test_pending_range|https://ci-cassandra.apache.org/job/Cassandra-trunk/445/testReport/junit/dtest-large-novnode.pending_range_test/TestPendingRangeMovements/test_pending_range/]
> {noformat}
> Error Message
> AssertionError: assert None is not None  +  where None =  0x7f29dfa83b80>('127\\.0\\.0\\.1.*?Down.*?Moving', '\nDatacenter: 
> datacenter1\n==\nAddress RackStatus State   Load  
>   Owns   ...   rack1   Up Normal  90.86 KiB   
> 40.00%  5534023222112865484 \n\n\n  ')  + 
>where  = re.search
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325744#comment-17325744
 ] 

Andres de la Peña commented on CASSANDRA-16607:
---

Thanks for the review, I'm running a final CI round with the {{AtomicInteger}} 
{{MockMessagingSpy.messagesIntercepted}}:

CI for 3.11:
* 
[CircleCI|https://app.circleci.com/pipelines/github/adelapena/cassandra/280/workflows/f808ffb1-4c06-4092-b891-212aad5c173b]
* 
[Jenkins|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/686/pipeline]

CI for trunk:
* [CircleCI 
j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/281/workflows/5d55912d-c9fc-44ae-bdd5-06343b2d7ee9]
* [CircleCI 
j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/281/workflows/88f596ba-4308-498d-8835-a50f6b659f0d]
* 
[Jenkins|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/687/pipeline]

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16598) Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer reassigned CASSANDRA-16598:
--

Assignee: Benjamin Lerer

> Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest
> -
>
> Key: CASSANDRA-16598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16598
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/java
>Reporter: David Capwell
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/882/workflows/8e34d09d-5908-495f-baac-5402e0c8e6ee/jobs/5276
> {code}
> junit.framework.AssertionFailedError: expected:<[0]> but was:<[2]>
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithStreamingFromTwoNodes(StreamingMetricsTest.java:88)
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithRepairAndStreamingFromTwoNodes(StreamingMetricsTest.java:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16592) The token function in where clause return incorrect data when using token equal condition and Specified a non-exist token value

2021-04-20 Thread cimon (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325703#comment-17325703
 ] 

cimon commented on CASSANDRA-16592:
---

I will submit it as soon as possible, and the test cases are currently being 
validated. 

Thank you for your attention.

> The token function in where clause return incorrect data when using token 
> equal condition and Specified a non-exist token value
> ---
>
> Key: CASSANDRA-16592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: cimon
>Assignee: cimon
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-rc
>
>
> I get incorrect value when use query like 'select Token(pk1,pk2),pk1,pk2 from 
> ks.table1 where token(pk1,pk2) = tokenValue'. The returned token value 
> mismatch the where condition.
> This problem is reproduced in 3.11.3 and 4.0.
> Here is my schema and select statement
> {code:java}
> // schema
> cqlsh> desc testprefix.cprefix_03 ;CREATE TABLE testprefix.cprefix_03 (
> pk1 int,
> pk2 int,
> ck1 text,
> ck2 text,
> t1 int,
> PRIMARY KEY ((pk1, pk2), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 ASC, ck2 ASC)
> AND additional_write_policy = '99p'
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';
> {code}
> execute cql query
> {code:java}
> // code placeholder
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9223372036854775808 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2) from testprefix.cprefix_03 where pk1 = 394560 
> and pk2 = 3394560 LIMIT 2; 
> system.token(pk1, pk2)
> 
>-9222849988925915479
>-9222849988925915479
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9222849988925915479 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows){code}
> we can find  that token value in the condition  are inconsistent with the 
> values in the result.
> 
> Then review the source code, to seek the anwser. 
> {code:java}
> // code placeholder
> private static void addRange(SSTableReader sstable, 
> AbstractBounds requested, 
> List> boundsList)
> {
> if (requested instanceof Range && ((Range)requested).isWrapAround())
> //  first condition
> {
> if (requested.right.compareTo(sstable.first) >= 0)
> {
> // since we wrap, we must contain the whole sstable prior to 
> stopKey()
> Boundary left = new 
> Boundary(sstable.first, true);
> Boundary right;
> right = requested.rightBoundary();
> right = minRight(right, sstable.last, true);
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> if (requested.left.compareTo(sstable.last) <= 0)
> {
> // since we wrap, we must contain the whole sstable after 
> dataRange.startKey()
> Boundary right = new 
> Boundary(sstable.last, true);
> Boundary left;
> left = requested.leftBoundary();
> left = maxLeft(left, sstable.first, true); // second condition
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> }
> else
> {
> assert requested.left.compareTo(requested.right) <= 0 || 
> requested.right.isMinimum();
> Boundary left, right;
> left = requested.leftBoundary();
> right = requested.rightBoundary();
> left = maxLeft(left, 

[jira] [Assigned] (CASSANDRA-16598) Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest

2021-04-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña reassigned CASSANDRA-16598:
-

Assignee: (was: Andres de la Peña)

> Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest
> -
>
> Key: CASSANDRA-16598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16598
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/java
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/882/workflows/8e34d09d-5908-495f-baac-5402e0c8e6ee/jobs/5276
> {code}
> junit.framework.AssertionFailedError: expected:<[0]> but was:<[2]>
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithStreamingFromTwoNodes(StreamingMetricsTest.java:88)
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithRepairAndStreamingFromTwoNodes(StreamingMetricsTest.java:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16598) Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest

2021-04-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325701#comment-17325701
 ] 

Andres de la Peña commented on CASSANDRA-16598:
---

[~bereng] I haven't being able to repro either locally or in the multiplexer. 
Of the two mentioned failures the one in line 88 is quite puzzling since it's 
just checking that rows haven't made it into a node that has been stopped while 
the others were being written, with hints disabled. I think [~blerer] was also 
looking into this, I'm unassigning myself to let you guys give it a go.

> Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest
> -
>
> Key: CASSANDRA-16598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16598
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/java
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/882/workflows/8e34d09d-5908-495f-baac-5402e0c8e6ee/jobs/5276
> {code}
> junit.framework.AssertionFailedError: expected:<[0]> but was:<[2]>
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithStreamingFromTwoNodes(StreamingMetricsTest.java:88)
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithRepairAndStreamingFromTwoNodes(StreamingMetricsTest.java:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16619:
--
Fix Version/s: 3.11.x
   3.0.x
   4.0

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16619:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
  Component/s: Local/Commit Log
Discovered By: User Report
Reviewers: Jakub Zytka
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16598) Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest

2021-04-20 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325687#comment-17325687
 ] 

Berenguer Blasi commented on CASSANDRA-16598:
-

[~adelapena] did you manage to repro and you're onto sthg or should I give it a 
try?

> Fix flaky test testMetricsWithRepairAndStreamingFromTwoNodes - 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest
> -
>
> Key: CASSANDRA-16598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16598
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/java
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/882/workflows/8e34d09d-5908-495f-baac-5402e0c8e6ee/jobs/5276
> {code}
> junit.framework.AssertionFailedError: expected:<[0]> but was:<[2]>
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithStreamingFromTwoNodes(StreamingMetricsTest.java:88)
>   at 
> org.apache.cassandra.distributed.test.metrics.StreamingMetricsTest.testMetricsWithRepairAndStreamingFromTwoNodes(StreamingMetricsTest.java:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16614) Flaky test_pending_range

2021-04-20 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325608#comment-17325608
 ] 

Berenguer Blasi edited comment on CASSANDRA-16614 at 4/20/21, 10:19 AM:


The window of opportunity for the test to check the node is Down/Moving is too 
small. It may have already gone to Down/Normal by the time we check. So better 
check all nodes saw the node moving in the logs instead.

Logs of failed run 
[here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-large-novnode/148/Cassandra-trunk-dtest-large-novnode/label=cassandra-dtest-large,split=3/].


was (Author: bereng):
The window of opportunity for the test to check the node is Down/Moving is too 
small. It may have already gone to Down/Normal by the time we check. So better 
check all nodes saw the node moving in the logs instead.

> Flaky test_pending_range
> 
>
> Key: CASSANDRA-16614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flaky 
> [test_pending_range|https://ci-cassandra.apache.org/job/Cassandra-trunk/445/testReport/junit/dtest-large-novnode.pending_range_test/TestPendingRangeMovements/test_pending_range/]
> {noformat}
> Error Message
> AssertionError: assert None is not None  +  where None =  0x7f29dfa83b80>('127\\.0\\.0\\.1.*?Down.*?Moving', '\nDatacenter: 
> datacenter1\n==\nAddress RackStatus State   Load  
>   Owns   ...   rack1   Up Normal  90.86 KiB   
> 40.00%  5534023222112865484 \n\n\n  ')  + 
>where  = re.search
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16619:
--
Description: 
SSTable metadata contains commit log positions of the sstable. These positions 
are used to filter out mutations from the commit log on restart and only make 
sense for the node on which the data was flushed.

If an SSTable is moved between nodes they may cover regions that the receiving 
node has not yet flushed, and result in valid data being lost should these 
sections of the commit log need to be replayed.

Solution:
The chosen solution introduces a new sstable metadata (StatsMetadata) - 
originatingHostId (UUID), which is the local host id of the node on which the 
sstable was created, or null if not known. Commit log intervals from an sstable 
are taken into account during Commit Log replay only when the originatingHostId 
of the sstable matches the local node's hostId.

For new sstables the originatingHostId is set according to StorageService's 
local hostId.
For compacted sstables the originatingHostId set according to StorageService's 
local hostId, and only commit log intervals from local sstables is preserved in 
the resulting sstable.

discovered by [~jakubzytka]


  was:
SSTable metadata contains commit log positions of the sstable. These positions 
are used to filter out mutations from the commit log on restart and only make 
sense for the node on which the data was flushed.

If an SSTable is moved between nodes they may cover regions that the receiving 
node has not yet flushed, and result in valid data being lost should these 
sections of the commit log need to be replayed.

Solution:
The chosen solution introduces a new sstable metadata (StatsMetadata) - 
originatingHostId (UUID), which is the local host id of the node on which the 
sstable was created, or null if not known. Commit log intervals from an sstable 
are taken into account during Commit Log replay only when the originatingHostId 
of the sstable matches the local node's hostId.

For new sstables the originatingHostId is set according to StorageService's 
local hostId.
For compacted sstables the originatingHostId set according to StorageService's 
local hostId, and only commit log intervals from local sstables is preserved in 
the resulting sstable.



> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Jacek Lewandowski (Jira)
Jacek Lewandowski created CASSANDRA-16619:
-

 Summary: Loss of commit log data possible after sstable ingest
 Key: CASSANDRA-16619
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
 Project: Cassandra
  Issue Type: Bug
Reporter: Jacek Lewandowski


SSTable metadata contains commit log positions of the sstable. These positions 
are used to filter out mutations from the commit log on restart and only make 
sense for the node on which the data was flushed.

If an SSTable is moved between nodes they may cover regions that the receiving 
node has not yet flushed, and result in valid data being lost should these 
sections of the commit log need to be replayed.

Solution:
The chosen solution introduces a new sstable metadata (StatsMetadata) - 
originatingHostId (UUID), which is the local host id of the node on which the 
sstable was created, or null if not known. Commit log intervals from an sstable 
are taken into account during Commit Log replay only when the originatingHostId 
of the sstable matches the local node's hostId.

For new sstables the originatingHostId is set according to StorageService's 
local hostId.
For compacted sstables the originatingHostId set according to StorageService's 
local hostId, and only commit log intervals from local sstables is preserved in 
the resulting sstable.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski reassigned CASSANDRA-16619:
-

Assignee: Jacek Lewandowski

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16614) Flaky test_pending_range

2021-04-20 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325684#comment-17325684
 ] 

Berenguer Blasi commented on CASSANDRA-16614:
-

Multiplexing locally seems to work ok

{noformat}
pytest --count 30 -rP --cassandra-dir=../16614 
pending_range_test.py::TestPendingRangeMovements::test_pending_range
30 passed in 2353.96 seconds
{noformat}

> Flaky test_pending_range
> 
>
> Key: CASSANDRA-16614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flaky 
> [test_pending_range|https://ci-cassandra.apache.org/job/Cassandra-trunk/445/testReport/junit/dtest-large-novnode.pending_range_test/TestPendingRangeMovements/test_pending_range/]
> {noformat}
> Error Message
> AssertionError: assert None is not None  +  where None =  0x7f29dfa83b80>('127\\.0\\.0\\.1.*?Down.*?Moving', '\nDatacenter: 
> datacenter1\n==\nAddress RackStatus State   Load  
>   Owns   ...   rack1   Up Normal  90.86 KiB   
> 40.00%  5534023222112865484 \n\n\n  ')  + 
>where  = re.search
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16614) Flaky test_pending_range

2021-04-20 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-16614:

Test and Documentation Plan: See comments
 Status: Patch Available  (was: In Progress)

> Flaky test_pending_range
> 
>
> Key: CASSANDRA-16614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flaky 
> [test_pending_range|https://ci-cassandra.apache.org/job/Cassandra-trunk/445/testReport/junit/dtest-large-novnode.pending_range_test/TestPendingRangeMovements/test_pending_range/]
> {noformat}
> Error Message
> AssertionError: assert None is not None  +  where None =  0x7f29dfa83b80>('127\\.0\\.0\\.1.*?Down.*?Moving', '\nDatacenter: 
> datacenter1\n==\nAddress RackStatus State   Load  
>   Owns   ...   rack1   Up Normal  90.86 KiB   
> 40.00%  5534023222112865484 \n\n\n  ')  + 
>where  = re.search
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16592) The token function in where clause return incorrect data when using token equal condition and Specified a non-exist token value

2021-04-20 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325669#comment-17325669
 ] 

Benjamin Lerer commented on CASSANDRA-16592:


[~cimon] If you do not have time for making a patch, I can take over if you 
like.

> The token function in where clause return incorrect data when using token 
> equal condition and Specified a non-exist token value
> ---
>
> Key: CASSANDRA-16592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: cimon
>Assignee: cimon
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-rc
>
>
> I get incorrect value when use query like 'select Token(pk1,pk2),pk1,pk2 from 
> ks.table1 where token(pk1,pk2) = tokenValue'. The returned token value 
> mismatch the where condition.
> This problem is reproduced in 3.11.3 and 4.0.
> Here is my schema and select statement
> {code:java}
> // schema
> cqlsh> desc testprefix.cprefix_03 ;CREATE TABLE testprefix.cprefix_03 (
> pk1 int,
> pk2 int,
> ck1 text,
> ck2 text,
> t1 int,
> PRIMARY KEY ((pk1, pk2), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 ASC, ck2 ASC)
> AND additional_write_policy = '99p'
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';
> {code}
> execute cql query
> {code:java}
> // code placeholder
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9223372036854775808 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2) from testprefix.cprefix_03 where pk1 = 394560 
> and pk2 = 3394560 LIMIT 2; 
> system.token(pk1, pk2)
> 
>-9222849988925915479
>-9222849988925915479
> (2 rows)
> cqlsh> SELECT Token(pk1,pk2), pk1,pk2  from testprefix.cprefix_03 WHERE  
> token(pk1, pk2) =-9222849988925915479 LIMIT 2; 
> system.token(pk1, pk2) | pk1| pk2
> ++-
>-9222849988925915479 | 394560 | 3394560
>-9222849988925915479 | 394560 | 3394560
> (2 rows){code}
> we can find  that token value in the condition  are inconsistent with the 
> values in the result.
> 
> Then review the source code, to seek the anwser. 
> {code:java}
> // code placeholder
> private static void addRange(SSTableReader sstable, 
> AbstractBounds requested, 
> List> boundsList)
> {
> if (requested instanceof Range && ((Range)requested).isWrapAround())
> //  first condition
> {
> if (requested.right.compareTo(sstable.first) >= 0)
> {
> // since we wrap, we must contain the whole sstable prior to 
> stopKey()
> Boundary left = new 
> Boundary(sstable.first, true);
> Boundary right;
> right = requested.rightBoundary();
> right = minRight(right, sstable.last, true);
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> if (requested.left.compareTo(sstable.last) <= 0)
> {
> // since we wrap, we must contain the whole sstable after 
> dataRange.startKey()
> Boundary right = new 
> Boundary(sstable.last, true);
> Boundary left;
> left = requested.leftBoundary();
> left = maxLeft(left, sstable.first, true); // second condition
> if (!isEmpty(left, right))
> boundsList.add(AbstractBounds.bounds(left, right));
> }
> }
> else
> {
> assert requested.left.compareTo(requested.right) <= 0 || 
> requested.right.isMinimum();
> Boundary left, right;
> left = requested.leftBoundary();
> right = requested.rightBoundary();
> left = maxLeft(left, sstable.first, true);
> 

[jira] [Updated] (CASSANDRA-16586) Fix flaky test testAvailabilityV30ToV4 - org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16586:
---
Reviewers: Benjamin Lerer, Ekaterina Dimitrova  (was: Ekaterina Dimitrova)

> Fix flaky test testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> --
>
> Key: CASSANDRA-16586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16586
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-rc
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/881/workflows/8e477260-ac6a-4eab-b4be-cbc048199565/jobs/5269
> testAvailabilityV30ToV4 - 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test
> {code}
> junit.framework.AssertionFailedError: Unexpected error in case QUORUM-QUORUM 
> with not upgraded coordinator and 1 nodes down
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:127)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$2(MixedModeAvailabilityTestBase.java:79)
>   at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:186)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:81)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:53)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityV30Test.testAvailabilityV30ToV4(MixedModeAvailabilityV30Test.java:39)
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 1 responses.
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:209)
>   at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$5(IsolatedExecutor.java:109)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeWithResult(Coordinator.java:69)
>   at 
> org.apache.cassandra.distributed.api.ICoordinator.execute(ICoordinator.java:32)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.lambda$test$1(MixedModeAvailabilityTestBase.java:120)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.maybeFail(MixedModeAvailabilityTestBase.java:139)
>   at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase$Tester.test(MixedModeAvailabilityTestBase.java:119)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 1 responses.
>   at 
> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:136)
>   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:142)
>   at 
> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>   at 
> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1831)
>   at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1780)
>   at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1718)
>   at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1627)
>   at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1162)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:302)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:115)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.executeInternal(Coordinator.java:107)
>   at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:69)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325649#comment-17325649
 ] 

Benjamin Lerer commented on CASSANDRA-16616:


The patch looks good to me +1.

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16607:
---
Status: Ready to Commit  (was: Review In Progress)

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325644#comment-17325644
 ] 

Benjamin Lerer commented on CASSANDRA-16607:


The patches look good to me. I would simply also make 
{{MockMessagingSpy.messagesIntercepted}} an {{AtomicInteger}} for extra safety 
in the case we have some changes in the future that allow reads and writes with 
different threads but that change can be done on commit.

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16616:
---
Reviewers: Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Status: Review In Progress  (was: Patch Available)

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16616:
---
Reviewers: Benjamin Lerer

> Harden internode message resource limit accounting against serialization 
> failures
> -
>
> Key: CASSANDRA-16616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to 
> correctly adjust the resource limits for an OutboundConnection, it affects 
> the other connection types sharing the same OutboundConnections so that any 
> of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to 
> re-initialize all of the connections with a correct limit, the effort to test 
> and maintain the recovery code seems too high for something that should 
> "never happen" (except it did once, which is why it needs hardening).  The 
> safer option is to kill the JVM and have whatever external monitoring is in 
> place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or 
> are unserializable messages takes place after the recovery handling logic. If 
> there are problems with the recovery logic that throw an exception, the 
> message is never logged for future diagnosis. Logging should take place 
> first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the 
> Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16607:
---
Reviewers: Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Status: Review In Progress  (was: Patch Available)

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16607) Fix flaky test testRequestResponse – org.apache.cassandra.net.MockMessagingServiceTest

2021-04-20 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16607:
---
Reviewers: Benjamin Lerer

> Fix flaky test testRequestResponse – 
> org.apache.cassandra.net.MockMessagingServiceTest
> --
>
> Key: CASSANDRA-16607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16607
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 3.11.11, 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/659/tests/
> {code}
> Error
> expected:<1> but was:<0>
> Stacktrace
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at 
> org.apache.cassandra.net.MockMessagingServiceTest.testRequestResponse(MockMessagingServiceTest.java:81)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Standard Output
> INFO  [main] 2021-04-15 08:22:46,838 YamlConfigurationLoader.java:93 - 
> Configuration location: 
> file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,840 YamlConfigurationLoader.java:112 - 
> Loading settings from file:/home/cassandra/cassandra/test/conf/cassandra.yaml
> DEBUG [main] 2021-04-15 08:22:46,899 InternalLoggerFactory.java:63 - Using 
> SLF4J as the default logging framework
> DEBUG [main] 2021-04-15 08:22:46,911 PlatformDependent0.java:417 - 
> -Dio.netty.noUnsaf
> ...[truncated 61235 chars]...
> te NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> DEBUG [main] 2021-04-15 08:22:49,840 StorageService.java:2674 - New node 
> /127.0.0.1:7069 at token a57d4b7f61f49471614b7ac41f16477e
> DEBUG [main] 2021-04-15 08:22:49,848 StorageService.java:2727 - Node 
> /127.0.0.1:7069 state NORMAL, token [a57d4b7f61f49471614b7ac41f16477e]
> INFO  [main] 2021-04-15 08:22:49,848 StorageService.java:2730 - Node 
> /127.0.0.1:7069 state jump to NORMAL
> DEBUG [main] 2021-04-15 08:22:49,849 StorageService.java:1619 - NORMAL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16614) Flaky test_pending_range

2021-04-20 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325608#comment-17325608
 ] 

Berenguer Blasi commented on CASSANDRA-16614:
-

The window of opportunity for the test to check the node is Down/Moving is too 
small. It may have already gone to Down/Normal by the time we check. So better 
check all nodes saw the node moving in the logs instead.

> Flaky test_pending_range
> 
>
> Key: CASSANDRA-16614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flaky 
> [test_pending_range|https://ci-cassandra.apache.org/job/Cassandra-trunk/445/testReport/junit/dtest-large-novnode.pending_range_test/TestPendingRangeMovements/test_pending_range/]
> {noformat}
> Error Message
> AssertionError: assert None is not None  +  where None =  0x7f29dfa83b80>('127\\.0\\.0\\.1.*?Down.*?Moving', '\nDatacenter: 
> datacenter1\n==\nAddress RackStatus State   Load  
>   Owns   ...   rack1   Up Normal  90.86 KiB   
> 40.00%  5534023222112865484 \n\n\n  ')  + 
>where  = re.search
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16618:
--
Source Control Link: https://github.com/apache/cassandra/pull/973

> IntelliJ configuration is broken after recent changes in build.xml
> --
>
> Key: CASSANDRA-16618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IntelliJ configuration is broken after recent changes in build.xml
> In particular, it does not resolve {{build/test/lib/jars}} as library folder
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16618:
--
Test and Documentation Plan: manual testing
 Status: Patch Available  (was: In Progress)

> IntelliJ configuration is broken after recent changes in build.xml
> --
>
> Key: CASSANDRA-16618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IntelliJ configuration is broken after recent changes in build.xml
> In particular, it does not resolve {{build/test/lib/jars}} as library folder
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16618:
--
Description: 
IntelliJ configuration is broken after recent changes in build.xml

In particular, it does not resolve {{build/test/lib/jars}} as library folder

 !screenshot-1.png! 

  was:
IntelliJ configuration is broken after recent changes in build.xml

In particular, it does not resolve {{build/test/lib/jars}} as library folder


> IntelliJ configuration is broken after recent changes in build.xml
> --
>
> Key: CASSANDRA-16618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: screenshot-1.png
>
>
> IntelliJ configuration is broken after recent changes in build.xml
> In particular, it does not resolve {{build/test/lib/jars}} as library folder
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16618:
--
Attachment: screenshot-1.png

> IntelliJ configuration is broken after recent changes in build.xml
> --
>
> Key: CASSANDRA-16618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: screenshot-1.png
>
>
> IntelliJ configuration is broken after recent changes in build.xml
> In particular, it does not resolve {{build/test/lib/jars}} as library folder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16618:
--
 Bug Category: Parent values: Packaging(13660)
   Complexity: Low Hanging Fruit
  Component/s: Build
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> IntelliJ configuration is broken after recent changes in build.xml
> --
>
> Key: CASSANDRA-16618
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> IntelliJ configuration is broken after recent changes in build.xml
> In particular, it does not resolve {{build/test/lib/jars}} as library folder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16618) IntelliJ configuration is broken after recent changes in build.xml

2021-04-20 Thread Jacek Lewandowski (Jira)
Jacek Lewandowski created CASSANDRA-16618:
-

 Summary: IntelliJ configuration is broken after recent changes in 
build.xml
 Key: CASSANDRA-16618
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16618
 Project: Cassandra
  Issue Type: Bug
Reporter: Jacek Lewandowski
Assignee: Jacek Lewandowski


IntelliJ configuration is broken after recent changes in build.xml

In particular, it does not resolve {{build/test/lib/jars}} as library folder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15669) LeveledCompactionStrategy compact last level throw an ArrayIndexOutOfBoundsException

2021-04-20 Thread sunhaihong (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325504#comment-17325504
 ] 

sunhaihong commented on CASSANDRA-15669:


Hi [~azotcsit],Thank you for your reply. 

My problem is the second situation that you described  2. *There is no 
proper handling of a situation when there is more data than supported.*

 

> LeveledCompactionStrategy compact last level throw an 
> ArrayIndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15669
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction/LCS
>Reporter: sunhaihong
>Assignee: Alexey Zotov
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
> Attachments: cfs_compaction_info.png, error_info.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Cassandra will throw an ArrayIndexOutOfBoundsException when compact last 
> level.
> My test is as follows:
>  # Create a table with LeveledCompactionStrategy and its params are 
> 'enabled': 'true', 'fanout_size': '2', 'max_threshold': '32', 
> 'min_threshold': '4', 'sstable_size_in_mb': '2'(fanout_size and 
> sstable_size_in_mb are too small just to make it easier to reproduce the 
> problem);
>  # Insert data into the table by stress;
>  # Cassandra throw an ArrayIndexOutOfBoundsException when compact level9 
> sstables(this level score bigger than 1.001)
> ERROR [CompactionExecutor:4] 2020-03-28 08:59:00,990 CassandraDaemon.java:442 
> - Exception in thread Thread[CompactionExecutor:4,1,main]
>  java.lang.ArrayIndexOutOfBoundsException: 9
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getLevel(LeveledManifest.java:814)
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:746)
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:398)
>  at 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:131)
>  at 
> org.apache.cassandra.db.compaction.CompactionStrategyHolder.lambda$getBackgroundTaskSuppliers$0(CompactionStrategyHolder.java:109)
>  at 
> org.apache.cassandra.db.compaction.AbstractStrategyHolder$TaskSupplier.getTask(AbstractStrategyHolder.java:66)
>  at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:214)
>  at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:289)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>  at java.util.concurrent.FutureTask.run(FutureTask.java)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.lang.Thread.run(Thread.java:748)
> I tested it on cassandra version 3.11.3 & 4.0-alpha3. The exception all 
> happened.
> once it triggers, level1- leveln compaction no longer works, level0 is still 
> valid
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org