[ 
https://issues.apache.org/jira/browse/CASSANDRA-19483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886514#comment-17886514
 ] 

David Capwell commented on CASSANDRA-19483:
-------------------------------------------

been speaking with [~smiklosovic] in slack, trying to dump that conversation 
here.

The main patch is 
https://github.com/instaclustr/cassandra/commit/fe0de74de0cfa289a09cfb2d16dc622f212a5267#diff-1807ebb9d586d616b71765b94be9b6f981a018f43368416a9e294dfae1ecf402R84,
 which is a simplification of the PR linked.

The change would now log the following each time the protocol magic is found

{code}
noSpam1m.warn("Failed to properly handshake with peer " + 
channel.remoteAddress() + ". Closing the channel. Invalid legacy protocol 
magic. {}",
                        cause.getMessage());
{code}

this would be a regression to the internode_error_reporting_exclusions config 
as it would log regardless of what that config says.

I am honestly fine with a new config that lets you opt into this behavior, 
something like the following (100% open to names)

{code}
private void exceptionCaught(Channel channel, Throwable cause)
        {
            final SocketAddress remoteAddress = channel.remoteAddress();
            boolean reportingExclusion = 
DatabaseDescriptor.getInternodeErrorReportingExclusions().contains(remoteAddress);

            if (reportingExclusion)
                logger.debug("Excluding internode exception for {}; address 
contained in internode_error_reporting_exclusions", remoteAddress, cause);
            else if (cause instanceof Message.InvalidLegacyProtocolMagic && 
DatabaseDescriptor.getInvalidLegacyProtocolMagicNoSpamEnabled())
                noSpam1m.warn("Failed to properly handshake with peer " + 
channel.remoteAddress() + ". Closing the channel. Invalid legacy protocol 
magic. {}",
                        cause.getMessage());
            else
                logger.error("Failed to properly handshake with peer {}. 
Closing the channel.", remoteAddress, cause);
{code}

with the above we don't change the behavior unless you opt in, and it plays 
nicely with the existing internode_error_reporting_exclusions logic.

The patch doesn't offer any tests so it was hard to validate that this change 
is enough

> Catch InvalidLegacyProtocolMagic exceptions
> -------------------------------------------
>
>                 Key: CASSANDRA-19483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Observability/Logging
>            Reporter: Brad Schoening
>            Assignee: Ilya Zakharov
>            Priority: Low
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Similar to CASSANDRA-18839, we frequently see the exception 
> InvalidLegacyProtocolMagic with full stack traces flooding our logs. These 
> are due to Qualys vulnerability scans.
> It seem to be a simple check in Message.java and would be better handled by:
> a) returning a boolean from validateLegacyProtocolMagic() -> 
> hasValidLegacyProtocolMagic() instead of creating a custom exception class or
> b) adding a catch block in HandshakeProtocol.java and return null as is done 
> for messagingVersion
> {code:java}
>  static Initiate maybeDecode(ByteBuf buf) throws IOException
>         {
>             ...
>             try (DataInputBuffer in = new DataInputBuffer(nio, false))
>             {
>                 validateLegacyProtocolMagic(in.readInt());   // throws 
> exception
>                 int flags = in.readInt();
>                 // legacy pre40 messagingVersion flag
>                 if (getBits(flags, 8, 8) < VERSION_40)
>                     return null;
>                 int minMessagingVersion = getBits(flags, 16, 8);
>                 int maxMessagingVersion = getBits(flags, 24, 8);
>                 // 5.0+ does not support pre40
>                 if (maxMessagingVersion < MessagingService.VERSION_40)
>                     return null;
>                 ....
>             }
>             catch (EOFException e)
>             {
>                 return null;
>             }
>         }
> {code}
> {code:java}
> static void validateLegacyProtocolMagic(int magic) throws 
> InvalidLegacyProtocolMagic
> {
>     if (magic != PROTOCOL_MAGIC)
>             throw new InvalidLegacyProtocolMagic(magic);
> }
> {code}
> {code:java}
> {{2024-03-20 03:47:27,380 [ERROR] [Messaging-EventLoop-3-2] cluster_id=9 
> ip_address=10.0.0.1 InboundConnectionInitiator.java:360 - Failed to properly 
> handshake with peer /10.0.2:33356. Closing the channel.}}
> {{io.netty.handler.codec.DecoderException: 
> org.apache.cassandra.net.Message$InvalidLegacyProtocolMagic: Read 1431520594, 
> Expected -900387334}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:404)}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:371)}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354)}}
> {{    at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)}}
> {{    at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)}}
> {{    at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)}}
> {{    at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)}}
> {{    at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)}}
> {{    at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)}}
> {{    at 
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)}}
> {{    at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:819)}}
> {{    at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)}}
> {{    at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)}}
> {{    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)}}
> {{    at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)}}
> {{    at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)}}
> {{    at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{Caused by: org.apache.cassandra.net.Message$InvalidLegacyProtocolMagic: 
> Read 1431520594, Expected -900387334}}
> {{    at 
> org.apache.cassandra.net.Message.validateLegacyProtocolMagic(Message.java:343)}}
> {{    at 
> org.apache.cassandra.net.HandshakeProtocol$Initiate.maybeDecode(HandshakeProtocol.java:167)}}
> {{    at 
> org.apache.cassandra.net.InboundConnectionInitiator$Handler.initiate(InboundConnectionInitiator.java:242)}}
> {{    at 
> org.apache.cassandra.net.InboundConnectionInitiator$Handler.decode(InboundConnectionInitiator.java:235)}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)}}
> {{    at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)}}
> {{    ... 18 common frames omitted}}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to