[ 
https://issues.apache.org/jira/browse/GEODE-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440655#comment-17440655
 ] 

Dan Smith commented on GEODE-9050:
----------------------------------

 I tracked this down in 1.14 so we can upgrade netty there. This bug exists in 
geode 1.14 but not in the latest geode 1.15 develop. In 1.14, we are changing 
the event loop group for a netty channel while threads maybe writing to the 
channel in ExecutionHandlerContext.changeChannelEventLoopGroup. This leads to 
the below assertion failure with netty 4.1.68 and above. It is unknown what 
sort or problems this might cause with the earlier versions of netty without 
the assertion:

This exception occurs when running 
PubSubIntegrationTest.ensureOrderingOfPublishedMessages after upgrading to 
netty 4.1.68 on support/1.14.


{noformat}
[warn 2021/10/27 22:34:47.657 GMT  <GeodeRedisServer-Command-105> tid=0x3d4] 
Failed to execute publish function java.lang.AssertionError
org.apache.geode.cache.execute.FunctionException: java.lang.AssertionError
        at 
org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.setException(LocalResultCollectorImpl.java:205)
        at 
org.apache.geode.internal.cache.execute.MemberFunctionResultSender.setException(MemberFunctionResultSender.java:233)
        at 
org.apache.geode.internal.cache.execute.AbstractExecution.handleException(AbstractExecution.java:504)
        at 
org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:353)
        at 
org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionOnLocalNode(AbstractExecution.java:307)
        at 
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:136)
        at 
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
        at 
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:376)
        at 
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:359)
        at 
org.apache.geode.redis.internal.pubsub.PubSubImpl.publish(PubSubImpl.java:76)
        at 
org.apache.geode.redis.internal.executor.pubsub.PublishExecutor.executeCommand(PublishExecutor.java:35)
        at 
org.apache.geode.redis.internal.RedisCommandType.executeCommand(RedisCommandType.java:335)
        at 
org.apache.geode.redis.internal.netty.Command.execute(Command.java:188)
        at 
org.apache.geode.redis.internal.netty.ExecutionHandlerContext.executeCommand(ExecutionHandlerContext.java:315)
        at 
org.apache.geode.redis.internal.netty.ExecutionHandlerContext.processCommandQueue(ExecutionHandlerContext.java:150)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError
        at 
io.netty.handler.timeout.WriteTimeoutHandler.addWriteTimeoutTask(WriteTimeoutHandler.java:144)
        at 
io.netty.handler.timeout.WriteTimeoutHandler.scheduleTimeout(WriteTimeoutHandler.java:136)
        at 
io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:110)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
        at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        ... 1 more {noformat}

Here is the full sequence of events with geode 1.14. 



1. A subscription is created and marked ready to publish
2. In another thread A publish message comes in, starts writing to the channel 
of the subscriber
3. Netty uses the executor for the channel to perform the write (executor A)
4. The subcription thread changes the exector of the channel in 
changeChannelEventLoopGroup
5. The write eventually hits this assertion that the executor of the write 
matches the current executor of the channel. But because we changed the 
executor it no longer matches.

Since this is a hard to hit race condition and redis is experimental in 1.14 we 
are going to just change the test in 1.14 to not hit this issue and recommend 
users use 1.15 anyway.

> Redis test fails with Netty 4.1.60 and later
> --------------------------------------------
>
>                 Key: GEODE-9050
>                 URL: https://issues.apache.org/jira/browse/GEODE-9050
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: Owen Nichols
>            Assignee: Jens Deppe
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> {{PubSubIntegrationTest > ensureOrderingOfPublishedMessages}} 
> [fails|http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-6153/test-results/integrationTest/1616031328/index.html]
>  reliably, on both Linux and Windows, if I [bump 
> Netty|https://github.com/apache/geode/pull/6153/commits/03b81f93b011377a5021a4b87acecacfa02b93a4]
>  from 4.1.59.Final to 4.1.60.Final.  It's important to keep up to date with 
> latest versions of our 3rd-party dependencies but breaking this out 
> separately so someone with redis knowledge can tackle it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to