[
https://issues.apache.org/jira/browse/GEODE-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440655#comment-17440655
]
Dan Smith commented on GEODE-9050:
----------------------------------
I tracked this down in 1.14 so we can upgrade netty there. This bug exists in
geode 1.14 but not in the latest geode 1.15 develop. In 1.14, we are changing
the event loop group for a netty channel while threads maybe writing to the
channel in ExecutionHandlerContext.changeChannelEventLoopGroup. This leads to
the below assertion failure with netty 4.1.68 and above. It is unknown what
sort or problems this might cause with the earlier versions of netty without
the assertion:
This exception occurs when running
PubSubIntegrationTest.ensureOrderingOfPublishedMessages after upgrading to
netty 4.1.68 on support/1.14.
{noformat}
[warn 2021/10/27 22:34:47.657 GMT <GeodeRedisServer-Command-105> tid=0x3d4]
Failed to execute publish function java.lang.AssertionError
org.apache.geode.cache.execute.FunctionException: java.lang.AssertionError
at
org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.setException(LocalResultCollectorImpl.java:205)
at
org.apache.geode.internal.cache.execute.MemberFunctionResultSender.setException(MemberFunctionResultSender.java:233)
at
org.apache.geode.internal.cache.execute.AbstractExecution.handleException(AbstractExecution.java:504)
at
org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:353)
at
org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionOnLocalNode(AbstractExecution.java:307)
at
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:136)
at
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
at
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:376)
at
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:359)
at
org.apache.geode.redis.internal.pubsub.PubSubImpl.publish(PubSubImpl.java:76)
at
org.apache.geode.redis.internal.executor.pubsub.PublishExecutor.executeCommand(PublishExecutor.java:35)
at
org.apache.geode.redis.internal.RedisCommandType.executeCommand(RedisCommandType.java:335)
at
org.apache.geode.redis.internal.netty.Command.execute(Command.java:188)
at
org.apache.geode.redis.internal.netty.ExecutionHandlerContext.executeCommand(ExecutionHandlerContext.java:315)
at
org.apache.geode.redis.internal.netty.ExecutionHandlerContext.processCommandQueue(ExecutionHandlerContext.java:150)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError
at
io.netty.handler.timeout.WriteTimeoutHandler.addWriteTimeoutTask(WriteTimeoutHandler.java:144)
at
io.netty.handler.timeout.WriteTimeoutHandler.scheduleTimeout(WriteTimeoutHandler.java:136)
at
io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:110)
at
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
at
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
... 1 more {noformat}
Here is the full sequence of events with geode 1.14.
1. A subscription is created and marked ready to publish
2. In another thread A publish message comes in, starts writing to the channel
of the subscriber
3. Netty uses the executor for the channel to perform the write (executor A)
4. The subcription thread changes the exector of the channel in
changeChannelEventLoopGroup
5. The write eventually hits this assertion that the executor of the write
matches the current executor of the channel. But because we changed the
executor it no longer matches.
Since this is a hard to hit race condition and redis is experimental in 1.14 we
are going to just change the test in 1.14 to not hit this issue and recommend
users use 1.15 anyway.
> Redis test fails with Netty 4.1.60 and later
> --------------------------------------------
>
> Key: GEODE-9050
> URL: https://issues.apache.org/jira/browse/GEODE-9050
> Project: Geode
> Issue Type: Bug
> Components: redis
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Owen Nichols
> Assignee: Jens Deppe
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> {{PubSubIntegrationTest > ensureOrderingOfPublishedMessages}}
> [fails|http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-6153/test-results/integrationTest/1616031328/index.html]
> reliably, on both Linux and Windows, if I [bump
> Netty|https://github.com/apache/geode/pull/6153/commits/03b81f93b011377a5021a4b87acecacfa02b93a4]
> from 4.1.59.Final to 4.1.60.Final. It's important to keep up to date with
> latest versions of our 3rd-party dependencies but breaking this out
> separately so someone with redis knowledge can tackle it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)