Github user nathanmarz commented on the pull request:
https://github.com/apache/storm/pull/349#issuecomment-87778672
Looking inside task->node+port needs to be done inside the read lock (as it
was done before this TaskDrainer class got inserted here). If that's done,
there's no possible way for the sender to get its hands on a connection that's
closed or going to be closed while it's using it.
To elaborate on that last point: once the write lock code is finished, all
entries in task->node+port are up to date and accurate, and no connections for
those node+ports will be closed. This means it's impossible for the code in
read-locked to get a to-be-closed connection since it gets everything by
looking at task->node+port first.
The write-locked code was carefully written this way to minimize the amount
of work that actually happens in the write lock. This is important because when
that code is active, tasks are prevented from sending messages. The way it is
now, it's just setting the value of a single in-memory atom â so basically
no work at all. If it were making/closing connections, it would be a completely
different story.
In short, the code in the write-lock is fine â it's the code in the read
lock that needs to be fixed. As part of that, the code looking up the node+port
for a task needs to be moved back to this function and not happen before the
tuple goes on the transfer queue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---