Github user nathanmarz commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87778672
  
    Looking inside task->node+port needs to be done inside the read lock (as it 
was done before this TaskDrainer class got inserted here). If that's done, 
there's no possible way for the sender to get its hands on a connection that's 
closed or going to be closed while it's using it.
    
    To elaborate on that last point: once the write lock code is finished, all 
entries in task->node+port are up to date and accurate, and no connections for 
those node+ports will be closed. This means it's impossible for the code in 
read-locked to get a to-be-closed connection since it gets everything by 
looking at task->node+port first.
    
    The write-locked code was carefully written this way to minimize the amount 
of work that actually happens in the write lock. This is important because when 
that code is active, tasks are prevented from sending messages. The way it is 
now, it's just setting the value of a single in-memory atom – so basically 
no work at all. If it were making/closing connections, it would be a completely 
different story.
    
    In short, the code in the write-lock is fine – it's the code in the read 
lock that needs to be fixed. As part of that, the code looking up the node+port 
for a task needs to be moved back to this function and not happen before the 
tuple goes on the transfer queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to