> On May 28, 2015, 7:10 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, lines 132-135
> > <https://reviews.apache.org/r/34734/diff/1/?file=973063#file973063line132>
> >
> > So, I guess the race condition is the following. The expiration thread
> > moves a TimerTaskEntry from one TimerTaskList to another during reinsert.
> > At the same time, another thread can complete an operation and try to
> > remove the entry from the list. Is that right?
> >
> > With the patch, it seems when TimerTask.cancel tries to re move an
> > entry from the list, the following can happen (1) line 133 tests that list
> > in entry is not null, (2) a reinsert process happens and the entry is
> > removed from list which sets list in the entry to null, (3) list.remove in
> > 134 is called and the entry is not removed since list is now null, (4) line
> > 133 is tested again and since list is now null, we quit the loop, (5) the
> > reinsert process adds the entry to a new list.
> >
> > At this point, a completed entry still exists in the list.
>
> Yasuhiro Matsuda wrote:
> You are right. It should be rare, but a completed entry can remain in the
> list until expiration. The completion flag in DelayedOperation prevents
> excess executions. So, it is not too bad.
>
> Jun Rao wrote:
> Thanks for the clarification. Thinking about this a bit more, could we
> hit a NullPointerException in step (3)? At that point, list could be null
> when we call remove.
>
> Onur Karaman wrote:
> Yeah that check-then-act should probably be done atomically. Maybe all
> changes/check-then-acts to TimerTaskEntry just need to be guarded by the
> TimerTaskEntry's intrinsic lock?
>
> Yasuhiro Matsuda wrote:
> Oops. The value of TimerTaskEntry.list should be save to a local variable.
>
> Jun Rao wrote:
> Ok, thanks. Perhaps we can also add some comments explaining why we need
> the while loop and the rare possibility of not removing a completed operation
> from the Timer.
>
> Guozhang Wang wrote:
> Could it happen that concurrent threads are calling
> TimerTaskList.add(entry) and TimerTaskList.remove(entry) on different lists
> for the same entry? Since we do not synchronize on the entry object this
> could cause race condition on next/prev/list reference manipulation.
I will add a flag to TimerTaskEntry for cancelation to so that a canceled task
won't be reinserted. This should reduce a chance of leaving completed task in a
list.
Also I will add sync on TimerTaskEntry to TimerTaskList.{add|remove}. This
makes one operation sync on both TimerTaskList and TimerTaskEntry. We have to
be careful in ordering them, otherwise it may cause a deadlock. I hope I did it
right. Please review my next patch carefully.
- Yasuhiro
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85596
-----------------------------------------------------------
On May 27, 2015, 9 p.m., Yasuhiro Matsuda wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34734/
> -----------------------------------------------------------
>
> (Updated May 27, 2015, 9 p.m.)
>
>
> Review request for kafka.
>
>
> Bugs: KAFKA-2226
> https://issues.apache.org/jira/browse/KAFKA-2226
>
>
> Repository: kafka
>
>
> Description
> -------
>
> fix a race condition in TimerTaskEntry.remove
>
>
> Diffs
> -----
>
> core/src/main/scala/kafka/utils/timer/TimerTaskList.scala
> e7a96570ddc2367583d6d5590628db7e7f6ba39b
>
> Diff: https://reviews.apache.org/r/34734/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Yasuhiro Matsuda
>
>