On Fri, Apr 8, 2016 at 12:14 AM, Sudheesh Katkam <sudhe...@apache.org>
wrote:

> I agree there is a problem. Nice catch! Is there a ticket for this?
>

Not yet, I wanted to confirm I didn't miss anything obvious first. Will
create a JIRA right away.


> The fragment executor is responsible for sending the final state, and in
> this case its waiting forever, making the query hang. In any scenario where
> a thread other than the fragment executor is failing (or cancelling) a
> fragment, that thread should change the state, *and then *interrupt the
> fragment executor. There are so many ways to get to
> *FragmentExecutor.fail()*, and looks like [1] is the scenario you have
> mentioned, right?
>

Yes, this is the case I saw happening.


> Thank you,
> Sudheesh
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java#L89
>
> On Thu, Apr 7, 2016 at 3:42 PM, Abdel Hakim Deneche <adene...@maprtech.com
> >
> wrote:
>
> > Thanks Sudheesh. So even after we fix DRILL-3714, it's still possible for
> > the root fragment to fail without being cancelled.
> >
> > Take a look at BaseRawBatchBuffer.enqueue() and you will see that, once a
> > fragment is in failed state, this method will release the batch and send
> > back an OK ack to the sender.
> >
> > About your second question. When the UserServer calls
> > FragmentExecutor.fail() it will just set it's status to FAILED without
> > interrupting it. If the fragment thread is blocked in it's receiver, it
> > will never send it's status to the Foreman.
> >
> > On Thu, Apr 7, 2016 at 10:36 PM, Sudheesh Katkam <sudhe...@apache.org>
> > wrote:
> >
> > > I can answer one question myself. See inline.
> > >
> > > As you mentioned elsewhere, this issue will rarely happen (and even
> > harder
> > > to reproduce) once DRILL-3714 is committed.
> > >
> > > On Thu, Apr 7, 2016 at 11:38 AM, Sudheesh Katkam <sudhe...@apache.org>
> > > wrote:
> > >
> > > > Hakim,
> > > >
> > > > Can you point me to where [3] happens?
> > > >
> > > > Two questions:
> > > >
> > > > + Why is the root fragment blocked? If the user channel is closed,
> the
> > > > query is cancelled [1], which should cancel and interrupt all running
> > > > fragments. This interruption happens regardless of fragment failure
> > that
> > > > you have pointed out when user channel is closed [2]. Unless there is
> > > there
> > > > a blocking call when failure is handled through the channel closed
> > > > listener, I don't see why cancellation is not triggered.
> > > >
> > >
> > > It is possible for fragment failure to be fully processed before
> Foreman
> > > cancels all running fragments, in which case the root fragment will not
> > be
> > > interrupted (because it is not cancelled, see
> > > QueryManager#cancelExecutingFragments).
> > >
> > >
> > > > + Why does the Foreman wait forever? AFAIK failures are reported
> > > > immediately to the user. Is the root fragment not reported as FAILED
> to
> > > the
> > > > Foreman?
> > > >
> > > > Thank you,
> > > > Sudheesh
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java#L179
> > > > [2]
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java#L92
> > > >
> > > > On Thu, Apr 7, 2016 at 6:29 AM, John Omernik <j...@omernik.com>
> wrote:
> > > >
> > > >> Abdel -
> > > >>
> > > >> I think I've seen this on a MapR cluster I run, especially on CTAS.
> > For
> > > >> me, I have not brought it up because the cluster I am running on has
> > > some
> > > >> serious personal issues (like being hardware that's near 7 years
> old,
> > > its
> > > >> a
> > > >> test cluster) and given the "hard to reproduce" nature of the
> problem,
> > > >> I've
> > > >> been reluctant to create noise. Given what you've described, it
> seems
> > > very
> > > >> similar to CTAS hangs I've seen, but couldn't accurately reproduce.
> > > >>
> > > >> This didn't add much to your post, but I wanted to give you a +1 for
> > > >> outlining this potential problem.  Once I move to more robust
> > hardware,
> > > >> and
> > > >> I am in similar situations, I will post more verbose details from my
> > > side.
> > > >>
> > > >> John
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Apr 7, 2016 at 2:29 AM, Abdel Hakim Deneche <
> > > >> adene...@maprtech.com>
> > > >> wrote:
> > > >>
> > > >> > So, we've been seeing some queries hang, I've come up with a
> > possible
> > > >> > explanation, but so far it's really difficult to reproduce. Let me
> > > know
> > > >> if
> > > >> > you think this explanation doesn't hold up or if you have any
> ideas
> > > how
> > > >> we
> > > >> > can reproduce it. Thanks
> > > >> >
> > > >> > - generally it's a CTAS running on a large cluster (lot's of
> writers
> > > >> > running in parallel)
> > > >> > - logs show that the user channel was closed and UserServer caused
> > the
> > > >> root
> > > >> > fragment to move to a FAILED state [1]
> > > >> > - jstack shows that the root fragment is blocked in it's receiver
> > > >> waiting
> > > >> > for data [2]
> > > >> > - jstack also shows that ALL other fragments are no longer
> running,
> > > and
> > > >> the
> > > >> > logs show that all of them succeeded [3]
> > > >> > - the foreman waits *forever* for the root fragment to finish
> > > >> >
> > > >> > [1] the only case I can think off is when the user channel closed
> > > while
> > > >> the
> > > >> > fragment was waiting for an ack from the user client
> > > >> > [2] if a writer finishes earlier than the others, it will send a
> > data
> > > >> batch
> > > >> > to the root fragment that will be sent to the user. The root will
> > then
> > > >> > immediately block on it's receiver waiting for the remaining
> writers
> > > to
> > > >> > finish
> > > >> > [3] once the root fragment moves to a failed state, the receiver
> > will
> > > >> > immediately release any received batch and return an OK to the
> > sender
> > > >> > without putting the batch in it's blocking queue.
> > > >> >
> > > >> > Abdelhakim Deneche
> > > >> >
> > > >> > Software Engineer
> > > >> >
> > > >> >   <http://www.mapr.com/>
> > > >> >
> > > >> >
> > > >> > Now Available - Free Hadoop On-Demand Training
> > > >> > <
> > > >> >
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Reply via email to