On Fri, Apr 8, 2016 at 12:14 AM, Sudheesh Katkam <[email protected]> wrote:
> I agree there is a problem. Nice catch! Is there a ticket for this? > Not yet, I wanted to confirm I didn't miss anything obvious first. Will create a JIRA right away. > The fragment executor is responsible for sending the final state, and in > this case its waiting forever, making the query hang. In any scenario where > a thread other than the fragment executor is failing (or cancelling) a > fragment, that thread should change the state, *and then *interrupt the > fragment executor. There are so many ways to get to > *FragmentExecutor.fail()*, and looks like [1] is the scenario you have > mentioned, right? > Yes, this is the case I saw happening. > Thank you, > Sudheesh > > [1] > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java#L89 > > On Thu, Apr 7, 2016 at 3:42 PM, Abdel Hakim Deneche <[email protected] > > > wrote: > > > Thanks Sudheesh. So even after we fix DRILL-3714, it's still possible for > > the root fragment to fail without being cancelled. > > > > Take a look at BaseRawBatchBuffer.enqueue() and you will see that, once a > > fragment is in failed state, this method will release the batch and send > > back an OK ack to the sender. > > > > About your second question. When the UserServer calls > > FragmentExecutor.fail() it will just set it's status to FAILED without > > interrupting it. If the fragment thread is blocked in it's receiver, it > > will never send it's status to the Foreman. > > > > On Thu, Apr 7, 2016 at 10:36 PM, Sudheesh Katkam <[email protected]> > > wrote: > > > > > I can answer one question myself. See inline. > > > > > > As you mentioned elsewhere, this issue will rarely happen (and even > > harder > > > to reproduce) once DRILL-3714 is committed. > > > > > > On Thu, Apr 7, 2016 at 11:38 AM, Sudheesh Katkam <[email protected]> > > > wrote: > > > > > > > Hakim, > > > > > > > > Can you point me to where [3] happens? > > > > > > > > Two questions: > > > > > > > > + Why is the root fragment blocked? If the user channel is closed, > the > > > > query is cancelled [1], which should cancel and interrupt all running > > > > fragments. This interruption happens regardless of fragment failure > > that > > > > you have pointed out when user channel is closed [2]. Unless there is > > > there > > > > a blocking call when failure is handled through the channel closed > > > > listener, I don't see why cancellation is not triggered. > > > > > > > > > > It is possible for fragment failure to be fully processed before > Foreman > > > cancels all running fragments, in which case the root fragment will not > > be > > > interrupted (because it is not cancelled, see > > > QueryManager#cancelExecutingFragments). > > > > > > > > > > + Why does the Foreman wait forever? AFAIK failures are reported > > > > immediately to the user. Is the root fragment not reported as FAILED > to > > > the > > > > Foreman? > > > > > > > > Thank you, > > > > Sudheesh > > > > > > > > [1] > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java#L179 > > > > [2] > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java#L92 > > > > > > > > On Thu, Apr 7, 2016 at 6:29 AM, John Omernik <[email protected]> > wrote: > > > > > > > >> Abdel - > > > >> > > > >> I think I've seen this on a MapR cluster I run, especially on CTAS. > > For > > > >> me, I have not brought it up because the cluster I am running on has > > > some > > > >> serious personal issues (like being hardware that's near 7 years > old, > > > its > > > >> a > > > >> test cluster) and given the "hard to reproduce" nature of the > problem, > > > >> I've > > > >> been reluctant to create noise. Given what you've described, it > seems > > > very > > > >> similar to CTAS hangs I've seen, but couldn't accurately reproduce. > > > >> > > > >> This didn't add much to your post, but I wanted to give you a +1 for > > > >> outlining this potential problem. Once I move to more robust > > hardware, > > > >> and > > > >> I am in similar situations, I will post more verbose details from my > > > side. > > > >> > > > >> John > > > >> > > > >> > > > >> > > > >> On Thu, Apr 7, 2016 at 2:29 AM, Abdel Hakim Deneche < > > > >> [email protected]> > > > >> wrote: > > > >> > > > >> > So, we've been seeing some queries hang, I've come up with a > > possible > > > >> > explanation, but so far it's really difficult to reproduce. Let me > > > know > > > >> if > > > >> > you think this explanation doesn't hold up or if you have any > ideas > > > how > > > >> we > > > >> > can reproduce it. Thanks > > > >> > > > > >> > - generally it's a CTAS running on a large cluster (lot's of > writers > > > >> > running in parallel) > > > >> > - logs show that the user channel was closed and UserServer caused > > the > > > >> root > > > >> > fragment to move to a FAILED state [1] > > > >> > - jstack shows that the root fragment is blocked in it's receiver > > > >> waiting > > > >> > for data [2] > > > >> > - jstack also shows that ALL other fragments are no longer > running, > > > and > > > >> the > > > >> > logs show that all of them succeeded [3] > > > >> > - the foreman waits *forever* for the root fragment to finish > > > >> > > > > >> > [1] the only case I can think off is when the user channel closed > > > while > > > >> the > > > >> > fragment was waiting for an ack from the user client > > > >> > [2] if a writer finishes earlier than the others, it will send a > > data > > > >> batch > > > >> > to the root fragment that will be sent to the user. The root will > > then > > > >> > immediately block on it's receiver waiting for the remaining > writers > > > to > > > >> > finish > > > >> > [3] once the root fragment moves to a failed state, the receiver > > will > > > >> > immediately release any received batch and return an OK to the > > sender > > > >> > without putting the batch in it's blocking queue. > > > >> > > > > >> > Abdelhakim Deneche > > > >> > > > > >> > Software Engineer > > > >> > > > > >> > <http://www.mapr.com/> > > > >> > > > > >> > > > > >> > Now Available - Free Hadoop On-Demand Training > > > >> > < > > > >> > > > > >> > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
