Hi Chaitanya,
The bug you mentioned is actually fixed in the latest version. The fix
for Jira APEXCORE-130
<https://issues.apache.org/jira/browse/APEXCORE-130> handles
this issue as well.
Please try once with the latest changes from master.
This is the commit id with fix: 139a9cac6397948bb63a53ea80188f2ffd6e5da2
Thanks!
Isha
On Thu, Mar 3, 2016 at 5:26 AM, Chaitanya Chebolu <[email protected]
> wrote:
> Thanks Isha for analyzing the issue.
>
> I am adding your analysis to the JIRA.
>
> I observed one more issue in THREAD_LOCAL.
>
> Let's the DAG be as follows:
> A -> B -> C
>
> Where A, B, C are operators, B and C are the operators which are them
> THREAD_LOCAL.
>
>
> If the downstream operator (i.e Operator C) throws exception from the main
> thread, then application master caught exception and killed the container.
> New container allocated for B and C operators. B is re-deployed into the
> newly allocated container and the status is ACTIVE, but, C is not
> re-deploying.
>
> After re-deployment of Operator B, DAG be as follows:
> A -> B.
>
> I looked into Stram Logs, observed the following message:
> "INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
> [PTOperator[id=2,name=B]]".
>
> I think this is the issue. Here, Operator C is not there in affected
> operators.
>
> I created an application for this issue. Sample Application is here
> <https://github.com/chaithu14/AppThreadLocal/tree/theadBranch>.
>
> @Isha: Have you observed the same behavior?
>
> I am creating a JIRA for this issue.
>
> Regards,
> Chaitanya
>
> On Wed, Mar 2, 2016 at 9:34 AM, Sandeep Deshmukh <[email protected]>
> wrote:
>
> > Great finding Isha.
> >
> > In general, it is always advisable to do things in main thread. We had
> some
> > timing issues in dtIngest as we were emitting tuples in the Reconciler
> > thread. Once we moved all emit statements to the main thread, there were
> no
> > issues observed.
> >
> > Issue: When tuples are emitted in Reconciler thread, some of them were
> > emitted post endWindow but before the checkpointing is done. These tuples
> > for the downstream operator are not guaranteed to reach the same window.
> > Thus checkpointing of the two operators is not in sync and that could
> > result in few tuples replayed wrongly from the Reconciler based operator.
> >
> > Regards,
> > Sandeep
> >
> > On Wed, Mar 2, 2016 at 8:57 AM, Isha Arkatkar <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I checked the application
> https://github.com/chaithu14/AppThreadLocal
> > >
> > > In this example, exception from downstream operator is thrown in a
> > > different thread in AbstractReconciler operator. And the rethrow to
> main
> > > operator thread is done in handleIdleTime. This function is not
> > guaranteed
> > > to be invoked in every window. In Thread_local locality I checked that
> > > handleIdleTime did not get invoked. So, the exception did not get
> > rethrown.
> > >
> > > The exception thrown from a different thread other than the main
> > operator
> > > thread are not caught by Application Master. Something we can probably
> > add
> > > to troubleshooting guide to add a rethrow in the main thread.
> > >
> > > I verified that if downstream operator throws exception in the main
> > > thread, it is caught appropriately by application master even in thread
> > > local case.
> > >
> > > Thanks,
> > > Isha
> > >
> > > On Thu, Feb 25, 2016 at 9:57 PM, Chaitanya Chebolu <
> > > [email protected]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Created Sample application for THREAD_LOCAL issue. Application is
> > here
> > > > <https://github.com/chaithu14/AppThreadLocal>.
> > > > Application has the following DAG:
> > > >
> > > > RandomEventGenerator -> OuputOperator.
> > > >
> > > > Both the operators are THREAD_LOCAL.
> > > >
> > > > In OutputOperator, throwing exceptions at every committed window.
> So,
> > > > AppMaster supposed to kill container at every committed window. This
> is
> > > > expected behavior.
> > > > But, this is not happening with the current Apex.
> > > >
> > > > One more observation is, If the upstream operator throws exception
> at
> > > > every committed window, then AppMaster is killing the container
> > > > continuously. But, this is not happening with the downstream
> operator.
> > > >
> > > > Created JIRA for this issue: APEXCORE-357
> > > >
> > > > Regards,
> > > > Chaitanya
> > > >
> > > > On Thu, Feb 25, 2016 at 12:36 PM, Chaitanya Chebolu <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi ,
> > > > >
> > > > > I am facing issues in Thread_Local. Two operators which are
> thread
> > > > local
> > > > > and out of which, the downstream operator throws exceptions. But,
> > > > AppMaster
> > > > > is not catching those exceptions. I was unable to figure out why
> > > > > application is not working.
> > > > > If both the operators are deployed on different containers, then
> > the
> > > > > container is killed continuously by AppMaster. This is expected
> > > behavior.
> > > > >
> > > > > For Example, Let's say the dag be op1 -> op2 where op1, op2 are
> > two
> > > > > operators which are of them thread local. Throws an exception from
> > the
> > > > > downstream operator op2, AppMaster is not catching exceptions. I
> will
> > > > > create a JIRA for this issue. Please some one help on this.
> > > > >
> > > > > Regards,
> > > > > Chaitanya
> > > > >
> > > >
> > >
> >
>