In this specific case it's a FunctionService thread. GFSH is executing a
management query function against a region value with circular references
which causes TypedJson to happily recurse forever.

I'll focus on the recursion bug in TypedJson and leave SystemFailure alone
for now.

Thanks,
Kirk

On Wed, Sep 7, 2016 at 10:41 AM, Anthony Baker <aba...@pivotal.io> wrote:

> I’m a little concerned about trying to continue on when a thread has
> died.  What if the thread was important?  Can we prevent the StackOverflow
> in the first place?
>
> Anthony
>
> > On Sep 7, 2016, at 10:00 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> > Here's the current behavior: the stack trace for a StackOverflowError is
> > logged, the thread dies, and then Geode closes its Cache and
> > DistributedSystem. If it's a Server process, then the process exits.
> >
> > The proposal is to still have Geode log the stack trace and the thread
> will
> > die. The change is that we will not close the Cache and DistributedSystem
> > so that the Server process does not exit.
> >
> > The only way we would hit this situation today that I know of involves
> > writing a query that exposes a recursive json parsing bug in the
> TypedJson
> > class. I would alter SystemFailure to not shutdown for a
> StackOverflowError
> > AND also fix the underlying bug in TypedJson which results in a
> > StackOverflowError. The fix for TypedJson may involve removing it in
> favor
> > of using Jackson for json parsing. I'll work on writing tests that expose
> > both of these issues -- in the meantime please let me know if anyone has
> > any feedback or opinions.
> >
> > Thanks,
> > Kirk
> >
> >
> > On Wed, Sep 7, 2016 at 9:44 AM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> >> I'd like to change SystemFailure and calling code to not shutdown for a
> >> java.lang.StackOverflowError.
> >>
> >> The existing behavior would be unchanged for these VirtualMachineErrors:
> >>
> >> java.lang.InternalError
> >> java.lang.OutOfMemoryError
> >> java.lang.UnknownError
> >> java.util.zip.ZipError
> >>
> >> Thoughts or concerns?
> >>
> >> Thanks,
> >> Kirk
> >>
> >>
> >> On Fri, Sep 2, 2016 at 2:55 PM, Kirk Lund <kl...@apache.org> wrote:
> >>
> >>> The Geode codebase currently includes the component SystemFailure which
> >>> is initiated by any instance of VirtualMachineError:
> >>>
> >>>      } catch (VirtualMachineError e) {
> >>>        SystemFailure.initiateFailure(e);
> >>>        throw e;
> >>>
> >>> SystemFailure will ultimately react by closing the DistributedSystem
> and
> >>> Cache (ie, shutdown the server). The original reason was to close the
> Cache
> >>> in the event of an OutOfMemoryError to prevent Cache inconsistency
> from one
> >>> member to another.
> >>>
> >>> There are additional types of VirtualMachineError besides
> >>> OutOfMemoryError. Does it really make sense to initiate SystemFailure
> for
> >>> all other types including StackOverflowError?
> >>>
> >>> GFSH starts all processes with a flag indicating that OutOfMemoryError
> >>> should result in shutdown. It specifies "-XX:OnOutOfMemoryError=
> taskkill
> >>> /F /PID %p" for HotSpot on Windows, "-XX:OnOutOfMemoryError=kill -KILL
> %p"
> >>> for HotSpot on all other platforms, "-Xcheck:memory" on J9
> >>> or "-XXexitOnOutOfMemory" on JRockit.
> >>>
> >>> Given that the above flag should terminate the process on
> >>> OutOfMemoryError, are we now able to delete and remove SystemFailure
> from
> >>> Geode? Opinions?
> >>>
> >>> Thanks,
> >>> Kirk
> >>>
> >>>
> >>
>
>

Reply via email to