I’m a little concerned about trying to continue on when a thread has died. What if the thread was important? Can we prevent the StackOverflow in the first place?
Anthony > On Sep 7, 2016, at 10:00 AM, Kirk Lund <kl...@pivotal.io> wrote: > > Here's the current behavior: the stack trace for a StackOverflowError is > logged, the thread dies, and then Geode closes its Cache and > DistributedSystem. If it's a Server process, then the process exits. > > The proposal is to still have Geode log the stack trace and the thread will > die. The change is that we will not close the Cache and DistributedSystem > so that the Server process does not exit. > > The only way we would hit this situation today that I know of involves > writing a query that exposes a recursive json parsing bug in the TypedJson > class. I would alter SystemFailure to not shutdown for a StackOverflowError > AND also fix the underlying bug in TypedJson which results in a > StackOverflowError. The fix for TypedJson may involve removing it in favor > of using Jackson for json parsing. I'll work on writing tests that expose > both of these issues -- in the meantime please let me know if anyone has > any feedback or opinions. > > Thanks, > Kirk > > > On Wed, Sep 7, 2016 at 9:44 AM, Kirk Lund <kl...@pivotal.io> wrote: > >> I'd like to change SystemFailure and calling code to not shutdown for a >> java.lang.StackOverflowError. >> >> The existing behavior would be unchanged for these VirtualMachineErrors: >> >> java.lang.InternalError >> java.lang.OutOfMemoryError >> java.lang.UnknownError >> java.util.zip.ZipError >> >> Thoughts or concerns? >> >> Thanks, >> Kirk >> >> >> On Fri, Sep 2, 2016 at 2:55 PM, Kirk Lund <kl...@apache.org> wrote: >> >>> The Geode codebase currently includes the component SystemFailure which >>> is initiated by any instance of VirtualMachineError: >>> >>> } catch (VirtualMachineError e) { >>> SystemFailure.initiateFailure(e); >>> throw e; >>> >>> SystemFailure will ultimately react by closing the DistributedSystem and >>> Cache (ie, shutdown the server). The original reason was to close the Cache >>> in the event of an OutOfMemoryError to prevent Cache inconsistency from one >>> member to another. >>> >>> There are additional types of VirtualMachineError besides >>> OutOfMemoryError. Does it really make sense to initiate SystemFailure for >>> all other types including StackOverflowError? >>> >>> GFSH starts all processes with a flag indicating that OutOfMemoryError >>> should result in shutdown. It specifies "-XX:OnOutOfMemoryError=taskkill >>> /F /PID %p" for HotSpot on Windows, "-XX:OnOutOfMemoryError=kill -KILL %p" >>> for HotSpot on all other platforms, "-Xcheck:memory" on J9 >>> or "-XXexitOnOutOfMemory" on JRockit. >>> >>> Given that the above flag should terminate the process on >>> OutOfMemoryError, are we now able to delete and remove SystemFailure from >>> Geode? Opinions? >>> >>> Thanks, >>> Kirk >>> >>> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail