Here's the current behavior: the stack trace for a StackOverflowError is
logged, the thread dies, and then Geode closes its Cache and
DistributedSystem. If it's a Server process, then the process exits.

The proposal is to still have Geode log the stack trace and the thread will
die. The change is that we will not close the Cache and DistributedSystem
so that the Server process does not exit.

The only way we would hit this situation today that I know of involves
writing a query that exposes a recursive json parsing bug in the TypedJson
class. I would alter SystemFailure to not shutdown for a StackOverflowError
AND also fix the underlying bug in TypedJson which results in a
StackOverflowError. The fix for TypedJson may involve removing it in favor
of using Jackson for json parsing. I'll work on writing tests that expose
both of these issues -- in the meantime please let me know if anyone has
any feedback or opinions.

Thanks,
Kirk


On Wed, Sep 7, 2016 at 9:44 AM, Kirk Lund <kl...@pivotal.io> wrote:

> I'd like to change SystemFailure and calling code to not shutdown for a
> java.lang.StackOverflowError.
>
> The existing behavior would be unchanged for these VirtualMachineErrors:
>
> java.lang.InternalError
> java.lang.OutOfMemoryError
> java.lang.UnknownError
> java.util.zip.ZipError
>
> Thoughts or concerns?
>
> Thanks,
> Kirk
>
>
> On Fri, Sep 2, 2016 at 2:55 PM, Kirk Lund <kl...@apache.org> wrote:
>
>> The Geode codebase currently includes the component SystemFailure which
>> is initiated by any instance of VirtualMachineError:
>>
>>       } catch (VirtualMachineError e) {
>>         SystemFailure.initiateFailure(e);
>>         throw e;
>>
>> SystemFailure will ultimately react by closing the DistributedSystem and
>> Cache (ie, shutdown the server). The original reason was to close the Cache
>> in the event of an OutOfMemoryError to prevent Cache inconsistency from one
>> member to another.
>>
>> There are additional types of VirtualMachineError besides
>> OutOfMemoryError. Does it really make sense to initiate SystemFailure for
>> all other types including StackOverflowError?
>>
>> GFSH starts all processes with a flag indicating that OutOfMemoryError
>> should result in shutdown. It specifies "-XX:OnOutOfMemoryError=taskkill
>> /F /PID %p" for HotSpot on Windows, "-XX:OnOutOfMemoryError=kill -KILL %p"
>> for HotSpot on all other platforms, "-Xcheck:memory" on J9
>> or "-XXexitOnOutOfMemory" on JRockit.
>>
>> Given that the above flag should terminate the process on
>> OutOfMemoryError, are we now able to delete and remove SystemFailure from
>> Geode? Opinions?
>>
>> Thanks,
>> Kirk
>>
>>
>

Reply via email to