Hi Juan,

Thanks for the detailed analysis.

I did some digging too and cannot see why the QUIT signal might be getting
lost. I'll do some more investigation...

Just for reference, you can still use the 'jstack' tool to get thread dumps
although they just show on the terminal where you run the command.

Although I haven't confirmed, I think there may be a difference in how
stdout is handled, WRT log files, when a server or locator is started using
the Server/LocatorLauncher API (as opposed to directly with gfsh).

--Jens

On Wed, Dec 20, 2017 at 4:12 AM, Juan José Ramos <jra...@pivotal.io> wrote:

> Hello again,
>
> Regarding my last question from the original email (unformatted jetty
> logging when enabling --*redirect-output*), it looks like the offending
> library within the *geode-pulse.war* is *commons-logging-x.x.jar.* The
> library is configured as *providedCompile* in *build.gradle* and, by
> definition, this means it should be included in the *compile
> classpath* but *not
> included in the packaged war file*. This happens because *commons-logging*
> is
> a transient dependency of another direct pulse dependency, not configured
> as *providedCompile; *the fix is quite simple and it just implies
> adding *exclude
> module: 'commons-logging'* to this direct dependencies.
> That said, two questions remain unanswered:
>   - Any comments about the suggested approach to fix GEODE-4101 (deprecate
> old system properties, add new flag and set an internal system properties
> to ignore the old ones)?.
>   - Anyone knows if Geode is, somehow, catching the QUIT signal and
> preventing the thread dump from going to the member's log file?. Can you
> point me to the relevant section of code or class?.
>
> Best regards.
>
> On Wed, Dec 20, 2017 at 6:33 AM, Juan José Ramos <jra...@pivotal.io>
> wrote:
>
> > Hello Kirk,
> >
> > Thanks for your reply. Please find my answers below, in line for an
> easier
> > follow-up.
> >
> > *>> By default Geode redirects stdout and stderr to the value of the
> > log-file property when the Cache starts. If log-file is set to "" then it
> > won't redirect them.*
> >
> > This doesn't seem to be true at the moment, I've tested the behavior
> > before sending the email and creating the JIRA. If you start a basic
> server
> > with the defaults + *log-file*, the output written from a
> > function/listener to stdout/stderr is lost, nothing is printed on the
> > member's logs.
> >
> >
> > *>> So we reluctantly made the code redirect stdout and stderr when
> > LogWriterAppender is registered (at the same point in Cache startup where
> > the behavior originally existed). If you specify a custom log4j2.xml then
> > this behavior does NOT happen.*
> >
> > Also tested this one and it doesn't seem to be true, the result is the
> same
> > as before: the output written to *stderr/stdout* is lost, nothing is
> > printed on the member's logs.
> >
> >
> > *>> So a user currently has two options:*
> > *>> 1) keep the old behavior,*
> > *>> 2) use a custom log4j2.xml which can be specified using the Log4J 2
> > system property "log4j.configurationFile".*
> >
> > The attached zip file contains a reproducible scenario (you just need to
> > modify *setenv.txt* and execute *reproduce.sh*) showing that both
> > approaches fail, the output from *System.out.println()* and
> > *System.err.println()* is lost, along with the thread dump generated by
> *kill
> > -3 PID*.
> > Hope this is clear now, hope you can answer the questions from my
> previous
> > email so I can proceed with the fix.
> > Best regards
> >
> >
> > On Tue, Dec 19, 2017 at 2:56 PM, Kirk Lund <kl...@apache.org> wrote:
> >
> >> By default Geode redirects stdout and stderr to the value of the
> log-file
> >> property when the Cache starts. If log-file is set to "" then it won't
> >> redirect them.
> >>
> >> When we changed GemFire to use Log4J2 internally, we tried to leave
> stdout
> >> and stderr alone such that the LogWriterAppender would append log
> >> statements to the log-file and it would leave the stdout appender
> >> registered and appending to stdout. This was a change in behavior from
> the
> >> previous version of GemFire, and the management at that time decided
> that
> >> it needed to maintain the old behavior, so we reluctantly made the code
> >> redirect stdout and stderr when LogWriterAppender is registered (at the
> >> same point in Cache startup where the behavior originally existed).
> >>
> >> If you specify a custom log4j2.xml then this behavior does NOT happen.
> So
> >> a
> >> User currently has two options: 1) keep the old behavior, 2) use a
> custom
> >> log4j2.xml which can be specified using the Log4J 2 system property
> >> "log4j.configurationFile".
> >>
> >> If you copy geode-core/src/main/resources/log4j2.xml for use as the
> >> starting basis for your log4j2.xml then you must NOT keep this line (or
> >> set
> >> it to false):
> >>
> >> <Property name="geode-default">true</Property>
> >>
> >> That is the line that Geode uses to determine that it is using its own
> >> default log4j2.xml and this enables redirecting of stdout and stderr
> when
> >> the LogWriterAppender starts up.
> >>
> >> On Tue, Dec 19, 2017 at 7:11 AM, Ju@N <jujora...@gmail.com> wrote:
> >>
> >> > Hello Geode devs,
> >> >
> >> > Currently GEODE is "swallowing" all output sent to stdout and stderr
> by
> >> > default and there's no way of changing this behavior when starting
> >> members
> >> > through *gfsh*.
> >> > This, between other things, prevents users, between other things, from
> >> > playing around with *System.out.println()* during development phases
> and
> >> > getting thread dumps by executing a plain *kill -3* or *kill -QUIT*
> >> using
> >> > the processId, which is critical in troubleshooting.
> >> > Currently there are two internal flags that can be used to prevent
> this
> >> > default behavior, both have to be used at the same time and both are
> >> very
> >> > counterintuitive: *gemfire.OSProcess.ENABLE_OUTPUT_REDIRECTION=true*
> >> and
> >> > *gemfire.OSProcess.DISABLE_OUTPUT_REDIRECTION=false*. These flags,
> >> > however,
> >> > don't work when starting members through *gfsh*, and that's because
> the
> >> > relevant commands wrongly assume that the flags are already part of
> the
> >> > system properties too early in the lifecycle execution of the command:
> >> >
> >> >
> >> > *StartXXXXXCommand.java*
> >> >
> >> > @CliCommand(value = CliStrings.START_XXXXX, help =
> >> > CliStrings.START_XXXXX__HELP)
> >> >
> >> > @CliMetaData(shellOnly = true, relatedTopic =
> >> > {CliStrings.TOPIC_GEODE_XXXXX, CliStrings.TOPIC_GEODE_LIFECYCLE})
> >> > public Result startXXXXX(...) throws Exception {
> >> >         (...)
> >> >         final boolean redirectOutput =
> >> > Boolean.getBoolean(OSProcess.ENABLE_OUTPUT_REDIRECTION_PROPERTY);
> >> >     XXXXXLauncher.Builder serverXXXXXBuilder =
> >> >         new XXXXXLauncher.Builder()
> >> >         .setRedirectOutput(redirectOutput)
> >> >         (...)
> >> >
> >> > }
> >> >
> >> > At this stage during the execution, the system properties used when
> >> > starting the members haven't been fully parsed yet and the flags are
> >> only
> >> > present within the sun.java.command system property, so
> >> > *Boolean.getBoolean(OSProcess.ENABLE_OUTPUT_REDIRECTION_PROPERTY)*
> will
> >> > always return *false*. There's a JIRA created with this same
> >> description,
> >> > and I've started to work on a fix for it: GEODE-4101
> >> > <https://issues.apache.org/jira/browse/GEODE-4101>.
> >> >
> >> > The proposal would be to add a new flag, *--redirect-ouput*, to the
> >> start
> >> > commands in GFSH and deprecate the properties
> >> > *OSProcess.DISABLE_OUTPUT_REDIRECTION* and
> >> > *OSProcess.ENABLE_OUTPUT_REDIRECTION*. To avoid major code changes
> the
> >> > start commands will have this new flag as a parameter and will also
> set
> >> as
> >> > *true* a new internal system property
> >> > *OSProcess.DISABLE_REDIRECTION_CONFIGURATION* which, as it names
> >> implies,
> >> > will disable the other two properties when set. In the next major
> >> release,
> >> > the three properties should be deleted without major changes. Do you
> see
> >> > any flaws here?.
> >> >
> >> > I've tested these changes and the output from *System.out.println()*
> >> (from
> >> > a function or listener, as an example) goes to the member's log file
> as
> >> > expected. However, no matter what I do, I can't get the output from
> >> *kill
> >> > -3 / kill -QUIT*, nor can I find a place within the source code where
> >> this
> >> > signal is caught to explain why the thread dump is not printed in the
> >> > member's log file. Am I missing something?.
> >> >
> >> > Last, but not least, when redirecting *stdout/stderr* within a locator
> >> with
> >> > pulse embedded, all of the deploy steps get logged using a different
> >> format
> >> > (*this was being swallowed before*):
> >> >
> >> > ...
> >> > [info 2017/12/19 11:12:12.123 ART locator1 <main> tid=0x1]
> >> > Initializing Spring root WebApplicationContext
> >> > Dec 19, 2017 11:12:12 AM org.springframework.web.
> context.ContextLoader
> >> > initWebApplicationContext
> >> > INFO: Root WebApplicationContext: initialization started
> >> > Dec 19, 2017 11:12:12 AM
> >> > org.springframework.web.context.support.XmlWebApplicationContext
> >> > prepareRefresh
> >> > INFO: Refreshing Root WebApplicationContext: startup date [Tue Dec 19
> >> > 11:12:12 ART 2017]; root of context hierarchy
> >> > Dec 19, 2017 11:12:12 AM
> >> > org.springframework.beans.factory.xml.XmlBeanDefinitionReader
> >> > loadBeanDefinitions
> >> > INFO: Loading XML bean definitions from ServletContext resource
> >> > [/WEB-INF/mvc-dispatcher-servlet.xml]
> >> > Dec 19, 2017 11:12:12 AM
> >> > org.springframework.beans.factory.xml.XmlBeanDefinitionReader
> >> > loadBeanDefinitions
> >> > INFO: Loading XML bean definitions from ServletContext resource
> >> > [/WEB-INF/spring-security.xml]
> >> > ...
> >> >
> >> > This probably happens because jetty uses *StdErrLog* by default and
> >> > *log4j2* gets
> >> > reconfigured using the *log4j2.xml* file from pulse (ignoring the
> format
> >> > and options defined by *org.apache.geode.internal.
> logging.LogService*).
> >> > What would be the recommended approach here?, add *geode-core* as a
> >> compile
> >> > dependency of *geode-pulse *and directly use *LogService* instead of
> the
> >> > default *LogManager*?, define a custom *LogService* in *geode-pulse
> *to
> >> > check (JMX maybe?) whether there's a parent context defined already
> and
> >> use
> >> > it instead of *LogManager*?, tweak *JettyHelper* to, somehow, threat
> >> > *pulse* differently
> >> > and disable this deploy logging as it happens today?, leave it as it
> is
> >> and
> >> > create a new JIRA to address this separately (maybe moving the
> internal
> >> > logging to a separate module)?.
> >> > Best regards
> >> >
> >>
> >
> >
> >
> > --
> > Juan José Ramos Cassella
> > Senior Technical Support Engineer
> > Email: jra...@pivotal.io
> > Office#: +353 21 4238611
> > Mobile#: +353 87 2074066
> > After Hours Contact#: +1 877 477 2269
> > Office Hours: Mon - Thu 08:30 - 17:00 GMT. Fri 08:30 - 16:00 GMT
> > How to upload artifacts: https://support.pivotal.io/hc/
> > en-us/articles/204369073
> > How to escalate a ticket: https://support.pivotal.io/hc/
> > en-us/articles/203809556
> >
> > [image: support] <https://support.pivotal.io/> [image: twitter]
> > <https://twitter.com/pivotal> [image: linkedin]
> > <https://www.linkedin.com/company/3048967> [image: facebook]
> > <https://www.facebook.com/pivotalsoftware> [image: google plus]
> > <https://plus.google.com/+Pivotal> [image: youtube]
> > <https://www.youtube.com/playlist?list=PLAdzTan_
> eSPScpj2J50ErtzR9ANSzv3kl>
> >
>
>
>
> --
> Juan José Ramos Cassella
> Senior Technical Support Engineer
> Email: jra...@pivotal.io
> Office#: +353 21 4238611
> Mobile#: +353 87 2074066
> After Hours Contact#: +1 877 477 2269
> Office Hours: Mon - Thu 08:30 - 17:00 GMT. Fri 08:30 - 16:00 GMT
> How to upload artifacts:
> https://support.pivotal.io/hc/en-us/articles/204369073
> How to escalate a ticket:
> https://support.pivotal.io/hc/en-us/articles/203809556
>
> [image: support] <https://support.pivotal.io/> [image: twitter]
> <https://twitter.com/pivotal> [image: linkedin]
> <https://www.linkedin.com/company/3048967> [image: facebook]
> <https://www.facebook.com/pivotalsoftware> [image: google plus]
> <https://plus.google.com/+Pivotal> [image: youtube]
> <https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
>

Reply via email to