On Tue, Oct 31, 2017 at 12:44 AM, Jamo Luhrsen <[email protected]> wrote:

> On 10/30/2017 01:29 PM, Tom Pantelis wrote:
> > On Mon, Oct 30, 2017 at 4:25 PM, Sam Hague <[email protected] <mailto:
> [email protected]>> wrote:
> >     On Mon, Oct 30, 2017 at 3:02 PM, Tom Pantelis <[email protected]
> <mailto:[email protected]>> wrote:
> >         On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger <
> [email protected] <mailto:[email protected]>> wrote:
> >
> >             Hi Sam,
> >
> >             On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague <
> [email protected] <mailto:[email protected]>> wrote:
> >
> >                 Stephen, Michael, Tom,
> >
> >                 do you have any ways to collect debugs when ODL crashes
> in CSIT?
> >
> >
> >             JVMs (almost) never "just crash" without a word... either
> some code does java.lang.System.exit(), which you may
> >             remember we do in the CDS/Akka code somewhere, or there's a
> bug in the JVM implementation - in which case there
> >             should be a one of those JVM crash logs type things - a file
> named something like hs_err_pid22607.log in the
> >             "current working" directory. Where would that be on these
> CSIT runs, and are the CSIT JJB jobs set up to preserve
> >             such JVM crash log files and copy them over to
> logs.opendaylight.org <http://logs.opendaylight.org> ?
> >
> >
> >         Akka will do System.exit() if it encounters an error serious for
> that.  But it doesn't do it silently. However I
> >         believe we disabled the automatic exiting in akka.
> >
> >     Should there be any logs in ODL for this? There is nothing in the
> karaf log when this happens. It literally just stops.
> >
> >     The karaf.console log does say the karaf process was killed:
> >
> >     /tmp/karaf-0.7.1-SNAPSHOT/bin/karaf: line 422: 11528 Killed
> ${KARAF_EXEC} "${JAVA}" ${JAVA_OPTS} "$NON_BLOCKING_PRNG"
> >     -Djava.endorsed.dirs="${JAVA_ENDORSED_DIRS}"
> -Djava.ext.dirs="${JAVA_EXT_DIRS}"
> >     -Dkaraf.instances="${KARAF_HOME}/instances"
> -Dkaraf.home="${KARAF_HOME}" -Dkaraf.base="${KARAF_BASE}"
> >     -Dkaraf.data="${KARAF_DATA}" -Dkaraf.etc="${KARAF_ETC}"
> -Dkaraf.restart.jvm.supported=true
> >     -Djava.io.tmpdir="${KARAF_DATA}/tmp" -Djava.util.logging.config.
> file="${KARAF_BASE}/etc/java.util.logging.properties"
> >     ${KARAF_SYSTEM_OPTS} ${KARAF_OPTS} ${OPTS} "$@" -classpath
> "${CLASSPATH}" ${MAIN}
> >
> >     In the CSIT robot files we can see the below connection errors so
> ODL is not responding to new requests. This plus the
> >     above lead to think ODL just died.
> >
> >     [ WARN ] Retrying (Retry(total=2, connect=None, read=None,
> redirect=None, status=None)) after connection broken by
> >     
> > 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection
> object at 0x5ca2d50>: Failed to establish a new
> >     connection: [Errno 111] Connection refused',)'
> >
> >
> >
> > That would seem to indicate something did a kill -9.  As Michael said,
> if the JVM crashed there would be an hs_err_pid file
> > and it would log a message about it
>
> yeah, this is where my money is at as well. The OS must be dumping it
> because it's
> misbehaving. I'll try to hack the job to start collecting os level log
> info (e.g. journalctl, etc)
>

JamO, do make sure you collect not just OS level but also the JVM's
hs_err_*.log  file (if any); my bet is a JVM more than an OS level crash...

BTW: The most common fix ;) for JVM crashes often is simply upgrading to
the latest available patch version of OpenJDK.. but I'm guessing/hoping we
run from RPM and already have the latest - or is this possibly running on
an older JVM version package that was somehow "held back" via special dnf
instructions, or manually installed from a ZIP, kind of thing?


> JamO
>
>
> >
> > _______________________________________________
> > controller-dev mailing list
> > [email protected]
> > https://lists.opendaylight.org/mailman/listinfo/controller-dev
> >
> _______________________________________________
> controller-dev mailing list
> [email protected]
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to