On Tue, Oct 31, 2017 at 12:44 AM, Jamo Luhrsen <[email protected]> wrote:
> On 10/30/2017 01:29 PM, Tom Pantelis wrote: > > On Mon, Oct 30, 2017 at 4:25 PM, Sam Hague <[email protected] <mailto: > [email protected]>> wrote: > > On Mon, Oct 30, 2017 at 3:02 PM, Tom Pantelis <[email protected] > <mailto:[email protected]>> wrote: > > On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger < > [email protected] <mailto:[email protected]>> wrote: > > > > Hi Sam, > > > > On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague < > [email protected] <mailto:[email protected]>> wrote: > > > > Stephen, Michael, Tom, > > > > do you have any ways to collect debugs when ODL crashes > in CSIT? > > > > > > JVMs (almost) never "just crash" without a word... either > some code does java.lang.System.exit(), which you may > > remember we do in the CDS/Akka code somewhere, or there's a > bug in the JVM implementation - in which case there > > should be a one of those JVM crash logs type things - a file > named something like hs_err_pid22607.log in the > > "current working" directory. Where would that be on these > CSIT runs, and are the CSIT JJB jobs set up to preserve > > such JVM crash log files and copy them over to > logs.opendaylight.org <http://logs.opendaylight.org> ? > > > > > > Akka will do System.exit() if it encounters an error serious for > that. But it doesn't do it silently. However I > > believe we disabled the automatic exiting in akka. > > > > Should there be any logs in ODL for this? There is nothing in the > karaf log when this happens. It literally just stops. > > > > The karaf.console log does say the karaf process was killed: > > > > /tmp/karaf-0.7.1-SNAPSHOT/bin/karaf: line 422: 11528 Killed > ${KARAF_EXEC} "${JAVA}" ${JAVA_OPTS} "$NON_BLOCKING_PRNG" > > -Djava.endorsed.dirs="${JAVA_ENDORSED_DIRS}" > -Djava.ext.dirs="${JAVA_EXT_DIRS}" > > -Dkaraf.instances="${KARAF_HOME}/instances" > -Dkaraf.home="${KARAF_HOME}" -Dkaraf.base="${KARAF_BASE}" > > -Dkaraf.data="${KARAF_DATA}" -Dkaraf.etc="${KARAF_ETC}" > -Dkaraf.restart.jvm.supported=true > > -Djava.io.tmpdir="${KARAF_DATA}/tmp" -Djava.util.logging.config. > file="${KARAF_BASE}/etc/java.util.logging.properties" > > ${KARAF_SYSTEM_OPTS} ${KARAF_OPTS} ${OPTS} "$@" -classpath > "${CLASSPATH}" ${MAIN} > > > > In the CSIT robot files we can see the below connection errors so > ODL is not responding to new requests. This plus the > > above lead to think ODL just died. > > > > [ WARN ] Retrying (Retry(total=2, connect=None, read=None, > redirect=None, status=None)) after connection broken by > > > > 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection > object at 0x5ca2d50>: Failed to establish a new > > connection: [Errno 111] Connection refused',)' > > > > > > > > That would seem to indicate something did a kill -9. As Michael said, > if the JVM crashed there would be an hs_err_pid file > > and it would log a message about it > > yeah, this is where my money is at as well. The OS must be dumping it > because it's > misbehaving. I'll try to hack the job to start collecting os level log > info (e.g. journalctl, etc) > JamO, do make sure you collect not just OS level but also the JVM's hs_err_*.log file (if any); my bet is a JVM more than an OS level crash... BTW: The most common fix ;) for JVM crashes often is simply upgrading to the latest available patch version of OpenJDK.. but I'm guessing/hoping we run from RPM and already have the latest - or is this possibly running on an older JVM version package that was somehow "held back" via special dnf instructions, or manually installed from a ZIP, kind of thing? > JamO > > > > > > _______________________________________________ > > controller-dev mailing list > > [email protected] > > https://lists.opendaylight.org/mailman/listinfo/controller-dev > > > _______________________________________________ > controller-dev mailing list > [email protected] > https://lists.opendaylight.org/mailman/listinfo/controller-dev >
_______________________________________________ controller-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/controller-dev
