On Mon, Oct 30, 2017 at 3:02 PM, Tom Pantelis <[email protected]> wrote:

>
>
> On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger <[email protected]>
> wrote:
>
>> Hi Sam,
>>
>> On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague <[email protected]> wrote:
>>
>>> Stephen, Michael, Tom,
>>>
>>> do you have any ways to collect debugs when ODL crashes in CSIT?
>>>
>>
>> JVMs (almost) never "just crash" without a word... either some code
>> does java.lang.System.exit(), which you may remember we do in the CDS/Akka
>> code somewhere, or there's a bug in the JVM implementation - in which case
>> there should be a one of those JVM crash logs type things - a file named
>> something like hs_err_pid22607.log in the "current working" directory.
>> Where would that be on these CSIT runs, and are the CSIT JJB jobs set up to
>> preserve such JVM crash log files and copy them over to
>> logs.opendaylight.org ?
>>
>
> Akka will do System.exit() if it encounters an error serious for that.
> But it doesn't do it silently. However I believe we disabled the automatic
> exiting in akka.
>
Should there be any logs in ODL for this? There is nothing in the karaf log
when this happens. It literally just stops.

The karaf.console log does say the karaf process was killed:

/tmp/karaf-0.7.1-SNAPSHOT/bin/karaf: line 422: 11528 Killed ${KARAF_EXEC}
"${JAVA}" ${JAVA_OPTS} "$NON_BLOCKING_PRNG"
-Djava.endorsed.dirs="${JAVA_ENDORSED_DIRS}"
-Djava.ext.dirs="${JAVA_EXT_DIRS}"
-Dkaraf.instances="${KARAF_HOME}/instances" -Dkaraf.home="${KARAF_HOME}"
-Dkaraf.base="${KARAF_BASE}" -Dkaraf.data="${KARAF_DATA}"
-Dkaraf.etc="${KARAF_ETC}" -Dkaraf.restart.jvm.supported=true
-Djava.io.tmpdir="${KARAF_DATA}/tmp"
-Djava.util.logging.config.file="${KARAF_BASE}/etc/java.util.logging.properties"
${KARAF_SYSTEM_OPTS} ${KARAF_OPTS} ${OPTS} "$@" -classpath "${CLASSPATH}"
${MAIN}

In the CSIT robot files we can see the below connection errors so ODL is
not responding to new requests. This plus the above lead to think ODL just
died.

[ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None,
status=None)) after connection broken by
'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection
object at 0x5ca2d50>: Failed to establish a new connection: [Errno 111]
Connection refused',)'

>
>
>>
>> Tx,
>> M.
>> --
>> Michael Vorburger, Red Hat
>> [email protected] | IRC: vorburger @freenode | ~ = http://vorburger.ch
>>
>>
>>
>>>
>>> We have a number of jobs [1] that have recently started to crash. ODL
>>> just goes away in the middle of the job. No warnings or exceptions. This
>>> seems to only happen with ntirogen and oxygen so it leads me to believe it
>>> is a recent patch in something core.
>>>
>>> Thanks, Sam
>>>
>>> [1] https://logs.opendaylight.org/releng/jenkins092/netvirt-
>>> csit-1node-openstack-ocata-upstream-stateful-nitrogen/319/od
>>> l1_karaf.log.gz
>>>
>>
>>
>
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to