On 07/02/2018 01:46 PM, Victor Pickard wrote:


On Mon, Jul 2, 2018 at 2:44 PM Tom Pantelis <tompante...@gmail.com 
<mailto:tompante...@gmail.com>> wrote:



    On Mon, Jul 2, 2018 at 2:15 PM, Victor Pickard <vpick...@redhat.com 
<mailto:vpick...@redhat.com>> wrote:

        Hi all,

        I'm looking at clustering stability. One of the jobs I've been looking 
at is controller clustering. This is a
        good CSIT, in that it stops and starts ODL several times during the run.

        In one of failed test runs (sandbox, logs wiped from last week, but I 
do have this particular karaf log archived
        locally), ODL is started, and rest calls fail during the test. Looking 
at the logs, I can see why. Karaf failed
        to start, or better yet, took a really long time to start. From the 
snipped below, you can see about 7 mins
        between when Karaf launched, and did something?, maybe restarted again. 
But the main thing is that karaf failed
        to start in a timely manner, taking over 7 minutes to begin to start up 
blueprints, etc.


        I ran a job that had karaf debug logging enabled with this setting:

        log4j.rootLogger=DEBUG


        This did not go very well. This generates way too much debug info, and 
was causing timeouts and other various
        errors in the CSIT run.


        So, my questions are:

        1. Has anyone see this issue where karaf seems to hang on startup 
(after a kill -9 on karaf pid)? If so, is this
        a known issue?

        2. What debug would be needed to figure out why karaf was hanging? Note 
the above generated a log file of ~768
        MB in a very short timespan.


    Vic - does this happen if you gracefully shut it down? In years past with 
karaf I recall corruption could occur in
    the bundle cache under data if the karaf process was killed. I don't know 
if that potential issue is still present
    with karaf 4. Does it clean the data dir before restarting? If not, it 
would be good to do so to be safe.

Other than that, we probably need to get a thread dump.

I'm thinking that you want a thread dump at the point in time where this issue occurs? I'm not sure if we easily can do this in CSIT. I see that thread dumps are (supposed to be) collected for jobs, but only at the very end of the test suite run. The last runs don't seem to be working because jstack command cannot be found.

this has just been sitting and broken for some time now. It was never anything
anyone cared about until now. We should try to get it fixed. I'll add it as
a subtask to your new bug (CONTROLLER-1845) and assign it to myself.

I'm thinking we would have to add some additional checking to see if karaf did start properly and in a timely manner, and dump the threads right then if failure detected.

Can you point me to a specific robot keyword that would fail in this
case? I can dig it up eventually, but hoping you have it quickly
available.

Jamo, do we have anything like this in CSIT now that you know of?

I think we can add some jstack dumping to certain keywords, or test
case teardowns (to use in the sandbox), and it shouldn't be hard
to do (once we get jstack working again). but, no we don't have anything
like it at this point. It's another subtask on my plate now.

Thanks
JamO



        Thanks,

        Vic




        Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
        INFO: Installing and starting initial bundles
        Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
        INFO: All initial bundles installed and set to start
        Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
        INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
        Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
        INFO: Lock acquired
        Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main$KarafLockCallback 
lockAquired INFO: Lock acquired. Setting
        startlevel to 100 Jun 29, 2018 3:50:48 PM org.apache.karaf.main.Main 
launch INFO: Installing and starting
        initial bundles
        Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main launch
        INFO: All initial bundles installed and set to start
        Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
        INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
        Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
        INFO: Lock acquired
        Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main$KarafLockCallback 
lockAquired
        INFO: Lock acquired. Setting startlevel to 100



        _______________________________________________
        controller-dev mailing list
        controller-dev@lists.opendaylight.org 
<mailto:controller-dev@lists.opendaylight.org>
        https://lists.opendaylight.org/mailman/listinfo/controller-dev




_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to