On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote: > > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote: > > > Lately, we have been plagued by a lot of intermittent test failures. > > > > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. > > > These have not been resolved by the latest ntirpc pullup. > > > > > > Additionally, we see a lot of intermittent failures in the continuous > > > integration. > > > > > > A big issue with the Centos CI is that it seems to have a fragile > > > setup, and sometimes doesn't even succeed in trying to build Ganesha, > > > and then fires a Verified -1. This makes it hard to evaluate what > > > patches are actually ready for integration. > > > > We can look into this, but it helps if you can provide a link to the patch > in > > GerritHub or the job in the CI. > > Here's one merged last week with a Gluster CI Verify -1: > > https://review.gerrithub.io/#/c/375463/ > > And just to preserve it in case... here's the log: > > Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode. > [EnvInject] - Loading node environment variables. > Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace > /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster > [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe > /tmp/jenkins5031649144466335345.sh > + set +x > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 100 1735 100 1735 0 0 8723 0 --:--:-- --:--:-- --:--:-- > 8718 > Traceback (most recent call last): > File "bootstrap.py", line 33, in <module> > b=json.loads(dat) > File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads > return _default_decoder.decode(s) > File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode > raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console : > FAILED > Build step 'Execute shell' marked build as failure > Finished: FAILURE > > Which tells me not much about why it failed, though it looks like a failure > that has nothing to do with Ganesha...
>From #centos-devel on Freenode: 15:49 < ndevos> bstinson: is https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a known duffy problem? and how can the jobs work around this? 15:51 < bstinson> ndevos: you may be hitting the rate limit 15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen when a series of patches get sent 15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a failure? 15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over 5 minutes 15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be acceptible? 15:59 < ndevos> bstinson: is there a particular message returned by duffy when the rate limit is hit? the reply is not json, but maybe some error? 15:59 < ndevos> (in plain text format?) 15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain text error message 16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes' Added a retry logic which is now live, and should get applied for all upcoming tests: https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129 > > > An additional issue with the Centos CI is that the failure logs often > > > aren't preserved long enough to even diagnose the issue. > > > > That is something we can change. Some jobs do not delete the results, but > > others seem to do. How long (in days), or how many results would you like > to > > keep? > > I'd say they need to be kept at least a week, if we could have time based > retention rather than number of results retention, I think that would help. Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not really cost us anything, so I'll change it to 14 days. A screenshot for these settings has been attached. It can be that I missed updating a job so let us know in case logs are deleted too early. > At least after a week, it's reasonable to expect folks to rebase their > patches and re-submit, which would trigger a new run. > > > > The result is that honestly, I mostly ignore the Centos CI results. > > > They almost might as well not be run... > > > > This is definitely not what we want, so lets fix the problems. > > Yea, and thus my rant... I really understand this, a CI should be helpful in identifying problems, and not introduce problems from itself. Lets try hard to not have you needing to rant about it much more :-) > > > Let's talk about CI more on a near time concall (it would help if > > > Niels and Jiffin could join a call to talk about this, our next call > > > might be too soon for that). > > > > Tuesdays tend to be very busy for me, and I am not sure I can join the > call > > next week. Arthy did some work on the jobs in the CentOS CI, she could > > probably work with Jiffin to make any changes that improve the experience > > for you. I'm happy to help out where I can too, of course :-) > > If we can figure out another time to have a CI call, that would be helpful. > It would be good to pull in Patrice from CEA as well as anyone else who > cares. > > It would really help if we could have someone with better time zone overlap > with me who could manage the CI stuff, but that may not be realistic. We can sign up anyone in the NFS-Ganesha community to do this. It takes a little time to get familiar with the scripts and tools that are used, but once that settled it is relatively straight forward. Volunteers? Cheers, Niels ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel