And now with screenshot! :)
Have a good weekend,
Niels
On Fri, Sep 08, 2017 at 05:41:30PM +0200, Niels de Vos wrote:
> On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote:
> > > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote:
> > > > Lately, we have been plagued by a lot of intermittent test failures.
> > > >
> > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16.
> > > > These have not been resolved by the latest ntirpc pullup.
> > > >
> > > > Additionally, we see a lot of intermittent failures in the continuous
> > > > integration.
> > > >
> > > > A big issue with the Centos CI is that it seems to have a fragile
> > > > setup, and sometimes doesn't even succeed in trying to build Ganesha,
> > > > and then fires a Verified -1. This makes it hard to evaluate what
> > > > patches are actually ready for integration.
> > >
> > > We can look into this, but it helps if you can provide a link to the patch
> > in
> > > GerritHub or the job in the CI.
> >
> > Here's one merged last week with a Gluster CI Verify -1:
> >
> > https://review.gerrithub.io/#/c/375463/
> >
> > And just to preserve it in case... here's the log:
> >
> > Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode.
> > [EnvInject] - Loading node environment variables.
> > Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace
> > /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster
> > [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe
> > /tmp/jenkins5031649144466335345.sh
> > + set +x
> > % Total % Received % Xferd Average Speed Time Time Time
> > Current
> > Dload Upload Total Spent Left
> > Speed
> >
> > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
> > 0
> > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
> > 0
> > 100 1735 100 1735 0 0 8723 0 --:--:-- --:--:-- --:--:--
> > 8718
> > Traceback (most recent call last):
> > File "bootstrap.py", line 33, in <module>
> > b=json.loads(dat)
> > File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
> > return _default_decoder.decode(s)
> > File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
> > obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> > File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
> > raise ValueError("No JSON object could be decoded")
> > ValueError: No JSON object could be decoded
> > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console :
> > FAILED
> > Build step 'Execute shell' marked build as failure
> > Finished: FAILURE
> >
> > Which tells me not much about why it failed, though it looks like a failure
> > that has nothing to do with Ganesha...
>
> From #centos-devel on Freenode:
>
> 15:49 < ndevos> bstinson: is
> https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a
> known duffy problem? and how can the jobs work around this?
> 15:51 < bstinson> ndevos: you may be hitting the rate limit
> 15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen
> when a series of patches get sent
> 15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a
> failure?
> 15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over
> 5 minutes
> 15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be
> acceptible?
> 15:59 < ndevos> bstinson: is there a particular message returned by duffy
> when the rate limit is hit? the reply is not json, but maybe some error?
> 15:59 < ndevos> (in plain text format?)
> 15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain
> text error message
> 16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes'
>
> Added a retry logic which is now live, and should get applied for all
> upcoming tests:
>
> https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129
>
>
> > > > An additional issue with the Centos CI is that the failure logs often
> > > > aren't preserved long enough to even diagnose the issue.
> > >
> > > That is something we can change. Some jobs do not delete the results, but
> > > others seem to do. How long (in days), or how many results would you like
> > to
> > > keep?
> >
> > I'd say they need to be kept at least a week, if we could have time based
> > retention rather than number of results retention, I think that would help.
>
> Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not
> really cost us anything, so I'll change it to 14 days. A screenshot for
> these settings has been attached. It can be that I missed updating a job
> so let us know in case logs are deleted too early.
>
> > At least after a week, it's reasonable to expect folks to rebase their
> > patches and re-submit, which would trigger a new run.
> >
> > > > The result is that honestly, I mostly ignore the Centos CI results.
> > > > They almost might as well not be run...
> > >
> > > This is definitely not what we want, so lets fix the problems.
> >
> > Yea, and thus my rant...
>
> I really understand this, a CI should be helpful in identifying
> problems, and not introduce problems from itself. Lets try hard to not
> have you needing to rant about it much more :-)
>
> > > > Let's talk about CI more on a near time concall (it would help if
> > > > Niels and Jiffin could join a call to talk about this, our next call
> > > > might be too soon for that).
> > >
> > > Tuesdays tend to be very busy for me, and I am not sure I can join the
> > call
> > > next week. Arthy did some work on the jobs in the CentOS CI, she could
> > > probably work with Jiffin to make any changes that improve the experience
> > > for you. I'm happy to help out where I can too, of course :-)
> >
> > If we can figure out another time to have a CI call, that would be helpful.
>
> > It would be good to pull in Patrice from CEA as well as anyone else who
> > cares.
> >
> > It would really help if we could have someone with better time zone overlap
> > with me who could manage the CI stuff, but that may not be realistic.
>
> We can sign up anyone in the NFS-Ganesha community to do this. It takes
> a little time to get familiar with the scripts and tools that are used,
> but once that settled it is relatively straight forward.
>
> Volunteers?
>
> Cheers,
> Niels
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Nfs-ganesha-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel