Re: [QGIS-Developer] QGIS repository management

2023-10-18 Thread Matthias Kuhn via QGIS-Developer
Yes, pretty much.
Interesting is, that for the other job based on fedora 38 the cache works,
see e.g.
working:
https://github.com/qgis/QGIS/actions/runs/6560633128/job/17818675001#step:7:117
not working:
https://github.com/qgis/QGIS/actions/runs/6560633128/job/17818674503#step:7:182

Matthias

On Wed, Oct 18, 2023 at 4:36 PM Julien Cabieces <
julien.cabie...@oslandia.com> wrote:

>
> > It looks like this is a docker container, this could possibly be cached
> in the ghcr?
>
> It's part of the qgis3-qt5-build-deps.dockerfile [0] which is built with a
> cache github action [1]. But looking at the log, it doesn't look like cache
> is at work, everything seems to be downloaded and built inside the
> docker.
>
> Is it what you are refering to? or is it something else?
>
> [0]
>
> https://github.com/qgis/QGIS/blob/master/.docker/qgis3-qt5-build-deps.dockerfile#L165
> [1]
> https://github.com/qgis/QGIS/blob/master/.github/workflows/run-tests.yml#L117
>
> > On Wed, Oct 18, 2023 at 9:11 AM Julien Cabieces <
> julien.cabie...@oslandia.com> wrote:
> >
> >  > Is there anything I've missed?
> >
> >  A missing external ressource (OTB, oracle binaries...) that we get with
> curl. This one for
> >  instance recently :
> >
> https://github.com/qgis/QGIS/actions/runs/6549790513/job/17793674746?pr=54963
> >
> >  Maybe we could store those ressources somewhere (a github repository for
> >  instance?) to avoid those errors.
> >
> > It looks like this is a docker container, this could possibly be cached
> in the ghcr?
>
>
> --
>
> Julien Cabieces
> Senior Developer at Oslandia
> julien.cabie...@oslandia.com
>
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


Re: [QGIS-Developer] QGIS repository management

2023-10-18 Thread Julien Cabieces via QGIS-Developer

> It looks like this is a docker container, this could possibly be cached in 
> the ghcr?

It's part of the qgis3-qt5-build-deps.dockerfile [0] which is built with a
cache github action [1]. But looking at the log, it doesn't look like cache
is at work, everything seems to be downloaded and built inside the
docker.

Is it what you are refering to? or is it something else?

[0]
https://github.com/qgis/QGIS/blob/master/.docker/qgis3-qt5-build-deps.dockerfile#L165
[1] 
https://github.com/qgis/QGIS/blob/master/.github/workflows/run-tests.yml#L117

> On Wed, Oct 18, 2023 at 9:11 AM Julien Cabieces 
>  wrote:
>
>  > Is there anything I've missed?
>
>  A missing external ressource (OTB, oracle binaries...) that we get with 
> curl. This one for
>  instance recently :
>  https://github.com/qgis/QGIS/actions/runs/6549790513/job/17793674746?pr=54963
>
>  Maybe we could store those ressources somewhere (a github repository for
>  instance?) to avoid those errors.
>
> It looks like this is a docker container, this could possibly be cached in 
> the ghcr?


-- 

Julien Cabieces
Senior Developer at Oslandia
julien.cabie...@oslandia.com
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


Re: [QGIS-Developer] QGIS repository management

2023-10-18 Thread Matthias Kuhn via QGIS-Developer
On Wed, Oct 18, 2023 at 9:11 AM Julien Cabieces <
julien.cabie...@oslandia.com> wrote:

>
> > Is there anything I've missed?
>
>
> A missing external ressource (OTB, oracle binaries...) that we get with
> curl. This one for
> instance recently :
>
> https://github.com/qgis/QGIS/actions/runs/6549790513/job/17793674746?pr=54963
>
> Maybe we could store those ressources somewhere (a github repository for
> instance?) to avoid those errors.
>

It looks like this is a docker container, this could possibly be cached in
the ghcr?
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


Re: [QGIS-Developer] QGIS repository management

2023-10-18 Thread Julien Cabieces via QGIS-Developer

> Is there anything I've missed?


A missing external ressource (OTB, oracle binaries...) that we get with curl. 
This one for
instance recently :
https://github.com/qgis/QGIS/actions/runs/6549790513/job/17793674746?pr=54963

Maybe we could store those ressources somewhere (a github repository for
instance?) to avoid those errors.

> network access manager test: depends on the badssl external website,
>  which goes down occasionally. We could potentially run this site
>  ourselves as part of the CI setup, but that's trading the dependence
>  on an external service for added CI maintenance burden on ourselves.
>

That's what we do for webdav and minio tests :
https://github.com/qgis/QGIS/blob/master/.docker/docker-compose-testing.yml#L14

I don't know much about bad ssl returned response but is it that much
complicated to configure ?

Regards,
Julien


> On Tue, Oct 17, 2023 at 9:19 PM Nyall Dawson via QGIS-Developer 
>  wrote:
>
>  >
>  > IMHO, the main issue here is that the CI is too often broken for no
>  > reasons related to the PR content. And most of the time recently, it's
>  > because of one of the mingw jobs.
>
>  I suggest we move this conversation into more practical areas and
>  focus instead on the actual CI issues.
>
>  In my experience there's a couple of main issues here:
>
>  1. The mingw build fails often with the error:
>
>  cd /__w/QGIS/QGIS/build_mingw64/resources && /usr/bin/cmake -E copy
>  /__w/QGIS/QGIS/resources/srs6.db
>  /__w/QGIS/QGIS/build_mingw64/resources/srs.db
>  make[2]: write error: stdout
>
>  It always succeeds if you restart the job though. I don't know what's
>  causing this... a disk space issue? (but then why would it succeed on
>  the following attempt?)
>
> mingw runs on ubuntu-latest within a fedora container. ubuntu images come 
> with plenty of bloat preinstalled that can be easily deleted.
> cf. 
> https://github.com/easimon/maximize-build-space/blob/master/action.yml#L114-L130
> I just don't know how to do that if a "container" is specified as the delete 
> action will need to happen on the base system.
>  
>  
>  2. The msys2 build can take HOURS (5+ in some cases). The qt 5/ qt6
>  builds also see occasional extremely slow builds.
>
>  I think the root cause of this is that we've exhausted all the
>  available space allowed for caching objects on github. All the builds
>  are trying to cache things to speed up the workflows, but the
>  different caches are randomly getting deleted because together they've
>  well exceeded the total allowed size. The solutions I could see here
>  are:
>
>  - approaching github to ask nicely for more space
>  - dropping one or more workflows. Maybe we could combine the msys2 and
>  mingw workflow into one to cut out one workflow. But that's blocked
>  because the msys2 workflow currently has a lot of things disabled
>  (including python), so it's currently not a suitable replacement for
>  the mingw builds which give end users an easy way to test PRs on
>  windows.
>
> What is the advantage of the msys2 workflow?
> It does "some build on windows" but IIUC does not replicate the actual 
> windows build using OSGEO4W. Is there something else that it brings to
> the mix?
>  
>  
>  3. Random test failures. Main culprits are:
>  - ept provider test (but I **think** I've fixed this with
>  improvements/fixes made over the 3.34 bug fixing period). At least, I
>  don't recall seeing this in the last week, when it used to be a daily
>  random fail.
>  - raster file writer test: randomly fails for an unknown reason. The
>  test passes valgrind/asan fine, so I suspect the random failures are
>  caused by some interplay with other tests modifying data in the test
>  data folder. More work is needed.
>  - raster analysis utils test: same as above.
>  - network access manager test: depends on the badssl external website,
>  which goes down occasionally. We could potentially run this site
>  ourselves as part of the CI setup, but that's trading the dependence
>  on an external service for added CI maintenance burden on ourselves.
>
>  4. Docker/ubuntu repository fails. This happens occasionally -- the
>  external repos will have transient connectivity issues. Not much we
>  can do here except manually re-running the jobs.
>
>  Is there anything I've missed?
>
> There is also the occasional "QGIS is not compatible with the bleeding edge 
> [sip, name your dependency] version that [rolling distro] has
> decided to start shipping today".
>
> Matthias
>  
>  
>  Nyall
>
>  >
>  > Like proposed by Matthias, I would be very much in favor in modifying
>  > our CI jobs to rely on non-rolling release distribution.
>  >
>  > Regards,
>  > Julien
>  >
>  >
>  > > In my experience, the peer reviews have proven to be an effective tool 
> to improve the code quality.
>  > > I think it can be explained with a four eyes principle. The first two 
> eyes are involved in writing the code, the second pair of eyes that
>  validates
>  > > needs to be a 

Re: [QGIS-Developer] QGIS repository management

2023-10-18 Thread Matthias Kuhn via QGIS-Developer
On Tue, Oct 17, 2023 at 9:19 PM Nyall Dawson via QGIS-Developer <
qgis-developer@lists.osgeo.org> wrote:

> >
> > IMHO, the main issue here is that the CI is too often broken for no
> > reasons related to the PR content. And most of the time recently, it's
> > because of one of the mingw jobs.
>
> I suggest we move this conversation into more practical areas and
> focus instead on the actual CI issues.
>
> In my experience there's a couple of main issues here:
>
> 1. The mingw build fails often with the error:
>
> cd /__w/QGIS/QGIS/build_mingw64/resources && /usr/bin/cmake -E copy
> /__w/QGIS/QGIS/resources/srs6.db
> /__w/QGIS/QGIS/build_mingw64/resources/srs.db
> make[2]: write error: stdout
>
> It always succeeds if you restart the job though. I don't know what's
> causing this... a disk space issue? (but then why would it succeed on
> the following attempt?)
>

mingw runs on ubuntu-latest within a fedora container. ubuntu images come
with plenty of bloat preinstalled that can be easily deleted.
cf.
https://github.com/easimon/maximize-build-space/blob/master/action.yml#L114-L130
I just don't know how to do that if a "container" is specified as the
delete action will need to happen on the base system.


>
> 2. The msys2 build can take HOURS (5+ in some cases). The qt 5/ qt6
> builds also see occasional extremely slow builds.
>
> I think the root cause of this is that we've exhausted all the
> available space allowed for caching objects on github. All the builds
> are trying to cache things to speed up the workflows, but the
> different caches are randomly getting deleted because together they've
> well exceeded the total allowed size. The solutions I could see here
> are:
>
> - approaching github to ask nicely for more space
> - dropping one or more workflows. Maybe we could combine the msys2 and
> mingw workflow into one to cut out one workflow. But that's blocked
> because the msys2 workflow currently has a lot of things disabled
> (including python), so it's currently not a suitable replacement for
> the mingw builds which give end users an easy way to test PRs on
> windows.
>

What is the advantage of the msys2 workflow?
It does "some build on windows" but IIUC does not replicate the actual
windows build using OSGEO4W. Is there something else that it brings to the
mix?


>
> 3. Random test failures. Main culprits are:
> - ept provider test (but I **think** I've fixed this with
> improvements/fixes made over the 3.34 bug fixing period). At least, I
> don't recall seeing this in the last week, when it used to be a daily
> random fail.
> - raster file writer test: randomly fails for an unknown reason. The
> test passes valgrind/asan fine, so I suspect the random failures are
> caused by some interplay with other tests modifying data in the test
> data folder. More work is needed.
> - raster analysis utils test: same as above.
> - network access manager test: depends on the badssl external website,
> which goes down occasionally. We could potentially run this site
> ourselves as part of the CI setup, but that's trading the dependence
> on an external service for added CI maintenance burden on ourselves.
>
> 4. Docker/ubuntu repository fails. This happens occasionally -- the
> external repos will have transient connectivity issues. Not much we
> can do here except manually re-running the jobs.
>
> Is there anything I've missed?
>

There is also the occasional "QGIS is not compatible with the bleeding edge
[sip, name your dependency] version that [rolling distro] has decided to
start shipping today".

Matthias


>
> Nyall
>
>
>
>
>
>
>
>
>
> >
> > Like proposed by Matthias, I would be very much in favor in modifying
> > our CI jobs to rely on non-rolling release distribution.
> >
> > Regards,
> > Julien
> >
> >
> > > In my experience, the peer reviews have proven to be an effective tool
> to improve the code quality.
> > > I think it can be explained with a four eyes principle. The first two
> eyes are involved in writing the code, the second pair of eyes that
> validates
> > > needs to be a trusted pair.
> > > For a code base of the maturity of QGIS, this seems to me to be
> justified and reasonable.
> > >
> > > As for who can do a review, a potential conflict of interest has been
> raised. This arises most noticeable within a single company, but can also
> > > very well be the case whenever a collaboration occurs between multiple
> individuals or if a well-connected client is involved. In my experience, in
> > > most cases people will first ask the person that they trust to be most
> qualified to review a specific area of the code, regardless the organisation
> > > they work for. In some cases, this may be your colleagues. What I and
> others have done in the past in such cases, is to leave an additional review
> > > window of a couple of days before merging to give everyone the
> opportunity to jump in if they wanted to, just to be sure to play it fair.
> In my
> > > experience, this worked quite