Hi all,

Appreciate the context Uwe!  In poking around a bit I see there's a
cleanup option in the job config called: "Clean up this jobs
workspaces from other slave nodes".  Seems like that might help a bit,
though I'd have to do a little more digging to be sure.  Doesn't
appear to be enabled on either the Lucene or Solr jobs, fwiw.

Does anyone have a pointer to the discussion around getting these
project-specific build nodes?  I searched around in JIRA and on
lists.apache.org, but couldn't find a request for the nodes or a
discussion about creating them.  Would love to understand the
rationale a bit better: in talking to INFRA folks last week, they
suggested the main (only?) reason folks use project-specific VMs is to
avoid waiting in the general pool...but our average wait time these
days at least looks much much longer than anything in the general
pool.

>From the "outside" looking in, it feels like we're taking on more
maintenance/infra burden for worse results (at least as defined by
'uptime' and build-wait times).

Best,

Jason

On Tue, Oct 15, 2024 at 6:12 AM Uwe Schindler <u...@thetaphi.de> wrote:
>
> Hi,
>
> I have root access on both machines. I was not aware of problems. The
> workspace name problem is a known one: If a node is down while the job
> is renamed or there are multiple nodes having the workspace it can't
> delete. In the case of multiple nodes, it only deletes the workspace on
> the node which had the job running recently.
>
> As general rule that I always use: Before renaming a job, go to the Job
> and prune the workspace from the web interface. But this has the same
> problem like described before: It only shows the workspace of the last
> node which executed the job recently.
>
> Uwe
>
> Am 14.10.2024 um 22:01 schrieb Jason Gerlowski:
> > Of course, happy to help - glad you got some 'green' builds.
> >
> > Both agents should be back online now.
> >
> > The root of the problem appears to be that Jenkins jobs use a static
> > workspace whose path is based on the name of the job.  This would work
> > great if job names never changed I guess.  But our job names *do*
> > drift - both Lucene and Solr tend to include version strings (e.g.
> > Solr-check-9.6, Lucene-check-9.12), which introduces some "drift" and
> > orphans a few workspaces a year.  That doesn't sound like much, but
> > each workspace contains a full Solr or Lucene checkout+build, so they
> > add up pretty quickly.  Anyway, that root problem remains and will
> > need to be addressed if our projects want to continue the specially
> > tagged agents.  But things are healthy for now!
> >
> > Best,
> >
> > Jason
> >
> > On Tue, Oct 8, 2024 at 3:10 AM Luca Cavanna <java...@apache.org> wrote:
> >> Thanks a lot Jason,
> >> this helps a lot. I see that the newly added jobs for 10x and 10.0 have 
> >> been built and it all looks pretty green now.
> >>
> >> Thanks
> >> Luca
> >>
> >> On Mon, Oct 7, 2024 at 11:27 PM Jason Gerlowski <gerlowsk...@gmail.com> 
> >> wrote:
> >>> Hi Luca,
> >>>
> >>> I suspect I'm chiming in here a little late to help with your
> >>> release-related question, but...
> >>>
> >>> I stopped into the "#askinfra Office Hours" this afternoon at
> >>> ApacheCon, and asked for some help on this.  Both workers seemed to
> >>> have disk-space issues, seemingly due to orphaned workspaces.  I've
> >>> gotten one agent/worker back online (lucene-solr-2 I believe).  The
> >>> other one I'm hoping to get back online shortly, after a bit more
> >>> cleanup.
> >>>
> >>> (Getting the right permissions to clean things up was a bit of a
> >>> process; I'm hoping to document this and will share here when that's
> >>> ready.)
> >>>
> >>> There are still nightly jobs that run on the ASF Jenkins (for both
> >>> Lucene and Solr); on the Solr side at least these are quite useful.
> >>>
> >>> Best,
> >>>
> >>> Jason
> >>>
> >>>
> >>> On Wed, Oct 2, 2024 at 2:40 PM Luca Cavanna <java...@apache.org> wrote:
> >>>> Hi all,
> >>>> I created new CI jobs at https://ci-builds.apache.org/job/Lucene/ 
> >>>> yesterday to cover branch_10x and branch_10_0 . Not a single build for 
> >>>> them started so far.
> >>>>
> >>>> Poking around I noticed on the build history a message "Pending - all 
> >>>> nodes of label Lucene are offline", which looked suspicious. Are we 
> >>>> still using this jenkins? I have successfully used it for the release I 
> >>>> have done in the past, but it was already some months ago. The step of 
> >>>> creating jobs is still part of the release wizard process anyways, so it 
> >>>> felt right to do this step. I am not sure how to proceed from here, does 
> >>>> anyone know? I also noticed a too low disk space warning on one of the 
> >>>> two agents.
> >>>>
> >>>> Thanks
> >>>> Luca
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to