On Tue, 14 Jan 2025 at 18:13, Niall Pemberton <[email protected]> wrote:
>
> On Sun, 12 Jan 2025 at 09:05, sebb <[email protected]> wrote:
>
> > On Sat, 11 Jan 2025 at 20:03, Niall Pemberton <[email protected]>
> > wrote:
> > >
> > > Hi Attic Team,
> > >
> > > A number of retired projects are flagged on a monthly report (to the
> > > Privacy Committee) as contravening the ASF privacy policy due to their
> > use
> > > of Google Analytics (mainly)[1].
> >
> > It's not just the analytics; fonts, images and scripts (etc.) must not
> > be loaded from non-permitted sites.
> >
> > > I'm starting with the assumption that it would be difficult/painful to
> > fix
> > > this now that these projects are in the Attic, but I thought I would ask
> > > here if there was any way to do this?
> >
> > Yes, it was difficult to prepare sites for the Attic.
> > In general, it's not possible (or desirable) to regenerate the website
> > from source.
> > So originally it meant the Attic needed to change every single HTML source
> > file.
> > This was a lot of work, even if some could be automated.
> >
> > The Attic banner is now added with a server filter (in Lua) that adds
> > the banner text to the HTML files.
> >
> > I think the first stage is to establish what effect the CSP will have
> > on the behaviour of each site.
> > In the case of tracking, that is not needed for Attic sites, so unless
> > the site no longer works properly, any failed fetches can just be
> > ignored.
> >
> > For other assets such as fonts, images and scripts, failure to load is
> > likely to have an adverse effect, so needs to be solved.
> >
> > In theory the server filter could be extended to change URLs to a
> > local copy, though it might be tricky to only change the relevant
> > URLs.
> > (Only automatically loaded resources need be changed)
> > I suspect it might be necessary to edit the HTML files to do this properly.
> >
> > Some scripts must be accessed from the 3rd party and cannot be
> > replaced with local copies.
> > That might mean rewriting entire pages if the script is essential to
> > the site working.
> > In the case of analytics, such references can just be removed/disabled.
> >
> > For some references it should be possible to set up a central proxy
> > server to fetch the resource, and change the HTML to use the proxy.
> > If the proxy is set up properly, the user PII is not passed on to the 3rd
> > party.
> > Or it might be possible to use server rewrites if those can be
> > guaranteed not to pass on any PII from the original request.
> >
> > For testing, it should be possible to set up a Docker container with a
> > webserver having the appropriate CSP settings.
> > It can also have the Lua filter for experiments with that.
> > The website source can then be checked out locally, and mapped into
> > the container on startup
> >
>
> Thanks for this response Seb.
>
> So when you say "website source can then be checked out locally" does that
> means Attic has a copy of the deployed websites somewhere in a repo and you
> can modify/redeploy if need be? Or are the original project websites
> effectively frozen at the point of retirement and no longer modifiable?

Project website repos are generally frozen at the point of retirement.
[The server filter that adds the Attic banner means write access is not needed.]

For testing CSP and how to fix it you can checkout your own copy of the website.

If it turns out the website source needs to be amended (rather than
using a server filter), then Infra will need to grant write access.
Or a patch file could be created from the checkout and provided to
Infra so they can apply it.

> Niall
>
>
>
> >
> > I worked on the Lua script, and did some CSP testing, and may still
> > have suitable Docker scripts.
> >
> > > Below is a list of websites that are being flagged. I am willing to do
> > some
> > > work to fix this issue, so if you have any ideas on how this could be
> > > resolved, then I would appreciate it,
> >
> > > Thanks
> > >
> > > Niall
> > >
> > >
> > > apex.apache.org
> > > archiva.apache.org
> > > bahir.apache.org
> > > directmemory.apache.org
> > > eagle.apache.org
> > > hama.apache.org
> > > hawq.apache.org
> > > ibatis.apache.org
> > > marmotta.apache.org
> > > metron.apache.org
> > > mxnet.apache.org
> > > ode.apache.org
> > > polygene.apache.org
> > > reef.apache.org
> > > stdcxx.apache.org
> > > stratos.apache.org
> > > streams.apache.org
> > > tajo.apache.org
> > > trafodion.apache.org
> > > tuscany.apache.org
> > > twill.apache.org
> > > usergrid.apache.org
> > > wink.apache.org
> > >
> > > [1]
> > >
> > https://privacy.apache.org/faq/committers.html#can-i-use-google-analytics
> >

Reply via email to