On Sun, 12 Jan 2025 at 09:05, sebb <seb...@gmail.com> wrote:

> On Sat, 11 Jan 2025 at 20:03, Niall Pemberton <niall.pember...@gmail.com>
> wrote:
> >
> > Hi Attic Team,
> >
> > A number of retired projects are flagged on a monthly report (to the
> > Privacy Committee) as contravening the ASF privacy policy due to their
> use
> > of Google Analytics (mainly)[1].
>
> It's not just the analytics; fonts, images and scripts (etc.) must not
> be loaded from non-permitted sites.
>
> > I'm starting with the assumption that it would be difficult/painful to
> fix
> > this now that these projects are in the Attic, but I thought I would ask
> > here if there was any way to do this?
>
> Yes, it was difficult to prepare sites for the Attic.
> In general, it's not possible (or desirable) to regenerate the website
> from source.
> So originally it meant the Attic needed to change every single HTML source
> file.
> This was a lot of work, even if some could be automated.
>
> The Attic banner is now added with a server filter (in Lua) that adds
> the banner text to the HTML files.
>
> I think the first stage is to establish what effect the CSP will have
> on the behaviour of each site.
> In the case of tracking, that is not needed for Attic sites, so unless
> the site no longer works properly, any failed fetches can just be
> ignored.
>
> For other assets such as fonts, images and scripts, failure to load is
> likely to have an adverse effect, so needs to be solved.
>
> In theory the server filter could be extended to change URLs to a
> local copy, though it might be tricky to only change the relevant
> URLs.
> (Only automatically loaded resources need be changed)
> I suspect it might be necessary to edit the HTML files to do this properly.
>
> Some scripts must be accessed from the 3rd party and cannot be
> replaced with local copies.
> That might mean rewriting entire pages if the script is essential to
> the site working.
> In the case of analytics, such references can just be removed/disabled.
>
> For some references it should be possible to set up a central proxy
> server to fetch the resource, and change the HTML to use the proxy.
> If the proxy is set up properly, the user PII is not passed on to the 3rd
> party.
> Or it might be possible to use server rewrites if those can be
> guaranteed not to pass on any PII from the original request.
>
> For testing, it should be possible to set up a Docker container with a
> webserver having the appropriate CSP settings.
> It can also have the Lua filter for experiments with that.
> The website source can then be checked out locally, and mapped into
> the container on startup
>

Thanks for this response Seb.

So when you say "website source can then be checked out locally" does that
means Attic has a copy of the deployed websites somewhere in a repo and you
can modify/redeploy if need be? Or are the original project websites
effectively frozen at the point of retirement and no longer modifiable?

Niall



>
> I worked on the Lua script, and did some CSP testing, and may still
> have suitable Docker scripts.
>
> > Below is a list of websites that are being flagged. I am willing to do
> some
> > work to fix this issue, so if you have any ideas on how this could be
> > resolved, then I would appreciate it,
>
> > Thanks
> >
> > Niall
> >
> >
> > apex.apache.org
> > archiva.apache.org
> > bahir.apache.org
> > directmemory.apache.org
> > eagle.apache.org
> > hama.apache.org
> > hawq.apache.org
> > ibatis.apache.org
> > marmotta.apache.org
> > metron.apache.org
> > mxnet.apache.org
> > ode.apache.org
> > polygene.apache.org
> > reef.apache.org
> > stdcxx.apache.org
> > stratos.apache.org
> > streams.apache.org
> > tajo.apache.org
> > trafodion.apache.org
> > tuscany.apache.org
> > twill.apache.org
> > usergrid.apache.org
> > wink.apache.org
> >
> > [1]
> >
> https://privacy.apache.org/faq/committers.html#can-i-use-google-analytics
>

Reply via email to