On Sun, 12 Jan 2025 at 09:05, sebb <seb...@gmail.com> wrote: > On Sat, 11 Jan 2025 at 20:03, Niall Pemberton <niall.pember...@gmail.com> > wrote: > > > > Hi Attic Team, > > > > A number of retired projects are flagged on a monthly report (to the > > Privacy Committee) as contravening the ASF privacy policy due to their > use > > of Google Analytics (mainly)[1]. > > It's not just the analytics; fonts, images and scripts (etc.) must not > be loaded from non-permitted sites. > > > I'm starting with the assumption that it would be difficult/painful to > fix > > this now that these projects are in the Attic, but I thought I would ask > > here if there was any way to do this? > > Yes, it was difficult to prepare sites for the Attic. > In general, it's not possible (or desirable) to regenerate the website > from source. > So originally it meant the Attic needed to change every single HTML source > file. > This was a lot of work, even if some could be automated. > > The Attic banner is now added with a server filter (in Lua) that adds > the banner text to the HTML files. > > I think the first stage is to establish what effect the CSP will have > on the behaviour of each site. > In the case of tracking, that is not needed for Attic sites, so unless > the site no longer works properly, any failed fetches can just be > ignored. > > For other assets such as fonts, images and scripts, failure to load is > likely to have an adverse effect, so needs to be solved. > > In theory the server filter could be extended to change URLs to a > local copy, though it might be tricky to only change the relevant > URLs. > (Only automatically loaded resources need be changed) > I suspect it might be necessary to edit the HTML files to do this properly. > > Some scripts must be accessed from the 3rd party and cannot be > replaced with local copies. > That might mean rewriting entire pages if the script is essential to > the site working. > In the case of analytics, such references can just be removed/disabled. > > For some references it should be possible to set up a central proxy > server to fetch the resource, and change the HTML to use the proxy. > If the proxy is set up properly, the user PII is not passed on to the 3rd > party. > Or it might be possible to use server rewrites if those can be > guaranteed not to pass on any PII from the original request. > > For testing, it should be possible to set up a Docker container with a > webserver having the appropriate CSP settings. > It can also have the Lua filter for experiments with that. > The website source can then be checked out locally, and mapped into > the container on startup >
Thanks for this response Seb. So when you say "website source can then be checked out locally" does that means Attic has a copy of the deployed websites somewhere in a repo and you can modify/redeploy if need be? Or are the original project websites effectively frozen at the point of retirement and no longer modifiable? Niall > > I worked on the Lua script, and did some CSP testing, and may still > have suitable Docker scripts. > > > Below is a list of websites that are being flagged. I am willing to do > some > > work to fix this issue, so if you have any ideas on how this could be > > resolved, then I would appreciate it, > > > Thanks > > > > Niall > > > > > > apex.apache.org > > archiva.apache.org > > bahir.apache.org > > directmemory.apache.org > > eagle.apache.org > > hama.apache.org > > hawq.apache.org > > ibatis.apache.org > > marmotta.apache.org > > metron.apache.org > > mxnet.apache.org > > ode.apache.org > > polygene.apache.org > > reef.apache.org > > stdcxx.apache.org > > stratos.apache.org > > streams.apache.org > > tajo.apache.org > > trafodion.apache.org > > tuscany.apache.org > > twill.apache.org > > usergrid.apache.org > > wink.apache.org > > > > [1] > > > https://privacy.apache.org/faq/committers.html#can-i-use-google-analytics >