[
https://issues.apache.org/jira/browse/PIG-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943956#comment-17943956
]
Rohini Palaniswamy edited comment on PIG-5468 at 4/13/25 10:09 PM:
-------------------------------------------------------------------
[~niallp],
I did remove the external svn link and copied locally. That is committed now.
{code:java}
svn propdel svn:externals author/src/documentation
rm -rf author/src/documentation/skins
svn copy
https://svn.apache.org/repos/asf/hadoop/common/site/main/author/src/documentation/skins
author/src/documentation/skins
svn commit -m "PIG-5468: Remove svn external link to hadoop skins and copy
locally"
{code}
Tried to apply the pig-ga.patch. But it was generated assuming the skins
directory was nonexistent. So applied separately and tried to diff and
reconcile the changes to site-to-xhtml.xsl. But I could not find anything
related to removing Google Analytics in site-to-xhtml.xsl in the patch and it
still has the reference. Found two other xml files with references to Google,
but they were not part of the patch. Can you take a look ?
{code}
find . -type f -not -path '*/\.*' | grep -v "./publish" | xargs grep -i google
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
ga.src = ('https:' == document.location.protocol ? 'https://ssl' :
'http://www') + '.google-analytics.com/ga.js';
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
<form class="roundtopsmall" method="get"
action="http://www.google.com/search">
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
<form method="get" action="http://www.google.com/search">
./author/src/documentation/skinconf.xml: <!-- To enable lucene search add
provider="lucene" (default is google).
./author/src/documentation/skinconf.xml: no search box. @domain will enable
sitesearch for the specific domain with google.
./author/src/documentation/skinconf.xml: In other words google will search
the @domain for the query string.
./author/src/documentation/skinconf.xml: <search domain="pig.apache.org"
provider="google"/>
./author/src/documentation/content/xdocs/privacypolicy.xml:<a
href="http://www.google.com/analytics/">Google Analytics</a>
./author/src/documentation/content/xdocs/privacypolicy.xml:service and handled
by Google as
./author/src/documentation/content/xdocs/privacypolicy.xml:described in their
<a href="http://www.google.com/privacy.html">privacy policy</a>.
./author/src/documentation/content/xdocs/privacypolicy.xml:cookie if you prefer
not to share this data with Google.</p>
grep -i google pig-ga.patch
+ ga.src = ('https:' == document.location.protocol ? 'https://ssl' :
'http://www') + '.google-analytics.com/ga.js';
+ <form class="roundtopsmall" method="get"
action="http://www.google.com/search">
+ <form method="get" action="http://www.google.com/search">
{code}
was (Author: rohini):
[~niallp],
I did remove the external svn link and copied locally. That is committed now.
{code:java}
svn propdel svn:externals author/src/documentation
rm -rf author/src/documentation/skins
svn copy
https://svn.apache.org/repos/asf/hadoop/common/site/main/author/src/documentation/skins
author/src/documentation/skins
svn commit -m "PIG-5468: Remove svn external link to hadoop skins and copy
locally"
{code}
Tried to apply the pig-ga.patch. But it was generated assuming the skins
directory was nonexistent. So applied separately and tried to diff and
reconcile the changes or site-to-xhtml.xsl. But I could not find anything
related to removing Google Analytics in site-to-xhtml.xsl. Found two other xml
files with references to Google, but they were not part of the patch. Can you
take a look ?
{code}
find . -type f -not -path '*/\.*' | grep -v "./publish" | xargs grep -i google
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
ga.src = ('https:' == document.location.protocol ? 'https://ssl' :
'http://www') + '.google-analytics.com/ga.js';
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
<form class="roundtopsmall" method="get"
action="http://www.google.com/search">
./author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl:
<form method="get" action="http://www.google.com/search">
./author/src/documentation/skinconf.xml: <!-- To enable lucene search add
provider="lucene" (default is google).
./author/src/documentation/skinconf.xml: no search box. @domain will enable
sitesearch for the specific domain with google.
./author/src/documentation/skinconf.xml: In other words google will search
the @domain for the query string.
./author/src/documentation/skinconf.xml: <search domain="pig.apache.org"
provider="google"/>
./author/src/documentation/content/xdocs/privacypolicy.xml:<a
href="http://www.google.com/analytics/">Google Analytics</a>
./author/src/documentation/content/xdocs/privacypolicy.xml:service and handled
by Google as
./author/src/documentation/content/xdocs/privacypolicy.xml:described in their
<a href="http://www.google.com/privacy.html">privacy policy</a>.
./author/src/documentation/content/xdocs/privacypolicy.xml:cookie if you prefer
not to share this data with Google.</p>
{code}
> Remove Google Analytics from the Pig Website
> --------------------------------------------
>
> Key: PIG-5468
> URL: https://issues.apache.org/jira/browse/PIG-5468
> Project: Pig
> Issue Type: Task
> Reporter: Niall Pemberton
> Assignee: Niall Pemberton
> Priority: Major
> Attachments: pig-ga.patch
>
>
> Hi Pig Team
> The ASF {_}*Privacy Policy*{_}[1][2] does not permit the use of _*Google
> Analytics*_ on any ASF websites and the ASF Infra team will soon enforce a
> {_}*Content Security Policy*{_}(CSP) that will block access to external
> trackers:
> * [https://lists.apache.org/thread/w34sd92v4rz3j28hyddmt5tbprbdq6lc]
> Please could you remove the use of the Google Analytics from the Pig website?
> * [https://lists.apache.org/thread/4I
> 17v5034773lhoytoqs7h343vpjnwn1c|https://lists.apache.org/thread/417v5034773lhoytoqs7h343vpjnwn1c]
> I would have submitted a patch to remove Google Analytics, but it seems that
> Pig is pulling in file _*site-to-xhtml.xsl*_ from the old Hadoop Subversion
> repository, via an svn:external link
> *
> [https://svn.apache.org/viewvc/hadoop/common/site/main/author/src/documentation/skins/hadoop-pelt/xslt/html/site-to-xhtml.xsl?view=markup]
> * [https://svn.apache.org/viewvc/pig/site/author/src/documentation/]
> Since Hadoop has now moved to GitHub, then it would probably be a good idea
> to copy that folder (as its not longer maintained) into your site and
> maintain it within the Pig project - or ask the hadoop project to modify it
>
> The ASF hosts its own _*Matomo*_ instance to provide projects with analytics
> and you can request a tracking id for your project by sending a mail to
> *privacy AT apache.org.*
> *
> [https://privacy.apache.org/faq/committers.html#can-i-use-web-analytics-matomo]
> Additionally I would recommend reviewing any external resources loaded by
> your website. The Content Security Policy will prevent any resources being
> loaded from 3rd Party providers that the ASF does not have a Data Processing
> Agreement (DPA) with. On the 1st February Infra will begin a temporary
> "brownout" when the CSP will be turned on for a short period. This will allow
> projects to check which parts, if any, of their websites will stop working.
> The Privacy FAQ answers a number of questions about which external providers
> are permitted or not:
> * [https://privacy.apache.org/faq/committers.html]
> Thanks
> Niall
> [1] [https://privacy.apache.org/policies/website-policy.html]
> [2]
> [https://privacy.apache.org/faq/committers.html#can-i-use-google-analytics]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)