I'm not trying to come across as anti-analytics, i'm not. But I feel a lot of those questions can be answered by the aggregate stats already provided by apache (presumably from httpd access_log), without adding privacy-invading-google-tracker javascripts and cookies. So, while your answers are good, they don't justify google analytics in my eyes.
As an example, lets look at https://uls.apache.org/exports/lucene.apache.org.yaml and consider your list 1. You can see breakdown of pageviews and "visitors" by day. I don't know how they determine unique "visitor" since it isn't cookie tracking: maybe some combo of (IP address, TLS session ID, user agent), but whatever they have is good enough for me. 2. I can see most popular pages and your 6.6 ref guide stuff 3. Top referrers gives you a rough idea of where people are coming from (including internal referrers). So people are clicking links on those pages. 4. see #1. 5. see #3. Google provides no additional magic here, this is referer (sic) header either way. 6. i think the download process is actually hacked up/convoluted just to force some GA tracking. At least i know if i disable javascript, the download buttons still work. 7. what is missing? On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafa...@gmail.com> wrote: > I block any analytics I can find. I am with you on the overall > positioning. And yes, the absolute numbers lie. > > At the same time, we can get a lot of relative numbers and trends that are > valuable in other ways. > > For example: > 1) Are the social media announcements of new releases drive people to > download Solr? > 2) Which Ref Guide pages (if we had GA there) are most popular and why > can't we convince users to use the latest version instead of 6.6 (looking > at referrals). My specific peeve is that I think URPs page should be a lot > more visible, I would love to see if my assumptions are true by seeing if > people discover that page, relative to other pages. > 3) What is the page flow on the website? Are there any pages that are > complete invisible because of how we linked to them? Are there super > popular pages that are completely out of date? > 4) Do we have increase or decrease in traffic matching specific events > 5) Is there a specific partner/agency site that is driving a lot of > attention to Solr; can we replicate that with others? > 6) Do we even count downloads in GA? Because GA is for HTML pages only by > default > 7) If any of this is valuable, but we want to pull out GA anyway, this > would help to know what tracking information we would like from Apache > Infra? > > In general, these kinds of questions are the domain of Developer > Relationships role. Lucene/Solr project does not have one as such, which > may explain why not many people understand the values of modern analytics > solutions. I am offering my time to make the value of analytics concrete, > so we are making the next decision based on reality rather than our > collective imagination of what analytics actually does or does not. > > Regards, > Alex. > > > > > On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcm...@gmail.com> wrote: > >> >> >> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msoko...@gmail.com> >> wrote: >> >>> Before you look, should we have a betting pool on the number of >>> downloads/day? I will arrange for a bottle of some excellent liquid to >>> be sent to the closest guess at the number of redirects to the mirror >>> sites, as determined by Alexandre. Also, has it been increasing over >>> the last year? Finally, if we can predict these trends using activity >>> on the main apache site, maybe we don't need to track independently. >>> >> >> Why do we even care? >> >> How many users are downloading lucene tgz from the site versus using an >> artifact in maven repositories (via maven, gradle, etc)? How many users are >> downloading solr tgz from the site versus using solr official image from >> docker hub? >> >> I'm just asking these questions to try to understand the need for the >> google tracking. >> >>