I am opposed, in principle, to letting a private company have access to the IP addresses and other personally identifiable information of our users.
On Wed, Mar 3, 2021 at 8:11 PM Robert Muir <rcm...@gmail.com> wrote: > I'm not trying to come across as anti-analytics, i'm not. But I feel a lot > of those questions can be answered by the aggregate stats already provided > by apache (presumably from httpd access_log), without adding > privacy-invading-google-tracker javascripts and cookies. So, while your > answers are good, they don't justify google analytics in my eyes. > > As an example, lets look at > https://uls.apache.org/exports/lucene.apache.org.yaml and consider your > list > 1. You can see breakdown of pageviews and "visitors" by day. I don't know > how they determine unique "visitor" since it isn't cookie tracking: maybe > some combo of (IP address, TLS session ID, user agent), but whatever they > have is good enough for me. > 2. I can see most popular pages and your 6.6 ref guide stuff > 3. Top referrers gives you a rough idea of where people are coming from > (including internal referrers). So people are clicking links on those > pages. > 4. see #1. > 5. see #3. Google provides no additional magic here, this is referer (sic) > header either way. > 6. i think the download process is actually hacked up/convoluted just to > force some GA tracking. At least i know if i disable javascript, the > download buttons still work. > 7. what is missing? > > > On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> I block any analytics I can find. I am with you on the overall >> positioning. And yes, the absolute numbers lie. >> >> At the same time, we can get a lot of relative numbers and trends that >> are valuable in other ways. >> >> For example: >> 1) Are the social media announcements of new releases drive people to >> download Solr? >> 2) Which Ref Guide pages (if we had GA there) are most popular and why >> can't we convince users to use the latest version instead of 6.6 (looking >> at referrals). My specific peeve is that I think URPs page should be a lot >> more visible, I would love to see if my assumptions are true by seeing if >> people discover that page, relative to other pages. >> 3) What is the page flow on the website? Are there any pages that are >> complete invisible because of how we linked to them? Are there super >> popular pages that are completely out of date? >> 4) Do we have increase or decrease in traffic matching specific events >> 5) Is there a specific partner/agency site that is driving a lot of >> attention to Solr; can we replicate that with others? >> 6) Do we even count downloads in GA? Because GA is for HTML pages only by >> default >> 7) If any of this is valuable, but we want to pull out GA anyway, this >> would help to know what tracking information we would like from Apache >> Infra? >> >> In general, these kinds of questions are the domain of Developer >> Relationships role. Lucene/Solr project does not have one as such, which >> may explain why not many people understand the values of modern analytics >> solutions. I am offering my time to make the value of analytics concrete, >> so we are making the next decision based on reality rather than our >> collective imagination of what analytics actually does or does not. >> >> Regards, >> Alex. >> >> >> >> >> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcm...@gmail.com> wrote: >> >>> >>> >>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msoko...@gmail.com> >>> wrote: >>> >>>> Before you look, should we have a betting pool on the number of >>>> downloads/day? I will arrange for a bottle of some excellent liquid to >>>> be sent to the closest guess at the number of redirects to the mirror >>>> sites, as determined by Alexandre. Also, has it been increasing over >>>> the last year? Finally, if we can predict these trends using activity >>>> on the main apache site, maybe we don't need to track independently. >>>> >>> >>> Why do we even care? >>> >>> How many users are downloading lucene tgz from the site versus using an >>> artifact in maven repositories (via maven, gradle, etc)? How many users are >>> downloading solr tgz from the site versus using solr official image from >>> docker hub? >>> >>> I'm just asking these questions to try to understand the need for the >>> google tracking. >>> >>>