+1, telemetry enables us to be more informed and make informed decisions. It's clear to me that all this is all "PII" & identity free. Good to see other communities have gotten this approved and setup at the ASF
Max On Fri, 16 Jun 2023 at 13:51, Evan Rusackas <e...@preset.io.invalid> wrote: > Hi all, > > I wanted to float a new proposal regarding gathering Superset telemetry. > The Superset dev community is constantly asking a few key questions about > the installations of Superset in the wild, and we’ve been looking for a > low-cost/low-effort opportunity to gather this telemetry data. We seek to > better address questions such as: > > • Which versions of Superset are people installing/using now (i.e. whether > people have largely made the jump from version X to version Y) > • How many people are running off of various SHAs from the repo, rather > than official Apache releases > • Knowing the potential fallout of security issues in older installations > of Superset that may or may not be taking place > • Approximately how many people out there in the world are installing > Superset, and other related metadata. > > In order to address these and other questions, we’ve been looking > at Scarf [1] as a potential solution. Scarf actuall provides many ways to > get these sorts of telemetry metrics, but in the case of Superset, we’re > hoping to start with a couple of small moves to make the least > objectionable low-impact implementation possible. > > Namely, we’d like to do two things: > > 1) Install the `scarf js` npm package that you install in your package > JSON files (e.g. Superset, and all of the superset-ui packages). When the > package is downloaded, metadata is recorded to the Scarf dashboard. > > 2) Utilize the Scarf Gateway, which we can use in front of PyPi and in > front of any docker containers. While it’s possible people go around this > gateway, we can probably configure and encourage most traffic to go through > these gateways > > While Scarf does not store any personally identifying information from SDK > telemetry data, it does send various bits of IP-derived information as > outlined here [2]. This data should be made as transparent as possible by > granting dashboard access to the Superset PMC and any other relevant means > of sharing/surfacing it that we encounter (Town Hall, Slack, etc). For what > it’s worth, this is not the first Apache use of Scarf - Apache Skywalking > and DolphinScheduler have been using Scarf, and DevLake and APISIX are > getting started here as well. > > If all this proves positive, there are also additional inroads we can take > toward further telemetry collection, including utilizing Scarf’s API to > manually send telemetry information to (at install, for example), as well > as the possibility to add “pixels" in docs/websites. But first, let’s > tackle those first updates, and see how it pans out. > > In an attempt to be abundantly transparent, the PR will also include > • Documentation on the Superset website about the packages and code's > existence and purpose, > • Instructions/documentation to opt out of it by either configuration or > package removal [3]. > • Means to view the collected data wherever possible (most likely for PMC > members to start, since it can’t be wide-open public) > > I’m officially seeking lazy consensus to install and test the Scarf npm > module on `master` and begin implementing/trialing the Scarf gateway > opportunistically, in order to start testing (and sharing) the telemetry > Scarf gathers. > > PRs for both the npm approach and gateway approach are already open for > your consideration/reference [4, 5] > > Thanks, > > Evan Rusackas > Preset | preset.io > Apache Superset PMC > > [1] https://about.scarf.sh/ > [2] https://about.scarf.sh/package-sdks > [3] > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > [4] https://github.com/apache/superset/pull/24433 > [5] https://github.com/apache/superset/pull/24432 > > > > >