+1, telemetry enables us to be more informed and make informed decisions.
It's clear to me that all this is all "PII" & identity free. Good to see
other communities have gotten this approved and setup at the ASF

Max

On Fri, 16 Jun 2023 at 13:51, Evan Rusackas <e...@preset.io.invalid> wrote:

> Hi all,
>
> I wanted to float a new proposal regarding gathering Superset telemetry.
> The Superset dev community is constantly asking a few key questions about
> the installations of Superset in the wild, and we’ve been looking for a
> low-cost/low-effort opportunity to gather this telemetry data. We seek to
> better address questions such as:
>
> • Which versions of Superset are people installing/using now (i.e. whether
> people have largely made the jump from version X to version Y)
> • How many people are running off of various SHAs from the repo, rather
> than official Apache releases
> • Knowing the potential fallout of security issues in older installations
> of Superset that may or may not be taking place
> • Approximately how many people out there in the world are installing
> Superset, and other related metadata.
>
> In order to address these and other questions, we’ve been looking
> at Scarf [1] as a potential solution. Scarf actuall provides many ways to
> get these sorts of telemetry metrics, but in the case of Superset, we’re
> hoping to start with a couple of small moves to make the least
> objectionable low-impact implementation possible.
>
> Namely, we’d like to do two things:
>
> 1) Install the `scarf js` npm package that you install in your package
> JSON files (e.g. Superset, and all of the superset-ui packages). When the
> package is downloaded, metadata is recorded to the Scarf dashboard.
>
> 2) Utilize the Scarf Gateway, which we can use in front of PyPi and in
> front of any docker containers. While it’s possible people go around this
> gateway, we can probably configure and encourage most traffic to go through
> these gateways
>
> While Scarf does not store any personally identifying information from SDK
> telemetry data, it does send various bits of IP-derived information as
> outlined here [2]. This data should be made as transparent as possible by
> granting dashboard access to the Superset PMC and any other relevant means
> of sharing/surfacing it that we encounter (Town Hall, Slack, etc). For what
> it’s worth, this is not the first Apache use of Scarf - Apache Skywalking
> and DolphinScheduler have been using Scarf, and DevLake and APISIX are
> getting started here as well.
>
> If all this proves positive, there are also additional inroads we can take
> toward further telemetry collection, including utilizing Scarf’s API to
> manually send telemetry information to (at install, for example), as well
> as the possibility to add “pixels" in docs/websites. But first, let’s
> tackle those first updates, and see how it pans out.
>
> In an attempt to be abundantly transparent, the PR will also include
> • Documentation on the Superset website about the packages and code's
> existence and purpose,
> • Instructions/documentation to opt out of it by either configuration or
> package removal [3].
> • Means to view the collected data wherever possible (most likely for PMC
> members to start, since it can’t be wide-open public)
>
> I’m officially seeking lazy consensus to install and test the Scarf npm
> module on `master` and begin implementing/trialing the Scarf gateway
> opportunistically, in order to start testing (and sharing) the telemetry
> Scarf gathers.
>
> PRs for both the npm approach and gateway approach are already open for
> your consideration/reference  [4, 5]
>
> Thanks,
>
> Evan Rusackas
> Preset | preset.io
> Apache Superset PMC
>
> [1] https://about.scarf.sh/
> [2] https://about.scarf.sh/package-sdks
> [3]
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> [4] https://github.com/apache/superset/pull/24433
> [5] https://github.com/apache/superset/pull/24432
>
>
>
>
>

Reply via email to