Hey all,

getting more intel on the current installation base would be very welcome
for understanding where the community is feature and security wise. It's
great to hear other Apache projects have pre-validated this approach, but
let's make sure we're very transparent about what we're collecting and how
admins can control this.

Also, personally, the thing that I'd be most interested to understand is
which databases people are using. Having this info would make it much
easier for the active devs to prioritize db connectivity issues. If that's
something we could bake into this proposal at some point that'd be awesome!

Ville

On Fri 16. Jun 2023 at 23.51 Evan Rusackas <e...@preset.io.invalid> wrote:

> Hi all,
>
> I wanted to float a new proposal regarding gathering Superset telemetry.
> The Superset dev community is constantly asking a few key questions about
> the installations of Superset in the wild, and we’ve been looking for a
> low-cost/low-effort opportunity to gather this telemetry data. We seek to
> better address questions such as:
>
> • Which versions of Superset are people installing/using now (i.e. whether
> people have largely made the jump from version X to version Y)
> • How many people are running off of various SHAs from the repo, rather
> than official Apache releases
> • Knowing the potential fallout of security issues in older installations
> of Superset that may or may not be taking place
> • Approximately how many people out there in the world are installing
> Superset, and other related metadata.
>
> In order to address these and other questions, we’ve been looking
> at Scarf [1] as a potential solution. Scarf actuall provides many ways to
> get these sorts of telemetry metrics, but in the case of Superset, we’re
> hoping to start with a couple of small moves to make the least
> objectionable low-impact implementation possible.
>
> Namely, we’d like to do two things:
>
> 1) Install the `scarf js` npm package that you install in your package
> JSON files (e.g. Superset, and all of the superset-ui packages). When the
> package is downloaded, metadata is recorded to the Scarf dashboard.
>
> 2) Utilize the Scarf Gateway, which we can use in front of PyPi and in
> front of any docker containers. While it’s possible people go around this
> gateway, we can probably configure and encourage most traffic to go through
> these gateways
>
> While Scarf does not store any personally identifying information from SDK
> telemetry data, it does send various bits of IP-derived information as
> outlined here [2]. This data should be made as transparent as possible by
> granting dashboard access to the Superset PMC and any other relevant means
> of sharing/surfacing it that we encounter (Town Hall, Slack, etc). For what
> it’s worth, this is not the first Apache use of Scarf - Apache Skywalking
> and DolphinScheduler have been using Scarf, and DevLake and APISIX are
> getting started here as well.
>
> If all this proves positive, there are also additional inroads we can take
> toward further telemetry collection, including utilizing Scarf’s API to
> manually send telemetry information to (at install, for example), as well
> as the possibility to add “pixels" in docs/websites. But first, let’s
> tackle those first updates, and see how it pans out.
>
> In an attempt to be abundantly transparent, the PR will also include
> • Documentation on the Superset website about the packages and code's
> existence and purpose,
> • Instructions/documentation to opt out of it by either configuration or
> package removal [3].
> • Means to view the collected data wherever possible (most likely for PMC
> members to start, since it can’t be wide-open public)
>
> I’m officially seeking lazy consensus to install and test the Scarf npm
> module on `master` and begin implementing/trialing the Scarf gateway
> opportunistically, in order to start testing (and sharing) the telemetry
> Scarf gathers.
>
> PRs for both the npm approach and gateway approach are already open for
> your consideration/reference  [4, 5]
>
> Thanks,
>
> Evan Rusackas
> Preset | preset.io
> Apache Superset PMC
>
> [1] https://about.scarf.sh/
> [2] https://about.scarf.sh/package-sdks
> [3]
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> [4] https://github.com/apache/superset/pull/24433
> [5] https://github.com/apache/superset/pull/24432
>
>
>
>
>

Reply via email to