+1

On Tue, Jun 20, 2023, 22:42 Ville Brofeldt <ville.v.brofe...@gmail.com>
wrote:

> Hey all,
>
> getting more intel on the current installation base would be very welcome
> for understanding where the community is feature and security wise. It's
> great to hear other Apache projects have pre-validated this approach, but
> let's make sure we're very transparent about what we're collecting and how
> admins can control this.
>
> Also, personally, the thing that I'd be most interested to understand is
> which databases people are using. Having this info would make it much
> easier for the active devs to prioritize db connectivity issues. If that's
> something we could bake into this proposal at some point that'd be awesome!
>
> Ville
>
> On Fri 16. Jun 2023 at 23.51 Evan Rusackas <e...@preset.io.invalid> wrote:
>
> > Hi all,
> >
> > I wanted to float a new proposal regarding gathering Superset telemetry.
> > The Superset dev community is constantly asking a few key questions about
> > the installations of Superset in the wild, and we’ve been looking for a
> > low-cost/low-effort opportunity to gather this telemetry data. We seek to
> > better address questions such as:
> >
> > • Which versions of Superset are people installing/using now (i.e.
> whether
> > people have largely made the jump from version X to version Y)
> > • How many people are running off of various SHAs from the repo, rather
> > than official Apache releases
> > • Knowing the potential fallout of security issues in older installations
> > of Superset that may or may not be taking place
> > • Approximately how many people out there in the world are installing
> > Superset, and other related metadata.
> >
> > In order to address these and other questions, we’ve been looking
> > at Scarf [1] as a potential solution. Scarf actuall provides many ways to
> > get these sorts of telemetry metrics, but in the case of Superset, we’re
> > hoping to start with a couple of small moves to make the least
> > objectionable low-impact implementation possible.
> >
> > Namely, we’d like to do two things:
> >
> > 1) Install the `scarf js` npm package that you install in your package
> > JSON files (e.g. Superset, and all of the superset-ui packages). When the
> > package is downloaded, metadata is recorded to the Scarf dashboard.
> >
> > 2) Utilize the Scarf Gateway, which we can use in front of PyPi and in
> > front of any docker containers. While it’s possible people go around this
> > gateway, we can probably configure and encourage most traffic to go
> through
> > these gateways
> >
> > While Scarf does not store any personally identifying information from
> SDK
> > telemetry data, it does send various bits of IP-derived information as
> > outlined here [2]. This data should be made as transparent as possible by
> > granting dashboard access to the Superset PMC and any other relevant
> means
> > of sharing/surfacing it that we encounter (Town Hall, Slack, etc). For
> what
> > it’s worth, this is not the first Apache use of Scarf - Apache Skywalking
> > and DolphinScheduler have been using Scarf, and DevLake and APISIX are
> > getting started here as well.
> >
> > If all this proves positive, there are also additional inroads we can
> take
> > toward further telemetry collection, including utilizing Scarf’s API to
> > manually send telemetry information to (at install, for example), as well
> > as the possibility to add “pixels" in docs/websites. But first, let’s
> > tackle those first updates, and see how it pans out.
> >
> > In an attempt to be abundantly transparent, the PR will also include
> > • Documentation on the Superset website about the packages and code's
> > existence and purpose,
> > • Instructions/documentation to opt out of it by either configuration or
> > package removal [3].
> > • Means to view the collected data wherever possible (most likely for PMC
> > members to start, since it can’t be wide-open public)
> >
> > I’m officially seeking lazy consensus to install and test the Scarf npm
> > module on `master` and begin implementing/trialing the Scarf gateway
> > opportunistically, in order to start testing (and sharing) the telemetry
> > Scarf gathers.
> >
> > PRs for both the npm approach and gateway approach are already open for
> > your consideration/reference  [4, 5]
> >
> > Thanks,
> >
> > Evan Rusackas
> > Preset | preset.io
> > Apache Superset PMC
> >
> > [1] https://about.scarf.sh/
> > [2] https://about.scarf.sh/package-sdks
> > [3]
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > [4] https://github.com/apache/superset/pull/24433
> > [5] https://github.com/apache/superset/pull/24432
> >
> >
> >
> >
> >
>

Reply via email to