+1 On Tue, Jun 20, 2023, 22:42 Ville Brofeldt <ville.v.brofe...@gmail.com> wrote:
> Hey all, > > getting more intel on the current installation base would be very welcome > for understanding where the community is feature and security wise. It's > great to hear other Apache projects have pre-validated this approach, but > let's make sure we're very transparent about what we're collecting and how > admins can control this. > > Also, personally, the thing that I'd be most interested to understand is > which databases people are using. Having this info would make it much > easier for the active devs to prioritize db connectivity issues. If that's > something we could bake into this proposal at some point that'd be awesome! > > Ville > > On Fri 16. Jun 2023 at 23.51 Evan Rusackas <e...@preset.io.invalid> wrote: > > > Hi all, > > > > I wanted to float a new proposal regarding gathering Superset telemetry. > > The Superset dev community is constantly asking a few key questions about > > the installations of Superset in the wild, and we’ve been looking for a > > low-cost/low-effort opportunity to gather this telemetry data. We seek to > > better address questions such as: > > > > • Which versions of Superset are people installing/using now (i.e. > whether > > people have largely made the jump from version X to version Y) > > • How many people are running off of various SHAs from the repo, rather > > than official Apache releases > > • Knowing the potential fallout of security issues in older installations > > of Superset that may or may not be taking place > > • Approximately how many people out there in the world are installing > > Superset, and other related metadata. > > > > In order to address these and other questions, we’ve been looking > > at Scarf [1] as a potential solution. Scarf actuall provides many ways to > > get these sorts of telemetry metrics, but in the case of Superset, we’re > > hoping to start with a couple of small moves to make the least > > objectionable low-impact implementation possible. > > > > Namely, we’d like to do two things: > > > > 1) Install the `scarf js` npm package that you install in your package > > JSON files (e.g. Superset, and all of the superset-ui packages). When the > > package is downloaded, metadata is recorded to the Scarf dashboard. > > > > 2) Utilize the Scarf Gateway, which we can use in front of PyPi and in > > front of any docker containers. While it’s possible people go around this > > gateway, we can probably configure and encourage most traffic to go > through > > these gateways > > > > While Scarf does not store any personally identifying information from > SDK > > telemetry data, it does send various bits of IP-derived information as > > outlined here [2]. This data should be made as transparent as possible by > > granting dashboard access to the Superset PMC and any other relevant > means > > of sharing/surfacing it that we encounter (Town Hall, Slack, etc). For > what > > it’s worth, this is not the first Apache use of Scarf - Apache Skywalking > > and DolphinScheduler have been using Scarf, and DevLake and APISIX are > > getting started here as well. > > > > If all this proves positive, there are also additional inroads we can > take > > toward further telemetry collection, including utilizing Scarf’s API to > > manually send telemetry information to (at install, for example), as well > > as the possibility to add “pixels" in docs/websites. But first, let’s > > tackle those first updates, and see how it pans out. > > > > In an attempt to be abundantly transparent, the PR will also include > > • Documentation on the Superset website about the packages and code's > > existence and purpose, > > • Instructions/documentation to opt out of it by either configuration or > > package removal [3]. > > • Means to view the collected data wherever possible (most likely for PMC > > members to start, since it can’t be wide-open public) > > > > I’m officially seeking lazy consensus to install and test the Scarf npm > > module on `master` and begin implementing/trialing the Scarf gateway > > opportunistically, in order to start testing (and sharing) the telemetry > > Scarf gathers. > > > > PRs for both the npm approach and gateway approach are already open for > > your consideration/reference [4, 5] > > > > Thanks, > > > > Evan Rusackas > > Preset | preset.io > > Apache Superset PMC > > > > [1] https://about.scarf.sh/ > > [2] https://about.scarf.sh/package-sdks > > [3] > > > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > > [4] https://github.com/apache/superset/pull/24433 > > [5] https://github.com/apache/superset/pull/24432 > > > > > > > > > > >