+1. All that sounds reasonable, there are precedents, ASF supports Scarf
officially. Would be great to have access to such telemetry data.

On Sat, Mar 30, 2024 at 1:18 AM Kaxil Naik <kaxiln...@apache.org> wrote:

> Hi all,
>
> I want to propose gathering telemetry for Airflow installations. As the
> Airflow community, we have been relying heavily on the yearly Airflow
> Survey and anecdotes to answer a few key questions about Airflow usage.
> Questions like the following:
>
>
>    - Which versions of Airflow are people installing/using now (i.e.
>    whether people have primarily made the jump from version X to version Y)
>    - Which DB is used as the Metadata DB and which version e.g Pg 14?
>    - What Python version is being used?
>    - Which Executor is being used?
>    - Approximately how many people out there in the world are installing
>    Airflow
>
>
> There is a solution that should help answer these questions: Scarf [1]. The
> ASF already approves Scarf [2][3] and is already used by other ASF
> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, DevLake,
> Skywalking as it follows GDPR and other regulations.
>
> Similar to Superset, we probably can use it as follows:
>
>
>    1. Install the `scarf js` npm package and bundle it in the Webserver.
>    When the package is downloaded & Airflow webserver is opened, metadata
> is
>    recorded to the Scarf dashboard.
>    2. Utilize the Scarf Gateway [6], which we can use in front of docker
>    containers. While it’s possible people go around this gateway, we can
>    probably configure and encourage most traffic to go through these
> gateways.
>
> While Scarf does not store any personally identifying information from SDK
> telemetry data, it does send various bits of IP-derived information as
> outlined here [7]. This data should be made as transparent as possible by
> granting dashboard access to the Airflow PMC and any other relevant means
> of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter
> etc).
>
> The following case studies are worth reading:
>
>    1. https://about.scarf.sh/post/scarf-case-study-apache-superset (From
>    Maxime)
>    2.
>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>
> Similar to them, this could help in various ways that come with using data
> for decision-making. With clear guidelines on "how to opt-out" [8][9][10] &
> "what data is being collected" on the Airflow website, this can be
> beneficial to the entire community as we would be making more informed
> decisions.
>
> Regards,
> Kaxil
>
>
> [1] https://about.scarf.sh/
> [2] https://privacy.apache.org/policies/privacy-policy-public.html
> [3] https://privacy.apache.org/faq/committers.html
> [4] https://github.com/apache/superset/issues/25639
> [5]
>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> [6] https://about.scarf.sh/scarf-gateway
> [7] https://about.scarf.sh/privacy-policy
> [8]
>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> [9]
>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> [10]
>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>

Reply via email to