+1 yeah, this sounds useful to me

--
Regards,
Aritra Basu

On Sat, Mar 30, 2024, 6:58 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> +1. All that sounds reasonable, there are precedents, ASF supports Scarf
> officially. Would be great to have access to such telemetry data.
>
> On Sat, Mar 30, 2024 at 1:18 AM Kaxil Naik <kaxiln...@apache.org> wrote:
>
> > Hi all,
> >
> > I want to propose gathering telemetry for Airflow installations. As the
> > Airflow community, we have been relying heavily on the yearly Airflow
> > Survey and anecdotes to answer a few key questions about Airflow usage.
> > Questions like the following:
> >
> >
> >    - Which versions of Airflow are people installing/using now (i.e.
> >    whether people have primarily made the jump from version X to version
> Y)
> >    - Which DB is used as the Metadata DB and which version e.g Pg 14?
> >    - What Python version is being used?
> >    - Which Executor is being used?
> >    - Approximately how many people out there in the world are installing
> >    Airflow
> >
> >
> > There is a solution that should help answer these questions: Scarf [1].
> The
> > ASF already approves Scarf [2][3] and is already used by other ASF
> > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, DevLake,
> > Skywalking as it follows GDPR and other regulations.
> >
> > Similar to Superset, we probably can use it as follows:
> >
> >
> >    1. Install the `scarf js` npm package and bundle it in the Webserver.
> >    When the package is downloaded & Airflow webserver is opened, metadata
> > is
> >    recorded to the Scarf dashboard.
> >    2. Utilize the Scarf Gateway [6], which we can use in front of docker
> >    containers. While it’s possible people go around this gateway, we can
> >    probably configure and encourage most traffic to go through these
> > gateways.
> >
> > While Scarf does not store any personally identifying information from
> SDK
> > telemetry data, it does send various bits of IP-derived information as
> > outlined here [7]. This data should be made as transparent as possible by
> > granting dashboard access to the Airflow PMC and any other relevant means
> > of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter
> > etc).
> >
> > The following case studies are worth reading:
> >
> >    1. https://about.scarf.sh/post/scarf-case-study-apache-superset (From
> >    Maxime)
> >    2.
> >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >
> > Similar to them, this could help in various ways that come with using
> data
> > for decision-making. With clear guidelines on "how to opt-out"
> [8][9][10] &
> > "what data is being collected" on the Airflow website, this can be
> > beneficial to the entire community as we would be making more informed
> > decisions.
> >
> > Regards,
> > Kaxil
> >
> >
> > [1] https://about.scarf.sh/
> > [2] https://privacy.apache.org/policies/privacy-policy-public.html
> > [3] https://privacy.apache.org/faq/committers.html
> > [4] https://github.com/apache/superset/issues/25639
> > [5]
> >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > [6] https://about.scarf.sh/scarf-gateway
> > [7] https://about.scarf.sh/privacy-policy
> > [8]
> >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > [9]
> >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > [10]
> >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >
>

Reply via email to