+1 I agree with this proposal.

On Sat, 30 Mar 2024, 11:34 Aritra Basu, <aritrabasu1...@gmail.com> wrote:

> +1 yeah, this sounds useful to me
>
> --
> Regards,
> Aritra Basu
>
> On Sat, Mar 30, 2024, 6:58 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > +1. All that sounds reasonable, there are precedents, ASF supports Scarf
> > officially. Would be great to have access to such telemetry data.
> >
> > On Sat, Mar 30, 2024 at 1:18 AM Kaxil Naik <kaxiln...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > I want to propose gathering telemetry for Airflow installations. As the
> > > Airflow community, we have been relying heavily on the yearly Airflow
> > > Survey and anecdotes to answer a few key questions about Airflow usage.
> > > Questions like the following:
> > >
> > >
> > >    - Which versions of Airflow are people installing/using now (i.e.
> > >    whether people have primarily made the jump from version X to
> version
> > Y)
> > >    - Which DB is used as the Metadata DB and which version e.g Pg 14?
> > >    - What Python version is being used?
> > >    - Which Executor is being used?
> > >    - Approximately how many people out there in the world are
> installing
> > >    Airflow
> > >
> > >
> > > There is a solution that should help answer these questions: Scarf [1].
> > The
> > > ASF already approves Scarf [2][3] and is already used by other ASF
> > > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
> DevLake,
> > > Skywalking as it follows GDPR and other regulations.
> > >
> > > Similar to Superset, we probably can use it as follows:
> > >
> > >
> > >    1. Install the `scarf js` npm package and bundle it in the
> Webserver.
> > >    When the package is downloaded & Airflow webserver is opened,
> metadata
> > > is
> > >    recorded to the Scarf dashboard.
> > >    2. Utilize the Scarf Gateway [6], which we can use in front of
> docker
> > >    containers. While it’s possible people go around this gateway, we
> can
> > >    probably configure and encourage most traffic to go through these
> > > gateways.
> > >
> > > While Scarf does not store any personally identifying information from
> > SDK
> > > telemetry data, it does send various bits of IP-derived information as
> > > outlined here [7]. This data should be made as transparent as possible
> by
> > > granting dashboard access to the Airflow PMC and any other relevant
> means
> > > of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter
> > > etc).
> > >
> > > The following case studies are worth reading:
> > >
> > >    1. https://about.scarf.sh/post/scarf-case-study-apache-superset
> (From
> > >    Maxime)
> > >    2.
> > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > >
> > > Similar to them, this could help in various ways that come with using
> > data
> > > for decision-making. With clear guidelines on "how to opt-out"
> > [8][9][10] &
> > > "what data is being collected" on the Airflow website, this can be
> > > beneficial to the entire community as we would be making more informed
> > > decisions.
> > >
> > > Regards,
> > > Kaxil
> > >
> > >
> > > [1] https://about.scarf.sh/
> > > [2] https://privacy.apache.org/policies/privacy-policy-public.html
> > > [3] https://privacy.apache.org/faq/committers.html
> > > [4] https://github.com/apache/superset/issues/25639
> > > [5]
> > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > [6] https://about.scarf.sh/scarf-gateway
> > > [7] https://about.scarf.sh/privacy-policy
> > > [8]
> > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > [9]
> > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > [10]
> > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > >
> >
>

Reply via email to