+1 I agree with this proposal. On Sat, 30 Mar 2024, 11:34 Aritra Basu, <aritrabasu1...@gmail.com> wrote:
> +1 yeah, this sounds useful to me > > -- > Regards, > Aritra Basu > > On Sat, Mar 30, 2024, 6:58 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > +1. All that sounds reasonable, there are precedents, ASF supports Scarf > > officially. Would be great to have access to such telemetry data. > > > > On Sat, Mar 30, 2024 at 1:18 AM Kaxil Naik <kaxiln...@apache.org> wrote: > > > > > Hi all, > > > > > > I want to propose gathering telemetry for Airflow installations. As the > > > Airflow community, we have been relying heavily on the yearly Airflow > > > Survey and anecdotes to answer a few key questions about Airflow usage. > > > Questions like the following: > > > > > > > > > - Which versions of Airflow are people installing/using now (i.e. > > > whether people have primarily made the jump from version X to > version > > Y) > > > - Which DB is used as the Metadata DB and which version e.g Pg 14? > > > - What Python version is being used? > > > - Which Executor is being used? > > > - Approximately how many people out there in the world are > installing > > > Airflow > > > > > > > > > There is a solution that should help answer these questions: Scarf [1]. > > The > > > ASF already approves Scarf [2][3] and is already used by other ASF > > > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, > DevLake, > > > Skywalking as it follows GDPR and other regulations. > > > > > > Similar to Superset, we probably can use it as follows: > > > > > > > > > 1. Install the `scarf js` npm package and bundle it in the > Webserver. > > > When the package is downloaded & Airflow webserver is opened, > metadata > > > is > > > recorded to the Scarf dashboard. > > > 2. Utilize the Scarf Gateway [6], which we can use in front of > docker > > > containers. While it’s possible people go around this gateway, we > can > > > probably configure and encourage most traffic to go through these > > > gateways. > > > > > > While Scarf does not store any personally identifying information from > > SDK > > > telemetry data, it does send various bits of IP-derived information as > > > outlined here [7]. This data should be made as transparent as possible > by > > > granting dashboard access to the Airflow PMC and any other relevant > means > > > of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter > > > etc). > > > > > > The following case studies are worth reading: > > > > > > 1. https://about.scarf.sh/post/scarf-case-study-apache-superset > (From > > > Maxime) > > > 2. > > > > > > > > > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding > > > > > > Similar to them, this could help in various ways that come with using > > data > > > for decision-making. With clear guidelines on "how to opt-out" > > [8][9][10] & > > > "what data is being collected" on the Airflow website, this can be > > > beneficial to the entire community as we would be making more informed > > > decisions. > > > > > > Regards, > > > Kaxil > > > > > > > > > [1] https://about.scarf.sh/ > > > [2] https://privacy.apache.org/policies/privacy-policy-public.html > > > [3] https://privacy.apache.org/faq/committers.html > > > [4] https://github.com/apache/superset/issues/25639 > > > [5] > > > > > > > > > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code > > > [6] https://about.scarf.sh/scarf-gateway > > > [7] https://about.scarf.sh/privacy-policy > > > [8] > > > > > > > > > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data > > > [9] > > > > > > > > > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose > > > [10] > > > > > > > > > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > > > > > >