+1 yeah, this sounds useful to me -- Regards, Aritra Basu
On Sat, Mar 30, 2024, 6:58 AM Jarek Potiuk <ja...@potiuk.com> wrote: > +1. All that sounds reasonable, there are precedents, ASF supports Scarf > officially. Would be great to have access to such telemetry data. > > On Sat, Mar 30, 2024 at 1:18 AM Kaxil Naik <kaxiln...@apache.org> wrote: > > > Hi all, > > > > I want to propose gathering telemetry for Airflow installations. As the > > Airflow community, we have been relying heavily on the yearly Airflow > > Survey and anecdotes to answer a few key questions about Airflow usage. > > Questions like the following: > > > > > > - Which versions of Airflow are people installing/using now (i.e. > > whether people have primarily made the jump from version X to version > Y) > > - Which DB is used as the Metadata DB and which version e.g Pg 14? > > - What Python version is being used? > > - Which Executor is being used? > > - Approximately how many people out there in the world are installing > > Airflow > > > > > > There is a solution that should help answer these questions: Scarf [1]. > The > > ASF already approves Scarf [2][3] and is already used by other ASF > > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, DevLake, > > Skywalking as it follows GDPR and other regulations. > > > > Similar to Superset, we probably can use it as follows: > > > > > > 1. Install the `scarf js` npm package and bundle it in the Webserver. > > When the package is downloaded & Airflow webserver is opened, metadata > > is > > recorded to the Scarf dashboard. > > 2. Utilize the Scarf Gateway [6], which we can use in front of docker > > containers. While it’s possible people go around this gateway, we can > > probably configure and encourage most traffic to go through these > > gateways. > > > > While Scarf does not store any personally identifying information from > SDK > > telemetry data, it does send various bits of IP-derived information as > > outlined here [7]. This data should be made as transparent as possible by > > granting dashboard access to the Airflow PMC and any other relevant means > > of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter > > etc). > > > > The following case studies are worth reading: > > > > 1. https://about.scarf.sh/post/scarf-case-study-apache-superset (From > > Maxime) > > 2. > > > > > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding > > > > Similar to them, this could help in various ways that come with using > data > > for decision-making. With clear guidelines on "how to opt-out" > [8][9][10] & > > "what data is being collected" on the Airflow website, this can be > > beneficial to the entire community as we would be making more informed > > decisions. > > > > Regards, > > Kaxil > > > > > > [1] https://about.scarf.sh/ > > [2] https://privacy.apache.org/policies/privacy-policy-public.html > > [3] https://privacy.apache.org/faq/committers.html > > [4] https://github.com/apache/superset/issues/25639 > > [5] > > > > > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code > > [6] https://about.scarf.sh/scarf-gateway > > [7] https://about.scarf.sh/privacy-policy > > [8] > > > > > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data > > [9] > > > > > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose > > [10] > > > > > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > > >