> I'd like to propose, that we start with collecting simple data with limited access: to all the PMC members. We can always expand it to Committers and then expand further to make it invite-only or setup exporting it to a DB like Postgres <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly viewable dashboard.
Looks like a good plan; we can discuss the export format when we decide to do it. On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > Yup, exactly. > > I believe this would definitely help us take early and informed decisions. >> E.g. Had we had this earlier, I believe it would have definitely helped us >> more for our past discussions like whether we should continue supporting >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4), >> similarly about the DaskExecutor ( >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc. >> > > > Btw clarifying my own stance on the below; and let me know what you think > @Hussein > Awala <huss...@awala.fr> : I'd like to propose, that we start with > collecting simple data with limited access: to all the PMC members. We can > always expand it to Committers and then expand further to make it > invite-only or setup exporting it to a DB like Postgres > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly > viewable dashboard. It would be similar to an iterative software > development approach, since this will be the first time for us, as Airflow > PMC, to add such telemetry. This is of course just my opinion though :) > > Regarding the data, like I had mentioned in the email and I am glad others >> including you are on the same page that the data will be shared with all >> PMC members. The point about sharing it via website and newsletter was for >> the community — Airflow users. I don’t think anyone in the community (apart >> from the PMC members) would need raw data. And even if they need it, I’d >> say they should put effort and contribute to the Airflow project and become >> PMC members. >> To be clear: this telemetry data should help us, as Airflow PMC, to steer >> some of the decision making based on this data similar to how only PMC has >> a binding vote on the releases. [1] and this is similar to how Apache >> Superset does it too. >> [1] >> https://www.apache.org/dev/pmc.html#what-is-a-pmc > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io.invalid> > wrote: > >> +1 to introduce this. >> >> I believe this would definitely help us take early and informed decisions. >> E.g. Had we had this earlier, I believe it would have definitely helped us >> more for our past discussions like whether we should continue supporting >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4), >> similarly about the DaskExecutor ( >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc. >> >> >> Best regards, >> >> *Pankaj Koti* >> Senior Software Engineer (Airflow OSS Engineering team) >> Location: Pune, Maharashtra, India >> Timezone: Indian Standard Time (IST) >> Phone: +91 9730079985 >> >> >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >> >> > Yup, I had added a link to scarf docs in the original email that >> referenced >> > opting out and we should even add an Airflow config that puts all >> config in >> > a single place. Without it we can’t be compliant to all the policies >> even >> > if we collectively ignore or are unaware of the importance of it. >> > >> > Regarding the data, like I had mentioned in the email and I am glad >> others >> > including you are on the same page that the data will be shared with all >> > PMC members. The point about sharing it via website and newsletter was >> for >> > the community — Airflow users. I don’t think anyone in the community >> (apart >> > from the PMC members) would need raw data. And even if they need it, I’d >> > say they should put effort and contribute to the Airflow project and >> become >> > PMC members. >> > >> > To be clear: this telemetry data should help us, as Airflow PMC, to >> steer >> > some of the decision making based on this data similar to how only PMC >> has >> > a binding vote on the releases. [1] and this is similar to how Apache >> > Superset does it too. >> > >> > [1] >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc >> > >> > >> > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> wrote: >> > >> > > I mentioned opting out just to confirm its importance, and after >> checking >> > > the Scarf documentation it appears to be supported natively by Scarf. >> For >> > > data accessibility, my point was more about raw data, not just >> aggregated >> > > information/insights shared via monthly newsletters, as we do for >> Airflow >> > > annual Survey for example: >> > > https://airflow.apache.org/survey vs >> > > >> > > >> > >> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics >> > > . >> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com> >> wrote: >> > > >> > > > Agreed to both your points Hussein but both the points are already >> > > covered >> > > > in my original discussion post - both about opting out and providing >> > data >> > > > to all the PMC members and providing visibility via Monthly >> > newsletters. >> > > Is >> > > > there anything else you propose to discuss that isn’t covered? >> > > > >> > > > >> > > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr> >> wrote: >> > > > >> > > > > +1 for the idea in general, but there are two main points to >> discuss >> > > > before >> > > > > voting on this: >> > > > > >> > > > > 1. We should provide an option to disable Scarf: >> > > > > As Airflow is not a paid product, we cannot force companies to >> report >> > > > their >> > > > > use of this project. Otherwise, some may choose to create their >> own >> > > fork >> > > > > just to disable Scarf. >> > > > > >> > > > > 2. Concerning the exclusivity of access to data: >> > > > > The data collected must either be completely proprietary for use >> by >> > PMC >> > > > and >> > > > > ASF, or completely open. Since many companies offer Airflow as a >> > > product, >> > > > > it is imperative not to give one company more privileges than >> > others. I >> > > > > raise this point for the principle of equality of opportunity. >> > > > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia < >> sunank...@gmail.com >> > > >> > > > > wrote: >> > > > > >> > > > > > Big +1 for Scarf. >> > > > > > >> > > > > > Transparency is key, so it's important to be super clear about >> > opting >> > > > > > out and what's tracked to avoid spooking anyone about IP stuff. >> > > > > > >> > > > > > Regards >> > > > > > Ankit Chaurasia >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai < >> > > amoghdesai....@gmail.com> >> > > > > > wrote: >> > > > > > > >> > > > > > > +1 looks like a good tool which could be super helpful. >> > > > > > > >> > > > > > > * We should have some transparency into the data that is >> > collected >> > > or >> > > > > > sent >> > > > > > > * We should have an option to optionally opt-out >> > > > > > > >> > > > > > > Thanks & Regards, >> > > > > > > Amogh Desai >> > > > > > > >> > > > > > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <weilee...@gmail.com> >> > > wrote: >> > > > > > > >> > > > > > > > +1 to this. It would be really useful. As long as we can opt >> > > out, I >> > > > > > think >> > > > > > > > we’re good. >> > > > > > > > >> > > > > > > > Best, >> > > > > > > > Wei >> > > > > > > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik < >> > kaxiln...@gmail.com> >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > > Grammar Correction: >> > > > > > > > > >> > > > > > > > > We should assume that those who deploy and upgrade >> Airflow - >> > > > > actually >> > > > > > > > read >> > > > > > > > >> and take into account what is written in the release >> notes - >> > > > > > especially >> > > > > > > > if >> > > > > > > > >> they have security guys breathing their necks, similarly >> as >> > we >> > > > > have >> > > > > > to >> > > > > > > > >> assume they follow CVE announcements about security >> issues >> > > > fixed. >> > > > > > If we >> > > > > > > > >> are very straightforward and out-going about the change, >> > > inform >> > > > > very >> > > > > > > > >> clearly how to opt-out, I don't see a big problem with >> > > opt-out. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > I couldn't agree more; even though we shouldn't collect >> any >> > > data >> > > > > that >> > > > > > > > > hamper security (and we should aim to do the same), most >> > > security >> > > > > > > > concerned >> > > > > > > > > folks don't just upgrade, and we can rely on them >> regarding >> > > > release >> > > > > > notes >> > > > > > > > > or announcements and we can make it very clear in our >> > > > announcements >> > > > > > too; >> > > > > > > > > and in our installation guides. >> > > > > > > > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik < >> > kaxiln...@gmail.com> >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> Grammar crrection: >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik < >> > kaxiln...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > > >> >> > > > > > > > >>> Have this at the end of the email too: but if folks >> don't >> > > read >> > > > > > until >> > > > > > > > the >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]: >> > > > > > > > >>> >> > > > > > > > >>> "I think people often ask ‘how do I contribute to open >> > > > source?’, >> > > > > > ‘I've >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an >> > engineer.’ >> > > > > > Actually, >> > > > > > > > the >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my >> > > > organization >> > > > > > gets >> > > > > > > > real >> > > > > > > > >>> value from this piece of software.’ There are a bunch of >> > ways >> > > > to >> > > > > > let >> > > > > > > > the >> > > > > > > > >>> people know about it – and now Scarf is there. If your >> > > > > > organization is >> > > > > > > > >>> getting a lot of value from a piece of open source >> > software, >> > > > make >> > > > > > sure >> > > > > > > > the >> > > > > > > > >>> devs know about it." >> > > > > > > > >>> >> > > > > > > > >>> What kind of edge cases are you thinking about? I don't >> > think >> > > > it >> > > > > > makes >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to collect >> > data >> > > > for >> > > > > > most >> > > > > > > > >>> Airflow installations except for those that don't want >> to >> > > give >> > > > > > data, >> > > > > > > > then >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as we >> > don't >> > > > > > collect >> > > > > > > > any >> > > > > > > > >>> PII data, this is in-compliance as well. >> > > > > > > > >>> >> > > > > > > > >>> Imagine someone learning Airflow, if they have to opt-in >> > via >> > > a >> > > > > > config, >> > > > > > > > >>> they wouldn't even know or care about it, hence us >> losing >> > > most >> > > > of >> > > > > > the >> > > > > > > > data. >> > > > > > > > >>> I understand why some orgs & individuals may want to >> > opt-out. >> > > > > > > > >>> >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML >> image >> > > tag) >> > > > > > that you >> > > > > > > > >>> can place in your website or product to track visitors >> to >> > > that >> > > > > > URL. If >> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't have >> > > > approved >> > > > > > it >> > > > > > > > at all. >> > > > > > > > >>> >> > > > > > > > >>> A few key details to note about the pixel: >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> - No PII is tracked… Scarf does not capture/retain IP >> > > > > > information… >> > > > > > > > >>> this information is discarded by the platform upon >> > > > > > > > processing/aggregating >> > > > > > > > >>> - Scarf pixels respect the Do Not Track (DNT) >> settings of >> > > > > > browsers - >> > > > > > > > >>> these users will not be tracked whatsoever. >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> All the ASF projects I had listed (whether they use >> Scarf >> > > > gateway >> > > > > > or >> > > > > > > > >>> Scarf pixel in product) are using opt-out. >> > > > > > > > >>> >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this feature >> > with >> > > > > > users who >> > > > > > > > >>>> trust and if it works great - make it public. I think >> it's >> > > > wise >> > > > > to >> > > > > > > > handle >> > > > > > > > >>>> edge cases and configure collected data more >> accurately. >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> It would be a pixel in the webserver, should affect >> nothing >> > > at >> > > > > all >> > > > > > even >> > > > > > > > >>> in an air-gapped environment. >> > > > > > > > >>> >> > > > > > > > >>>> 2. It should not affect anything if access to the >> internet >> > > is >> > > > > > > > restricted >> > > > > > > > >>>> which is default for many companies. >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> 100% agreed on the below: >> > > > > > > > >>> >> > > > > > > > >>>> I think we have a very good blueprint to follow >> including >> > at >> > > > > > least 5 >> > > > > > > > >>>> other >> > > > > > > > >>>> ASF projects that also passed the review of the >> > privacy@asf. >> > > > > And >> > > > > > > > while I >> > > > > > > > >>>> understand (and concur) the urge for opt-in by default >> > > coming >> > > > > from >> > > > > > > > >>>> consumer >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not a >> > > > consumer >> > > > > > > > >>>> software and is used in "corporate environment" which >> has >> > a >> > > > > little >> > > > > > > > >>>> different expectations and broad assumption that the >> > company >> > > > can >> > > > > > make >> > > > > > > > >>>> decisions on such telemetry on behalf of the employees >> > using >> > > > it. >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> Couldn't agree more; even though there shouldn't we >> collect >> > > > > hamper >> > > > > > > > >>> security (and we should aim to do the same), most >> security >> > > > > > concerned >> > > > > > > > folks >> > > > > > > > >>> don't just >> > > > > > > > >>> upgrade, and we can rely on them regarding release >> notes or >> > > > > > > > announcements >> > > > > > > > >>> and we can make it very clear in our announcements too; >> and >> > > in >> > > > > our >> > > > > > > > >>> installation guides. >> > > > > > > > >>> >> > > > > > > > >>> We should assume that those who deploy and upgrade >> Airflow >> > - >> > > > > > actually >> > > > > > > > read >> > > > > > > > >>>> and take into account what is written in the release >> > notes - >> > > > > > > > especially >> > > > > > > > >>>> if >> > > > > > > > >>>> they have security guys breathing their necks, >> similarly >> > as >> > > we >> > > > > > have to >> > > > > > > > >>>> assume they follow CVE announcements about security >> issues >> > > > > fixed. >> > > > > > If >> > > > > > > > we >> > > > > > > > >>>> are very straightforward and out-going about the >> change, >> > > > inform >> > > > > > very >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem with >> > > > opt-out. >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> To be clear, the collection of data, or at least the >> data >> > we >> > > > > should >> > > > > > > > >>> gather here should help all the consumers without >> violating >> > > > > > anything >> > > > > > > > >>> regulations. I will quote Maxime's quote in the use-case >> > doc >> > > > [1] >> > > > > > > > >>> >> > > > > > > > >>> "*Another Form of Contributing* >> > > > > > > > >>> “I think people often ask ‘how do I contribute to open >> > > > source?’, >> > > > > > ‘I've >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an >> > engineer.’ >> > > > > > Actually, >> > > > > > > > the >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my >> > > > organization >> > > > > > gets >> > > > > > > > real >> > > > > > > > >>> value from this piece of software.’ There are a bunch of >> > ways >> > > > to >> > > > > > let >> > > > > > > > the >> > > > > > > > >>> people know about it – and now Scarf is there. If your >> > > > > > organization is >> > > > > > > > >>> getting a lot of value from a piece of open source >> > software, >> > > > make >> > > > > > sure >> > > > > > > > the >> > > > > > > > >>> devs know about it.”" >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> [1] >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset >> > > > > > > > >>> >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin < >> > > > > kxe...@apache.org> >> > > > > > > > wrote: >> > > > > > > > >>> >> > > > > > > > >>>> Hi Jarek! >> > > > > > > > >>>> >> > > > > > > > >>>> I understand the reasons for opt-out from a project >> view. >> > I >> > > > just >> > > > > > > > suddenly >> > > > > > > > >>>> imagined the situation when an upgrade happens and here >> > > comes >> > > > > the >> > > > > > > > data to >> > > > > > > > >>>> some third party service - that's a view from a user >> side >> > of >> > > > > some >> > > > > > big >> > > > > > > > >>>> company. >> > > > > > > > >>>> >> > > > > > > > >>>> There could be good alternatives to handle this: >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this >> feature >> > > with >> > > > > > users >> > > > > > > > who >> > > > > > > > >>>> trust and if it works great - make it public. I think >> it's >> > > > wise >> > > > > to >> > > > > > > > handle >> > > > > > > > >>>> edge cases and configure collected data more >> accurately. >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make >> this >> > > > > > feature not >> > > > > > > > >>>> get >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration. >> > > > > > > > >>>> >> > > > > > > > >>>> Just a personal thoughts for discussion (: >> > > > > > > > >>>> >> > > > > > > > >>>> -- >> > > > > > > > >>>> ,,,^..^,,, >> > > > > > > > >>>> >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk < >> > > > ja...@potiuk.com> >> > > > > > > > wrote: >> > > > > > > > >>>> >> > > > > > > > >>>>> Hello everyone, >> > > > > > > > >>>>> >> > > > > > > > >>>>> it has to be: >> > > > > > > > >>>>> >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys >> about >> > new >> > > > > > unplanned >> > > > > > > > >>>>>> activity after regular upgrade. >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>>> That's a very good point about security triggering >> > > Alexander, >> > > > > > but I >> > > > > > > > am >> > > > > > > > >>>> not >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There >> are >> > > other >> > > > > > ways of >> > > > > > > > >>>>> communicating with the "deployment managers" who >> install >> > > and >> > > > > > upgrade >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of >> > ours, >> > > > > slack >> > > > > > > > >>>>> announcements etc. We have plenty of channels we can >> use >> > to >> > > > > > > > >>>> communicate the >> > > > > > > > >>>>> change. >> > > > > > > > >>>>> >> > > > > > > > >>>>> I think we have a very good blueprint to follow >> including >> > > at >> > > > > > least 5 >> > > > > > > > >>>> other >> > > > > > > > >>>>> ASF projects that also passed the review of the >> > > privacy@asf. >> > > > > And >> > > > > > > > >>>> while I >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by default >> > > coming >> > > > > > from >> > > > > > > > >>>> consumer >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is not a >> > > > consumer >> > > > > > > > >>>>> software and is used in "corporate environment" which >> > has a >> > > > > > little >> > > > > > > > >>>>> different expectations and broad assumption that the >> > > company >> > > > > can >> > > > > > make >> > > > > > > > >>>>> decisions on such telemetry on behalf of the employees >> > > using >> > > > > it. >> > > > > > > > >>>>> >> > > > > > > > >>>>> We should assume that those who deploy and upgrade >> > Airflow >> > > - >> > > > > > actually >> > > > > > > > >>>> read >> > > > > > > > >>>>> and take into account what is written in the release >> > notes >> > > - >> > > > > > > > >>>> especially if >> > > > > > > > >>>>> they have security guys breathing their necks, >> similarly >> > as >> > > > we >> > > > > > have >> > > > > > > > to >> > > > > > > > >>>>> assume they follow CVE announcements about security >> > issues >> > > > > > fixed. If >> > > > > > > > we >> > > > > > > > >>>>> are very straightforward and out-going about the >> change, >> > > > inform >> > > > > > very >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem with >> > > > opt-out. >> > > > > > > > >>>>> >> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v >> > spend >> > > a >> > > > > good >> > > > > > > > deal >> > > > > > > > >>>> of >> > > > > > > > >>>>> time reading the Superset and other use case and >> > > explanation >> > > > > in >> > > > > > > > >>>> detail to >> > > > > > > > >>>>> make a better informed decision) - and it looks like >> they >> > > > also >> > > > > > went >> > > > > > > > >>>> opt-out >> > > > > > > > >>>>> way and got cleared by privacy@a.o. And if we cannot >> > > reach >> > > > > > > > >>>> consensus, we >> > > > > > > > >>>>> should - as usual - make a voting decision on it >> (because >> > > > yes, >> > > > > > it is >> > > > > > > > an >> > > > > > > > >>>>> important decision), but - after reading and >> > understanding >> > > > why >> > > > > > others >> > > > > > > > >>>> also >> > > > > > > > >>>>> did it - for me personally, opt-out is a good path. >> > > > > > > > >>>>> >> > > > > > > > >>>>> Also because it will rather increase the amount of >> data >> > to >> > > > > > gather, >> > > > > > > > and >> > > > > > > > >>>> in >> > > > > > > > >>>>> our case - counter intuitively - it will be even >> better >> > for >> > > > > > privacy >> > > > > > > > and >> > > > > > > > >>>>> corporate anonymity, because the more data we get, the >> > more >> > > > > > difficult >> > > > > > > > >>>> it >> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated >> insight >> > > > from >> > > > > > it. >> > > > > > > > >>>> Imagine >> > > > > > > > >>>>> if only a few corporate users will enable it >> consciously >> > - >> > > > then >> > > > > > we >> > > > > > > > >>>> will be >> > > > > > > > >>>>> able to draw much more conclusions if we find out who >> > they >> > > > are, >> > > > > > than >> > > > > > > > if >> > > > > > > > >>>>> everyone has it enabled by default. >> > > > > > > > >>>>> >> > > > > > > > >>>>> That's my take on it - but again, it's up to us to >> vote, >> > > for >> > > > me >> > > > > > > > opt-in >> > > > > > > > >>>> is >> > > > > > > > >>>>> not "has to", and I am rather for opt-out. >> > > > > > > > >>>>> >> > > > > > > > >>>>> J. >> > > > > > > > >>>>> >> > > > > > > > >>>>>> Hi all, >> > > > > > > > >>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow >> > > > > > installations. >> > > > > > > > >>>> As the >> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on >> the >> > > > yearly >> > > > > > > > >>>> Airflow >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions >> > about >> > > > > > Airflow >> > > > > > > > >>>> usage. >> > > > > > > > >>>>>>> Questions like the following: >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> - Which versions of Airflow are people >> > installing/using >> > > > now >> > > > > > > > >>>> (i.e. >> > > > > > > > >>>>>>> whether people have primarily made the jump from >> > > version >> > > > X >> > > > > to >> > > > > > > > >>>>> version >> > > > > > > > >>>>>> Y) >> > > > > > > > >>>>>>> - Which DB is used as the Metadata DB and which >> > version >> > > > e.g >> > > > > > Pg >> > > > > > > > >>>> 14? >> > > > > > > > >>>>>>> - What Python version is being used? >> > > > > > > > >>>>>>> - Which Executor is being used? >> > > > > > > > >>>>>>> - Approximately how many people out there in the >> > world >> > > > are >> > > > > > > > >>>>> installing >> > > > > > > > >>>>>>> Airflow >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> There is a solution that should help answer these >> > > > questions: >> > > > > > Scarf >> > > > > > > > >>>> [1]. >> > > > > > > > >>>>>> The >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already >> used >> > by >> > > > > other >> > > > > > ASF >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo >> > > > > > Kubernetes, >> > > > > > > > >>>>> DevLake, >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations. >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as >> follows: >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> 1. Install the `scarf js` npm package and bundle >> it >> > in >> > > > the >> > > > > > > > >>>>> Webserver. >> > > > > > > > >>>>>>> When the package is downloaded & Airflow >> webserver is >> > > > > opened, >> > > > > > > > >>>>> metadata >> > > > > > > > >>>>>>> is >> > > > > > > > >>>>>>> recorded to the Scarf dashboard. >> > > > > > > > >>>>>>> 2. Utilize the Scarf Gateway [6], which we can >> use in >> > > > front >> > > > > > of >> > > > > > > > >>>>> docker >> > > > > > > > >>>>>>> containers. While it’s possible people go around >> this >> > > > > > gateway, >> > > > > > > > >>>> we >> > > > > > > > >>>>> can >> > > > > > > > >>>>>>> probably configure and encourage most traffic to >> go >> > > > through >> > > > > > > > >>>> these >> > > > > > > > >>>>>>> gateways. >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> While Scarf does not store any personally >> identifying >> > > > > > information >> > > > > > > > >>>> from >> > > > > > > > >>>>>> SDK >> > > > > > > > >>>>>>> telemetry data, it does send various bits of >> IP-derived >> > > > > > > > >>>> information as >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as >> > > transparent >> > > > as >> > > > > > > > >>>> possible >> > > > > > > > >>>>> by >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any >> > > other >> > > > > > relevant >> > > > > > > > >>>>> means >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town >> Hall, >> > > > Slack, >> > > > > > > > >>>> Newsletter >> > > > > > > > >>>>>>> etc). >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> The following case studies are worth reading: >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> 1. >> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset >> > > > > > > > >>>>> (From >> > > > > > > > >>>>>>> Maxime) >> > > > > > > > >>>>>>> 2. >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> Similar to them, this could help in various ways >> that >> > > come >> > > > > with >> > > > > > > > >>>> using >> > > > > > > > >>>>>> data >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how >> to >> > > > > opt-out" >> > > > > > > > >>>>>> [8][9][10] & >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow >> website, >> > > this >> > > > > > can be >> > > > > > > > >>>>>>> beneficial to the entire community as we would be >> > making >> > > > more >> > > > > > > > >>>> informed >> > > > > > > > >>>>>>> decisions. >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> Regards, >> > > > > > > > >>>>>>> Kaxil >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/ >> > > > > > > > >>>>>>> [2] >> > > > > > https://privacy.apache.org/policies/privacy-policy-public.html >> > > > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html >> > > > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639 >> > > > > > > > >>>>>>> [5] >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy >> > > > > > > > >>>>>>> [8] >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data >> > > > > > > > >>>>>>> [9] >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose >> > > > > > > > >>>>>>> [10] >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics >> > > > > > > > >>>>>>> >> > > > > > > > >>>>>> >> > > > > > > > >>>>> >> > > > > > > > >>>> >> > > > > > > > >>> >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > >> --------------------------------------------------------------------- >> > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> > > > > > > > For additional commands, e-mail: >> dev-h...@airflow.apache.org >> > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > --------------------------------------------------------------------- >> > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >