My 2 cents: it must be possible to opt-out, preferably it should be possible to deploy Airflow instances without bundling the telemetry library dependencies. Other than that I don't mind it being e.g. optional provider.
śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <huss...@awala.fr> napisał: > > I'd like to propose, that we start with collecting simple data with > limited access: to all the PMC members. We can always expand it to > Committers and then expand further to make it invite-only or setup > exporting it to a DB like Postgres > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly > viewable dashboard. > > Looks like a good plan; we can discuss the export format when we decide to > do it. > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > Yup, exactly. > > > > I believe this would definitely help us take early and informed > decisions. > >> E.g. Had we had this earlier, I believe it would have definitely helped > us > >> more for our past discussions like whether we should continue supporting > >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 > ), > >> similarly about the DaskExecutor ( > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc. > >> > > > > > > Btw clarifying my own stance on the below; and let me know what you > think @Hussein > > Awala <huss...@awala.fr> : I'd like to propose, that we start with > > collecting simple data with limited access: to all the PMC members. We > can > > always expand it to Committers and then expand further to make it > > invite-only or setup exporting it to a DB like Postgres > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a > publicly > > viewable dashboard. It would be similar to an iterative software > > development approach, since this will be the first time for us, as > Airflow > > PMC, to add such telemetry. This is of course just my opinion though :) > > > > Regarding the data, like I had mentioned in the email and I am glad > others > >> including you are on the same page that the data will be shared with all > >> PMC members. The point about sharing it via website and newsletter was > for > >> the community — Airflow users. I don’t think anyone in the community > (apart > >> from the PMC members) would need raw data. And even if they need it, I’d > >> say they should put effort and contribute to the Airflow project and > become > >> PMC members. > >> To be clear: this telemetry data should help us, as Airflow PMC, to > steer > >> some of the decision making based on this data similar to how only PMC > has > >> a binding vote on the releases. [1] and this is similar to how Apache > >> Superset does it too. > >> [1] > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc > > > > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io > .invalid> > > wrote: > > > >> +1 to introduce this. > >> > >> I believe this would definitely help us take early and informed > decisions. > >> E.g. Had we had this earlier, I believe it would have definitely helped > us > >> more for our past discussions like whether we should continue supporting > >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 > ), > >> similarly about the DaskExecutor ( > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc. > >> > >> > >> Best regards, > >> > >> *Pankaj Koti* > >> Senior Software Engineer (Airflow OSS Engineering team) > >> Location: Pune, Maharashtra, India > >> Timezone: Indian Standard Time (IST) > >> Phone: +91 9730079985 > >> > >> > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > >> > >> > Yup, I had added a link to scarf docs in the original email that > >> referenced > >> > opting out and we should even add an Airflow config that puts all > >> config in > >> > a single place. Without it we can’t be compliant to all the policies > >> even > >> > if we collectively ignore or are unaware of the importance of it. > >> > > >> > Regarding the data, like I had mentioned in the email and I am glad > >> others > >> > including you are on the same page that the data will be shared with > all > >> > PMC members. The point about sharing it via website and newsletter was > >> for > >> > the community — Airflow users. I don’t think anyone in the community > >> (apart > >> > from the PMC members) would need raw data. And even if they need it, > I’d > >> > say they should put effort and contribute to the Airflow project and > >> become > >> > PMC members. > >> > > >> > To be clear: this telemetry data should help us, as Airflow PMC, to > >> steer > >> > some of the decision making based on this data similar to how only PMC > >> has > >> > a binding vote on the releases. [1] and this is similar to how Apache > >> > Superset does it too. > >> > > >> > [1] > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc > >> > > >> > > >> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> wrote: > >> > > >> > > I mentioned opting out just to confirm its importance, and after > >> checking > >> > > the Scarf documentation it appears to be supported natively by > Scarf. > >> For > >> > > data accessibility, my point was more about raw data, not just > >> aggregated > >> > > information/insights shared via monthly newsletters, as we do for > >> Airflow > >> > > annual Survey for example: > >> > > https://airflow.apache.org/survey vs > >> > > > >> > > > >> > > >> > https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics > >> > > . > >> > > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com> > >> wrote: > >> > > > >> > > > Agreed to both your points Hussein but both the points are already > >> > > covered > >> > > > in my original discussion post - both about opting out and > providing > >> > data > >> > > > to all the PMC members and providing visibility via Monthly > >> > newsletters. > >> > > Is > >> > > > there anything else you propose to discuss that isn’t covered? > >> > > > > >> > > > > >> > > > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr> > >> wrote: > >> > > > > >> > > > > +1 for the idea in general, but there are two main points to > >> discuss > >> > > > before > >> > > > > voting on this: > >> > > > > > >> > > > > 1. We should provide an option to disable Scarf: > >> > > > > As Airflow is not a paid product, we cannot force companies to > >> report > >> > > > their > >> > > > > use of this project. Otherwise, some may choose to create their > >> own > >> > > fork > >> > > > > just to disable Scarf. > >> > > > > > >> > > > > 2. Concerning the exclusivity of access to data: > >> > > > > The data collected must either be completely proprietary for use > >> by > >> > PMC > >> > > > and > >> > > > > ASF, or completely open. Since many companies offer Airflow as a > >> > > product, > >> > > > > it is imperative not to give one company more privileges than > >> > others. I > >> > > > > raise this point for the principle of equality of opportunity. > >> > > > > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia < > >> sunank...@gmail.com > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > Big +1 for Scarf. > >> > > > > > > >> > > > > > Transparency is key, so it's important to be super clear about > >> > opting > >> > > > > > out and what's tracked to avoid spooking anyone about IP > stuff. > >> > > > > > > >> > > > > > Regards > >> > > > > > Ankit Chaurasia > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai < > >> > > amoghdesai....@gmail.com> > >> > > > > > wrote: > >> > > > > > > > >> > > > > > > +1 looks like a good tool which could be super helpful. > >> > > > > > > > >> > > > > > > * We should have some transparency into the data that is > >> > collected > >> > > or > >> > > > > > sent > >> > > > > > > * We should have an option to optionally opt-out > >> > > > > > > > >> > > > > > > Thanks & Regards, > >> > > > > > > Amogh Desai > >> > > > > > > > >> > > > > > > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee < > weilee...@gmail.com> > >> > > wrote: > >> > > > > > > > >> > > > > > > > +1 to this. It would be really useful. As long as we can > opt > >> > > out, I > >> > > > > > think > >> > > > > > > > we’re good. > >> > > > > > > > > >> > > > > > > > Best, > >> > > > > > > > Wei > >> > > > > > > > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik < > >> > kaxiln...@gmail.com> > >> > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > Grammar Correction: > >> > > > > > > > > > >> > > > > > > > > We should assume that those who deploy and upgrade > >> Airflow - > >> > > > > actually > >> > > > > > > > read > >> > > > > > > > >> and take into account what is written in the release > >> notes - > >> > > > > > especially > >> > > > > > > > if > >> > > > > > > > >> they have security guys breathing their necks, > similarly > >> as > >> > we > >> > > > > have > >> > > > > > to > >> > > > > > > > >> assume they follow CVE announcements about security > >> issues > >> > > > fixed. > >> > > > > > If we > >> > > > > > > > >> are very straightforward and out-going about the > change, > >> > > inform > >> > > > > very > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem with > >> > > opt-out. > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > I couldn't agree more; even though we shouldn't collect > >> any > >> > > data > >> > > > > that > >> > > > > > > > > hamper security (and we should aim to do the same), most > >> > > security > >> > > > > > > > concerned > >> > > > > > > > > folks don't just upgrade, and we can rely on them > >> regarding > >> > > > release > >> > > > > > notes > >> > > > > > > > > or announcements and we can make it very clear in our > >> > > > announcements > >> > > > > > too; > >> > > > > > > > > and in our installation guides. > >> > > > > > > > > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik < > >> > kaxiln...@gmail.com> > >> > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > >> Grammar crrection: > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik < > >> > kaxiln...@gmail.com > >> > > > > >> > > > > > wrote: > >> > > > > > > > >> > >> > > > > > > > >>> Have this at the end of the email too: but if folks > >> don't > >> > > read > >> > > > > > until > >> > > > > > > > the > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]: > >> > > > > > > > >>> > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to open > >> > > > source?’, > >> > > > > > ‘I've > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an > >> > engineer.’ > >> > > > > > Actually, > >> > > > > > > > the > >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my > >> > > > organization > >> > > > > > gets > >> > > > > > > > real > >> > > > > > > > >>> value from this piece of software.’ There are a bunch > of > >> > ways > >> > > > to > >> > > > > > let > >> > > > > > > > the > >> > > > > > > > >>> people know about it – and now Scarf is there. If your > >> > > > > > organization is > >> > > > > > > > >>> getting a lot of value from a piece of open source > >> > software, > >> > > > make > >> > > > > > sure > >> > > > > > > > the > >> > > > > > > > >>> devs know about it." > >> > > > > > > > >>> > >> > > > > > > > >>> What kind of edge cases are you thinking about? I > don't > >> > think > >> > > > it > >> > > > > > makes > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to > collect > >> > data > >> > > > for > >> > > > > > most > >> > > > > > > > >>> Airflow installations except for those that don't want > >> to > >> > > give > >> > > > > > data, > >> > > > > > > > then > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as > we > >> > don't > >> > > > > > collect > >> > > > > > > > any > >> > > > > > > > >>> PII data, this is in-compliance as well. > >> > > > > > > > >>> > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to > opt-in > >> > via > >> > > a > >> > > > > > config, > >> > > > > > > > >>> they wouldn't even know or care about it, hence us > >> losing > >> > > most > >> > > > of > >> > > > > > the > >> > > > > > > > data. > >> > > > > > > > >>> I understand why some orgs & individuals may want to > >> > opt-out. > >> > > > > > > > >>> > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML > >> image > >> > > tag) > >> > > > > > that you > >> > > > > > > > >>> can place in your website or product to track visitors > >> to > >> > > that > >> > > > > > URL. If > >> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't > have > >> > > > approved > >> > > > > > it > >> > > > > > > > at all. > >> > > > > > > > >>> > >> > > > > > > > >>> A few key details to note about the pixel: > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> - No PII is tracked… Scarf does not capture/retain > IP > >> > > > > > information… > >> > > > > > > > >>> this information is discarded by the platform upon > >> > > > > > > > processing/aggregating > >> > > > > > > > >>> - Scarf pixels respect the Do Not Track (DNT) > >> settings of > >> > > > > > browsers - > >> > > > > > > > >>> these users will not be tracked whatsoever. > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> All the ASF projects I had listed (whether they use > >> Scarf > >> > > > gateway > >> > > > > > or > >> > > > > > > > >>> Scarf pixel in product) are using opt-out. > >> > > > > > > > >>> > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this > feature > >> > with > >> > > > > > users who > >> > > > > > > > >>>> trust and if it works great - make it public. I think > >> it's > >> > > > wise > >> > > > > to > >> > > > > > > > handle > >> > > > > > > > >>>> edge cases and configure collected data more > >> accurately. > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> It would be a pixel in the webserver, should affect > >> nothing > >> > > at > >> > > > > all > >> > > > > > even > >> > > > > > > > >>> in an air-gapped environment. > >> > > > > > > > >>> > >> > > > > > > > >>>> 2. It should not affect anything if access to the > >> internet > >> > > is > >> > > > > > > > restricted > >> > > > > > > > >>>> which is default for many companies. > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> 100% agreed on the below: > >> > > > > > > > >>> > >> > > > > > > > >>>> I think we have a very good blueprint to follow > >> including > >> > at > >> > > > > > least 5 > >> > > > > > > > >>>> other > >> > > > > > > > >>>> ASF projects that also passed the review of the > >> > privacy@asf. > >> > > > > And > >> > > > > > > > while I > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by > default > >> > > coming > >> > > > > from > >> > > > > > > > >>>> consumer > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not > a > >> > > > consumer > >> > > > > > > > >>>> software and is used in "corporate environment" which > >> has > >> > a > >> > > > > little > >> > > > > > > > >>>> different expectations and broad assumption that the > >> > company > >> > > > can > >> > > > > > make > >> > > > > > > > >>>> decisions on such telemetry on behalf of the > employees > >> > using > >> > > > it. > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't we > >> collect > >> > > > > hamper > >> > > > > > > > >>> security (and we should aim to do the same), most > >> security > >> > > > > > concerned > >> > > > > > > > folks > >> > > > > > > > >>> don't just > >> > > > > > > > >>> upgrade, and we can rely on them regarding release > >> notes or > >> > > > > > > > announcements > >> > > > > > > > >>> and we can make it very clear in our announcements > too; > >> and > >> > > in > >> > > > > our > >> > > > > > > > >>> installation guides. > >> > > > > > > > >>> > >> > > > > > > > >>> We should assume that those who deploy and upgrade > >> Airflow > >> > - > >> > > > > > actually > >> > > > > > > > read > >> > > > > > > > >>>> and take into account what is written in the release > >> > notes - > >> > > > > > > > especially > >> > > > > > > > >>>> if > >> > > > > > > > >>>> they have security guys breathing their necks, > >> similarly > >> > as > >> > > we > >> > > > > > have to > >> > > > > > > > >>>> assume they follow CVE announcements about security > >> issues > >> > > > > fixed. > >> > > > > > If > >> > > > > > > > we > >> > > > > > > > >>>> are very straightforward and out-going about the > >> change, > >> > > > inform > >> > > > > > very > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem > with > >> > > > opt-out. > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> To be clear, the collection of data, or at least the > >> data > >> > we > >> > > > > should > >> > > > > > > > >>> gather here should help all the consumers without > >> violating > >> > > > > > anything > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the > use-case > >> > doc > >> > > > [1] > >> > > > > > > > >>> > >> > > > > > > > >>> "*Another Form of Contributing* > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to open > >> > > > source?’, > >> > > > > > ‘I've > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an > >> > engineer.’ > >> > > > > > Actually, > >> > > > > > > > the > >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my > >> > > > organization > >> > > > > > gets > >> > > > > > > > real > >> > > > > > > > >>> value from this piece of software.’ There are a bunch > of > >> > ways > >> > > > to > >> > > > > > let > >> > > > > > > > the > >> > > > > > > > >>> people know about it – and now Scarf is there. If your > >> > > > > > organization is > >> > > > > > > > >>> getting a lot of value from a piece of open source > >> > software, > >> > > > make > >> > > > > > sure > >> > > > > > > > the > >> > > > > > > > >>> devs know about it.”" > >> > > > > > > > >>> > >> > > > > > > > >>> > >> > > > > > > > >>> [1] > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset > >> > > > > > > > >>> > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin < > >> > > > > kxe...@apache.org> > >> > > > > > > > wrote: > >> > > > > > > > >>> > >> > > > > > > > >>>> Hi Jarek! > >> > > > > > > > >>>> > >> > > > > > > > >>>> I understand the reasons for opt-out from a project > >> view. > >> > I > >> > > > just > >> > > > > > > > suddenly > >> > > > > > > > >>>> imagined the situation when an upgrade happens and > here > >> > > comes > >> > > > > the > >> > > > > > > > data to > >> > > > > > > > >>>> some third party service - that's a view from a user > >> side > >> > of > >> > > > > some > >> > > > > > big > >> > > > > > > > >>>> company. > >> > > > > > > > >>>> > >> > > > > > > > >>>> There could be good alternatives to handle this: > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this > >> feature > >> > > with > >> > > > > > users > >> > > > > > > > who > >> > > > > > > > >>>> trust and if it works great - make it public. I think > >> it's > >> > > > wise > >> > > > > to > >> > > > > > > > handle > >> > > > > > > > >>>> edge cases and configure collected data more > >> accurately. > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make > >> this > >> > > > > > feature not > >> > > > > > > > >>>> get > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration. > >> > > > > > > > >>>> > >> > > > > > > > >>>> Just a personal thoughts for discussion (: > >> > > > > > > > >>>> > >> > > > > > > > >>>> -- > >> > > > > > > > >>>> ,,,^..^,,, > >> > > > > > > > >>>> > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk < > >> > > > ja...@potiuk.com> > >> > > > > > > > wrote: > >> > > > > > > > >>>> > >> > > > > > > > >>>>> Hello everyone, > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> it has to be: > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys > >> about > >> > new > >> > > > > > unplanned > >> > > > > > > > >>>>>> activity after regular upgrade. > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> That's a very good point about security triggering > >> > > Alexander, > >> > > > > > but I > >> > > > > > > > am > >> > > > > > > > >>>> not > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There > >> are > >> > > other > >> > > > > > ways of > >> > > > > > > > >>>>> communicating with the "deployment managers" who > >> install > >> > > and > >> > > > > > upgrade > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of > >> > ours, > >> > > > > slack > >> > > > > > > > >>>>> announcements etc. We have plenty of channels we can > >> use > >> > to > >> > > > > > > > >>>> communicate the > >> > > > > > > > >>>>> change. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> I think we have a very good blueprint to follow > >> including > >> > > at > >> > > > > > least 5 > >> > > > > > > > >>>> other > >> > > > > > > > >>>>> ASF projects that also passed the review of the > >> > > privacy@asf. > >> > > > > And > >> > > > > > > > >>>> while I > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by > default > >> > > coming > >> > > > > > from > >> > > > > > > > >>>> consumer > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is > not a > >> > > > consumer > >> > > > > > > > >>>>> software and is used in "corporate environment" > which > >> > has a > >> > > > > > little > >> > > > > > > > >>>>> different expectations and broad assumption that the > >> > > company > >> > > > > can > >> > > > > > make > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the > employees > >> > > using > >> > > > > it. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> We should assume that those who deploy and upgrade > >> > Airflow > >> > > - > >> > > > > > actually > >> > > > > > > > >>>> read > >> > > > > > > > >>>>> and take into account what is written in the release > >> > notes > >> > > - > >> > > > > > > > >>>> especially if > >> > > > > > > > >>>>> they have security guys breathing their necks, > >> similarly > >> > as > >> > > > we > >> > > > > > have > >> > > > > > > > to > >> > > > > > > > >>>>> assume they follow CVE announcements about security > >> > issues > >> > > > > > fixed. If > >> > > > > > > > we > >> > > > > > > > >>>>> are very straightforward and out-going about the > >> change, > >> > > > inform > >> > > > > > very > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem > with > >> > > > opt-out. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v > >> > spend > >> > > a > >> > > > > good > >> > > > > > > > deal > >> > > > > > > > >>>> of > >> > > > > > > > >>>>> time reading the Superset and other use case and > >> > > explanation > >> > > > > in > >> > > > > > > > >>>> detail to > >> > > > > > > > >>>>> make a better informed decision) - and it looks like > >> they > >> > > > also > >> > > > > > went > >> > > > > > > > >>>> opt-out > >> > > > > > > > >>>>> way and got cleared by privacy@a.o. And if we > cannot > >> > > reach > >> > > > > > > > >>>> consensus, we > >> > > > > > > > >>>>> should - as usual - make a voting decision on it > >> (because > >> > > > yes, > >> > > > > > it is > >> > > > > > > > an > >> > > > > > > > >>>>> important decision), but - after reading and > >> > understanding > >> > > > why > >> > > > > > others > >> > > > > > > > >>>> also > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good path. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> Also because it will rather increase the amount of > >> data > >> > to > >> > > > > > gather, > >> > > > > > > > and > >> > > > > > > > >>>> in > >> > > > > > > > >>>>> our case - counter intuitively - it will be even > >> better > >> > for > >> > > > > > privacy > >> > > > > > > > and > >> > > > > > > > >>>>> corporate anonymity, because the more data we get, > the > >> > more > >> > > > > > difficult > >> > > > > > > > >>>> it > >> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated > >> insight > >> > > > from > >> > > > > > it. > >> > > > > > > > >>>> Imagine > >> > > > > > > > >>>>> if only a few corporate users will enable it > >> consciously > >> > - > >> > > > then > >> > > > > > we > >> > > > > > > > >>>> will be > >> > > > > > > > >>>>> able to draw much more conclusions if we find out > who > >> > they > >> > > > are, > >> > > > > > than > >> > > > > > > > if > >> > > > > > > > >>>>> everyone has it enabled by default. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> That's my take on it - but again, it's up to us to > >> vote, > >> > > for > >> > > > me > >> > > > > > > > opt-in > >> > > > > > > > >>>> is > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>> J. > >> > > > > > > > >>>>> > >> > > > > > > > >>>>>> Hi all, > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow > >> > > > > > installations. > >> > > > > > > > >>>> As the > >> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on > >> the > >> > > > yearly > >> > > > > > > > >>>> Airflow > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions > >> > about > >> > > > > > Airflow > >> > > > > > > > >>>> usage. > >> > > > > > > > >>>>>>> Questions like the following: > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> - Which versions of Airflow are people > >> > installing/using > >> > > > now > >> > > > > > > > >>>> (i.e. > >> > > > > > > > >>>>>>> whether people have primarily made the jump from > >> > > version > >> > > > X > >> > > > > to > >> > > > > > > > >>>>> version > >> > > > > > > > >>>>>> Y) > >> > > > > > > > >>>>>>> - Which DB is used as the Metadata DB and which > >> > version > >> > > > e.g > >> > > > > > Pg > >> > > > > > > > >>>> 14? > >> > > > > > > > >>>>>>> - What Python version is being used? > >> > > > > > > > >>>>>>> - Which Executor is being used? > >> > > > > > > > >>>>>>> - Approximately how many people out there in the > >> > world > >> > > > are > >> > > > > > > > >>>>> installing > >> > > > > > > > >>>>>>> Airflow > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> There is a solution that should help answer these > >> > > > questions: > >> > > > > > Scarf > >> > > > > > > > >>>> [1]. > >> > > > > > > > >>>>>> The > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already > >> used > >> > by > >> > > > > other > >> > > > > > ASF > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], > Dubbo > >> > > > > > Kubernetes, > >> > > > > > > > >>>>> DevLake, > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other > regulations. > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as > >> follows: > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> 1. Install the `scarf js` npm package and bundle > >> it > >> > in > >> > > > the > >> > > > > > > > >>>>> Webserver. > >> > > > > > > > >>>>>>> When the package is downloaded & Airflow > >> webserver is > >> > > > > opened, > >> > > > > > > > >>>>> metadata > >> > > > > > > > >>>>>>> is > >> > > > > > > > >>>>>>> recorded to the Scarf dashboard. > >> > > > > > > > >>>>>>> 2. Utilize the Scarf Gateway [6], which we can > >> use in > >> > > > front > >> > > > > > of > >> > > > > > > > >>>>> docker > >> > > > > > > > >>>>>>> containers. While it’s possible people go around > >> this > >> > > > > > gateway, > >> > > > > > > > >>>> we > >> > > > > > > > >>>>> can > >> > > > > > > > >>>>>>> probably configure and encourage most traffic to > >> go > >> > > > through > >> > > > > > > > >>>> these > >> > > > > > > > >>>>>>> gateways. > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> While Scarf does not store any personally > >> identifying > >> > > > > > information > >> > > > > > > > >>>> from > >> > > > > > > > >>>>>> SDK > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of > >> IP-derived > >> > > > > > > > >>>> information as > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as > >> > > transparent > >> > > > as > >> > > > > > > > >>>> possible > >> > > > > > > > >>>>> by > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and > any > >> > > other > >> > > > > > relevant > >> > > > > > > > >>>>> means > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town > >> Hall, > >> > > > Slack, > >> > > > > > > > >>>> Newsletter > >> > > > > > > > >>>>>>> etc). > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> The following case studies are worth reading: > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> 1. > >> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset > >> > > > > > > > >>>>> (From > >> > > > > > > > >>>>>>> Maxime) > >> > > > > > > > >>>>>>> 2. > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> Similar to them, this could help in various ways > >> that > >> > > come > >> > > > > with > >> > > > > > > > >>>> using > >> > > > > > > > >>>>>> data > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how > >> to > >> > > > > opt-out" > >> > > > > > > > >>>>>> [8][9][10] & > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow > >> website, > >> > > this > >> > > > > > can be > >> > > > > > > > >>>>>>> beneficial to the entire community as we would be > >> > making > >> > > > more > >> > > > > > > > >>>> informed > >> > > > > > > > >>>>>>> decisions. > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> Regards, > >> > > > > > > > >>>>>>> Kaxil > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/ > >> > > > > > > > >>>>>>> [2] > >> > > > > > > https://privacy.apache.org/policies/privacy-policy-public.html > >> > > > > > > > >>>>>>> [3] > https://privacy.apache.org/faq/committers.html > >> > > > > > > > >>>>>>> [4] > https://github.com/apache/superset/issues/25639 > >> > > > > > > > >>>>>>> [5] > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy > >> > > > > > > > >>>>>>> [8] > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data > >> > > > > > > > >>>>>>> [9] > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose > >> > > > > > > > >>>>>>> [10] > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > >> > > > > > > > >>>>>>> > >> > > > > > > > >>>>>> > >> > > > > > > > >>>>> > >> > > > > > > > >>>> > >> > > > > > > > >>> > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > >> --------------------------------------------------------------------- > >> > > > > > > > To unsubscribe, e-mail: > dev-unsubscr...@airflow.apache.org > >> > > > > > > > For additional commands, e-mail: > >> dev-h...@airflow.apache.org > >> > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > >> > --------------------------------------------------------------------- > >> > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >