The webserver is packaged after compiling, so that won't be possible Michal.
On Tue, 9 Apr 2024 at 11:02, Michał Modras <michalmod...@google.com> wrote: > If it is packaged and installed by default, we add the dependency (and its > dependencies) to Airflow's already-not-small dependency tree. If we make it > installed and enabled by default, would there be an easy way to not just > switch it off (e.g. through the env variable), but also not package it at > all? That's why I was suggesting a provider, but actually any other > pluggable (and unpluggable) mechanism would work. > > On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <huss...@awala.fr> wrote: > >> > Other than that I don't mind it being e.g. optional provider. >> >> I don't think it is possible to implement it in a provider because it is a >> js package installed on the webserver; we could implement it as a plugin >> (Blueprint), but in this case, the user must make an effort to register >> it. >> >> It would be better to always install it, and activate it by default, with >> the possibility of deactivating it via the environment variable >> `SCARF_ANALYTICS=false` (according to the documentation), where if it is >> deactivated by default, many users will not activate it even if they don't >> mind to report the metrics, but if we enable it by default, only users who >> don't want to send metrics will disable it. >> >> >> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras >> <michalmod...@google.com.invalid> wrote: >> >> > My 2 cents: it must be possible to opt-out, preferably it should be >> > possible to deploy Airflow instances without bundling the telemetry >> library >> > dependencies. Other than that I don't mind it being e.g. optional >> provider. >> > >> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <huss...@awala.fr> >> > napisał: >> > >> > > > I'd like to propose, that we start with collecting simple data with >> > > limited access: to all the PMC members. We can always expand it to >> > > Committers and then expand further to make it invite-only or setup >> > > exporting it to a DB like Postgres >> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a >> > publicly >> > > viewable dashboard. >> > > >> > > Looks like a good plan; we can discuss the export format when we >> decide >> > to >> > > do it. >> > > >> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> >> wrote: >> > > >> > > > Yup, exactly. >> > > > >> > > > I believe this would definitely help us take early and informed >> > > decisions. >> > > >> E.g. Had we had this earlier, I believe it would have definitely >> > helped >> > > us >> > > >> more for our past discussions like whether we should continue >> > supporting >> > > >> MsSQL( >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 >> > > ), >> > > >> similarly about the DaskExecutor ( >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), >> > etc. >> > > >> >> > > > >> > > > >> > > > Btw clarifying my own stance on the below; and let me know what you >> > > think @Hussein >> > > > Awala <huss...@awala.fr> : I'd like to propose, that we start with >> > > > collecting simple data with limited access: to all the PMC members. >> We >> > > can >> > > > always expand it to Committers and then expand further to make it >> > > > invite-only or setup exporting it to a DB like Postgres >> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a >> > > publicly >> > > > viewable dashboard. It would be similar to an iterative software >> > > > development approach, since this will be the first time for us, as >> > > Airflow >> > > > PMC, to add such telemetry. This is of course just my opinion >> though :) >> > > > >> > > > Regarding the data, like I had mentioned in the email and I am glad >> > > others >> > > >> including you are on the same page that the data will be shared >> with >> > all >> > > >> PMC members. The point about sharing it via website and newsletter >> was >> > > for >> > > >> the community — Airflow users. I don’t think anyone in the >> community >> > > (apart >> > > >> from the PMC members) would need raw data. And even if they need >> it, >> > I’d >> > > >> say they should put effort and contribute to the Airflow project >> and >> > > become >> > > >> PMC members. >> > > >> To be clear: this telemetry data should help us, as Airflow PMC, to >> > > steer >> > > >> some of the decision making based on this data similar to how only >> PMC >> > > has >> > > >> a binding vote on the releases. [1] and this is similar to how >> Apache >> > > >> Superset does it too. >> > > >> [1] >> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc >> > > > >> > > > >> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io >> > > .invalid> >> > > > wrote: >> > > > >> > > >> +1 to introduce this. >> > > >> >> > > >> I believe this would definitely help us take early and informed >> > > decisions. >> > > >> E.g. Had we had this earlier, I believe it would have definitely >> > helped >> > > us >> > > >> more for our past discussions like whether we should continue >> > supporting >> > > >> MsSQL( >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 >> > > ), >> > > >> similarly about the DaskExecutor ( >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), >> > etc. >> > > >> >> > > >> >> > > >> Best regards, >> > > >> >> > > >> *Pankaj Koti* >> > > >> Senior Software Engineer (Airflow OSS Engineering team) >> > > >> Location: Pune, Maharashtra, India >> > > >> Timezone: Indian Standard Time (IST) >> > > >> Phone: +91 9730079985 <+91%2097300%2079985> >> > > >> >> > > >> >> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> >> > wrote: >> > > >> >> > > >> > Yup, I had added a link to scarf docs in the original email that >> > > >> referenced >> > > >> > opting out and we should even add an Airflow config that puts all >> > > >> config in >> > > >> > a single place. Without it we can’t be compliant to all the >> policies >> > > >> even >> > > >> > if we collectively ignore or are unaware of the importance of it. >> > > >> > >> > > >> > Regarding the data, like I had mentioned in the email and I am >> glad >> > > >> others >> > > >> > including you are on the same page that the data will be shared >> with >> > > all >> > > >> > PMC members. The point about sharing it via website and >> newsletter >> > was >> > > >> for >> > > >> > the community — Airflow users. I don’t think anyone in the >> community >> > > >> (apart >> > > >> > from the PMC members) would need raw data. And even if they need >> it, >> > > I’d >> > > >> > say they should put effort and contribute to the Airflow project >> and >> > > >> become >> > > >> > PMC members. >> > > >> > >> > > >> > To be clear: this telemetry data should help us, as Airflow PMC, >> to >> > > >> steer >> > > >> > some of the decision making based on this data similar to how >> only >> > PMC >> > > >> has >> > > >> > a binding vote on the releases. [1] and this is similar to how >> > Apache >> > > >> > Superset does it too. >> > > >> > >> > > >> > [1] >> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc >> > > >> > >> > > >> > >> > > >> > >> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> >> > wrote: >> > > >> > >> > > >> > > I mentioned opting out just to confirm its importance, and >> after >> > > >> checking >> > > >> > > the Scarf documentation it appears to be supported natively by >> > > Scarf. >> > > >> For >> > > >> > > data accessibility, my point was more about raw data, not just >> > > >> aggregated >> > > >> > > information/insights shared via monthly newsletters, as we do >> for >> > > >> Airflow >> > > >> > > annual Survey for example: >> > > >> > > https://airflow.apache.org/survey vs >> > > >> > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics >> > > >> > > . >> > > >> > > >> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com >> > >> > > >> wrote: >> > > >> > > >> > > >> > > > Agreed to both your points Hussein but both the points are >> > already >> > > >> > > covered >> > > >> > > > in my original discussion post - both about opting out and >> > > providing >> > > >> > data >> > > >> > > > to all the PMC members and providing visibility via Monthly >> > > >> > newsletters. >> > > >> > > Is >> > > >> > > > there anything else you propose to discuss that isn’t >> covered? >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr >> > >> > > >> wrote: >> > > >> > > > >> > > >> > > > > +1 for the idea in general, but there are two main points >> to >> > > >> discuss >> > > >> > > > before >> > > >> > > > > voting on this: >> > > >> > > > > >> > > >> > > > > 1. We should provide an option to disable Scarf: >> > > >> > > > > As Airflow is not a paid product, we cannot force >> companies to >> > > >> report >> > > >> > > > their >> > > >> > > > > use of this project. Otherwise, some may choose to create >> > their >> > > >> own >> > > >> > > fork >> > > >> > > > > just to disable Scarf. >> > > >> > > > > >> > > >> > > > > 2. Concerning the exclusivity of access to data: >> > > >> > > > > The data collected must either be completely proprietary >> for >> > use >> > > >> by >> > > >> > PMC >> > > >> > > > and >> > > >> > > > > ASF, or completely open. Since many companies offer Airflow >> > as a >> > > >> > > product, >> > > >> > > > > it is imperative not to give one company more privileges >> than >> > > >> > others. I >> > > >> > > > > raise this point for the principle of equality of >> opportunity. >> > > >> > > > > >> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia < >> > > >> sunank...@gmail.com >> > > >> > > >> > > >> > > > > wrote: >> > > >> > > > > >> > > >> > > > > > Big +1 for Scarf. >> > > >> > > > > > >> > > >> > > > > > Transparency is key, so it's important to be super clear >> > about >> > > >> > opting >> > > >> > > > > > out and what's tracked to avoid spooking anyone about IP >> > > stuff. >> > > >> > > > > > >> > > >> > > > > > Regards >> > > >> > > > > > Ankit Chaurasia >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai < >> > > >> > > amoghdesai....@gmail.com> >> > > >> > > > > > wrote: >> > > >> > > > > > > >> > > >> > > > > > > +1 looks like a good tool which could be super helpful. >> > > >> > > > > > > >> > > >> > > > > > > * We should have some transparency into the data that >> is >> > > >> > collected >> > > >> > > or >> > > >> > > > > > sent >> > > >> > > > > > > * We should have an option to optionally opt-out >> > > >> > > > > > > >> > > >> > > > > > > Thanks & Regards, >> > > >> > > > > > > Amogh Desai >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee < >> > > weilee...@gmail.com> >> > > >> > > wrote: >> > > >> > > > > > > >> > > >> > > > > > > > +1 to this. It would be really useful. As long as we >> can >> > > opt >> > > >> > > out, I >> > > >> > > > > > think >> > > >> > > > > > > > we’re good. >> > > >> > > > > > > > >> > > >> > > > > > > > Best, >> > > >> > > > > > > > Wei >> > > >> > > > > > > > >> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik < >> > > >> > kaxiln...@gmail.com> >> > > >> > > > > > wrote: >> > > >> > > > > > > > > >> > > >> > > > > > > > > Grammar Correction: >> > > >> > > > > > > > > >> > > >> > > > > > > > > We should assume that those who deploy and upgrade >> > > >> Airflow - >> > > >> > > > > actually >> > > >> > > > > > > > read >> > > >> > > > > > > > >> and take into account what is written in the >> release >> > > >> notes - >> > > >> > > > > > especially >> > > >> > > > > > > > if >> > > >> > > > > > > > >> they have security guys breathing their necks, >> > > similarly >> > > >> as >> > > >> > we >> > > >> > > > > have >> > > >> > > > > > to >> > > >> > > > > > > > >> assume they follow CVE announcements about >> security >> > > >> issues >> > > >> > > > fixed. >> > > >> > > > > > If we >> > > >> > > > > > > > >> are very straightforward and out-going about the >> > > change, >> > > >> > > inform >> > > >> > > > > very >> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem >> > with >> > > >> > > opt-out. >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't >> > collect >> > > >> any >> > > >> > > data >> > > >> > > > > that >> > > >> > > > > > > > > hamper security (and we should aim to do the same), >> > most >> > > >> > > security >> > > >> > > > > > > > concerned >> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them >> > > >> regarding >> > > >> > > > release >> > > >> > > > > > notes >> > > >> > > > > > > > > or announcements and we can make it very clear in >> our >> > > >> > > > announcements >> > > >> > > > > > too; >> > > >> > > > > > > > > and in our installation guides. >> > > >> > > > > > > > > >> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik < >> > > >> > kaxiln...@gmail.com> >> > > >> > > > > > wrote: >> > > >> > > > > > > > > >> > > >> > > > > > > > >> Grammar crrection: >> > > >> > > > > > > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik < >> > > >> > kaxiln...@gmail.com >> > > >> > > > >> > > >> > > > > > wrote: >> > > >> > > > > > > > >> >> > > >> > > > > > > > >>> Have this at the end of the email too: but if >> folks >> > > >> don't >> > > >> > > read >> > > >> > > > > > until >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]: >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to >> > open >> > > >> > > > source?’, >> > > >> > > > > > ‘I've >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an >> > > >> > engineer.’ >> > > >> > > > > > Actually, >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> very simplest thing that you can do is just say, >> ‘my >> > > >> > > > organization >> > > >> > > > > > gets >> > > >> > > > > > > > real >> > > >> > > > > > > > >>> value from this piece of software.’ There are a >> > bunch >> > > of >> > > >> > ways >> > > >> > > > to >> > > >> > > > > > let >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If >> > your >> > > >> > > > > > organization is >> > > >> > > > > > > > >>> getting a lot of value from a piece of open >> source >> > > >> > software, >> > > >> > > > make >> > > >> > > > > > sure >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> devs know about it." >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> What kind of edge cases are you thinking about? I >> > > don't >> > > >> > think >> > > >> > > > it >> > > >> > > > > > makes >> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to >> > > collect >> > > >> > data >> > > >> > > > for >> > > >> > > > > > most >> > > >> > > > > > > > >>> Airflow installations except for those that don't >> > want >> > > >> to >> > > >> > > give >> > > >> > > > > > data, >> > > >> > > > > > > > then >> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As >> long as >> > > we >> > > >> > don't >> > > >> > > > > > collect >> > > >> > > > > > > > any >> > > >> > > > > > > > >>> PII data, this is in-compliance as well. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to >> > > opt-in >> > > >> > via >> > > >> > > a >> > > >> > > > > > config, >> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence >> us >> > > >> losing >> > > >> > > most >> > > >> > > > of >> > > >> > > > > > the >> > > >> > > > > > > > data. >> > > >> > > > > > > > >>> I understand why some orgs & individuals may >> want to >> > > >> > opt-out. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an >> HTML >> > > >> image >> > > >> > > tag) >> > > >> > > > > > that you >> > > >> > > > > > > > >>> can place in your website or product to track >> > visitors >> > > >> to >> > > >> > > that >> > > >> > > > > > URL. If >> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF >> wouldn't >> > > have >> > > >> > > > approved >> > > >> > > > > > it >> > > >> > > > > > > > at all. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> A few key details to note about the pixel: >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> - No PII is tracked… Scarf does not >> capture/retain >> > > IP >> > > >> > > > > > information… >> > > >> > > > > > > > >>> this information is discarded by the platform >> upon >> > > >> > > > > > > > processing/aggregating >> > > >> > > > > > > > >>> - Scarf pixels respect the Do Not Track (DNT) >> > > >> settings of >> > > >> > > > > > browsers - >> > > >> > > > > > > > >>> these users will not be tracked whatsoever. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they >> use >> > > >> Scarf >> > > >> > > > gateway >> > > >> > > > > > or >> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this >> > > feature >> > > >> > with >> > > >> > > > > > users who >> > > >> > > > > > > > >>>> trust and if it works great - make it public. I >> > think >> > > >> it's >> > > >> > > > wise >> > > >> > > > > to >> > > >> > > > > > > > handle >> > > >> > > > > > > > >>>> edge cases and configure collected data more >> > > >> accurately. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> It would be a pixel in the webserver, should >> affect >> > > >> nothing >> > > >> > > at >> > > >> > > > > all >> > > >> > > > > > even >> > > >> > > > > > > > >>> in an air-gapped environment. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>>> 2. It should not affect anything if access to >> the >> > > >> internet >> > > >> > > is >> > > >> > > > > > > > restricted >> > > >> > > > > > > > >>>> which is default for many companies. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> 100% agreed on the below: >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>>> I think we have a very good blueprint to follow >> > > >> including >> > > >> > at >> > > >> > > > > > least 5 >> > > >> > > > > > > > >>>> other >> > > >> > > > > > > > >>>> ASF projects that also passed the review of the >> > > >> > privacy@asf. >> > > >> > > > > And >> > > >> > > > > > > > while I >> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by >> > > default >> > > >> > > coming >> > > >> > > > > from >> > > >> > > > > > > > >>>> consumer >> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is >> > not >> > > a >> > > >> > > > consumer >> > > >> > > > > > > > >>>> software and is used in "corporate environment" >> > which >> > > >> has >> > > >> > a >> > > >> > > > > little >> > > >> > > > > > > > >>>> different expectations and broad assumption that >> > the >> > > >> > company >> > > >> > > > can >> > > >> > > > > > make >> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the >> > > employees >> > > >> > using >> > > >> > > > it. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't >> we >> > > >> collect >> > > >> > > > > hamper >> > > >> > > > > > > > >>> security (and we should aim to do the same), most >> > > >> security >> > > >> > > > > > concerned >> > > >> > > > > > > > folks >> > > >> > > > > > > > >>> don't just >> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding >> release >> > > >> notes or >> > > >> > > > > > > > announcements >> > > >> > > > > > > > >>> and we can make it very clear in our >> announcements >> > > too; >> > > >> and >> > > >> > > in >> > > >> > > > > our >> > > >> > > > > > > > >>> installation guides. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> We should assume that those who deploy and >> upgrade >> > > >> Airflow >> > > >> > - >> > > >> > > > > > actually >> > > >> > > > > > > > read >> > > >> > > > > > > > >>>> and take into account what is written in the >> > release >> > > >> > notes - >> > > >> > > > > > > > especially >> > > >> > > > > > > > >>>> if >> > > >> > > > > > > > >>>> they have security guys breathing their necks, >> > > >> similarly >> > > >> > as >> > > >> > > we >> > > >> > > > > > have to >> > > >> > > > > > > > >>>> assume they follow CVE announcements about >> security >> > > >> issues >> > > >> > > > > fixed. >> > > >> > > > > > If >> > > >> > > > > > > > we >> > > >> > > > > > > > >>>> are very straightforward and out-going about the >> > > >> change, >> > > >> > > > inform >> > > >> > > > > > very >> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big >> problem >> > > with >> > > >> > > > opt-out. >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> To be clear, the collection of data, or at least >> the >> > > >> data >> > > >> > we >> > > >> > > > > should >> > > >> > > > > > > > >>> gather here should help all the consumers without >> > > >> violating >> > > >> > > > > > anything >> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the >> > > use-case >> > > >> > doc >> > > >> > > > [1] >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> "*Another Form of Contributing* >> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to >> > open >> > > >> > > > source?’, >> > > >> > > > > > ‘I've >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an >> > > >> > engineer.’ >> > > >> > > > > > Actually, >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> very simplest thing that you can do is just say, >> ‘my >> > > >> > > > organization >> > > >> > > > > > gets >> > > >> > > > > > > > real >> > > >> > > > > > > > >>> value from this piece of software.’ There are a >> > bunch >> > > of >> > > >> > ways >> > > >> > > > to >> > > >> > > > > > let >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If >> > your >> > > >> > > > > > organization is >> > > >> > > > > > > > >>> getting a lot of value from a piece of open >> source >> > > >> > software, >> > > >> > > > make >> > > >> > > > > > sure >> > > >> > > > > > > > the >> > > >> > > > > > > > >>> devs know about it.”" >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> [1] >> > > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin < >> > > >> > > > > kxe...@apache.org> >> > > >> > > > > > > > wrote: >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >>>> Hi Jarek! >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a >> project >> > > >> view. >> > > >> > I >> > > >> > > > just >> > > >> > > > > > > > suddenly >> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens >> and >> > > here >> > > >> > > comes >> > > >> > > > > the >> > > >> > > > > > > > data to >> > > >> > > > > > > > >>>> some third party service - that's a view from a >> > user >> > > >> side >> > > >> > of >> > > >> > > > > some >> > > >> > > > > > big >> > > >> > > > > > > > >>>> company. >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>> There could be good alternatives to handle this: >> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this >> > > >> feature >> > > >> > > with >> > > >> > > > > > users >> > > >> > > > > > > > who >> > > >> > > > > > > > >>>> trust and if it works great - make it public. I >> > think >> > > >> it's >> > > >> > > > wise >> > > >> > > > > to >> > > >> > > > > > > > handle >> > > >> > > > > > > > >>>> edge cases and configure collected data more >> > > >> accurately. >> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to >> > make >> > > >> this >> > > >> > > > > > feature not >> > > >> > > > > > > > >>>> get >> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration. >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (: >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>> -- >> > > >> > > > > > > > >>>> ,,,^..^,,, >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk < >> > > >> > > > ja...@potiuk.com> >> > > >> > > > > > > > wrote: >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>>>> Hello everyone, >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> it has to be: >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security >> guys >> > > >> about >> > > >> > new >> > > >> > > > > > unplanned >> > > >> > > > > > > > >>>>>> activity after regular upgrade. >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> That's a very good point about security >> triggering >> > > >> > > Alexander, >> > > >> > > > > > but I >> > > >> > > > > > > > am >> > > >> > > > > > > > >>>> not >> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. >> > There >> > > >> are >> > > >> > > other >> > > >> > > > > > ways of >> > > >> > > > > > > > >>>>> communicating with the "deployment managers" >> who >> > > >> install >> > > >> > > and >> > > >> > > > > > upgrade >> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social >> media >> > of >> > > >> > ours, >> > > >> > > > > slack >> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels >> we >> > can >> > > >> use >> > > >> > to >> > > >> > > > > > > > >>>> communicate the >> > > >> > > > > > > > >>>>> change. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> I think we have a very good blueprint to follow >> > > >> including >> > > >> > > at >> > > >> > > > > > least 5 >> > > >> > > > > > > > >>>> other >> > > >> > > > > > > > >>>>> ASF projects that also passed the review of the >> > > >> > > privacy@asf. >> > > >> > > > > And >> > > >> > > > > > > > >>>> while I >> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by >> > > default >> > > >> > > coming >> > > >> > > > > > from >> > > >> > > > > > > > >>>> consumer >> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow >> is >> > > not a >> > > >> > > > consumer >> > > >> > > > > > > > >>>>> software and is used in "corporate environment" >> > > which >> > > >> > has a >> > > >> > > > > > little >> > > >> > > > > > > > >>>>> different expectations and broad assumption >> that >> > the >> > > >> > > company >> > > >> > > > > can >> > > >> > > > > > make >> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the >> > > employees >> > > >> > > using >> > > >> > > > > it. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> We should assume that those who deploy and >> upgrade >> > > >> > Airflow >> > > >> > > - >> > > >> > > > > > actually >> > > >> > > > > > > > >>>> read >> > > >> > > > > > > > >>>>> and take into account what is written in the >> > release >> > > >> > notes >> > > >> > > - >> > > >> > > > > > > > >>>> especially if >> > > >> > > > > > > > >>>>> they have security guys breathing their necks, >> > > >> similarly >> > > >> > as >> > > >> > > > we >> > > >> > > > > > have >> > > >> > > > > > > > to >> > > >> > > > > > > > >>>>> assume they follow CVE announcements about >> > security >> > > >> > issues >> > > >> > > > > > fixed. If >> > > >> > > > > > > > we >> > > >> > > > > > > > >>>>> are very straightforward and out-going about >> the >> > > >> change, >> > > >> > > > inform >> > > >> > > > > > very >> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big >> problem >> > > with >> > > >> > > > opt-out. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o >> (but >> > I'v >> > > >> > spend >> > > >> > > a >> > > >> > > > > good >> > > >> > > > > > > > deal >> > > >> > > > > > > > >>>> of >> > > >> > > > > > > > >>>>> time reading the Superset and other use case >> and >> > > >> > > explanation >> > > >> > > > > in >> > > >> > > > > > > > >>>> detail to >> > > >> > > > > > > > >>>>> make a better informed decision) - and it looks >> > like >> > > >> they >> > > >> > > > also >> > > >> > > > > > went >> > > >> > > > > > > > >>>> opt-out >> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o. And if we >> > > cannot >> > > >> > > reach >> > > >> > > > > > > > >>>> consensus, we >> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on >> it >> > > >> (because >> > > >> > > > yes, >> > > >> > > > > > it is >> > > >> > > > > > > > an >> > > >> > > > > > > > >>>>> important decision), but - after reading and >> > > >> > understanding >> > > >> > > > why >> > > >> > > > > > others >> > > >> > > > > > > > >>>> also >> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good >> > path. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> Also because it will rather increase the >> amount of >> > > >> data >> > > >> > to >> > > >> > > > > > gather, >> > > >> > > > > > > > and >> > > >> > > > > > > > >>>> in >> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be >> even >> > > >> better >> > > >> > for >> > > >> > > > > > privacy >> > > >> > > > > > > > and >> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we >> get, >> > > the >> > > >> > more >> > > >> > > > > > difficult >> > > >> > > > > > > > >>>> it >> > > >> > > > > > > > >>>>> will be to get any >> non-statistical/non-aggregated >> > > >> insight >> > > >> > > > from >> > > >> > > > > > it. >> > > >> > > > > > > > >>>> Imagine >> > > >> > > > > > > > >>>>> if only a few corporate users will enable it >> > > >> consciously >> > > >> > - >> > > >> > > > then >> > > >> > > > > > we >> > > >> > > > > > > > >>>> will be >> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find >> out >> > > who >> > > >> > they >> > > >> > > > are, >> > > >> > > > > > than >> > > >> > > > > > > > if >> > > >> > > > > > > > >>>>> everyone has it enabled by default. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to >> us to >> > > >> vote, >> > > >> > > for >> > > >> > > > me >> > > >> > > > > > > > opt-in >> > > >> > > > > > > > >>>> is >> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>> J. >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>>>> Hi all, >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for >> > Airflow >> > > >> > > > > > installations. >> > > >> > > > > > > > >>>> As the >> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying >> heavily >> > on >> > > >> the >> > > >> > > > yearly >> > > >> > > > > > > > >>>> Airflow >> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key >> > questions >> > > >> > about >> > > >> > > > > > Airflow >> > > >> > > > > > > > >>>> usage. >> > > >> > > > > > > > >>>>>>> Questions like the following: >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> - Which versions of Airflow are people >> > > >> > installing/using >> > > >> > > > now >> > > >> > > > > > > > >>>> (i.e. >> > > >> > > > > > > > >>>>>>> whether people have primarily made the jump >> > from >> > > >> > > version >> > > >> > > > X >> > > >> > > > > to >> > > >> > > > > > > > >>>>> version >> > > >> > > > > > > > >>>>>> Y) >> > > >> > > > > > > > >>>>>>> - Which DB is used as the Metadata DB and >> > which >> > > >> > version >> > > >> > > > e.g >> > > >> > > > > > Pg >> > > >> > > > > > > > >>>> 14? >> > > >> > > > > > > > >>>>>>> - What Python version is being used? >> > > >> > > > > > > > >>>>>>> - Which Executor is being used? >> > > >> > > > > > > > >>>>>>> - Approximately how many people out there >> in >> > the >> > > >> > world >> > > >> > > > are >> > > >> > > > > > > > >>>>> installing >> > > >> > > > > > > > >>>>>>> Airflow >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> There is a solution that should help answer >> > these >> > > >> > > > questions: >> > > >> > > > > > Scarf >> > > >> > > > > > > > >>>> [1]. >> > > >> > > > > > > > >>>>>> The >> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is >> already >> > > >> used >> > > >> > by >> > > >> > > > > other >> > > >> > > > > > ASF >> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler >> [5], >> > > Dubbo >> > > >> > > > > > Kubernetes, >> > > >> > > > > > > > >>>>> DevLake, >> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other >> > > regulations. >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it >> as >> > > >> follows: >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> 1. Install the `scarf js` npm package and >> > bundle >> > > >> it >> > > >> > in >> > > >> > > > the >> > > >> > > > > > > > >>>>> Webserver. >> > > >> > > > > > > > >>>>>>> When the package is downloaded & Airflow >> > > >> webserver is >> > > >> > > > > opened, >> > > >> > > > > > > > >>>>> metadata >> > > >> > > > > > > > >>>>>>> is >> > > >> > > > > > > > >>>>>>> recorded to the Scarf dashboard. >> > > >> > > > > > > > >>>>>>> 2. Utilize the Scarf Gateway [6], which we >> can >> > > >> use in >> > > >> > > > front >> > > >> > > > > > of >> > > >> > > > > > > > >>>>> docker >> > > >> > > > > > > > >>>>>>> containers. While it’s possible people go >> > around >> > > >> this >> > > >> > > > > > gateway, >> > > >> > > > > > > > >>>> we >> > > >> > > > > > > > >>>>> can >> > > >> > > > > > > > >>>>>>> probably configure and encourage most >> traffic >> > to >> > > >> go >> > > >> > > > through >> > > >> > > > > > > > >>>> these >> > > >> > > > > > > > >>>>>>> gateways. >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally >> > > >> identifying >> > > >> > > > > > information >> > > >> > > > > > > > >>>> from >> > > >> > > > > > > > >>>>>> SDK >> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of >> > > >> IP-derived >> > > >> > > > > > > > >>>> information as >> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made >> as >> > > >> > > transparent >> > > >> > > > as >> > > >> > > > > > > > >>>> possible >> > > >> > > > > > > > >>>>> by >> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC >> and >> > > any >> > > >> > > other >> > > >> > > > > > relevant >> > > >> > > > > > > > >>>>> means >> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter >> (Town >> > > >> Hall, >> > > >> > > > Slack, >> > > >> > > > > > > > >>>> Newsletter >> > > >> > > > > > > > >>>>>>> etc). >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> The following case studies are worth reading: >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> 1. >> > > >> > > > > > >> > https://about.scarf.sh/post/scarf-case-study-apache-superset >> > > >> > > > > > > > >>>>> (From >> > > >> > > > > > > > >>>>>>> Maxime) >> > > >> > > > > > > > >>>>>>> 2. >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various >> ways >> > > >> that >> > > >> > > come >> > > >> > > > > with >> > > >> > > > > > > > >>>> using >> > > >> > > > > > > > >>>>>> data >> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on >> > "how >> > > >> to >> > > >> > > > > opt-out" >> > > >> > > > > > > > >>>>>> [8][9][10] & >> > > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow >> > > >> website, >> > > >> > > this >> > > >> > > > > > can be >> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we >> would >> > be >> > > >> > making >> > > >> > > > more >> > > >> > > > > > > > >>>> informed >> > > >> > > > > > > > >>>>>>> decisions. >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> Regards, >> > > >> > > > > > > > >>>>>>> Kaxil >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/ >> > > >> > > > > > > > >>>>>>> [2] >> > > >> > > > > > >> > > https://privacy.apache.org/policies/privacy-policy-public.html >> > > >> > > > > > > > >>>>>>> [3] >> > > https://privacy.apache.org/faq/committers.html >> > > >> > > > > > > > >>>>>>> [4] >> > > https://github.com/apache/superset/issues/25639 >> > > >> > > > > > > > >>>>>>> [5] >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code >> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway >> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy >> > > >> > > > > > > > >>>>>>> [8] >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data >> > > >> > > > > > > > >>>>>>> [9] >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose >> > > >> > > > > > > > >>>>>>> [10] >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics >> > > >> > > > > > > > >>>>>>> >> > > >> > > > > > > > >>>>>> >> > > >> > > > > > > > >>>>> >> > > >> > > > > > > > >>>> >> > > >> > > > > > > > >>> >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > >> > > >> >> --------------------------------------------------------------------- >> > > >> > > > > > > > To unsubscribe, e-mail: >> > > dev-unsubscr...@airflow.apache.org >> > > >> > > > > > > > For additional commands, e-mail: >> > > >> dev-h...@airflow.apache.org >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > >> > --------------------------------------------------------------------- >> > > >> > > > > > To unsubscribe, e-mail: >> dev-unsubscr...@airflow.apache.org >> > > >> > > > > > For additional commands, e-mail: >> > dev-h...@airflow.apache.org >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > >> > >> >