Once we decide to go ahead with this, I think it might be worth for us to also check if we can export metrics regarding which all providers are getting used for the user deployments. That I believe would help us understand the adoption of the provider and would also help in our decision making when we discuss that one is due for suspension.
On Tue, 9 Apr 2024, 20:51 Kaxil Naik, <kaxiln...@gmail.com> wrote: > The webserver is packaged after compiling, so that won't be possible > Michal. > > On Tue, 9 Apr 2024 at 11:02, Michał Modras <michalmod...@google.com> > wrote: > > > If it is packaged and installed by default, we add the dependency (and > its > > dependencies) to Airflow's already-not-small dependency tree. If we make > it > > installed and enabled by default, would there be an easy way to not just > > switch it off (e.g. through the env variable), but also not package it at > > all? That's why I was suggesting a provider, but actually any other > > pluggable (and unpluggable) mechanism would work. > > > > On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <huss...@awala.fr> wrote: > > > >> > Other than that I don't mind it being e.g. optional provider. > >> > >> I don't think it is possible to implement it in a provider because it > is a > >> js package installed on the webserver; we could implement it as a plugin > >> (Blueprint), but in this case, the user must make an effort to register > >> it. > >> > >> It would be better to always install it, and activate it by default, > with > >> the possibility of deactivating it via the environment variable > >> `SCARF_ANALYTICS=false` (according to the documentation), where if it is > >> deactivated by default, many users will not activate it even if they > don't > >> mind to report the metrics, but if we enable it by default, only users > who > >> don't want to send metrics will disable it. > >> > >> > >> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras > >> <michalmod...@google.com.invalid> wrote: > >> > >> > My 2 cents: it must be possible to opt-out, preferably it should be > >> > possible to deploy Airflow instances without bundling the telemetry > >> library > >> > dependencies. Other than that I don't mind it being e.g. optional > >> provider. > >> > > >> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <huss...@awala.fr> > >> > napisał: > >> > > >> > > > I'd like to propose, that we start with collecting simple data > with > >> > > limited access: to all the PMC members. We can always expand it to > >> > > Committers and then expand further to make it invite-only or setup > >> > > exporting it to a DB like Postgres > >> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a > >> > publicly > >> > > viewable dashboard. > >> > > > >> > > Looks like a good plan; we can discuss the export format when we > >> decide > >> > to > >> > > do it. > >> > > > >> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> > >> wrote: > >> > > > >> > > > Yup, exactly. > >> > > > > >> > > > I believe this would definitely help us take early and informed > >> > > decisions. > >> > > >> E.g. Had we had this earlier, I believe it would have definitely > >> > helped > >> > > us > >> > > >> more for our past discussions like whether we should continue > >> > supporting > >> > > >> MsSQL( > >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 > >> > > ), > >> > > >> similarly about the DaskExecutor ( > >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1 > ), > >> > etc. > >> > > >> > >> > > > > >> > > > > >> > > > Btw clarifying my own stance on the below; and let me know what > you > >> > > think @Hussein > >> > > > Awala <huss...@awala.fr> : I'd like to propose, that we start > with > >> > > > collecting simple data with limited access: to all the PMC > members. > >> We > >> > > can > >> > > > always expand it to Committers and then expand further to make it > >> > > > invite-only or setup exporting it to a DB like Postgres > >> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a > >> > > publicly > >> > > > viewable dashboard. It would be similar to an iterative software > >> > > > development approach, since this will be the first time for us, as > >> > > Airflow > >> > > > PMC, to add such telemetry. This is of course just my opinion > >> though :) > >> > > > > >> > > > Regarding the data, like I had mentioned in the email and I am > glad > >> > > others > >> > > >> including you are on the same page that the data will be shared > >> with > >> > all > >> > > >> PMC members. The point about sharing it via website and > newsletter > >> was > >> > > for > >> > > >> the community — Airflow users. I don’t think anyone in the > >> community > >> > > (apart > >> > > >> from the PMC members) would need raw data. And even if they need > >> it, > >> > I’d > >> > > >> say they should put effort and contribute to the Airflow project > >> and > >> > > become > >> > > >> PMC members. > >> > > >> To be clear: this telemetry data should help us, as Airflow PMC, > to > >> > > steer > >> > > >> some of the decision making based on this data similar to how > only > >> PMC > >> > > has > >> > > >> a binding vote on the releases. [1] and this is similar to how > >> Apache > >> > > >> Superset does it too. > >> > > >> [1] > >> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc > >> > > > > >> > > > > >> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti < > pankaj.k...@astronomer.io > >> > > .invalid> > >> > > > wrote: > >> > > > > >> > > >> +1 to introduce this. > >> > > >> > >> > > >> I believe this would definitely help us take early and informed > >> > > decisions. > >> > > >> E.g. Had we had this earlier, I believe it would have definitely > >> > helped > >> > > us > >> > > >> more for our past discussions like whether we should continue > >> > supporting > >> > > >> MsSQL( > >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4 > >> > > ), > >> > > >> similarly about the DaskExecutor ( > >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1 > ), > >> > etc. > >> > > >> > >> > > >> > >> > > >> Best regards, > >> > > >> > >> > > >> *Pankaj Koti* > >> > > >> Senior Software Engineer (Airflow OSS Engineering team) > >> > > >> Location: Pune, Maharashtra, India > >> > > >> Timezone: Indian Standard Time (IST) > >> > > >> Phone: +91 9730079985 <+91%2097300%2079985> > >> > > >> > >> > > >> > >> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> > >> > wrote: > >> > > >> > >> > > >> > Yup, I had added a link to scarf docs in the original email > that > >> > > >> referenced > >> > > >> > opting out and we should even add an Airflow config that puts > all > >> > > >> config in > >> > > >> > a single place. Without it we can’t be compliant to all the > >> policies > >> > > >> even > >> > > >> > if we collectively ignore or are unaware of the importance of > it. > >> > > >> > > >> > > >> > Regarding the data, like I had mentioned in the email and I am > >> glad > >> > > >> others > >> > > >> > including you are on the same page that the data will be shared > >> with > >> > > all > >> > > >> > PMC members. The point about sharing it via website and > >> newsletter > >> > was > >> > > >> for > >> > > >> > the community — Airflow users. I don’t think anyone in the > >> community > >> > > >> (apart > >> > > >> > from the PMC members) would need raw data. And even if they > need > >> it, > >> > > I’d > >> > > >> > say they should put effort and contribute to the Airflow > project > >> and > >> > > >> become > >> > > >> > PMC members. > >> > > >> > > >> > > >> > To be clear: this telemetry data should help us, as Airflow > PMC, > >> to > >> > > >> steer > >> > > >> > some of the decision making based on this data similar to how > >> only > >> > PMC > >> > > >> has > >> > > >> > a binding vote on the releases. [1] and this is similar to how > >> > Apache > >> > > >> > Superset does it too. > >> > > >> > > >> > > >> > [1] > >> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> > >> > wrote: > >> > > >> > > >> > > >> > > I mentioned opting out just to confirm its importance, and > >> after > >> > > >> checking > >> > > >> > > the Scarf documentation it appears to be supported natively > by > >> > > Scarf. > >> > > >> For > >> > > >> > > data accessibility, my point was more about raw data, not > just > >> > > >> aggregated > >> > > >> > > information/insights shared via monthly newsletters, as we do > >> for > >> > > >> Airflow > >> > > >> > > annual Survey for example: > >> > > >> > > https://airflow.apache.org/survey vs > >> > > >> > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics > >> > > >> > > . > >> > > >> > > > >> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik < > kaxiln...@gmail.com > >> > > >> > > >> wrote: > >> > > >> > > > >> > > >> > > > Agreed to both your points Hussein but both the points are > >> > already > >> > > >> > > covered > >> > > >> > > > in my original discussion post - both about opting out and > >> > > providing > >> > > >> > data > >> > > >> > > > to all the PMC members and providing visibility via Monthly > >> > > >> > newsletters. > >> > > >> > > Is > >> > > >> > > > there anything else you propose to discuss that isn’t > >> covered? > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala < > huss...@awala.fr > >> > > >> > > >> wrote: > >> > > >> > > > > >> > > >> > > > > +1 for the idea in general, but there are two main points > >> to > >> > > >> discuss > >> > > >> > > > before > >> > > >> > > > > voting on this: > >> > > >> > > > > > >> > > >> > > > > 1. We should provide an option to disable Scarf: > >> > > >> > > > > As Airflow is not a paid product, we cannot force > >> companies to > >> > > >> report > >> > > >> > > > their > >> > > >> > > > > use of this project. Otherwise, some may choose to create > >> > their > >> > > >> own > >> > > >> > > fork > >> > > >> > > > > just to disable Scarf. > >> > > >> > > > > > >> > > >> > > > > 2. Concerning the exclusivity of access to data: > >> > > >> > > > > The data collected must either be completely proprietary > >> for > >> > use > >> > > >> by > >> > > >> > PMC > >> > > >> > > > and > >> > > >> > > > > ASF, or completely open. Since many companies offer > Airflow > >> > as a > >> > > >> > > product, > >> > > >> > > > > it is imperative not to give one company more privileges > >> than > >> > > >> > others. I > >> > > >> > > > > raise this point for the principle of equality of > >> opportunity. > >> > > >> > > > > > >> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia < > >> > > >> sunank...@gmail.com > >> > > >> > > > >> > > >> > > > > wrote: > >> > > >> > > > > > >> > > >> > > > > > Big +1 for Scarf. > >> > > >> > > > > > > >> > > >> > > > > > Transparency is key, so it's important to be super > clear > >> > about > >> > > >> > opting > >> > > >> > > > > > out and what's tracked to avoid spooking anyone about > IP > >> > > stuff. > >> > > >> > > > > > > >> > > >> > > > > > Regards > >> > > >> > > > > > Ankit Chaurasia > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai < > >> > > >> > > amoghdesai....@gmail.com> > >> > > >> > > > > > wrote: > >> > > >> > > > > > > > >> > > >> > > > > > > +1 looks like a good tool which could be super > helpful. > >> > > >> > > > > > > > >> > > >> > > > > > > * We should have some transparency into the data that > >> is > >> > > >> > collected > >> > > >> > > or > >> > > >> > > > > > sent > >> > > >> > > > > > > * We should have an option to optionally opt-out > >> > > >> > > > > > > > >> > > >> > > > > > > Thanks & Regards, > >> > > >> > > > > > > Amogh Desai > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee < > >> > > weilee...@gmail.com> > >> > > >> > > wrote: > >> > > >> > > > > > > > >> > > >> > > > > > > > +1 to this. It would be really useful. As long as > we > >> can > >> > > opt > >> > > >> > > out, I > >> > > >> > > > > > think > >> > > >> > > > > > > > we’re good. > >> > > >> > > > > > > > > >> > > >> > > > > > > > Best, > >> > > >> > > > > > > > Wei > >> > > >> > > > > > > > > >> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik < > >> > > >> > kaxiln...@gmail.com> > >> > > >> > > > > > wrote: > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > Grammar Correction: > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > We should assume that those who deploy and > upgrade > >> > > >> Airflow - > >> > > >> > > > > actually > >> > > >> > > > > > > > read > >> > > >> > > > > > > > >> and take into account what is written in the > >> release > >> > > >> notes - > >> > > >> > > > > > especially > >> > > >> > > > > > > > if > >> > > >> > > > > > > > >> they have security guys breathing their necks, > >> > > similarly > >> > > >> as > >> > > >> > we > >> > > >> > > > > have > >> > > >> > > > > > to > >> > > >> > > > > > > > >> assume they follow CVE announcements about > >> security > >> > > >> issues > >> > > >> > > > fixed. > >> > > >> > > > > > If we > >> > > >> > > > > > > > >> are very straightforward and out-going about the > >> > > change, > >> > > >> > > inform > >> > > >> > > > > very > >> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big > problem > >> > with > >> > > >> > > opt-out. > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't > >> > collect > >> > > >> any > >> > > >> > > data > >> > > >> > > > > that > >> > > >> > > > > > > > > hamper security (and we should aim to do the > same), > >> > most > >> > > >> > > security > >> > > >> > > > > > > > concerned > >> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them > >> > > >> regarding > >> > > >> > > > release > >> > > >> > > > > > notes > >> > > >> > > > > > > > > or announcements and we can make it very clear in > >> our > >> > > >> > > > announcements > >> > > >> > > > > > too; > >> > > >> > > > > > > > > and in our installation guides. > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik < > >> > > >> > kaxiln...@gmail.com> > >> > > >> > > > > > wrote: > >> > > >> > > > > > > > > > >> > > >> > > > > > > > >> Grammar crrection: > >> > > >> > > > > > > > >> > >> > > >> > > > > > > > >> > >> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik < > >> > > >> > kaxiln...@gmail.com > >> > > >> > > > > >> > > >> > > > > > wrote: > >> > > >> > > > > > > > >> > >> > > >> > > > > > > > >>> Have this at the end of the email too: but if > >> folks > >> > > >> don't > >> > > >> > > read > >> > > >> > > > > > until > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> end and quoting Maxime from the use-case > blog[1]: > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute > to > >> > open > >> > > >> > > > source?’, > >> > > >> > > > > > ‘I've > >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be > an > >> > > >> > engineer.’ > >> > > >> > > > > > Actually, > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> very simplest thing that you can do is just > say, > >> ‘my > >> > > >> > > > organization > >> > > >> > > > > > gets > >> > > >> > > > > > > > real > >> > > >> > > > > > > > >>> value from this piece of software.’ There are a > >> > bunch > >> > > of > >> > > >> > ways > >> > > >> > > > to > >> > > >> > > > > > let > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> people know about it – and now Scarf is there. > If > >> > your > >> > > >> > > > > > organization is > >> > > >> > > > > > > > >>> getting a lot of value from a piece of open > >> source > >> > > >> > software, > >> > > >> > > > make > >> > > >> > > > > > sure > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> devs know about it." > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> What kind of edge cases are you thinking > about? I > >> > > don't > >> > > >> > think > >> > > >> > > > it > >> > > >> > > > > > makes > >> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is > to > >> > > collect > >> > > >> > data > >> > > >> > > > for > >> > > >> > > > > > most > >> > > >> > > > > > > > >>> Airflow installations except for those that > don't > >> > want > >> > > >> to > >> > > >> > > give > >> > > >> > > > > > data, > >> > > >> > > > > > > > then > >> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As > >> long as > >> > > we > >> > > >> > don't > >> > > >> > > > > > collect > >> > > >> > > > > > > > any > >> > > >> > > > > > > > >>> PII data, this is in-compliance as well. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have > to > >> > > opt-in > >> > > >> > via > >> > > >> > > a > >> > > >> > > > > > config, > >> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence > >> us > >> > > >> losing > >> > > >> > > most > >> > > >> > > > of > >> > > >> > > > > > the > >> > > >> > > > > > > > data. > >> > > >> > > > > > > > >>> I understand why some orgs & individuals may > >> want to > >> > > >> > opt-out. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an > >> HTML > >> > > >> image > >> > > >> > > tag) > >> > > >> > > > > > that you > >> > > >> > > > > > > > >>> can place in your website or product to track > >> > visitors > >> > > >> to > >> > > >> > > that > >> > > >> > > > > > URL. If > >> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF > >> wouldn't > >> > > have > >> > > >> > > > approved > >> > > >> > > > > > it > >> > > >> > > > > > > > at all. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> A few key details to note about the pixel: > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> - No PII is tracked… Scarf does not > >> capture/retain > >> > > IP > >> > > >> > > > > > information… > >> > > >> > > > > > > > >>> this information is discarded by the platform > >> upon > >> > > >> > > > > > > > processing/aggregating > >> > > >> > > > > > > > >>> - Scarf pixels respect the Do Not Track (DNT) > >> > > >> settings of > >> > > >> > > > > > browsers - > >> > > >> > > > > > > > >>> these users will not be tracked whatsoever. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they > >> use > >> > > >> Scarf > >> > > >> > > > gateway > >> > > >> > > > > > or > >> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test > this > >> > > feature > >> > > >> > with > >> > > >> > > > > > users who > >> > > >> > > > > > > > >>>> trust and if it works great - make it public. > I > >> > think > >> > > >> it's > >> > > >> > > > wise > >> > > >> > > > > to > >> > > >> > > > > > > > handle > >> > > >> > > > > > > > >>>> edge cases and configure collected data more > >> > > >> accurately. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> It would be a pixel in the webserver, should > >> affect > >> > > >> nothing > >> > > >> > > at > >> > > >> > > > > all > >> > > >> > > > > > even > >> > > >> > > > > > > > >>> in an air-gapped environment. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>>> 2. It should not affect anything if access to > >> the > >> > > >> internet > >> > > >> > > is > >> > > >> > > > > > > > restricted > >> > > >> > > > > > > > >>>> which is default for many companies. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> 100% agreed on the below: > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>>> I think we have a very good blueprint to > follow > >> > > >> including > >> > > >> > at > >> > > >> > > > > > least 5 > >> > > >> > > > > > > > >>>> other > >> > > >> > > > > > > > >>>> ASF projects that also passed the review of > the > >> > > >> > privacy@asf. > >> > > >> > > > > And > >> > > >> > > > > > > > while I > >> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by > >> > > default > >> > > >> > > coming > >> > > >> > > > > from > >> > > >> > > > > > > > >>>> consumer > >> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow > is > >> > not > >> > > a > >> > > >> > > > consumer > >> > > >> > > > > > > > >>>> software and is used in "corporate > environment" > >> > which > >> > > >> has > >> > > >> > a > >> > > >> > > > > little > >> > > >> > > > > > > > >>>> different expectations and broad assumption > that > >> > the > >> > > >> > company > >> > > >> > > > can > >> > > >> > > > > > make > >> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the > >> > > employees > >> > > >> > using > >> > > >> > > > it. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> Couldn't agree more; even though there > shouldn't > >> we > >> > > >> collect > >> > > >> > > > > hamper > >> > > >> > > > > > > > >>> security (and we should aim to do the same), > most > >> > > >> security > >> > > >> > > > > > concerned > >> > > >> > > > > > > > folks > >> > > >> > > > > > > > >>> don't just > >> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding > >> release > >> > > >> notes or > >> > > >> > > > > > > > announcements > >> > > >> > > > > > > > >>> and we can make it very clear in our > >> announcements > >> > > too; > >> > > >> and > >> > > >> > > in > >> > > >> > > > > our > >> > > >> > > > > > > > >>> installation guides. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> We should assume that those who deploy and > >> upgrade > >> > > >> Airflow > >> > > >> > - > >> > > >> > > > > > actually > >> > > >> > > > > > > > read > >> > > >> > > > > > > > >>>> and take into account what is written in the > >> > release > >> > > >> > notes - > >> > > >> > > > > > > > especially > >> > > >> > > > > > > > >>>> if > >> > > >> > > > > > > > >>>> they have security guys breathing their necks, > >> > > >> similarly > >> > > >> > as > >> > > >> > > we > >> > > >> > > > > > have to > >> > > >> > > > > > > > >>>> assume they follow CVE announcements about > >> security > >> > > >> issues > >> > > >> > > > > fixed. > >> > > >> > > > > > If > >> > > >> > > > > > > > we > >> > > >> > > > > > > > >>>> are very straightforward and out-going about > the > >> > > >> change, > >> > > >> > > > inform > >> > > >> > > > > > very > >> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big > >> problem > >> > > with > >> > > >> > > > opt-out. > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> To be clear, the collection of data, or at > least > >> the > >> > > >> data > >> > > >> > we > >> > > >> > > > > should > >> > > >> > > > > > > > >>> gather here should help all the consumers > without > >> > > >> violating > >> > > >> > > > > > anything > >> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the > >> > > use-case > >> > > >> > doc > >> > > >> > > > [1] > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> "*Another Form of Contributing* > >> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute > to > >> > open > >> > > >> > > > source?’, > >> > > >> > > > > > ‘I've > >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be > an > >> > > >> > engineer.’ > >> > > >> > > > > > Actually, > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> very simplest thing that you can do is just > say, > >> ‘my > >> > > >> > > > organization > >> > > >> > > > > > gets > >> > > >> > > > > > > > real > >> > > >> > > > > > > > >>> value from this piece of software.’ There are a > >> > bunch > >> > > of > >> > > >> > ways > >> > > >> > > > to > >> > > >> > > > > > let > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> people know about it – and now Scarf is there. > If > >> > your > >> > > >> > > > > > organization is > >> > > >> > > > > > > > >>> getting a lot of value from a piece of open > >> source > >> > > >> > software, > >> > > >> > > > make > >> > > >> > > > > > sure > >> > > >> > > > > > > > the > >> > > >> > > > > > > > >>> devs know about it.”" > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> [1] > >> > > >> > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin > < > >> > > >> > > > > kxe...@apache.org> > >> > > >> > > > > > > > wrote: > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > >>>> Hi Jarek! > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a > >> project > >> > > >> view. > >> > > >> > I > >> > > >> > > > just > >> > > >> > > > > > > > suddenly > >> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens > >> and > >> > > here > >> > > >> > > comes > >> > > >> > > > > the > >> > > >> > > > > > > > data to > >> > > >> > > > > > > > >>>> some third party service - that's a view from > a > >> > user > >> > > >> side > >> > > >> > of > >> > > >> > > > > some > >> > > >> > > > > > big > >> > > >> > > > > > > > >>>> company. > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>> There could be good alternatives to handle > this: > >> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test > this > >> > > >> feature > >> > > >> > > with > >> > > >> > > > > > users > >> > > >> > > > > > > > who > >> > > >> > > > > > > > >>>> trust and if it works great - make it public. > I > >> > think > >> > > >> it's > >> > > >> > > > wise > >> > > >> > > > > to > >> > > >> > > > > > > > handle > >> > > >> > > > > > > > >>>> edge cases and configure collected data more > >> > > >> accurately. > >> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature > to > >> > make > >> > > >> this > >> > > >> > > > > > feature not > >> > > >> > > > > > > > >>>> get > >> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible > frustration. > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (: > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>> -- > >> > > >> > > > > > > > >>>> ,,,^..^,,, > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk < > >> > > >> > > > ja...@potiuk.com> > >> > > >> > > > > > > > wrote: > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>>>> Hello everyone, > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> it has to be: > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security > >> guys > >> > > >> about > >> > > >> > new > >> > > >> > > > > > unplanned > >> > > >> > > > > > > > >>>>>> activity after regular upgrade. > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> That's a very good point about security > >> triggering > >> > > >> > > Alexander, > >> > > >> > > > > > but I > >> > > >> > > > > > > > am > >> > > >> > > > > > > > >>>> not > >> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. > >> > There > >> > > >> are > >> > > >> > > other > >> > > >> > > > > > ways of > >> > > >> > > > > > > > >>>>> communicating with the "deployment managers" > >> who > >> > > >> install > >> > > >> > > and > >> > > >> > > > > > upgrade > >> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social > >> media > >> > of > >> > > >> > ours, > >> > > >> > > > > slack > >> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels > >> we > >> > can > >> > > >> use > >> > > >> > to > >> > > >> > > > > > > > >>>> communicate the > >> > > >> > > > > > > > >>>>> change. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> I think we have a very good blueprint to > follow > >> > > >> including > >> > > >> > > at > >> > > >> > > > > > least 5 > >> > > >> > > > > > > > >>>> other > >> > > >> > > > > > > > >>>>> ASF projects that also passed the review of > the > >> > > >> > > privacy@asf. > >> > > >> > > > > And > >> > > >> > > > > > > > >>>> while I > >> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in > by > >> > > default > >> > > >> > > coming > >> > > >> > > > > > from > >> > > >> > > > > > > > >>>> consumer > >> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow > >> is > >> > > not a > >> > > >> > > > consumer > >> > > >> > > > > > > > >>>>> software and is used in "corporate > environment" > >> > > which > >> > > >> > has a > >> > > >> > > > > > little > >> > > >> > > > > > > > >>>>> different expectations and broad assumption > >> that > >> > the > >> > > >> > > company > >> > > >> > > > > can > >> > > >> > > > > > make > >> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the > >> > > employees > >> > > >> > > using > >> > > >> > > > > it. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> We should assume that those who deploy and > >> upgrade > >> > > >> > Airflow > >> > > >> > > - > >> > > >> > > > > > actually > >> > > >> > > > > > > > >>>> read > >> > > >> > > > > > > > >>>>> and take into account what is written in the > >> > release > >> > > >> > notes > >> > > >> > > - > >> > > >> > > > > > > > >>>> especially if > >> > > >> > > > > > > > >>>>> they have security guys breathing their > necks, > >> > > >> similarly > >> > > >> > as > >> > > >> > > > we > >> > > >> > > > > > have > >> > > >> > > > > > > > to > >> > > >> > > > > > > > >>>>> assume they follow CVE announcements about > >> > security > >> > > >> > issues > >> > > >> > > > > > fixed. If > >> > > >> > > > > > > > we > >> > > >> > > > > > > > >>>>> are very straightforward and out-going about > >> the > >> > > >> change, > >> > > >> > > > inform > >> > > >> > > > > > very > >> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big > >> problem > >> > > with > >> > > >> > > > opt-out. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o > >> (but > >> > I'v > >> > > >> > spend > >> > > >> > > a > >> > > >> > > > > good > >> > > >> > > > > > > > deal > >> > > >> > > > > > > > >>>> of > >> > > >> > > > > > > > >>>>> time reading the Superset and other use case > >> and > >> > > >> > > explanation > >> > > >> > > > > in > >> > > >> > > > > > > > >>>> detail to > >> > > >> > > > > > > > >>>>> make a better informed decision) - and it > looks > >> > like > >> > > >> they > >> > > >> > > > also > >> > > >> > > > > > went > >> > > >> > > > > > > > >>>> opt-out > >> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o. And if > we > >> > > cannot > >> > > >> > > reach > >> > > >> > > > > > > > >>>> consensus, we > >> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on > >> it > >> > > >> (because > >> > > >> > > > yes, > >> > > >> > > > > > it is > >> > > >> > > > > > > > an > >> > > >> > > > > > > > >>>>> important decision), but - after reading and > >> > > >> > understanding > >> > > >> > > > why > >> > > >> > > > > > others > >> > > >> > > > > > > > >>>> also > >> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good > >> > path. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> Also because it will rather increase the > >> amount of > >> > > >> data > >> > > >> > to > >> > > >> > > > > > gather, > >> > > >> > > > > > > > and > >> > > >> > > > > > > > >>>> in > >> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be > >> even > >> > > >> better > >> > > >> > for > >> > > >> > > > > > privacy > >> > > >> > > > > > > > and > >> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we > >> get, > >> > > the > >> > > >> > more > >> > > >> > > > > > difficult > >> > > >> > > > > > > > >>>> it > >> > > >> > > > > > > > >>>>> will be to get any > >> non-statistical/non-aggregated > >> > > >> insight > >> > > >> > > > from > >> > > >> > > > > > it. > >> > > >> > > > > > > > >>>> Imagine > >> > > >> > > > > > > > >>>>> if only a few corporate users will enable it > >> > > >> consciously > >> > > >> > - > >> > > >> > > > then > >> > > >> > > > > > we > >> > > >> > > > > > > > >>>> will be > >> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find > >> out > >> > > who > >> > > >> > they > >> > > >> > > > are, > >> > > >> > > > > > than > >> > > >> > > > > > > > if > >> > > >> > > > > > > > >>>>> everyone has it enabled by default. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to > >> us to > >> > > >> vote, > >> > > >> > > for > >> > > >> > > > me > >> > > >> > > > > > > > opt-in > >> > > >> > > > > > > > >>>> is > >> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>> J. > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>>>> Hi all, > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for > >> > Airflow > >> > > >> > > > > > installations. > >> > > >> > > > > > > > >>>> As the > >> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying > >> heavily > >> > on > >> > > >> the > >> > > >> > > > yearly > >> > > >> > > > > > > > >>>> Airflow > >> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key > >> > questions > >> > > >> > about > >> > > >> > > > > > Airflow > >> > > >> > > > > > > > >>>> usage. > >> > > >> > > > > > > > >>>>>>> Questions like the following: > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> - Which versions of Airflow are people > >> > > >> > installing/using > >> > > >> > > > now > >> > > >> > > > > > > > >>>> (i.e. > >> > > >> > > > > > > > >>>>>>> whether people have primarily made the > jump > >> > from > >> > > >> > > version > >> > > >> > > > X > >> > > >> > > > > to > >> > > >> > > > > > > > >>>>> version > >> > > >> > > > > > > > >>>>>> Y) > >> > > >> > > > > > > > >>>>>>> - Which DB is used as the Metadata DB and > >> > which > >> > > >> > version > >> > > >> > > > e.g > >> > > >> > > > > > Pg > >> > > >> > > > > > > > >>>> 14? > >> > > >> > > > > > > > >>>>>>> - What Python version is being used? > >> > > >> > > > > > > > >>>>>>> - Which Executor is being used? > >> > > >> > > > > > > > >>>>>>> - Approximately how many people out there > >> in > >> > the > >> > > >> > world > >> > > >> > > > are > >> > > >> > > > > > > > >>>>> installing > >> > > >> > > > > > > > >>>>>>> Airflow > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> There is a solution that should help answer > >> > these > >> > > >> > > > questions: > >> > > >> > > > > > Scarf > >> > > >> > > > > > > > >>>> [1]. > >> > > >> > > > > > > > >>>>>> The > >> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is > >> already > >> > > >> used > >> > > >> > by > >> > > >> > > > > other > >> > > >> > > > > > ASF > >> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler > >> [5], > >> > > Dubbo > >> > > >> > > > > > Kubernetes, > >> > > >> > > > > > > > >>>>> DevLake, > >> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other > >> > > regulations. > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it > >> as > >> > > >> follows: > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> 1. Install the `scarf js` npm package and > >> > bundle > >> > > >> it > >> > > >> > in > >> > > >> > > > the > >> > > >> > > > > > > > >>>>> Webserver. > >> > > >> > > > > > > > >>>>>>> When the package is downloaded & Airflow > >> > > >> webserver is > >> > > >> > > > > opened, > >> > > >> > > > > > > > >>>>> metadata > >> > > >> > > > > > > > >>>>>>> is > >> > > >> > > > > > > > >>>>>>> recorded to the Scarf dashboard. > >> > > >> > > > > > > > >>>>>>> 2. Utilize the Scarf Gateway [6], which > we > >> can > >> > > >> use in > >> > > >> > > > front > >> > > >> > > > > > of > >> > > >> > > > > > > > >>>>> docker > >> > > >> > > > > > > > >>>>>>> containers. While it’s possible people go > >> > around > >> > > >> this > >> > > >> > > > > > gateway, > >> > > >> > > > > > > > >>>> we > >> > > >> > > > > > > > >>>>> can > >> > > >> > > > > > > > >>>>>>> probably configure and encourage most > >> traffic > >> > to > >> > > >> go > >> > > >> > > > through > >> > > >> > > > > > > > >>>> these > >> > > >> > > > > > > > >>>>>>> gateways. > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally > >> > > >> identifying > >> > > >> > > > > > information > >> > > >> > > > > > > > >>>> from > >> > > >> > > > > > > > >>>>>> SDK > >> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits > of > >> > > >> IP-derived > >> > > >> > > > > > > > >>>> information as > >> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made > >> as > >> > > >> > > transparent > >> > > >> > > > as > >> > > >> > > > > > > > >>>> possible > >> > > >> > > > > > > > >>>>> by > >> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow > PMC > >> and > >> > > any > >> > > >> > > other > >> > > >> > > > > > relevant > >> > > >> > > > > > > > >>>>> means > >> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter > >> (Town > >> > > >> Hall, > >> > > >> > > > Slack, > >> > > >> > > > > > > > >>>> Newsletter > >> > > >> > > > > > > > >>>>>>> etc). > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> The following case studies are worth > reading: > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> 1. > >> > > >> > > > > > > >> > https://about.scarf.sh/post/scarf-case-study-apache-superset > >> > > >> > > > > > > > >>>>> (From > >> > > >> > > > > > > > >>>>>>> Maxime) > >> > > >> > > > > > > > >>>>>>> 2. > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various > >> ways > >> > > >> that > >> > > >> > > come > >> > > >> > > > > with > >> > > >> > > > > > > > >>>> using > >> > > >> > > > > > > > >>>>>> data > >> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines > on > >> > "how > >> > > >> to > >> > > >> > > > > opt-out" > >> > > >> > > > > > > > >>>>>> [8][9][10] & > >> > > >> > > > > > > > >>>>>>> "what data is being collected" on the > Airflow > >> > > >> website, > >> > > >> > > this > >> > > >> > > > > > can be > >> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we > >> would > >> > be > >> > > >> > making > >> > > >> > > > more > >> > > >> > > > > > > > >>>> informed > >> > > >> > > > > > > > >>>>>>> decisions. > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> Regards, > >> > > >> > > > > > > > >>>>>>> Kaxil > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/ > >> > > >> > > > > > > > >>>>>>> [2] > >> > > >> > > > > > > >> > > https://privacy.apache.org/policies/privacy-policy-public.html > >> > > >> > > > > > > > >>>>>>> [3] > >> > > https://privacy.apache.org/faq/committers.html > >> > > >> > > > > > > > >>>>>>> [4] > >> > > https://github.com/apache/superset/issues/25639 > >> > > >> > > > > > > > >>>>>>> [5] > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code > >> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway > >> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy > >> > > >> > > > > > > > >>>>>>> [8] > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data > >> > > >> > > > > > > > >>>>>>> [9] > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose > >> > > >> > > > > > > > >>>>>>> [10] > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > >> > > >> > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > >> > > >> > > > > > > > >>>>>>> > >> > > >> > > > > > > > >>>>>> > >> > > >> > > > > > > > >>>>> > >> > > >> > > > > > > > >>>> > >> > > >> > > > > > > > >>> > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > >> > > >> > >> --------------------------------------------------------------------- > >> > > >> > > > > > > > To unsubscribe, e-mail: > >> > > dev-unsubscr...@airflow.apache.org > >> > > >> > > > > > > > For additional commands, e-mail: > >> > > >> dev-h...@airflow.apache.org > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > >> > --------------------------------------------------------------------- > >> > > >> > > > > > To unsubscribe, e-mail: > >> dev-unsubscr...@airflow.apache.org > >> > > >> > > > > > For additional commands, e-mail: > >> > dev-h...@airflow.apache.org > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > > > >> > > > >> > > >> > > >