My 2 cents: it must be possible to opt-out, preferably it should be
possible to deploy Airflow instances without bundling the telemetry library
dependencies. Other than that I don't mind it being e.g. optional provider.

śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <huss...@awala.fr> napisał:

> > I'd like to propose, that we start with collecting simple data with
> limited access: to all the PMC members. We can always expand it to
> Committers and then expand further to make it invite-only or setup
> exporting it to a DB like Postgres
> <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
> viewable dashboard.
>
> Looks like a good plan; we can discuss the export format when we decide to
> do it.
>
> On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > Yup, exactly.
> >
> > I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >
> >
> > Btw clarifying my own stance on the below; and let me know what you
> think @Hussein
> > Awala <huss...@awala.fr> : I'd like to propose, that we start with
> > collecting simple data with limited access: to all the PMC members. We
> can
> > always expand it to Committers and then expand further to make it
> > invite-only or setup exporting it to a DB like Postgres
> > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> publicly
> > viewable dashboard. It would be similar to an iterative software
> > development approach, since this will be the first time for us, as
> Airflow
> > PMC, to add such telemetry. This is of course just my opinion though :)
> >
> > Regarding the data, like I had mentioned in the email and I am glad
> others
> >> including you are on the same page that the data will be shared with all
> >> PMC members. The point about sharing it via website and newsletter was
> for
> >> the community — Airflow users. I don’t think anyone in the community
> (apart
> >> from the PMC members) would need raw data. And even if they need it, I’d
> >> say they should put effort and contribute to the Airflow project and
> become
> >> PMC members.
> >> To be clear: this telemetry data should help us, as Airflow PMC, to
> steer
> >> some of the decision making based on this data similar to how only PMC
> has
> >> a binding vote on the releases. [1] and this is similar to how Apache
> >> Superset does it too.
> >> [1]
> >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >
> >
> > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io
> .invalid>
> > wrote:
> >
> >> +1 to introduce this.
> >>
> >> I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >>
> >> Best regards,
> >>
> >> *Pankaj Koti*
> >> Senior Software Engineer (Airflow OSS Engineering team)
> >> Location: Pune, Maharashtra, India
> >> Timezone: Indian Standard Time (IST)
> >> Phone: +91 9730079985
> >>
> >>
> >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >>
> >> > Yup, I had added a link to scarf docs in the original email that
> >> referenced
> >> > opting out and we should even add an Airflow config that puts all
> >> config in
> >> > a single place. Without it we can’t be compliant to all the policies
> >> even
> >> > if we collectively ignore or are unaware of the importance of it.
> >> >
> >> > Regarding the data, like I had mentioned in the email and I am glad
> >> others
> >> > including you are on the same page that the data will be shared with
> all
> >> > PMC members. The point about sharing it via website and newsletter was
> >> for
> >> > the community — Airflow users. I don’t think anyone in the community
> >> (apart
> >> > from the PMC members) would need raw data. And even if they need it,
> I’d
> >> > say they should put effort and contribute to the Airflow project and
> >> become
> >> > PMC members.
> >> >
> >> > To be clear: this telemetry data should help us, as Airflow PMC, to
> >> steer
> >> > some of the decision making based on this data similar to how only PMC
> >> has
> >> > a binding vote on the releases. [1] and this is similar to how Apache
> >> > Superset does it too.
> >> >
> >> > [1]
> >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >> >
> >> >
> >> >
> >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> wrote:
> >> >
> >> > > I mentioned opting out just to confirm its importance, and after
> >> checking
> >> > > the Scarf documentation it appears to be supported natively by
> Scarf.
> >> For
> >> > > data accessibility, my point was more about raw data, not just
> >> aggregated
> >> > > information/insights shared via monthly newsletters, as we do for
> >> Airflow
> >> > > annual Survey for example:
> >> > > https://airflow.apache.org/survey vs
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> >> > > .
> >> > >
> >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Agreed to both your points Hussein but both the points are already
> >> > > covered
> >> > > > in my original discussion post - both about opting out and
> providing
> >> > data
> >> > > > to all the PMC members and providing visibility via Monthly
> >> > newsletters.
> >> > > Is
> >> > > > there anything else you propose to discuss that isn’t covered?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr>
> >> wrote:
> >> > > >
> >> > > > > +1 for the idea in general, but there are two main points to
> >> discuss
> >> > > > before
> >> > > > > voting on this:
> >> > > > >
> >> > > > > 1. We should provide an option to disable Scarf:
> >> > > > > As Airflow is not a paid product, we cannot force companies to
> >> report
> >> > > > their
> >> > > > > use of this project. Otherwise, some may choose to create their
> >> own
> >> > > fork
> >> > > > > just to disable Scarf.
> >> > > > >
> >> > > > > 2. Concerning the exclusivity of access to data:
> >> > > > > The data collected must either be completely proprietary for use
> >> by
> >> > PMC
> >> > > > and
> >> > > > > ASF, or completely open. Since many companies offer Airflow as a
> >> > > product,
> >> > > > > it is imperative not to give one company more privileges than
> >> > others. I
> >> > > > > raise this point for the principle of equality of opportunity.
> >> > > > >
> >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> >> sunank...@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Big +1 for Scarf.
> >> > > > > >
> >> > > > > > Transparency is key, so it's important to be super clear about
> >> > opting
> >> > > > > > out and what's tracked to avoid spooking anyone about IP
> stuff.
> >> > > > > >
> >> > > > > > Regards
> >> > > > > > Ankit Chaurasia
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> >> > > amoghdesai....@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > +1 looks like a good tool which could be super helpful.
> >> > > > > > >
> >> > > > > > > * We should have some transparency into the data that is
> >> > collected
> >> > > or
> >> > > > > > sent
> >> > > > > > > * We should have an option to optionally opt-out
> >> > > > > > >
> >> > > > > > > Thanks & Regards,
> >> > > > > > > Amogh Desai
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
> weilee...@gmail.com>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > +1 to this. It would be really useful. As long as we can
> opt
> >> > > out, I
> >> > > > > > think
> >> > > > > > > > we’re good.
> >> > > > > > > >
> >> > > > > > > > Best,
> >> > > > > > > > Wei
> >> > > > > > > >
> >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> >> > kaxiln...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > Grammar Correction:
> >> > > > > > > > >
> >> > > > > > > > > We should assume that those who deploy and upgrade
> >> Airflow -
> >> > > > > actually
> >> > > > > > > > read
> >> > > > > > > > >> and take into account what is written in the release
> >> notes -
> >> > > > > > especially
> >> > > > > > > > if
> >> > > > > > > > >> they have security guys breathing their necks,
> similarly
> >> as
> >> > we
> >> > > > > have
> >> > > > > > to
> >> > > > > > > > >> assume they follow CVE announcements about security
> >> issues
> >> > > > fixed.
> >> > > > > > If we
> >> > > > > > > > >> are very straightforward and out-going about the
> change,
> >> > > inform
> >> > > > > very
> >> > > > > > > > >> clearly how to opt-out, I don't see a big problem with
> >> > > opt-out.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I couldn't agree more; even though we shouldn't collect
> >> any
> >> > > data
> >> > > > > that
> >> > > > > > > > > hamper security (and we should aim to do the same), most
> >> > > security
> >> > > > > > > > concerned
> >> > > > > > > > > folks don't just upgrade, and we can rely on them
> >> regarding
> >> > > > release
> >> > > > > > notes
> >> > > > > > > > > or announcements and we can make it very clear in our
> >> > > > announcements
> >> > > > > > too;
> >> > > > > > > > > and in our installation guides.
> >> > > > > > > > >
> >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> >> > kaxiln...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> Grammar crrection:
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> >> > kaxiln...@gmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > > >>
> >> > > > > > > > >>> Have this at the end of the email too: but if folks
> >> don't
> >> > > read
> >> > > > > > until
> >> > > > > > > > the
> >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> >> > > > > > > > >>>
> >> > > > > > > > >>> "I think people often ask ‘how do I contribute to open
> >> > > > source?’,
> >> > > > > > ‘I've
> >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> >> > engineer.’
> >> > > > > > Actually,
> >> > > > > > > > the
> >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> >> > > > organization
> >> > > > > > gets
> >> > > > > > > > real
> >> > > > > > > > >>> value from this piece of software.’ There are a bunch
> of
> >> > ways
> >> > > > to
> >> > > > > > let
> >> > > > > > > > the
> >> > > > > > > > >>> people know about it – and now Scarf is there. If your
> >> > > > > > organization is
> >> > > > > > > > >>> getting a lot of value from a piece of open source
> >> > software,
> >> > > > make
> >> > > > > > sure
> >> > > > > > > > the
> >> > > > > > > > >>> devs know about it."
> >> > > > > > > > >>>
> >> > > > > > > > >>> What kind of edge cases are you thinking about? I
> don't
> >> > think
> >> > > > it
> >> > > > > > makes
> >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
> collect
> >> > data
> >> > > > for
> >> > > > > > most
> >> > > > > > > > >>> Airflow installations except for those that don't want
> >> to
> >> > > give
> >> > > > > > data,
> >> > > > > > > > then
> >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as
> we
> >> > don't
> >> > > > > > collect
> >> > > > > > > > any
> >> > > > > > > > >>> PII data, this is in-compliance as well.
> >> > > > > > > > >>>
> >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
> opt-in
> >> > via
> >> > > a
> >> > > > > > config,
> >> > > > > > > > >>> they wouldn't even know or care about it, hence us
> >> losing
> >> > > most
> >> > > > of
> >> > > > > > the
> >> > > > > > > > data.
> >> > > > > > > > >>> I understand why some orgs & individuals may want to
> >> > opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML
> >> image
> >> > > tag)
> >> > > > > > that you
> >> > > > > > > > >>> can place in your website or product to track visitors
> >> to
> >> > > that
> >> > > > > > URL. If
> >> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't
> have
> >> > > > approved
> >> > > > > > it
> >> > > > > > > > at all.
> >> > > > > > > > >>>
> >> > > > > > > > >>> A few key details to note about the pixel:
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain
> IP
> >> > > > > > information…
> >> > > > > > > > >>>   this information is discarded by the platform upon
> >> > > > > > > > processing/aggregating
> >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
> >> settings of
> >> > > > > > browsers -
> >> > > > > > > > >>>   these users will not be tracked whatsoever.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> All the ASF projects I had listed (whether they use
> >> Scarf
> >> > > > gateway
> >> > > > > > or
> >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
> feature
> >> > with
> >> > > > > > users who
> >> > > > > > > > >>>> trust and if it works great - make it public. I think
> >> it's
> >> > > > wise
> >> > > > > to
> >> > > > > > > > handle
> >> > > > > > > > >>>> edge cases and configure collected data more
> >> accurately.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> It would be a pixel in the webserver, should affect
> >> nothing
> >> > > at
> >> > > > > all
> >> > > > > > even
> >> > > > > > > > >>> in an air-gapped environment.
> >> > > > > > > > >>>
> >> > > > > > > > >>>> 2. It should not affect anything if access to the
> >> internet
> >> > > is
> >> > > > > > > > restricted
> >> > > > > > > > >>>> which is default for many companies.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> 100% agreed on the below:
> >> > > > > > > > >>>
> >> > > > > > > > >>>> I think we have a very good blueprint to follow
> >> including
> >> > at
> >> > > > > > least 5
> >> > > > > > > > >>>> other
> >> > > > > > > > >>>> ASF projects that also passed the review of the
> >> > privacy@asf.
> >> > > > > And
> >> > > > > > > > while I
> >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
> default
> >> > > coming
> >> > > > > from
> >> > > > > > > > >>>> consumer
> >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not
> a
> >> > > > consumer
> >> > > > > > > > >>>> software and is used in "corporate environment" which
> >> has
> >> > a
> >> > > > > little
> >> > > > > > > > >>>> different expectations and broad assumption that the
> >> > company
> >> > > > can
> >> > > > > > make
> >> > > > > > > > >>>> decisions on such telemetry on behalf of the
> employees
> >> > using
> >> > > > it.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
> >> collect
> >> > > > > hamper
> >> > > > > > > > >>> security (and we should aim to do the same), most
> >> security
> >> > > > > > concerned
> >> > > > > > > > folks
> >> > > > > > > > >>> don't just
> >> > > > > > > > >>> upgrade, and we can rely on them regarding release
> >> notes or
> >> > > > > > > > announcements
> >> > > > > > > > >>> and we can make it very clear in our announcements
> too;
> >> and
> >> > > in
> >> > > > > our
> >> > > > > > > > >>> installation guides.
> >> > > > > > > > >>>
> >> > > > > > > > >>> We should assume that those who deploy and upgrade
> >> Airflow
> >> > -
> >> > > > > > actually
> >> > > > > > > > read
> >> > > > > > > > >>>> and take into account what is written in the release
> >> > notes -
> >> > > > > > > > especially
> >> > > > > > > > >>>> if
> >> > > > > > > > >>>> they have security guys breathing their necks,
> >> similarly
> >> > as
> >> > > we
> >> > > > > > have to
> >> > > > > > > > >>>> assume they follow CVE announcements about security
> >> issues
> >> > > > > fixed.
> >> > > > > > If
> >> > > > > > > > we
> >> > > > > > > > >>>> are very straightforward and out-going about the
> >> change,
> >> > > > inform
> >> > > > > > very
> >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem
> with
> >> > > > opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> To be clear, the collection of data, or at least the
> >> data
> >> > we
> >> > > > > should
> >> > > > > > > > >>> gather here should help all the consumers without
> >> violating
> >> > > > > > anything
> >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
> use-case
> >> > doc
> >> > > > [1]
> >> > > > > > > > >>>
> >> > > > > > > > >>> "*Another Form of Contributing*
> >> > > > > > > > >>> “I think people often ask ‘how do I contribute to open
> >> > > > source?’,
> >> > > > > > ‘I've
> >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> >> > engineer.’
> >> > > > > > Actually,
> >> > > > > > > > the
> >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> >> > > > organization
> >> > > > > > gets
> >> > > > > > > > real
> >> > > > > > > > >>> value from this piece of software.’ There are a bunch
> of
> >> > ways
> >> > > > to
> >> > > > > > let
> >> > > > > > > > the
> >> > > > > > > > >>> people know about it – and now Scarf is there. If your
> >> > > > > > organization is
> >> > > > > > > > >>> getting a lot of value from a piece of open source
> >> > software,
> >> > > > make
> >> > > > > > sure
> >> > > > > > > > the
> >> > > > > > > > >>> devs know about it.”"
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> [1]
> >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > > > > > > >>>
> >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> >> > > > > kxe...@apache.org>
> >> > > > > > > > wrote:
> >> > > > > > > > >>>
> >> > > > > > > > >>>> Hi Jarek!
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> I understand the reasons for opt-out from a project
> >> view.
> >> > I
> >> > > > just
> >> > > > > > > > suddenly
> >> > > > > > > > >>>> imagined the situation when an upgrade happens and
> here
> >> > > comes
> >> > > > > the
> >> > > > > > > > data to
> >> > > > > > > > >>>> some third party service - that's a view from a user
> >> side
> >> > of
> >> > > > > some
> >> > > > > > big
> >> > > > > > > > >>>> company.
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> There could be good alternatives to handle this:
> >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
> >> feature
> >> > > with
> >> > > > > > users
> >> > > > > > > > who
> >> > > > > > > > >>>> trust and if it works great - make it public. I think
> >> it's
> >> > > > wise
> >> > > > > to
> >> > > > > > > > handle
> >> > > > > > > > >>>> edge cases and configure collected data more
> >> accurately.
> >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make
> >> this
> >> > > > > > feature not
> >> > > > > > > > >>>> get
> >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> Just a personal thoughts for discussion (:
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> --
> >> > > > > > > > >>>> ,,,^..^,,,
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> >> > > > ja...@potiuk.com>
> >> > > > > > > > wrote:
> >> > > > > > > > >>>>
> >> > > > > > > > >>>>> Hello everyone,
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> it has to be:
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys
> >> about
> >> > new
> >> > > > > > unplanned
> >> > > > > > > > >>>>>> activity after regular upgrade.
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> That's a very good point about security triggering
> >> > > Alexander,
> >> > > > > > but I
> >> > > > > > > > am
> >> > > > > > > > >>>> not
> >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There
> >> are
> >> > > other
> >> > > > > > ways of
> >> > > > > > > > >>>>> communicating with the "deployment managers" who
> >> install
> >> > > and
> >> > > > > > upgrade
> >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
> >> > ours,
> >> > > > > slack
> >> > > > > > > > >>>>> announcements etc. We have plenty of channels we can
> >> use
> >> > to
> >> > > > > > > > >>>> communicate the
> >> > > > > > > > >>>>> change.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> I think we have a very good blueprint to follow
> >> including
> >> > > at
> >> > > > > > least 5
> >> > > > > > > > >>>> other
> >> > > > > > > > >>>>> ASF projects that also passed the review of the
> >> > > privacy@asf.
> >> > > > > And
> >> > > > > > > > >>>> while I
> >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
> default
> >> > > coming
> >> > > > > > from
> >> > > > > > > > >>>> consumer
> >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is
> not a
> >> > > > consumer
> >> > > > > > > > >>>>> software and is used in "corporate environment"
> which
> >> > has a
> >> > > > > > little
> >> > > > > > > > >>>>> different expectations and broad assumption that the
> >> > > company
> >> > > > > can
> >> > > > > > make
> >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
> employees
> >> > > using
> >> > > > > it.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> We should assume that those who deploy and upgrade
> >> > Airflow
> >> > > -
> >> > > > > > actually
> >> > > > > > > > >>>> read
> >> > > > > > > > >>>>> and take into account what is written in the release
> >> > notes
> >> > > -
> >> > > > > > > > >>>> especially if
> >> > > > > > > > >>>>> they have security guys breathing their necks,
> >> similarly
> >> > as
> >> > > > we
> >> > > > > > have
> >> > > > > > > > to
> >> > > > > > > > >>>>> assume they follow CVE announcements about security
> >> > issues
> >> > > > > > fixed. If
> >> > > > > > > > we
> >> > > > > > > > >>>>> are very straightforward and out-going about the
> >> change,
> >> > > > inform
> >> > > > > > very
> >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem
> with
> >> > > > opt-out.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
> >> > spend
> >> > > a
> >> > > > > good
> >> > > > > > > > deal
> >> > > > > > > > >>>> of
> >> > > > > > > > >>>>> time reading the Superset  and other use case and
> >> > > explanation
> >> > > > > in
> >> > > > > > > > >>>> detail to
> >> > > > > > > > >>>>> make a better informed decision) - and it looks like
> >> they
> >> > > > also
> >> > > > > > went
> >> > > > > > > > >>>> opt-out
> >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
> cannot
> >> > > reach
> >> > > > > > > > >>>> consensus, we
> >> > > > > > > > >>>>> should - as usual - make a voting decision on it
> >> (because
> >> > > > yes,
> >> > > > > > it is
> >> > > > > > > > an
> >> > > > > > > > >>>>> important decision), but - after reading and
> >> > understanding
> >> > > > why
> >> > > > > > others
> >> > > > > > > > >>>> also
> >> > > > > > > > >>>>> did it - for me personally, opt-out is a good path.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> Also because it will rather increase the amount of
> >> data
> >> > to
> >> > > > > > gather,
> >> > > > > > > > and
> >> > > > > > > > >>>> in
> >> > > > > > > > >>>>> our case - counter intuitively - it will be even
> >> better
> >> > for
> >> > > > > > privacy
> >> > > > > > > > and
> >> > > > > > > > >>>>> corporate anonymity, because the more data we get,
> the
> >> > more
> >> > > > > > difficult
> >> > > > > > > > >>>> it
> >> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
> >> insight
> >> > > > from
> >> > > > > > it.
> >> > > > > > > > >>>> Imagine
> >> > > > > > > > >>>>> if only a few corporate users will enable it
> >> consciously
> >> > -
> >> > > > then
> >> > > > > > we
> >> > > > > > > > >>>> will be
> >> > > > > > > > >>>>> able to draw much more conclusions if we find out
> who
> >> > they
> >> > > > are,
> >> > > > > > than
> >> > > > > > > > if
> >> > > > > > > > >>>>> everyone has it enabled by default.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
> >> vote,
> >> > > for
> >> > > > me
> >> > > > > > > > opt-in
> >> > > > > > > > >>>> is
> >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> J.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>>> Hi all,
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> >> > > > > > installations.
> >> > > > > > > > >>>> As the
> >> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on
> >> the
> >> > > > yearly
> >> > > > > > > > >>>> Airflow
> >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
> >> > about
> >> > > > > > Airflow
> >> > > > > > > > >>>> usage.
> >> > > > > > > > >>>>>>> Questions like the following:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> >> > installing/using
> >> > > > now
> >> > > > > > > > >>>> (i.e.
> >> > > > > > > > >>>>>>>   whether people have primarily made the jump from
> >> > > version
> >> > > > X
> >> > > > > to
> >> > > > > > > > >>>>> version
> >> > > > > > > > >>>>>> Y)
> >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
> >> > version
> >> > > > e.g
> >> > > > > > Pg
> >> > > > > > > > >>>> 14?
> >> > > > > > > > >>>>>>>   - What Python version is being used?
> >> > > > > > > > >>>>>>>   - Which Executor is being used?
> >> > > > > > > > >>>>>>>   - Approximately how many people out there in the
> >> > world
> >> > > > are
> >> > > > > > > > >>>>> installing
> >> > > > > > > > >>>>>>>   Airflow
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> There is a solution that should help answer these
> >> > > > questions:
> >> > > > > > Scarf
> >> > > > > > > > >>>> [1].
> >> > > > > > > > >>>>>> The
> >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already
> >> used
> >> > by
> >> > > > > other
> >> > > > > > ASF
> >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5],
> Dubbo
> >> > > > > > Kubernetes,
> >> > > > > > > > >>>>> DevLake,
> >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
> regulations.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
> >> follows:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle
> >> it
> >> > in
> >> > > > the
> >> > > > > > > > >>>>> Webserver.
> >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
> >> webserver is
> >> > > > > opened,
> >> > > > > > > > >>>>> metadata
> >> > > > > > > > >>>>>>> is
> >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can
> >> use in
> >> > > > front
> >> > > > > > of
> >> > > > > > > > >>>>> docker
> >> > > > > > > > >>>>>>>   containers. While it’s possible people go around
> >> this
> >> > > > > > gateway,
> >> > > > > > > > >>>> we
> >> > > > > > > > >>>>> can
> >> > > > > > > > >>>>>>>   probably configure and encourage most traffic to
> >> go
> >> > > > through
> >> > > > > > > > >>>> these
> >> > > > > > > > >>>>>>> gateways.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> While Scarf does not store any personally
> >> identifying
> >> > > > > > information
> >> > > > > > > > >>>> from
> >> > > > > > > > >>>>>> SDK
> >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
> >> IP-derived
> >> > > > > > > > >>>> information as
> >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
> >> > > transparent
> >> > > > as
> >> > > > > > > > >>>> possible
> >> > > > > > > > >>>>> by
> >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and
> any
> >> > > other
> >> > > > > > relevant
> >> > > > > > > > >>>>> means
> >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town
> >> Hall,
> >> > > > Slack,
> >> > > > > > > > >>>> Newsletter
> >> > > > > > > > >>>>>>> etc).
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> The following case studies are worth reading:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   1.
> >> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > > > > > > >>>>> (From
> >> > > > > > > > >>>>>>>   Maxime)
> >> > > > > > > > >>>>>>>   2.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Similar to them, this could help in various ways
> >> that
> >> > > come
> >> > > > > with
> >> > > > > > > > >>>> using
> >> > > > > > > > >>>>>> data
> >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how
> >> to
> >> > > > > opt-out"
> >> > > > > > > > >>>>>> [8][9][10] &
> >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
> >> website,
> >> > > this
> >> > > > > > can be
> >> > > > > > > > >>>>>>> beneficial to the entire community as we would be
> >> > making
> >> > > > more
> >> > > > > > > > >>>> informed
> >> > > > > > > > >>>>>>> decisions.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Regards,
> >> > > > > > > > >>>>>>> Kaxil
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> >> > > > > > > > >>>>>>> [2]
> >> > > > > >
> https://privacy.apache.org/policies/privacy-policy-public.html
> >> > > > > > > > >>>>>>> [3]
> https://privacy.apache.org/faq/committers.html
> >> > > > > > > > >>>>>>> [4]
> https://github.com/apache/superset/issues/25639
> >> > > > > > > > >>>>>>> [5]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> >> > > > > > > > >>>>>>> [8]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> >> > > > > > > > >>>>>>> [9]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> >> > > > > > > > >>>>>>> [10]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > > >>>
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > >
> >> ---------------------------------------------------------------------
> >> > > > > > > > To unsubscribe, e-mail:
> dev-unsubscr...@airflow.apache.org
> >> > > > > > > > For additional commands, e-mail:
> >> dev-h...@airflow.apache.org
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > > >
> >> > ---------------------------------------------------------------------
> >> > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to