> I'd like to propose, that we start with collecting simple data with
limited access: to all the PMC members. We can always expand it to
Committers and then expand further to make it invite-only or setup
exporting it to a DB like Postgres
<https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
viewable dashboard.

Looks like a good plan; we can discuss the export format when we decide to
do it.

On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Yup, exactly.
>
> I believe this would definitely help us take early and informed decisions.
>> E.g. Had we had this earlier, I believe it would have definitely helped us
>> more for our past discussions like whether we should continue supporting
>> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
>> similarly about the DaskExecutor (
>> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>>
>
>
> Btw clarifying my own stance on the below; and let me know what you think 
> @Hussein
> Awala <huss...@awala.fr> : I'd like to propose, that we start with
> collecting simple data with limited access: to all the PMC members. We can
> always expand it to Committers and then expand further to make it
> invite-only or setup exporting it to a DB like Postgres
> <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
> viewable dashboard. It would be similar to an iterative software
> development approach, since this will be the first time for us, as Airflow
> PMC, to add such telemetry. This is of course just my opinion though :)
>
> Regarding the data, like I had mentioned in the email and I am glad others
>> including you are on the same page that the data will be shared with all
>> PMC members. The point about sharing it via website and newsletter was for
>> the community — Airflow users. I don’t think anyone in the community (apart
>> from the PMC members) would need raw data. And even if they need it, I’d
>> say they should put effort and contribute to the Airflow project and become
>> PMC members.
>> To be clear: this telemetry data should help us, as Airflow PMC, to steer
>> some of the decision making based on this data similar to how only PMC has
>> a binding vote on the releases. [1] and this is similar to how Apache
>> Superset does it too.
>> [1]
>> https://www.apache.org/dev/pmc.html#what-is-a-pmc
>
>
> On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io.invalid>
> wrote:
>
>> +1 to introduce this.
>>
>> I believe this would definitely help us take early and informed decisions.
>> E.g. Had we had this earlier, I believe it would have definitely helped us
>> more for our past discussions like whether we should continue supporting
>> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
>> similarly about the DaskExecutor (
>> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>>
>>
>> Best regards,
>>
>> *Pankaj Koti*
>> Senior Software Engineer (Airflow OSS Engineering team)
>> Location: Pune, Maharashtra, India
>> Timezone: Indian Standard Time (IST)
>> Phone: +91 9730079985
>>
>>
>> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>
>> > Yup, I had added a link to scarf docs in the original email that
>> referenced
>> > opting out and we should even add an Airflow config that puts all
>> config in
>> > a single place. Without it we can’t be compliant to all the policies
>> even
>> > if we collectively ignore or are unaware of the importance of it.
>> >
>> > Regarding the data, like I had mentioned in the email and I am glad
>> others
>> > including you are on the same page that the data will be shared with all
>> > PMC members. The point about sharing it via website and newsletter was
>> for
>> > the community — Airflow users. I don’t think anyone in the community
>> (apart
>> > from the PMC members) would need raw data. And even if they need it, I’d
>> > say they should put effort and contribute to the Airflow project and
>> become
>> > PMC members.
>> >
>> > To be clear: this telemetry data should help us, as Airflow PMC, to
>> steer
>> > some of the decision making based on this data similar to how only PMC
>> has
>> > a binding vote on the releases. [1] and this is similar to how Apache
>> > Superset does it too.
>> >
>> > [1]
>> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> >
>> >
>> >
>> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr> wrote:
>> >
>> > > I mentioned opting out just to confirm its importance, and after
>> checking
>> > > the Scarf documentation it appears to be supported natively by Scarf.
>> For
>> > > data accessibility, my point was more about raw data, not just
>> aggregated
>> > > information/insights shared via monthly newsletters, as we do for
>> Airflow
>> > > annual Survey for example:
>> > > https://airflow.apache.org/survey vs
>> > >
>> > >
>> >
>> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
>> > > .
>> > >
>> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>> > >
>> > > > Agreed to both your points Hussein but both the points are already
>> > > covered
>> > > > in my original discussion post - both about opting out and providing
>> > data
>> > > > to all the PMC members and providing visibility via Monthly
>> > newsletters.
>> > > Is
>> > > > there anything else you propose to discuss that isn’t covered?
>> > > >
>> > > >
>> > > >
>> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr>
>> wrote:
>> > > >
>> > > > > +1 for the idea in general, but there are two main points to
>> discuss
>> > > > before
>> > > > > voting on this:
>> > > > >
>> > > > > 1. We should provide an option to disable Scarf:
>> > > > > As Airflow is not a paid product, we cannot force companies to
>> report
>> > > > their
>> > > > > use of this project. Otherwise, some may choose to create their
>> own
>> > > fork
>> > > > > just to disable Scarf.
>> > > > >
>> > > > > 2. Concerning the exclusivity of access to data:
>> > > > > The data collected must either be completely proprietary for use
>> by
>> > PMC
>> > > > and
>> > > > > ASF, or completely open. Since many companies offer Airflow as a
>> > > product,
>> > > > > it is imperative not to give one company more privileges than
>> > others. I
>> > > > > raise this point for the principle of equality of opportunity.
>> > > > >
>> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
>> sunank...@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Big +1 for Scarf.
>> > > > > >
>> > > > > > Transparency is key, so it's important to be super clear about
>> > opting
>> > > > > > out and what's tracked to avoid spooking anyone about IP stuff.
>> > > > > >
>> > > > > > Regards
>> > > > > > Ankit Chaurasia
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
>> > > amoghdesai....@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > +1 looks like a good tool which could be super helpful.
>> > > > > > >
>> > > > > > > * We should have some transparency into the data that is
>> > collected
>> > > or
>> > > > > > sent
>> > > > > > > * We should have an option to optionally opt-out
>> > > > > > >
>> > > > > > > Thanks & Regards,
>> > > > > > > Amogh Desai
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <weilee...@gmail.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > > +1 to this. It would be really useful. As long as we can opt
>> > > out, I
>> > > > > > think
>> > > > > > > > we’re good.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Wei
>> > > > > > > >
>> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
>> > kaxiln...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Grammar Correction:
>> > > > > > > > >
>> > > > > > > > > We should assume that those who deploy and upgrade
>> Airflow -
>> > > > > actually
>> > > > > > > > read
>> > > > > > > > >> and take into account what is written in the release
>> notes -
>> > > > > > especially
>> > > > > > > > if
>> > > > > > > > >> they have security guys breathing their necks, similarly
>> as
>> > we
>> > > > > have
>> > > > > > to
>> > > > > > > > >> assume they follow CVE announcements about security
>> issues
>> > > > fixed.
>> > > > > > If we
>> > > > > > > > >> are very straightforward and out-going about the change,
>> > > inform
>> > > > > very
>> > > > > > > > >> clearly how to opt-out, I don't see a big problem with
>> > > opt-out.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I couldn't agree more; even though we shouldn't collect
>> any
>> > > data
>> > > > > that
>> > > > > > > > > hamper security (and we should aim to do the same), most
>> > > security
>> > > > > > > > concerned
>> > > > > > > > > folks don't just upgrade, and we can rely on them
>> regarding
>> > > > release
>> > > > > > notes
>> > > > > > > > > or announcements and we can make it very clear in our
>> > > > announcements
>> > > > > > too;
>> > > > > > > > > and in our installation guides.
>> > > > > > > > >
>> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
>> > kaxiln...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> Grammar crrection:
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
>> > kaxiln...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >>> Have this at the end of the email too: but if folks
>> don't
>> > > read
>> > > > > > until
>> > > > > > > > the
>> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
>> > > > > > > > >>>
>> > > > > > > > >>> "I think people often ask ‘how do I contribute to open
>> > > > source?’,
>> > > > > > ‘I've
>> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > engineer.’
>> > > > > > Actually,
>> > > > > > > > the
>> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
>> > > > organization
>> > > > > > gets
>> > > > > > > > real
>> > > > > > > > >>> value from this piece of software.’ There are a bunch of
>> > ways
>> > > > to
>> > > > > > let
>> > > > > > > > the
>> > > > > > > > >>> people know about it – and now Scarf is there. If your
>> > > > > > organization is
>> > > > > > > > >>> getting a lot of value from a piece of open source
>> > software,
>> > > > make
>> > > > > > sure
>> > > > > > > > the
>> > > > > > > > >>> devs know about it."
>> > > > > > > > >>>
>> > > > > > > > >>> What kind of edge cases are you thinking about? I don't
>> > think
>> > > > it
>> > > > > > makes
>> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to collect
>> > data
>> > > > for
>> > > > > > most
>> > > > > > > > >>> Airflow installations except for those that don't want
>> to
>> > > give
>> > > > > > data,
>> > > > > > > > then
>> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as we
>> > don't
>> > > > > > collect
>> > > > > > > > any
>> > > > > > > > >>> PII data, this is in-compliance as well.
>> > > > > > > > >>>
>> > > > > > > > >>> Imagine someone learning Airflow, if they have to opt-in
>> > via
>> > > a
>> > > > > > config,
>> > > > > > > > >>> they wouldn't even know or care about it, hence us
>> losing
>> > > most
>> > > > of
>> > > > > > the
>> > > > > > > > data.
>> > > > > > > > >>> I understand why some orgs & individuals may want to
>> > opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML
>> image
>> > > tag)
>> > > > > > that you
>> > > > > > > > >>> can place in your website or product to track visitors
>> to
>> > > that
>> > > > > > URL. If
>> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't have
>> > > > approved
>> > > > > > it
>> > > > > > > > at all.
>> > > > > > > > >>>
>> > > > > > > > >>> A few key details to note about the pixel:
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
>> > > > > > information…
>> > > > > > > > >>>   this information is discarded by the platform upon
>> > > > > > > > processing/aggregating
>> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
>> settings of
>> > > > > > browsers -
>> > > > > > > > >>>   these users will not be tracked whatsoever.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> All the ASF projects I had listed (whether they use
>> Scarf
>> > > > gateway
>> > > > > > or
>> > > > > > > > >>> Scarf pixel in product) are using opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this feature
>> > with
>> > > > > > users who
>> > > > > > > > >>>> trust and if it works great - make it public. I think
>> it's
>> > > > wise
>> > > > > to
>> > > > > > > > handle
>> > > > > > > > >>>> edge cases and configure collected data more
>> accurately.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> It would be a pixel in the webserver, should affect
>> nothing
>> > > at
>> > > > > all
>> > > > > > even
>> > > > > > > > >>> in an air-gapped environment.
>> > > > > > > > >>>
>> > > > > > > > >>>> 2. It should not affect anything if access to the
>> internet
>> > > is
>> > > > > > > > restricted
>> > > > > > > > >>>> which is default for many companies.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> 100% agreed on the below:
>> > > > > > > > >>>
>> > > > > > > > >>>> I think we have a very good blueprint to follow
>> including
>> > at
>> > > > > > least 5
>> > > > > > > > >>>> other
>> > > > > > > > >>>> ASF projects that also passed the review of the
>> > privacy@asf.
>> > > > > And
>> > > > > > > > while I
>> > > > > > > > >>>> understand (and concur) the urge for opt-in by default
>> > > coming
>> > > > > from
>> > > > > > > > >>>> consumer
>> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not a
>> > > > consumer
>> > > > > > > > >>>> software and is used in "corporate environment" which
>> has
>> > a
>> > > > > little
>> > > > > > > > >>>> different expectations and broad assumption that the
>> > company
>> > > > can
>> > > > > > make
>> > > > > > > > >>>> decisions on such telemetry on behalf of the employees
>> > using
>> > > > it.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
>> collect
>> > > > > hamper
>> > > > > > > > >>> security (and we should aim to do the same), most
>> security
>> > > > > > concerned
>> > > > > > > > folks
>> > > > > > > > >>> don't just
>> > > > > > > > >>> upgrade, and we can rely on them regarding release
>> notes or
>> > > > > > > > announcements
>> > > > > > > > >>> and we can make it very clear in our announcements too;
>> and
>> > > in
>> > > > > our
>> > > > > > > > >>> installation guides.
>> > > > > > > > >>>
>> > > > > > > > >>> We should assume that those who deploy and upgrade
>> Airflow
>> > -
>> > > > > > actually
>> > > > > > > > read
>> > > > > > > > >>>> and take into account what is written in the release
>> > notes -
>> > > > > > > > especially
>> > > > > > > > >>>> if
>> > > > > > > > >>>> they have security guys breathing their necks,
>> similarly
>> > as
>> > > we
>> > > > > > have to
>> > > > > > > > >>>> assume they follow CVE announcements about security
>> issues
>> > > > > fixed.
>> > > > > > If
>> > > > > > > > we
>> > > > > > > > >>>> are very straightforward and out-going about the
>> change,
>> > > > inform
>> > > > > > very
>> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem with
>> > > > opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> To be clear, the collection of data, or at least the
>> data
>> > we
>> > > > > should
>> > > > > > > > >>> gather here should help all the consumers without
>> violating
>> > > > > > anything
>> > > > > > > > >>> regulations. I will quote Maxime's quote in the use-case
>> > doc
>> > > > [1]
>> > > > > > > > >>>
>> > > > > > > > >>> "*Another Form of Contributing*
>> > > > > > > > >>> “I think people often ask ‘how do I contribute to open
>> > > > source?’,
>> > > > > > ‘I've
>> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > engineer.’
>> > > > > > Actually,
>> > > > > > > > the
>> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
>> > > > organization
>> > > > > > gets
>> > > > > > > > real
>> > > > > > > > >>> value from this piece of software.’ There are a bunch of
>> > ways
>> > > > to
>> > > > > > let
>> > > > > > > > the
>> > > > > > > > >>> people know about it – and now Scarf is there. If your
>> > > > > > organization is
>> > > > > > > > >>> getting a lot of value from a piece of open source
>> > software,
>> > > > make
>> > > > > > sure
>> > > > > > > > the
>> > > > > > > > >>> devs know about it.”"
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> [1]
>> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > > > > > > >>>
>> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
>> > > > > kxe...@apache.org>
>> > > > > > > > wrote:
>> > > > > > > > >>>
>> > > > > > > > >>>> Hi Jarek!
>> > > > > > > > >>>>
>> > > > > > > > >>>> I understand the reasons for opt-out from a project
>> view.
>> > I
>> > > > just
>> > > > > > > > suddenly
>> > > > > > > > >>>> imagined the situation when an upgrade happens and here
>> > > comes
>> > > > > the
>> > > > > > > > data to
>> > > > > > > > >>>> some third party service - that's a view from a user
>> side
>> > of
>> > > > > some
>> > > > > > big
>> > > > > > > > >>>> company.
>> > > > > > > > >>>>
>> > > > > > > > >>>> There could be good alternatives to handle this:
>> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
>> feature
>> > > with
>> > > > > > users
>> > > > > > > > who
>> > > > > > > > >>>> trust and if it works great - make it public. I think
>> it's
>> > > > wise
>> > > > > to
>> > > > > > > > handle
>> > > > > > > > >>>> edge cases and configure collected data more
>> accurately.
>> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make
>> this
>> > > > > > feature not
>> > > > > > > > >>>> get
>> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
>> > > > > > > > >>>>
>> > > > > > > > >>>> Just a personal thoughts for discussion (:
>> > > > > > > > >>>>
>> > > > > > > > >>>> --
>> > > > > > > > >>>> ,,,^..^,,,
>> > > > > > > > >>>>
>> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
>> > > > ja...@potiuk.com>
>> > > > > > > > wrote:
>> > > > > > > > >>>>
>> > > > > > > > >>>>> Hello everyone,
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> it has to be:
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys
>> about
>> > new
>> > > > > > unplanned
>> > > > > > > > >>>>>> activity after regular upgrade.
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> That's a very good point about security triggering
>> > > Alexander,
>> > > > > > but I
>> > > > > > > > am
>> > > > > > > > >>>> not
>> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There
>> are
>> > > other
>> > > > > > ways of
>> > > > > > > > >>>>> communicating with the "deployment managers" who
>> install
>> > > and
>> > > > > > upgrade
>> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
>> > ours,
>> > > > > slack
>> > > > > > > > >>>>> announcements etc. We have plenty of channels we can
>> use
>> > to
>> > > > > > > > >>>> communicate the
>> > > > > > > > >>>>> change.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> I think we have a very good blueprint to follow
>> including
>> > > at
>> > > > > > least 5
>> > > > > > > > >>>> other
>> > > > > > > > >>>>> ASF projects that also passed the review of the
>> > > privacy@asf.
>> > > > > And
>> > > > > > > > >>>> while I
>> > > > > > > > >>>>> understand (and concur) the urge for opt-in by default
>> > > coming
>> > > > > > from
>> > > > > > > > >>>> consumer
>> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is not a
>> > > > consumer
>> > > > > > > > >>>>> software and is used in "corporate environment" which
>> > has a
>> > > > > > little
>> > > > > > > > >>>>> different expectations and broad assumption that the
>> > > company
>> > > > > can
>> > > > > > make
>> > > > > > > > >>>>> decisions on such telemetry on behalf of the employees
>> > > using
>> > > > > it.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> We should assume that those who deploy and upgrade
>> > Airflow
>> > > -
>> > > > > > actually
>> > > > > > > > >>>> read
>> > > > > > > > >>>>> and take into account what is written in the release
>> > notes
>> > > -
>> > > > > > > > >>>> especially if
>> > > > > > > > >>>>> they have security guys breathing their necks,
>> similarly
>> > as
>> > > > we
>> > > > > > have
>> > > > > > > > to
>> > > > > > > > >>>>> assume they follow CVE announcements about security
>> > issues
>> > > > > > fixed. If
>> > > > > > > > we
>> > > > > > > > >>>>> are very straightforward and out-going about the
>> change,
>> > > > inform
>> > > > > > very
>> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem with
>> > > > opt-out.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
>> > spend
>> > > a
>> > > > > good
>> > > > > > > > deal
>> > > > > > > > >>>> of
>> > > > > > > > >>>>> time reading the Superset  and other use case and
>> > > explanation
>> > > > > in
>> > > > > > > > >>>> detail to
>> > > > > > > > >>>>> make a better informed decision) - and it looks like
>> they
>> > > > also
>> > > > > > went
>> > > > > > > > >>>> opt-out
>> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot
>> > > reach
>> > > > > > > > >>>> consensus, we
>> > > > > > > > >>>>> should - as usual - make a voting decision on it
>> (because
>> > > > yes,
>> > > > > > it is
>> > > > > > > > an
>> > > > > > > > >>>>> important decision), but - after reading and
>> > understanding
>> > > > why
>> > > > > > others
>> > > > > > > > >>>> also
>> > > > > > > > >>>>> did it - for me personally, opt-out is a good path.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> Also because it will rather increase the amount of
>> data
>> > to
>> > > > > > gather,
>> > > > > > > > and
>> > > > > > > > >>>> in
>> > > > > > > > >>>>> our case - counter intuitively - it will be even
>> better
>> > for
>> > > > > > privacy
>> > > > > > > > and
>> > > > > > > > >>>>> corporate anonymity, because the more data we get, the
>> > more
>> > > > > > difficult
>> > > > > > > > >>>> it
>> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
>> insight
>> > > > from
>> > > > > > it.
>> > > > > > > > >>>> Imagine
>> > > > > > > > >>>>> if only a few corporate users will enable it
>> consciously
>> > -
>> > > > then
>> > > > > > we
>> > > > > > > > >>>> will be
>> > > > > > > > >>>>> able to draw much more conclusions if we find out who
>> > they
>> > > > are,
>> > > > > > than
>> > > > > > > > if
>> > > > > > > > >>>>> everyone has it enabled by default.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
>> vote,
>> > > for
>> > > > me
>> > > > > > > > opt-in
>> > > > > > > > >>>> is
>> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> J.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>>> Hi all,
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
>> > > > > > installations.
>> > > > > > > > >>>> As the
>> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on
>> the
>> > > > yearly
>> > > > > > > > >>>> Airflow
>> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
>> > about
>> > > > > > Airflow
>> > > > > > > > >>>> usage.
>> > > > > > > > >>>>>>> Questions like the following:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   - Which versions of Airflow are people
>> > installing/using
>> > > > now
>> > > > > > > > >>>> (i.e.
>> > > > > > > > >>>>>>>   whether people have primarily made the jump from
>> > > version
>> > > > X
>> > > > > to
>> > > > > > > > >>>>> version
>> > > > > > > > >>>>>> Y)
>> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
>> > version
>> > > > e.g
>> > > > > > Pg
>> > > > > > > > >>>> 14?
>> > > > > > > > >>>>>>>   - What Python version is being used?
>> > > > > > > > >>>>>>>   - Which Executor is being used?
>> > > > > > > > >>>>>>>   - Approximately how many people out there in the
>> > world
>> > > > are
>> > > > > > > > >>>>> installing
>> > > > > > > > >>>>>>>   Airflow
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> There is a solution that should help answer these
>> > > > questions:
>> > > > > > Scarf
>> > > > > > > > >>>> [1].
>> > > > > > > > >>>>>> The
>> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already
>> used
>> > by
>> > > > > other
>> > > > > > ASF
>> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
>> > > > > > Kubernetes,
>> > > > > > > > >>>>> DevLake,
>> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
>> follows:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle
>> it
>> > in
>> > > > the
>> > > > > > > > >>>>> Webserver.
>> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
>> webserver is
>> > > > > opened,
>> > > > > > > > >>>>> metadata
>> > > > > > > > >>>>>>> is
>> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
>> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can
>> use in
>> > > > front
>> > > > > > of
>> > > > > > > > >>>>> docker
>> > > > > > > > >>>>>>>   containers. While it’s possible people go around
>> this
>> > > > > > gateway,
>> > > > > > > > >>>> we
>> > > > > > > > >>>>> can
>> > > > > > > > >>>>>>>   probably configure and encourage most traffic to
>> go
>> > > > through
>> > > > > > > > >>>> these
>> > > > > > > > >>>>>>> gateways.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> While Scarf does not store any personally
>> identifying
>> > > > > > information
>> > > > > > > > >>>> from
>> > > > > > > > >>>>>> SDK
>> > > > > > > > >>>>>>> telemetry data, it does send various bits of
>> IP-derived
>> > > > > > > > >>>> information as
>> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
>> > > transparent
>> > > > as
>> > > > > > > > >>>> possible
>> > > > > > > > >>>>> by
>> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any
>> > > other
>> > > > > > relevant
>> > > > > > > > >>>>> means
>> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town
>> Hall,
>> > > > Slack,
>> > > > > > > > >>>> Newsletter
>> > > > > > > > >>>>>>> etc).
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> The following case studies are worth reading:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   1.
>> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > > > > > > >>>>> (From
>> > > > > > > > >>>>>>>   Maxime)
>> > > > > > > > >>>>>>>   2.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Similar to them, this could help in various ways
>> that
>> > > come
>> > > > > with
>> > > > > > > > >>>> using
>> > > > > > > > >>>>>> data
>> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how
>> to
>> > > > > opt-out"
>> > > > > > > > >>>>>> [8][9][10] &
>> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
>> website,
>> > > this
>> > > > > > can be
>> > > > > > > > >>>>>>> beneficial to the entire community as we would be
>> > making
>> > > > more
>> > > > > > > > >>>> informed
>> > > > > > > > >>>>>>> decisions.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Regards,
>> > > > > > > > >>>>>>> Kaxil
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
>> > > > > > > > >>>>>>> [2]
>> > > > > > https://privacy.apache.org/policies/privacy-policy-public.html
>> > > > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
>> > > > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
>> > > > > > > > >>>>>>> [5]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
>> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
>> > > > > > > > >>>>>>> [8]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>> > > > > > > > >>>>>>> [9]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>> > > > > > > > >>>>>>> [10]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > > >>>
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > > > > > > > For additional commands, e-mail:
>> dev-h...@airflow.apache.org
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to