The webserver is packaged after compiling, so that won't be possible Michal.

On Tue, 9 Apr 2024 at 11:02, Michał Modras <michalmod...@google.com> wrote:

> If it is packaged and installed by default, we add the dependency (and its
> dependencies) to Airflow's already-not-small dependency tree. If we make it
> installed and enabled by default, would there be an easy way to not just
> switch it off (e.g. through the env variable), but also not package it at
> all? That's why I was suggesting a provider, but actually any other
> pluggable (and unpluggable) mechanism would work.
>
> On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <huss...@awala.fr> wrote:
>
>> > Other than that I don't mind it being e.g. optional provider.
>>
>> I don't think it is possible to implement it in a provider because it is a
>> js package installed on the webserver; we could implement it as a plugin
>> (Blueprint), but in this case, the user must make an effort to register
>> it.
>>
>> It would be better to always install it, and activate it by default, with
>> the possibility of deactivating it via the environment variable
>> `SCARF_ANALYTICS=false` (according to the documentation), where if it is
>> deactivated by default, many users will not activate it even if they don't
>> mind to report the metrics, but if we enable it by default, only users who
>> don't want to send metrics will disable it.
>>
>>
>> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras
>> <michalmod...@google.com.invalid> wrote:
>>
>> > My 2 cents: it must be possible to opt-out, preferably it should be
>> > possible to deploy Airflow instances without bundling the telemetry
>> library
>> > dependencies. Other than that I don't mind it being e.g. optional
>> provider.
>> >
>> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <huss...@awala.fr>
>> > napisał:
>> >
>> > > > I'd like to propose, that we start with collecting simple data with
>> > > limited access: to all the PMC members. We can always expand it to
>> > > Committers and then expand further to make it invite-only or setup
>> > > exporting it to a DB like Postgres
>> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
>> > publicly
>> > > viewable dashboard.
>> > >
>> > > Looks like a good plan; we can discuss the export format when we
>> decide
>> > to
>> > > do it.
>> > >
>> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>> > >
>> > > > Yup, exactly.
>> > > >
>> > > > I believe this would definitely help us take early and informed
>> > > decisions.
>> > > >> E.g. Had we had this earlier, I believe it would have definitely
>> > helped
>> > > us
>> > > >> more for our past discussions like whether we should continue
>> > supporting
>> > > >> MsSQL(
>> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
>> > > ),
>> > > >> similarly about the DaskExecutor (
>> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
>> > etc.
>> > > >>
>> > > >
>> > > >
>> > > > Btw clarifying my own stance on the below; and let me know what you
>> > > think @Hussein
>> > > > Awala <huss...@awala.fr> : I'd like to propose, that we start with
>> > > > collecting simple data with limited access: to all the PMC members.
>> We
>> > > can
>> > > > always expand it to Committers and then expand further to make it
>> > > > invite-only or setup exporting it to a DB like Postgres
>> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
>> > > publicly
>> > > > viewable dashboard. It would be similar to an iterative software
>> > > > development approach, since this will be the first time for us, as
>> > > Airflow
>> > > > PMC, to add such telemetry. This is of course just my opinion
>> though :)
>> > > >
>> > > > Regarding the data, like I had mentioned in the email and I am glad
>> > > others
>> > > >> including you are on the same page that the data will be shared
>> with
>> > all
>> > > >> PMC members. The point about sharing it via website and newsletter
>> was
>> > > for
>> > > >> the community — Airflow users. I don’t think anyone in the
>> community
>> > > (apart
>> > > >> from the PMC members) would need raw data. And even if they need
>> it,
>> > I’d
>> > > >> say they should put effort and contribute to the Airflow project
>> and
>> > > become
>> > > >> PMC members.
>> > > >> To be clear: this telemetry data should help us, as Airflow PMC, to
>> > > steer
>> > > >> some of the decision making based on this data similar to how only
>> PMC
>> > > has
>> > > >> a binding vote on the releases. [1] and this is similar to how
>> Apache
>> > > >> Superset does it too.
>> > > >> [1]
>> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> > > >
>> > > >
>> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.k...@astronomer.io
>> > > .invalid>
>> > > > wrote:
>> > > >
>> > > >> +1 to introduce this.
>> > > >>
>> > > >> I believe this would definitely help us take early and informed
>> > > decisions.
>> > > >> E.g. Had we had this earlier, I believe it would have definitely
>> > helped
>> > > us
>> > > >> more for our past discussions like whether we should continue
>> > supporting
>> > > >> MsSQL(
>> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
>> > > ),
>> > > >> similarly about the DaskExecutor (
>> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
>> > etc.
>> > > >>
>> > > >>
>> > > >> Best regards,
>> > > >>
>> > > >> *Pankaj Koti*
>> > > >> Senior Software Engineer (Airflow OSS Engineering team)
>> > > >> Location: Pune, Maharashtra, India
>> > > >> Timezone: Indian Standard Time (IST)
>> > > >> Phone: +91 9730079985 <+91%2097300%2079985>
>> > > >>
>> > > >>
>> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <kaxiln...@gmail.com>
>> > wrote:
>> > > >>
>> > > >> > Yup, I had added a link to scarf docs in the original email that
>> > > >> referenced
>> > > >> > opting out and we should even add an Airflow config that puts all
>> > > >> config in
>> > > >> > a single place. Without it we can’t be compliant to all the
>> policies
>> > > >> even
>> > > >> > if we collectively ignore or are unaware of the importance of it.
>> > > >> >
>> > > >> > Regarding the data, like I had mentioned in the email and I am
>> glad
>> > > >> others
>> > > >> > including you are on the same page that the data will be shared
>> with
>> > > all
>> > > >> > PMC members. The point about sharing it via website and
>> newsletter
>> > was
>> > > >> for
>> > > >> > the community — Airflow users. I don’t think anyone in the
>> community
>> > > >> (apart
>> > > >> > from the PMC members) would need raw data. And even if they need
>> it,
>> > > I’d
>> > > >> > say they should put effort and contribute to the Airflow project
>> and
>> > > >> become
>> > > >> > PMC members.
>> > > >> >
>> > > >> > To be clear: this telemetry data should help us, as Airflow PMC,
>> to
>> > > >> steer
>> > > >> > some of the decision making based on this data similar to how
>> only
>> > PMC
>> > > >> has
>> > > >> > a binding vote on the releases. [1] and this is similar to how
>> > Apache
>> > > >> > Superset does it too.
>> > > >> >
>> > > >> > [1]
>> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <huss...@awala.fr>
>> > wrote:
>> > > >> >
>> > > >> > > I mentioned opting out just to confirm its importance, and
>> after
>> > > >> checking
>> > > >> > > the Scarf documentation it appears to be supported natively by
>> > > Scarf.
>> > > >> For
>> > > >> > > data accessibility, my point was more about raw data, not just
>> > > >> aggregated
>> > > >> > > information/insights shared via monthly newsletters, as we do
>> for
>> > > >> Airflow
>> > > >> > > annual Survey for example:
>> > > >> > > https://airflow.apache.org/survey vs
>> > > >> > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
>> > > >> > > .
>> > > >> > >
>> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com
>> >
>> > > >> wrote:
>> > > >> > >
>> > > >> > > > Agreed to both your points Hussein but both the points are
>> > already
>> > > >> > > covered
>> > > >> > > > in my original discussion post - both about opting out and
>> > > providing
>> > > >> > data
>> > > >> > > > to all the PMC members and providing visibility via Monthly
>> > > >> > newsletters.
>> > > >> > > Is
>> > > >> > > > there anything else you propose to discuss that isn’t
>> covered?
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr
>> >
>> > > >> wrote:
>> > > >> > > >
>> > > >> > > > > +1 for the idea in general, but there are two main points
>> to
>> > > >> discuss
>> > > >> > > > before
>> > > >> > > > > voting on this:
>> > > >> > > > >
>> > > >> > > > > 1. We should provide an option to disable Scarf:
>> > > >> > > > > As Airflow is not a paid product, we cannot force
>> companies to
>> > > >> report
>> > > >> > > > their
>> > > >> > > > > use of this project. Otherwise, some may choose to create
>> > their
>> > > >> own
>> > > >> > > fork
>> > > >> > > > > just to disable Scarf.
>> > > >> > > > >
>> > > >> > > > > 2. Concerning the exclusivity of access to data:
>> > > >> > > > > The data collected must either be completely proprietary
>> for
>> > use
>> > > >> by
>> > > >> > PMC
>> > > >> > > > and
>> > > >> > > > > ASF, or completely open. Since many companies offer Airflow
>> > as a
>> > > >> > > product,
>> > > >> > > > > it is imperative not to give one company more privileges
>> than
>> > > >> > others. I
>> > > >> > > > > raise this point for the principle of equality of
>> opportunity.
>> > > >> > > > >
>> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
>> > > >> sunank...@gmail.com
>> > > >> > >
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Big +1 for Scarf.
>> > > >> > > > > >
>> > > >> > > > > > Transparency is key, so it's important to be super clear
>> > about
>> > > >> > opting
>> > > >> > > > > > out and what's tracked to avoid spooking anyone about IP
>> > > stuff.
>> > > >> > > > > >
>> > > >> > > > > > Regards
>> > > >> > > > > > Ankit Chaurasia
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
>> > > >> > > amoghdesai....@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > +1 looks like a good tool which could be super helpful.
>> > > >> > > > > > >
>> > > >> > > > > > > * We should have some transparency into the data that
>> is
>> > > >> > collected
>> > > >> > > or
>> > > >> > > > > > sent
>> > > >> > > > > > > * We should have an option to optionally opt-out
>> > > >> > > > > > >
>> > > >> > > > > > > Thanks & Regards,
>> > > >> > > > > > > Amogh Desai
>> > > >> > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
>> > > weilee...@gmail.com>
>> > > >> > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > > +1 to this. It would be really useful. As long as we
>> can
>> > > opt
>> > > >> > > out, I
>> > > >> > > > > > think
>> > > >> > > > > > > > we’re good.
>> > > >> > > > > > > >
>> > > >> > > > > > > > Best,
>> > > >> > > > > > > > Wei
>> > > >> > > > > > > >
>> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
>> > > >> > kaxiln...@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > Grammar Correction:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > We should assume that those who deploy and upgrade
>> > > >> Airflow -
>> > > >> > > > > actually
>> > > >> > > > > > > > read
>> > > >> > > > > > > > >> and take into account what is written in the
>> release
>> > > >> notes -
>> > > >> > > > > > especially
>> > > >> > > > > > > > if
>> > > >> > > > > > > > >> they have security guys breathing their necks,
>> > > similarly
>> > > >> as
>> > > >> > we
>> > > >> > > > > have
>> > > >> > > > > > to
>> > > >> > > > > > > > >> assume they follow CVE announcements about
>> security
>> > > >> issues
>> > > >> > > > fixed.
>> > > >> > > > > > If we
>> > > >> > > > > > > > >> are very straightforward and out-going about the
>> > > change,
>> > > >> > > inform
>> > > >> > > > > very
>> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem
>> > with
>> > > >> > > opt-out.
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't
>> > collect
>> > > >> any
>> > > >> > > data
>> > > >> > > > > that
>> > > >> > > > > > > > > hamper security (and we should aim to do the same),
>> > most
>> > > >> > > security
>> > > >> > > > > > > > concerned
>> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them
>> > > >> regarding
>> > > >> > > > release
>> > > >> > > > > > notes
>> > > >> > > > > > > > > or announcements and we can make it very clear in
>> our
>> > > >> > > > announcements
>> > > >> > > > > > too;
>> > > >> > > > > > > > > and in our installation guides.
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
>> > > >> > kaxiln...@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >> Grammar crrection:
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
>> > > >> > kaxiln...@gmail.com
>> > > >> > > >
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >>> Have this at the end of the email too: but if
>> folks
>> > > >> don't
>> > > >> > > read
>> > > >> > > > > > until
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to
>> > open
>> > > >> > > > source?’,
>> > > >> > > > > > ‘I've
>> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > > >> > engineer.’
>> > > >> > > > > > Actually,
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
>> ‘my
>> > > >> > > > organization
>> > > >> > > > > > gets
>> > > >> > > > > > > > real
>> > > >> > > > > > > > >>> value from this piece of software.’ There are a
>> > bunch
>> > > of
>> > > >> > ways
>> > > >> > > > to
>> > > >> > > > > > let
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
>> > your
>> > > >> > > > > > organization is
>> > > >> > > > > > > > >>> getting a lot of value from a piece of open
>> source
>> > > >> > software,
>> > > >> > > > make
>> > > >> > > > > > sure
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> devs know about it."
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> What kind of edge cases are you thinking about? I
>> > > don't
>> > > >> > think
>> > > >> > > > it
>> > > >> > > > > > makes
>> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
>> > > collect
>> > > >> > data
>> > > >> > > > for
>> > > >> > > > > > most
>> > > >> > > > > > > > >>> Airflow installations except for those that don't
>> > want
>> > > >> to
>> > > >> > > give
>> > > >> > > > > > data,
>> > > >> > > > > > > > then
>> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As
>> long as
>> > > we
>> > > >> > don't
>> > > >> > > > > > collect
>> > > >> > > > > > > > any
>> > > >> > > > > > > > >>> PII data, this is in-compliance as well.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
>> > > opt-in
>> > > >> > via
>> > > >> > > a
>> > > >> > > > > > config,
>> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence
>> us
>> > > >> losing
>> > > >> > > most
>> > > >> > > > of
>> > > >> > > > > > the
>> > > >> > > > > > > > data.
>> > > >> > > > > > > > >>> I understand why some orgs & individuals may
>> want to
>> > > >> > opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an
>> HTML
>> > > >> image
>> > > >> > > tag)
>> > > >> > > > > > that you
>> > > >> > > > > > > > >>> can place in your website or product to track
>> > visitors
>> > > >> to
>> > > >> > > that
>> > > >> > > > > > URL. If
>> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF
>> wouldn't
>> > > have
>> > > >> > > > approved
>> > > >> > > > > > it
>> > > >> > > > > > > > at all.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> A few key details to note about the pixel:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>   - No PII is tracked… Scarf does not
>> capture/retain
>> > > IP
>> > > >> > > > > > information…
>> > > >> > > > > > > > >>>   this information is discarded by the platform
>> upon
>> > > >> > > > > > > > processing/aggregating
>> > > >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
>> > > >> settings of
>> > > >> > > > > > browsers -
>> > > >> > > > > > > > >>>   these users will not be tracked whatsoever.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they
>> use
>> > > >> Scarf
>> > > >> > > > gateway
>> > > >> > > > > > or
>> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
>> > > feature
>> > > >> > with
>> > > >> > > > > > users who
>> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
>> > think
>> > > >> it's
>> > > >> > > > wise
>> > > >> > > > > to
>> > > >> > > > > > > > handle
>> > > >> > > > > > > > >>>> edge cases and configure collected data more
>> > > >> accurately.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> It would be a pixel in the webserver, should
>> affect
>> > > >> nothing
>> > > >> > > at
>> > > >> > > > > all
>> > > >> > > > > > even
>> > > >> > > > > > > > >>> in an air-gapped environment.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> 2. It should not affect anything if access to
>> the
>> > > >> internet
>> > > >> > > is
>> > > >> > > > > > > > restricted
>> > > >> > > > > > > > >>>> which is default for many companies.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> 100% agreed on the below:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> I think we have a very good blueprint to follow
>> > > >> including
>> > > >> > at
>> > > >> > > > > > least 5
>> > > >> > > > > > > > >>>> other
>> > > >> > > > > > > > >>>> ASF projects that also passed the review of the
>> > > >> > privacy@asf.
>> > > >> > > > > And
>> > > >> > > > > > > > while I
>> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
>> > > default
>> > > >> > > coming
>> > > >> > > > > from
>> > > >> > > > > > > > >>>> consumer
>> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is
>> > not
>> > > a
>> > > >> > > > consumer
>> > > >> > > > > > > > >>>> software and is used in "corporate environment"
>> > which
>> > > >> has
>> > > >> > a
>> > > >> > > > > little
>> > > >> > > > > > > > >>>> different expectations and broad assumption that
>> > the
>> > > >> > company
>> > > >> > > > can
>> > > >> > > > > > make
>> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the
>> > > employees
>> > > >> > using
>> > > >> > > > it.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't
>> we
>> > > >> collect
>> > > >> > > > > hamper
>> > > >> > > > > > > > >>> security (and we should aim to do the same), most
>> > > >> security
>> > > >> > > > > > concerned
>> > > >> > > > > > > > folks
>> > > >> > > > > > > > >>> don't just
>> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding
>> release
>> > > >> notes or
>> > > >> > > > > > > > announcements
>> > > >> > > > > > > > >>> and we can make it very clear in our
>> announcements
>> > > too;
>> > > >> and
>> > > >> > > in
>> > > >> > > > > our
>> > > >> > > > > > > > >>> installation guides.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> We should assume that those who deploy and
>> upgrade
>> > > >> Airflow
>> > > >> > -
>> > > >> > > > > > actually
>> > > >> > > > > > > > read
>> > > >> > > > > > > > >>>> and take into account what is written in the
>> > release
>> > > >> > notes -
>> > > >> > > > > > > > especially
>> > > >> > > > > > > > >>>> if
>> > > >> > > > > > > > >>>> they have security guys breathing their necks,
>> > > >> similarly
>> > > >> > as
>> > > >> > > we
>> > > >> > > > > > have to
>> > > >> > > > > > > > >>>> assume they follow CVE announcements about
>> security
>> > > >> issues
>> > > >> > > > > fixed.
>> > > >> > > > > > If
>> > > >> > > > > > > > we
>> > > >> > > > > > > > >>>> are very straightforward and out-going about the
>> > > >> change,
>> > > >> > > > inform
>> > > >> > > > > > very
>> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big
>> problem
>> > > with
>> > > >> > > > opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> To be clear, the collection of data, or at least
>> the
>> > > >> data
>> > > >> > we
>> > > >> > > > > should
>> > > >> > > > > > > > >>> gather here should help all the consumers without
>> > > >> violating
>> > > >> > > > > > anything
>> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
>> > > use-case
>> > > >> > doc
>> > > >> > > > [1]
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> "*Another Form of Contributing*
>> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to
>> > open
>> > > >> > > > source?’,
>> > > >> > > > > > ‘I've
>> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > > >> > engineer.’
>> > > >> > > > > > Actually,
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
>> ‘my
>> > > >> > > > organization
>> > > >> > > > > > gets
>> > > >> > > > > > > > real
>> > > >> > > > > > > > >>> value from this piece of software.’ There are a
>> > bunch
>> > > of
>> > > >> > ways
>> > > >> > > > to
>> > > >> > > > > > let
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
>> > your
>> > > >> > > > > > organization is
>> > > >> > > > > > > > >>> getting a lot of value from a piece of open
>> source
>> > > >> > software,
>> > > >> > > > make
>> > > >> > > > > > sure
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> devs know about it.”"
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> [1]
>> > > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
>> > > >> > > > > kxe...@apache.org>
>> > > >> > > > > > > > wrote:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> Hi Jarek!
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a
>> project
>> > > >> view.
>> > > >> > I
>> > > >> > > > just
>> > > >> > > > > > > > suddenly
>> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens
>> and
>> > > here
>> > > >> > > comes
>> > > >> > > > > the
>> > > >> > > > > > > > data to
>> > > >> > > > > > > > >>>> some third party service - that's a view from a
>> > user
>> > > >> side
>> > > >> > of
>> > > >> > > > > some
>> > > >> > > > > > big
>> > > >> > > > > > > > >>>> company.
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> There could be good alternatives to handle this:
>> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
>> > > >> feature
>> > > >> > > with
>> > > >> > > > > > users
>> > > >> > > > > > > > who
>> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
>> > think
>> > > >> it's
>> > > >> > > > wise
>> > > >> > > > > to
>> > > >> > > > > > > > handle
>> > > >> > > > > > > > >>>> edge cases and configure collected data more
>> > > >> accurately.
>> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to
>> > make
>> > > >> this
>> > > >> > > > > > feature not
>> > > >> > > > > > > > >>>> get
>> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (:
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> --
>> > > >> > > > > > > > >>>> ,,,^..^,,,
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
>> > > >> > > > ja...@potiuk.com>
>> > > >> > > > > > > > wrote:
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>>> Hello everyone,
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> it has to be:
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security
>> guys
>> > > >> about
>> > > >> > new
>> > > >> > > > > > unplanned
>> > > >> > > > > > > > >>>>>> activity after regular upgrade.
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> That's a very good point about security
>> triggering
>> > > >> > > Alexander,
>> > > >> > > > > > but I
>> > > >> > > > > > > > am
>> > > >> > > > > > > > >>>> not
>> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in.
>> > There
>> > > >> are
>> > > >> > > other
>> > > >> > > > > > ways of
>> > > >> > > > > > > > >>>>> communicating with the "deployment managers"
>> who
>> > > >> install
>> > > >> > > and
>> > > >> > > > > > upgrade
>> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social
>> media
>> > of
>> > > >> > ours,
>> > > >> > > > > slack
>> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels
>> we
>> > can
>> > > >> use
>> > > >> > to
>> > > >> > > > > > > > >>>> communicate the
>> > > >> > > > > > > > >>>>> change.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> I think we have a very good blueprint to follow
>> > > >> including
>> > > >> > > at
>> > > >> > > > > > least 5
>> > > >> > > > > > > > >>>> other
>> > > >> > > > > > > > >>>>> ASF projects that also passed the review of the
>> > > >> > > privacy@asf.
>> > > >> > > > > And
>> > > >> > > > > > > > >>>> while I
>> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
>> > > default
>> > > >> > > coming
>> > > >> > > > > > from
>> > > >> > > > > > > > >>>> consumer
>> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow
>> is
>> > > not a
>> > > >> > > > consumer
>> > > >> > > > > > > > >>>>> software and is used in "corporate environment"
>> > > which
>> > > >> > has a
>> > > >> > > > > > little
>> > > >> > > > > > > > >>>>> different expectations and broad assumption
>> that
>> > the
>> > > >> > > company
>> > > >> > > > > can
>> > > >> > > > > > make
>> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
>> > > employees
>> > > >> > > using
>> > > >> > > > > it.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> We should assume that those who deploy and
>> upgrade
>> > > >> > Airflow
>> > > >> > > -
>> > > >> > > > > > actually
>> > > >> > > > > > > > >>>> read
>> > > >> > > > > > > > >>>>> and take into account what is written in the
>> > release
>> > > >> > notes
>> > > >> > > -
>> > > >> > > > > > > > >>>> especially if
>> > > >> > > > > > > > >>>>> they have security guys breathing their necks,
>> > > >> similarly
>> > > >> > as
>> > > >> > > > we
>> > > >> > > > > > have
>> > > >> > > > > > > > to
>> > > >> > > > > > > > >>>>> assume they follow CVE announcements about
>> > security
>> > > >> > issues
>> > > >> > > > > > fixed. If
>> > > >> > > > > > > > we
>> > > >> > > > > > > > >>>>> are very straightforward and out-going about
>> the
>> > > >> change,
>> > > >> > > > inform
>> > > >> > > > > > very
>> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big
>> problem
>> > > with
>> > > >> > > > opt-out.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o
>> (but
>> > I'v
>> > > >> > spend
>> > > >> > > a
>> > > >> > > > > good
>> > > >> > > > > > > > deal
>> > > >> > > > > > > > >>>> of
>> > > >> > > > > > > > >>>>> time reading the Superset  and other use case
>> and
>> > > >> > > explanation
>> > > >> > > > > in
>> > > >> > > > > > > > >>>> detail to
>> > > >> > > > > > > > >>>>> make a better informed decision) - and it looks
>> > like
>> > > >> they
>> > > >> > > > also
>> > > >> > > > > > went
>> > > >> > > > > > > > >>>> opt-out
>> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
>> > > cannot
>> > > >> > > reach
>> > > >> > > > > > > > >>>> consensus, we
>> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on
>> it
>> > > >> (because
>> > > >> > > > yes,
>> > > >> > > > > > it is
>> > > >> > > > > > > > an
>> > > >> > > > > > > > >>>>> important decision), but - after reading and
>> > > >> > understanding
>> > > >> > > > why
>> > > >> > > > > > others
>> > > >> > > > > > > > >>>> also
>> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good
>> > path.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> Also because it will rather increase the
>> amount of
>> > > >> data
>> > > >> > to
>> > > >> > > > > > gather,
>> > > >> > > > > > > > and
>> > > >> > > > > > > > >>>> in
>> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be
>> even
>> > > >> better
>> > > >> > for
>> > > >> > > > > > privacy
>> > > >> > > > > > > > and
>> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we
>> get,
>> > > the
>> > > >> > more
>> > > >> > > > > > difficult
>> > > >> > > > > > > > >>>> it
>> > > >> > > > > > > > >>>>> will be to get any
>> non-statistical/non-aggregated
>> > > >> insight
>> > > >> > > > from
>> > > >> > > > > > it.
>> > > >> > > > > > > > >>>> Imagine
>> > > >> > > > > > > > >>>>> if only a few corporate users will enable it
>> > > >> consciously
>> > > >> > -
>> > > >> > > > then
>> > > >> > > > > > we
>> > > >> > > > > > > > >>>> will be
>> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find
>> out
>> > > who
>> > > >> > they
>> > > >> > > > are,
>> > > >> > > > > > than
>> > > >> > > > > > > > if
>> > > >> > > > > > > > >>>>> everyone has it enabled by default.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to
>> us to
>> > > >> vote,
>> > > >> > > for
>> > > >> > > > me
>> > > >> > > > > > > > opt-in
>> > > >> > > > > > > > >>>> is
>> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> J.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>>> Hi all,
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for
>> > Airflow
>> > > >> > > > > > installations.
>> > > >> > > > > > > > >>>> As the
>> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying
>> heavily
>> > on
>> > > >> the
>> > > >> > > > yearly
>> > > >> > > > > > > > >>>> Airflow
>> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key
>> > questions
>> > > >> > about
>> > > >> > > > > > Airflow
>> > > >> > > > > > > > >>>> usage.
>> > > >> > > > > > > > >>>>>>> Questions like the following:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
>> > > >> > installing/using
>> > > >> > > > now
>> > > >> > > > > > > > >>>> (i.e.
>> > > >> > > > > > > > >>>>>>>   whether people have primarily made the jump
>> > from
>> > > >> > > version
>> > > >> > > > X
>> > > >> > > > > to
>> > > >> > > > > > > > >>>>> version
>> > > >> > > > > > > > >>>>>> Y)
>> > > >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and
>> > which
>> > > >> > version
>> > > >> > > > e.g
>> > > >> > > > > > Pg
>> > > >> > > > > > > > >>>> 14?
>> > > >> > > > > > > > >>>>>>>   - What Python version is being used?
>> > > >> > > > > > > > >>>>>>>   - Which Executor is being used?
>> > > >> > > > > > > > >>>>>>>   - Approximately how many people out there
>> in
>> > the
>> > > >> > world
>> > > >> > > > are
>> > > >> > > > > > > > >>>>> installing
>> > > >> > > > > > > > >>>>>>>   Airflow
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> There is a solution that should help answer
>> > these
>> > > >> > > > questions:
>> > > >> > > > > > Scarf
>> > > >> > > > > > > > >>>> [1].
>> > > >> > > > > > > > >>>>>> The
>> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is
>> already
>> > > >> used
>> > > >> > by
>> > > >> > > > > other
>> > > >> > > > > > ASF
>> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler
>> [5],
>> > > Dubbo
>> > > >> > > > > > Kubernetes,
>> > > >> > > > > > > > >>>>> DevLake,
>> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
>> > > regulations.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it
>> as
>> > > >> follows:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and
>> > bundle
>> > > >> it
>> > > >> > in
>> > > >> > > > the
>> > > >> > > > > > > > >>>>> Webserver.
>> > > >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
>> > > >> webserver is
>> > > >> > > > > opened,
>> > > >> > > > > > > > >>>>> metadata
>> > > >> > > > > > > > >>>>>>> is
>> > > >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
>> > > >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we
>> can
>> > > >> use in
>> > > >> > > > front
>> > > >> > > > > > of
>> > > >> > > > > > > > >>>>> docker
>> > > >> > > > > > > > >>>>>>>   containers. While it’s possible people go
>> > around
>> > > >> this
>> > > >> > > > > > gateway,
>> > > >> > > > > > > > >>>> we
>> > > >> > > > > > > > >>>>> can
>> > > >> > > > > > > > >>>>>>>   probably configure and encourage most
>> traffic
>> > to
>> > > >> go
>> > > >> > > > through
>> > > >> > > > > > > > >>>> these
>> > > >> > > > > > > > >>>>>>> gateways.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally
>> > > >> identifying
>> > > >> > > > > > information
>> > > >> > > > > > > > >>>> from
>> > > >> > > > > > > > >>>>>> SDK
>> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
>> > > >> IP-derived
>> > > >> > > > > > > > >>>> information as
>> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made
>> as
>> > > >> > > transparent
>> > > >> > > > as
>> > > >> > > > > > > > >>>> possible
>> > > >> > > > > > > > >>>>> by
>> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC
>> and
>> > > any
>> > > >> > > other
>> > > >> > > > > > relevant
>> > > >> > > > > > > > >>>>> means
>> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter
>> (Town
>> > > >> Hall,
>> > > >> > > > Slack,
>> > > >> > > > > > > > >>>> Newsletter
>> > > >> > > > > > > > >>>>>>> etc).
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> The following case studies are worth reading:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   1.
>> > > >> > > > > >
>> > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > >> > > > > > > > >>>>> (From
>> > > >> > > > > > > > >>>>>>>   Maxime)
>> > > >> > > > > > > > >>>>>>>   2.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various
>> ways
>> > > >> that
>> > > >> > > come
>> > > >> > > > > with
>> > > >> > > > > > > > >>>> using
>> > > >> > > > > > > > >>>>>> data
>> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on
>> > "how
>> > > >> to
>> > > >> > > > > opt-out"
>> > > >> > > > > > > > >>>>>> [8][9][10] &
>> > > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
>> > > >> website,
>> > > >> > > this
>> > > >> > > > > > can be
>> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we
>> would
>> > be
>> > > >> > making
>> > > >> > > > more
>> > > >> > > > > > > > >>>> informed
>> > > >> > > > > > > > >>>>>>> decisions.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Regards,
>> > > >> > > > > > > > >>>>>>> Kaxil
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
>> > > >> > > > > > > > >>>>>>> [2]
>> > > >> > > > > >
>> > > https://privacy.apache.org/policies/privacy-policy-public.html
>> > > >> > > > > > > > >>>>>>> [3]
>> > > https://privacy.apache.org/faq/committers.html
>> > > >> > > > > > > > >>>>>>> [4]
>> > > https://github.com/apache/superset/issues/25639
>> > > >> > > > > > > > >>>>>>> [5]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
>> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
>> > > >> > > > > > > > >>>>>>> [8]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>> > > >> > > > > > > > >>>>>>> [9]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>> > > >> > > > > > > > >>>>>>> [10]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > >
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> > > > > > > > To unsubscribe, e-mail:
>> > > dev-unsubscr...@airflow.apache.org
>> > > >> > > > > > > > For additional commands, e-mail:
>> > > >> dev-h...@airflow.apache.org
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> >
>> > ---------------------------------------------------------------------
>> > > >> > > > > > To unsubscribe, e-mail:
>> dev-unsubscr...@airflow.apache.org
>> > > >> > > > > > For additional commands, e-mail:
>> > dev-h...@airflow.apache.org
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Reply via email to