I mentioned opting out just to confirm its importance, and after checking
the Scarf documentation it appears to be supported natively by Scarf. For
data accessibility, my point was more about raw data, not just aggregated
information/insights shared via monthly newsletters, as we do for Airflow
annual Survey for example:
https://airflow.apache.org/survey vs
https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
.

On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Agreed to both your points Hussein but both the points are already covered
> in my original discussion post - both about opting out and providing data
> to all the PMC members and providing visibility via Monthly newsletters. Is
> there anything else you propose to discuss that isn’t covered?
>
>
>
> On Mon, 1 Apr 2024 at 13:21, Hussein Awala <huss...@awala.fr> wrote:
>
> > +1 for the idea in general, but there are two main points to discuss
> before
> > voting on this:
> >
> > 1. We should provide an option to disable Scarf:
> > As Airflow is not a paid product, we cannot force companies to report
> their
> > use of this project. Otherwise, some may choose to create their own fork
> > just to disable Scarf.
> >
> > 2. Concerning the exclusivity of access to data:
> > The data collected must either be completely proprietary for use by PMC
> and
> > ASF, or completely open. Since many companies offer Airflow as a product,
> > it is imperative not to give one company more privileges than others. I
> > raise this point for the principle of equality of opportunity.
> >
> > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <sunank...@gmail.com>
> > wrote:
> >
> > > Big +1 for Scarf.
> > >
> > > Transparency is key, so it's important to be super clear about opting
> > > out and what's tracked to avoid spooking anyone about IP stuff.
> > >
> > > Regards
> > > Ankit Chaurasia
> > >
> > >
> > >
> > >
> > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <amoghdesai....@gmail.com>
> > > wrote:
> > > >
> > > > +1 looks like a good tool which could be super helpful.
> > > >
> > > > * We should have some transparency into the data that is collected or
> > > sent
> > > > * We should have an option to optionally opt-out
> > > >
> > > > Thanks & Regards,
> > > > Amogh Desai
> > > >
> > > >
> > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <weilee...@gmail.com> wrote:
> > > >
> > > > > +1 to this. It would be really useful. As long as we can opt out, I
> > > think
> > > > > we’re good.
> > > > >
> > > > > Best,
> > > > > Wei
> > > > >
> > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Grammar Correction:
> > > > > >
> > > > > > We should assume that those who deploy and upgrade Airflow -
> > actually
> > > > > read
> > > > > >> and take into account what is written in the release notes -
> > > especially
> > > > > if
> > > > > >> they have security guys breathing their necks, similarly as we
> > have
> > > to
> > > > > >> assume they follow CVE announcements about security issues
> fixed.
> > > If we
> > > > > >> are very straightforward and out-going about the change, inform
> > very
> > > > > >> clearly how to opt-out, I don't see a big problem with opt-out.
> > > > > >
> > > > > >
> > > > > > I couldn't agree more; even though we shouldn't collect any data
> > that
> > > > > > hamper security (and we should aim to do the same), most security
> > > > > concerned
> > > > > > folks don't just upgrade, and we can rely on them regarding
> release
> > > notes
> > > > > > or announcements and we can make it very clear in our
> announcements
> > > too;
> > > > > > and in our installation guides.
> > > > > >
> > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Grammar crrection:
> > > > > >>
> > > > > >>
> > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> Have this at the end of the email too: but if folks don't read
> > > until
> > > > > the
> > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > > >>>
> > > > > >>> "I think people often ask ‘how do I contribute to open
> source?’,
> > > ‘I've
> > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > Actually,
> > > > > the
> > > > > >>> very simplest thing that you can do is just say, ‘my
> organization
> > > gets
> > > > > real
> > > > > >>> value from this piece of software.’ There are a bunch of ways
> to
> > > let
> > > > > the
> > > > > >>> people know about it – and now Scarf is there. If your
> > > organization is
> > > > > >>> getting a lot of value from a piece of open source software,
> make
> > > sure
> > > > > the
> > > > > >>> devs know about it."
> > > > > >>>
> > > > > >>> What kind of edge cases are you thinking about? I don't think
> it
> > > makes
> > > > > >>> sense to have "opt-in" at all. As the goal is to collect data
> for
> > > most
> > > > > >>> Airflow installations except for those that don't want to give
> > > data,
> > > > > then
> > > > > >>> "opt-out" is the only way to maximize it. As long as we don't
> > > collect
> > > > > any
> > > > > >>> PII data, this is in-compliance as well.
> > > > > >>>
> > > > > >>> Imagine someone learning Airflow, if they have to opt-in via a
> > > config,
> > > > > >>> they wouldn't even know or care about it, hence us losing most
> of
> > > the
> > > > > data.
> > > > > >>> I understand why some orgs & individuals may want to opt-out.
> > > > > >>>
> > > > > >>> Scarf Provides tracking pixels (essentially an HTML image tag)
> > > that you
> > > > > >>> can place in your website or product to track visitors to that
> > > URL. If
> > > > > >>> there were any concerns about Privacy, ASF wouldn't have
> approved
> > > it
> > > > > at all.
> > > > > >>>
> > > > > >>> A few key details to note about the pixel:
> > > > > >>>
> > > > > >>>
> > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > > information…
> > > > > >>>   this information is discarded by the platform upon
> > > > > processing/aggregating
> > > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> > > browsers -
> > > > > >>>   these users will not be tracked whatsoever.
> > > > > >>>
> > > > > >>>
> > > > > >>> All the ASF projects I had listed (whether they use Scarf
> gateway
> > > or
> > > > > >>> Scarf pixel in product) are using opt-out.
> > > > > >>>
> > > > > >>> 1. Short opt-in period before opt-out. Test this feature with
> > > users who
> > > > > >>>> trust and if it works great - make it public. I think it's
> wise
> > to
> > > > > handle
> > > > > >>>> edge cases and configure collected data more accurately.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> It would be a pixel in the webserver, should affect nothing at
> > all
> > > even
> > > > > >>> in an air-gapped environment.
> > > > > >>>
> > > > > >>>> 2. It should not affect anything if access to the internet is
> > > > > restricted
> > > > > >>>> which is default for many companies.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> 100% agreed on the below:
> > > > > >>>
> > > > > >>>> I think we have a very good blueprint to follow including at
> > > least 5
> > > > > >>>> other
> > > > > >>>> ASF projects that also passed the review of the privacy@asf.
> > And
> > > > > while I
> > > > > >>>> understand (and concur) the urge for opt-in by default coming
> > from
> > > > > >>>> consumer
> > > > > >>>> market (where it makes perfect sense) Airflow is not a
> consumer
> > > > > >>>> software and is used in "corporate environment" which has a
> > little
> > > > > >>>> different expectations and broad assumption that the company
> can
> > > make
> > > > > >>>> decisions on such telemetry on behalf of the employees using
> it.
> > > > > >>>
> > > > > >>>
> > > > > >>> Couldn't agree more; even though there shouldn't we collect
> > hamper
> > > > > >>> security (and we should aim to do the same), most security
> > > concerned
> > > > > folks
> > > > > >>> don't just
> > > > > >>> upgrade, and we can rely on them regarding release notes or
> > > > > announcements
> > > > > >>> and we can make it very clear in our announcements too; and in
> > our
> > > > > >>> installation guides.
> > > > > >>>
> > > > > >>> We should assume that those who deploy and upgrade Airflow -
> > > actually
> > > > > read
> > > > > >>>> and take into account what is written in the release notes -
> > > > > especially
> > > > > >>>> if
> > > > > >>>> they have security guys breathing their necks, similarly as we
> > > have to
> > > > > >>>> assume they follow CVE announcements about security issues
> > fixed.
> > > If
> > > > > we
> > > > > >>>> are very straightforward and out-going about the change,
> inform
> > > very
> > > > > >>>> clearly how to opt-out, I don't see a big problem with
> opt-out.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> To be clear, the collection of data, or at least the data we
> > should
> > > > > >>> gather here should help all the consumers without violating
> > > anything
> > > > > >>> regulations. I will quote Maxime's quote in the use-case doc
> [1]
> > > > > >>>
> > > > > >>> "*Another Form of Contributing*
> > > > > >>> “I think people often ask ‘how do I contribute to open
> source?’,
> > > ‘I've
> > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > Actually,
> > > > > the
> > > > > >>> very simplest thing that you can do is just say, ‘my
> organization
> > > gets
> > > > > real
> > > > > >>> value from this piece of software.’ There are a bunch of ways
> to
> > > let
> > > > > the
> > > > > >>> people know about it – and now Scarf is there. If your
> > > organization is
> > > > > >>> getting a lot of value from a piece of open source software,
> make
> > > sure
> > > > > the
> > > > > >>> devs know about it.”"
> > > > > >>>
> > > > > >>>
> > > > > >>> [1]
> https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > >>>
> > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > kxe...@apache.org>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi Jarek!
> > > > > >>>>
> > > > > >>>> I understand the reasons for opt-out from a project view. I
> just
> > > > > suddenly
> > > > > >>>> imagined the situation when an upgrade happens and here comes
> > the
> > > > > data to
> > > > > >>>> some third party service - that's a view from a user side of
> > some
> > > big
> > > > > >>>> company.
> > > > > >>>>
> > > > > >>>> There could be good alternatives to handle this:
> > > > > >>>> 1. Short opt-in period before opt-out. Test this feature with
> > > users
> > > > > who
> > > > > >>>> trust and if it works great - make it public. I think it's
> wise
> > to
> > > > > handle
> > > > > >>>> edge cases and configure collected data more accurately.
> > > > > >>>> 2. Explicitly somehow warn about this feature to make this
> > > feature not
> > > > > >>>> get
> > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > > >>>>
> > > > > >>>> Just a personal thoughts for discussion (:
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> ,,,^..^,,,
> > > > > >>>>
> > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> ja...@potiuk.com>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> Hello everyone,
> > > > > >>>>>
> > > > > >>>>> it has to be:
> > > > > >>>>>
> > > > > >>>>> 1. Opt-in by default to not trigger security guys about new
> > > unplanned
> > > > > >>>>>> activity after regular upgrade.
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> That's a very good point about security triggering Alexander,
> > > but I
> > > > > am
> > > > > >>>> not
> > > > > >>>>> so sure it means that we "have to" do opt-in. There are other
> > > ways of
> > > > > >>>>> communicating with the "deployment managers" who install and
> > > upgrade
> > > > > >>>>> airflow - i.e. release notes. blogs, social media of ours,
> > slack
> > > > > >>>>> announcements etc. We have plenty of channels we can use to
> > > > > >>>> communicate the
> > > > > >>>>> change.
> > > > > >>>>>
> > > > > >>>>> I think we have a very good blueprint to follow including at
> > > least 5
> > > > > >>>> other
> > > > > >>>>> ASF projects that also passed the review of the privacy@asf.
> > And
> > > > > >>>> while I
> > > > > >>>>> understand (and concur) the urge for opt-in by default coming
> > > from
> > > > > >>>> consumer
> > > > > >>>>> market (where it makes perfect sense) Airflow is not a
> consumer
> > > > > >>>>> software and is used in "corporate environment" which has a
> > > little
> > > > > >>>>> different expectations and broad assumption that the company
> > can
> > > make
> > > > > >>>>> decisions on such telemetry on behalf of the employees using
> > it.
> > > > > >>>>>
> > > > > >>>>> We should assume that those who deploy and upgrade Airflow -
> > > actually
> > > > > >>>> read
> > > > > >>>>> and take into account what is written in the release notes -
> > > > > >>>> especially if
> > > > > >>>>> they have security guys breathing their necks, similarly as
> we
> > > have
> > > > > to
> > > > > >>>>> assume they follow CVE announcements about security issues
> > > fixed. If
> > > > > we
> > > > > >>>>> are very straightforward and out-going about the change,
> inform
> > > very
> > > > > >>>>> clearly how to opt-out, I don't see a big problem with
> opt-out.
> > > > > >>>>>
> > > > > >>>>> We should of course check with privacy@a.o (but I'v spend a
> > good
> > > > > deal
> > > > > >>>> of
> > > > > >>>>> time reading the Superset  and other use case and explanation
> > in
> > > > > >>>> detail to
> > > > > >>>>> make a better informed decision) - and it looks like they
> also
> > > went
> > > > > >>>> opt-out
> > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> > > > > >>>> consensus, we
> > > > > >>>>> should - as usual - make a voting decision on it (because
> yes,
> > > it is
> > > > > an
> > > > > >>>>> important decision), but - after reading and understanding
> why
> > > others
> > > > > >>>> also
> > > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > > >>>>>
> > > > > >>>>> Also because it will rather increase the amount of data to
> > > gather,
> > > > > and
> > > > > >>>> in
> > > > > >>>>> our case - counter intuitively - it will be even better for
> > > privacy
> > > > > and
> > > > > >>>>> corporate anonymity, because the more data we get, the more
> > > difficult
> > > > > >>>> it
> > > > > >>>>> will be to get any non-statistical/non-aggregated insight
> from
> > > it.
> > > > > >>>> Imagine
> > > > > >>>>> if only a few corporate users will enable it consciously -
> then
> > > we
> > > > > >>>> will be
> > > > > >>>>> able to draw much more conclusions if we find out who they
> are,
> > > than
> > > > > if
> > > > > >>>>> everyone has it enabled by default.
> > > > > >>>>>
> > > > > >>>>> That's my take on it - but again, it's up to us to vote, for
> me
> > > > > opt-in
> > > > > >>>> is
> > > > > >>>>> not "has to", and I am rather for opt-out.
> > > > > >>>>>
> > > > > >>>>> J.
> > > > > >>>>>
> > > > > >>>>>> Hi all,
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > > installations.
> > > > > >>>> As the
> > > > > >>>>>>> Airflow community, we have been relying heavily on the
> yearly
> > > > > >>>> Airflow
> > > > > >>>>>>> Survey and anecdotes to answer a few key questions about
> > > Airflow
> > > > > >>>> usage.
> > > > > >>>>>>> Questions like the following:
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   - Which versions of Airflow are people installing/using
> now
> > > > > >>>> (i.e.
> > > > > >>>>>>>   whether people have primarily made the jump from version
> X
> > to
> > > > > >>>>> version
> > > > > >>>>>> Y)
> > > > > >>>>>>>   - Which DB is used as the Metadata DB and which version
> e.g
> > > Pg
> > > > > >>>> 14?
> > > > > >>>>>>>   - What Python version is being used?
> > > > > >>>>>>>   - Which Executor is being used?
> > > > > >>>>>>>   - Approximately how many people out there in the world
> are
> > > > > >>>>> installing
> > > > > >>>>>>>   Airflow
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> There is a solution that should help answer these
> questions:
> > > Scarf
> > > > > >>>> [1].
> > > > > >>>>>> The
> > > > > >>>>>>> ASF already approves Scarf [2][3] and is already used by
> > other
> > > ASF
> > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > > Kubernetes,
> > > > > >>>>> DevLake,
> > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > > >>>>>>>
> > > > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it in
> the
> > > > > >>>>> Webserver.
> > > > > >>>>>>>   When the package is downloaded & Airflow webserver is
> > opened,
> > > > > >>>>> metadata
> > > > > >>>>>>> is
> > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in
> front
> > > of
> > > > > >>>>> docker
> > > > > >>>>>>>   containers. While it’s possible people go around this
> > > gateway,
> > > > > >>>> we
> > > > > >>>>> can
> > > > > >>>>>>>   probably configure and encourage most traffic to go
> through
> > > > > >>>> these
> > > > > >>>>>>> gateways.
> > > > > >>>>>>>
> > > > > >>>>>>> While Scarf does not store any personally identifying
> > > information
> > > > > >>>> from
> > > > > >>>>>> SDK
> > > > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > > > >>>> information as
> > > > > >>>>>>> outlined here [7]. This data should be made as transparent
> as
> > > > > >>>> possible
> > > > > >>>>> by
> > > > > >>>>>>> granting dashboard access to the Airflow PMC and any other
> > > relevant
> > > > > >>>>> means
> > > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall,
> Slack,
> > > > > >>>> Newsletter
> > > > > >>>>>>> etc).
> > > > > >>>>>>>
> > > > > >>>>>>> The following case studies are worth reading:
> > > > > >>>>>>>
> > > > > >>>>>>>   1.
> > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > >>>>> (From
> > > > > >>>>>>>   Maxime)
> > > > > >>>>>>>   2.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > > >>>>>>>
> > > > > >>>>>>> Similar to them, this could help in various ways that come
> > with
> > > > > >>>> using
> > > > > >>>>>> data
> > > > > >>>>>>> for decision-making. With clear guidelines on "how to
> > opt-out"
> > > > > >>>>>> [8][9][10] &
> > > > > >>>>>>> "what data is being collected" on the Airflow website, this
> > > can be
> > > > > >>>>>>> beneficial to the entire community as we would be making
> more
> > > > > >>>> informed
> > > > > >>>>>>> decisions.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Kaxil
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> [1] https://about.scarf.sh/
> > > > > >>>>>>> [2]
> > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > > >>>>>>> [5]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > > >>>>>>> [8]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > > >>>>>>> [9]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > > >>>>>>> [10]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >
>

Reply via email to