+1 looks like a good tool which could be super helpful.

* We should have some transparency into the data that is collected or sent
* We should have an option to optionally opt-out

Thanks & Regards,
Amogh Desai


On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <[email protected]> wrote:

> +1 to this. It would be really useful. As long as we can opt out, I think
> we’re good.
>
> Best,
> Wei
>
> > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <[email protected]> wrote:
> >
> > Grammar Correction:
> >
> > We should assume that those who deploy and upgrade Airflow - actually
> read
> >> and take into account what is written in the release notes - especially
> if
> >> they have security guys breathing their necks, similarly as we have to
> >> assume they follow CVE announcements about security issues fixed. If we
> >> are very straightforward and out-going about the change, inform very
> >> clearly how to opt-out, I don't see a big problem with opt-out.
> >
> >
> > I couldn't agree more; even though we shouldn't collect any data that
> > hamper security (and we should aim to do the same), most security
> concerned
> > folks don't just upgrade, and we can rely on them regarding release notes
> > or announcements and we can make it very clear in our announcements too;
> > and in our installation guides.
> >
> > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <[email protected]> wrote:
> >
> >> Grammar crrection:
> >>
> >>
> >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <[email protected]> wrote:
> >>
> >>> Have this at the end of the email too: but if folks don't read until
> the
> >>> end and quoting Maxime from the use-case blog[1]:
> >>>
> >>> "I think people often ask ‘how do I contribute to open source?’, ‘I've
> >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> the
> >>> very simplest thing that you can do is just say, ‘my organization gets
> real
> >>> value from this piece of software.’ There are a bunch of ways to let
> the
> >>> people know about it – and now Scarf is there. If your organization is
> >>> getting a lot of value from a piece of open source software, make sure
> the
> >>> devs know about it."
> >>>
> >>> What kind of edge cases are you thinking about? I don't think it makes
> >>> sense to have "opt-in" at all. As the goal is to collect data for most
> >>> Airflow installations except for those that don't want to give data,
> then
> >>> "opt-out" is the only way to maximize it. As long as we don't collect
> any
> >>> PII data, this is in-compliance as well.
> >>>
> >>> Imagine someone learning Airflow, if they have to opt-in via a config,
> >>> they wouldn't even know or care about it, hence us losing most of the
> data.
> >>> I understand why some orgs & individuals may want to opt-out.
> >>>
> >>> Scarf Provides tracking pixels (essentially an HTML image tag) that you
> >>> can place in your website or product to track visitors to that URL. If
> >>> there were any concerns about Privacy, ASF wouldn't have approved it
> at all.
> >>>
> >>> A few key details to note about the pixel:
> >>>
> >>>
> >>>   - No PII is tracked… Scarf does not capture/retain IP information…
> >>>   this information is discarded by the platform upon
> processing/aggregating
> >>>   - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
> >>>   these users will not be tracked whatsoever.
> >>>
> >>>
> >>> All the ASF projects I had listed (whether they use Scarf gateway or
> >>> Scarf pixel in product) are using opt-out.
> >>>
> >>> 1. Short opt-in period before opt-out. Test this feature with users who
> >>>> trust and if it works great - make it public. I think it's wise to
> handle
> >>>> edge cases and configure collected data more accurately.
> >>>
> >>>
> >>>
> >>> It would be a pixel in the webserver, should affect nothing at all even
> >>> in an air-gapped environment.
> >>>
> >>>> 2. It should not affect anything if access to the internet is
> restricted
> >>>> which is default for many companies.
> >>>
> >>>
> >>>
> >>> 100% agreed on the below:
> >>>
> >>>> I think we have a very good blueprint to follow including at least 5
> >>>> other
> >>>> ASF projects that also passed the review of the privacy@asf. And
> while I
> >>>> understand (and concur) the urge for opt-in by default coming from
> >>>> consumer
> >>>> market (where it makes perfect sense) Airflow is not a consumer
> >>>> software and is used in "corporate environment" which has a little
> >>>> different expectations and broad assumption that the company can make
> >>>> decisions on such telemetry on behalf of the employees using it.
> >>>
> >>>
> >>> Couldn't agree more; even though there shouldn't we collect hamper
> >>> security (and we should aim to do the same), most security concerned
> folks
> >>> don't just
> >>> upgrade, and we can rely on them regarding release notes or
> announcements
> >>> and we can make it very clear in our announcements too; and in our
> >>> installation guides.
> >>>
> >>> We should assume that those who deploy and upgrade Airflow - actually
> read
> >>>> and take into account what is written in the release notes -
> especially
> >>>> if
> >>>> they have security guys breathing their necks, similarly as we have to
> >>>> assume they follow CVE announcements about security issues fixed. If
> we
> >>>> are very straightforward and out-going about the change, inform very
> >>>> clearly how to opt-out, I don't see a big problem with opt-out.
> >>>
> >>>
> >>>
> >>> To be clear, the collection of data, or at least the data we should
> >>> gather here should help all the consumers without violating anything
> >>> regulations. I will quote Maxime's quote in the use-case doc [1]
> >>>
> >>> "*Another Form of Contributing*
> >>> “I think people often ask ‘how do I contribute to open source?’, ‘I've
> >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> the
> >>> very simplest thing that you can do is just say, ‘my organization gets
> real
> >>> value from this piece of software.’ There are a bunch of ways to let
> the
> >>> people know about it – and now Scarf is there. If your organization is
> >>> getting a lot of value from a piece of open source software, make sure
> the
> >>> devs know about it.”"
> >>>
> >>>
> >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
> >>>
> >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <[email protected]>
> wrote:
> >>>
> >>>> Hi Jarek!
> >>>>
> >>>> I understand the reasons for opt-out from a project view. I just
> suddenly
> >>>> imagined the situation when an upgrade happens and here comes the
> data to
> >>>> some third party service - that's a view from a user side of some big
> >>>> company.
> >>>>
> >>>> There could be good alternatives to handle this:
> >>>> 1. Short opt-in period before opt-out. Test this feature with users
> who
> >>>> trust and if it works great - make it public. I think it's wise to
> handle
> >>>> edge cases and configure collected data more accurately.
> >>>> 2. Explicitly somehow warn about this feature to make this feature not
> >>>> get
> >>>> unnoticed. Just to reduce possible frustration.
> >>>>
> >>>> Just a personal thoughts for discussion (:
> >>>>
> >>>> --
> >>>> ,,,^..^,,,
> >>>>
> >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <[email protected]>
> wrote:
> >>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> it has to be:
> >>>>>
> >>>>> 1. Opt-in by default to not trigger security guys about new unplanned
> >>>>>> activity after regular upgrade.
> >>>>>>
> >>>>>
> >>>>> That's a very good point about security triggering Alexander, but I
> am
> >>>> not
> >>>>> so sure it means that we "have to" do opt-in. There are other ways of
> >>>>> communicating with the "deployment managers" who install and upgrade
> >>>>> airflow - i.e. release notes. blogs, social media of ours, slack
> >>>>> announcements etc. We have plenty of channels we can use to
> >>>> communicate the
> >>>>> change.
> >>>>>
> >>>>> I think we have a very good blueprint to follow including at least 5
> >>>> other
> >>>>> ASF projects that also passed the review of the privacy@asf. And
> >>>> while I
> >>>>> understand (and concur) the urge for opt-in by default coming from
> >>>> consumer
> >>>>> market (where it makes perfect sense) Airflow is not a consumer
> >>>>> software and is used in "corporate environment" which has a little
> >>>>> different expectations and broad assumption that the company can make
> >>>>> decisions on such telemetry on behalf of the employees using it.
> >>>>>
> >>>>> We should assume that those who deploy and upgrade Airflow - actually
> >>>> read
> >>>>> and take into account what is written in the release notes -
> >>>> especially if
> >>>>> they have security guys breathing their necks, similarly as we have
> to
> >>>>> assume they follow CVE announcements about security issues fixed. If
> we
> >>>>> are very straightforward and out-going about the change, inform very
> >>>>> clearly how to opt-out, I don't see a big problem with opt-out.
> >>>>>
> >>>>> We should of course check with [email protected] (but I'v spend a good
> deal
> >>>> of
> >>>>> time reading the Superset  and other use case and explanation in
> >>>> detail to
> >>>>> make a better informed decision) - and it looks like they also went
> >>>> opt-out
> >>>>> way and got cleared by [email protected].  And if we cannot reach
> >>>> consensus, we
> >>>>> should - as usual - make a voting decision on it (because yes, it is
> an
> >>>>> important decision), but - after reading and understanding why others
> >>>> also
> >>>>> did it - for me personally, opt-out is a good path.
> >>>>>
> >>>>> Also because it will rather increase the amount of data to gather,
> and
> >>>> in
> >>>>> our case - counter intuitively - it will be even better for privacy
> and
> >>>>> corporate anonymity, because the more data we get, the more difficult
> >>>> it
> >>>>> will be to get any non-statistical/non-aggregated insight from it.
> >>>> Imagine
> >>>>> if only a few corporate users will enable it consciously - then we
> >>>> will be
> >>>>> able to draw much more conclusions if we find out who they are, than
> if
> >>>>> everyone has it enabled by default.
> >>>>>
> >>>>> That's my take on it - but again, it's up to us to vote, for me
> opt-in
> >>>> is
> >>>>> not "has to", and I am rather for opt-out.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>>
> >>>>>>> I want to propose gathering telemetry for Airflow installations.
> >>>> As the
> >>>>>>> Airflow community, we have been relying heavily on the yearly
> >>>> Airflow
> >>>>>>> Survey and anecdotes to answer a few key questions about Airflow
> >>>> usage.
> >>>>>>> Questions like the following:
> >>>>>>>
> >>>>>>>
> >>>>>>>   - Which versions of Airflow are people installing/using now
> >>>> (i.e.
> >>>>>>>   whether people have primarily made the jump from version X to
> >>>>> version
> >>>>>> Y)
> >>>>>>>   - Which DB is used as the Metadata DB and which version e.g Pg
> >>>> 14?
> >>>>>>>   - What Python version is being used?
> >>>>>>>   - Which Executor is being used?
> >>>>>>>   - Approximately how many people out there in the world are
> >>>>> installing
> >>>>>>>   Airflow
> >>>>>>>
> >>>>>>>
> >>>>>>> There is a solution that should help answer these questions: Scarf
> >>>> [1].
> >>>>>> The
> >>>>>>> ASF already approves Scarf [2][3] and is already used by other ASF
> >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
> >>>>> DevLake,
> >>>>>>> Skywalking as it follows GDPR and other regulations.
> >>>>>>>
> >>>>>>> Similar to Superset, we probably can use it as follows:
> >>>>>>>
> >>>>>>>
> >>>>>>>   1. Install the `scarf js` npm package and bundle it in the
> >>>>> Webserver.
> >>>>>>>   When the package is downloaded & Airflow webserver is opened,
> >>>>> metadata
> >>>>>>> is
> >>>>>>>   recorded to the Scarf dashboard.
> >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front of
> >>>>> docker
> >>>>>>>   containers. While it’s possible people go around this gateway,
> >>>> we
> >>>>> can
> >>>>>>>   probably configure and encourage most traffic to go through
> >>>> these
> >>>>>>> gateways.
> >>>>>>>
> >>>>>>> While Scarf does not store any personally identifying information
> >>>> from
> >>>>>> SDK
> >>>>>>> telemetry data, it does send various bits of IP-derived
> >>>> information as
> >>>>>>> outlined here [7]. This data should be made as transparent as
> >>>> possible
> >>>>> by
> >>>>>>> granting dashboard access to the Airflow PMC and any other relevant
> >>>>> means
> >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
> >>>> Newsletter
> >>>>>>> etc).
> >>>>>>>
> >>>>>>> The following case studies are worth reading:
> >>>>>>>
> >>>>>>>   1. https://about.scarf.sh/post/scarf-case-study-apache-superset
> >>>>> (From
> >>>>>>>   Maxime)
> >>>>>>>   2.
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >>>>>>>
> >>>>>>> Similar to them, this could help in various ways that come with
> >>>> using
> >>>>>> data
> >>>>>>> for decision-making. With clear guidelines on "how to opt-out"
> >>>>>> [8][9][10] &
> >>>>>>> "what data is being collected" on the Airflow website, this can be
> >>>>>>> beneficial to the entire community as we would be making more
> >>>> informed
> >>>>>>> decisions.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Kaxil
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://about.scarf.sh/
> >>>>>>> [2] https://privacy.apache.org/policies/privacy-policy-public.html
> >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> >>>>>>> [4] https://github.com/apache/superset/issues/25639
> >>>>>>> [5]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> >>>>>>> [7] https://about.scarf.sh/privacy-policy
> >>>>>>> [8]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> >>>>>>> [9]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> >>>>>>> [10]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to