Grammar crrection:

On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <kaxiln...@gmail.com> wrote:

> Have this at the end of the email too: but if folks don't read until the
> end and quoting Maxime from the use-case blog[1]:
>
> "I think people often ask ‘how do I contribute to open source?’, ‘I've got
> to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very
> simplest thing that you can do is just say, ‘my organization gets real
> value from this piece of software.’ There are a bunch of ways to let the
> people know about it – and now Scarf is there. If your organization is
> getting a lot of value from a piece of open source software, make sure the
> devs know about it."
>
> What kind of edge cases are you thinking about? I don't think it makes
> sense to have "opt-in" at all. As the goal is to collect data for most
> Airflow installations except for those that don't want to give data, then
> "opt-out" is the only way to maximize it. As long as we don't collect any
> PII data, this is in-compliance as well.
>
> Imagine someone learning Airflow, if they have to opt-in via a config,
> they wouldn't even know or care about it, hence us losing most of the data.
> I understand why some orgs & individuals may want to opt-out.
>
> Scarf Provides tracking pixels (essentially an HTML image tag) that you
> can place in your website or product to track visitors to that URL. If
> there were any concerns about Privacy, ASF wouldn't have approved it at all.
>
> A few key details to note about the pixel:
>
>
>    - No PII is tracked… Scarf does not capture/retain IP information…
>    this information is discarded by the platform upon processing/aggregating
>    - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
>    these users will not be tracked whatsoever.
>
>
> All the ASF projects I had listed (whether they use Scarf gateway or Scarf
> pixel in product) are using opt-out.
>
> 1. Short opt-in period before opt-out. Test this feature with users who
>> trust and if it works great - make it public. I think it's wise to handle
>> edge cases and configure collected data more accurately.
>
>
>
> It would be a pixel in the webserver, should affect nothing at all even in
> an air-gapped environment.
>
>> 2. It should not affect anything if access to the internet is restricted
>> which is default for many companies.
>
>
>
> 100% agreed on the below:
>
>> I think we have a very good blueprint to follow including at least 5 other
>> ASF projects that also passed the review of the privacy@asf. And while I
>> understand (and concur) the urge for opt-in by default coming from
>> consumer
>> market (where it makes perfect sense) Airflow is not a consumer
>> software and is used in "corporate environment" which has a little
>> different expectations and broad assumption that the company can make
>> decisions on such telemetry on behalf of the employees using it.
>
>
> Couldn't agree more; even though there shouldn't we collect hamper
> security (and we should aim to do the same), most security concerned folks
> don't just
> upgrade, and we can rely on them regarding release notes or announcements
> and we can make it very clear in our announcements too; and in our
> installation guides.
>
> We should assume that those who deploy and upgrade Airflow - actually read
>> and take into account what is written in the release notes - especially if
>> they have security guys breathing their necks, similarly as we have to
>> assume they follow CVE announcements about security issues fixed. If we
>> are very straightforward and out-going about the change, inform very
>> clearly how to opt-out, I don't see a big problem with opt-out.
>
>
>
> To be clear, the collection of data, or at least the data we should gather
> here should help all the consumers without violating anything regulations.
> I will quote Maxime's quote in the use-case doc [1]
>
> "*Another Form of Contributing*
> “I think people often ask ‘how do I contribute to open source?’, ‘I've got
> to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very
> simplest thing that you can do is just say, ‘my organization gets real
> value from this piece of software.’ There are a bunch of ways to let the
> people know about it – and now Scarf is there. If your organization is
> getting a lot of value from a piece of open source software, make sure the
> devs know about it.”"
>
>
> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
>
> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kxe...@apache.org> wrote:
>
>> Hi Jarek!
>>
>> I understand the reasons for opt-out from a project view. I just suddenly
>> imagined the situation when an upgrade happens and here comes the data to
>> some third party service - that's a view from a user side of some big
>> company.
>>
>> There could be good alternatives to handle this:
>> 1. Short opt-in period before opt-out. Test this feature with users who
>> trust and if it works great - make it public. I think it's wise to handle
>> edge cases and configure collected data more accurately.
>> 2. Explicitly somehow warn about this feature to make this feature not get
>> unnoticed. Just to reduce possible frustration.
>>
>> Just a personal thoughts for discussion (:
>>
>> --
>> ,,,^..^,,,
>>
>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> > Hello everyone,
>> >
>> > it has to be:
>> >
>> > 1. Opt-in by default to not trigger security guys about new unplanned
>> > > activity after regular upgrade.
>> > >
>> >
>> > That's a very good point about security triggering Alexander, but I am
>> not
>> > so sure it means that we "have to" do opt-in. There are other ways of
>> > communicating with the "deployment managers" who install and upgrade
>> > airflow - i.e. release notes. blogs, social media of ours, slack
>> > announcements etc. We have plenty of channels we can use to communicate
>> the
>> > change.
>> >
>> > I think we have a very good blueprint to follow including at least 5
>> other
>> > ASF projects that also passed the review of the privacy@asf. And while
>> I
>> > understand (and concur) the urge for opt-in by default coming from
>> consumer
>> > market (where it makes perfect sense) Airflow is not a consumer
>> > software and is used in "corporate environment" which has a little
>> > different expectations and broad assumption that the company can make
>> > decisions on such telemetry on behalf of the employees using it.
>> >
>> > We should assume that those who deploy and upgrade Airflow - actually
>> read
>> > and take into account what is written in the release notes - especially
>> if
>> > they have security guys breathing their necks, similarly as we have to
>> > assume they follow CVE announcements about security issues fixed. If we
>> > are very straightforward and out-going about the change, inform very
>> > clearly how to opt-out, I don't see a big problem with opt-out.
>> >
>> > We should of course check with privacy@a.o (but I'v spend a good deal
>> of
>> > time reading the Superset  and other use case and explanation in detail
>> to
>> > make a better informed decision) - and it looks like they also went
>> opt-out
>> > way and got cleared by privacy@a.o.  And if we cannot reach consensus,
>> we
>> > should - as usual - make a voting decision on it (because yes, it is an
>> > important decision), but - after reading and understanding why others
>> also
>> > did it - for me personally, opt-out is a good path.
>> >
>> > Also because it will rather increase the amount of data to gather, and
>> in
>> > our case - counter intuitively - it will be even better for privacy and
>> > corporate anonymity, because the more data we get, the more difficult it
>> > will be to get any non-statistical/non-aggregated insight from it.
>> Imagine
>> > if only a few corporate users will enable it consciously - then we will
>> be
>> > able to draw much more conclusions if we find out who they are, than if
>> > everyone has it enabled by default.
>> >
>> > That's my take on it - but again, it's up to us to vote, for me opt-in
>> is
>> > not "has to", and I am rather for opt-out.
>> >
>> > J.
>> >
>> > > Hi all,
>> > >
>> > >
>> > > > I want to propose gathering telemetry for Airflow installations. As
>> the
>> > > > Airflow community, we have been relying heavily on the yearly
>> Airflow
>> > > > Survey and anecdotes to answer a few key questions about Airflow
>> usage.
>> > > > Questions like the following:
>> > > >
>> > > >
>> > > >    - Which versions of Airflow are people installing/using now (i.e.
>> > > >    whether people have primarily made the jump from version X to
>> > version
>> > > Y)
>> > > >    - Which DB is used as the Metadata DB and which version e.g Pg
>> 14?
>> > > >    - What Python version is being used?
>> > > >    - Which Executor is being used?
>> > > >    - Approximately how many people out there in the world are
>> > installing
>> > > >    Airflow
>> > > >
>> > > >
>> > > > There is a solution that should help answer these questions: Scarf
>> [1].
>> > > The
>> > > > ASF already approves Scarf [2][3] and is already used by other ASF
>> > > > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
>> > DevLake,
>> > > > Skywalking as it follows GDPR and other regulations.
>> > > >
>> > > > Similar to Superset, we probably can use it as follows:
>> > > >
>> > > >
>> > > >    1. Install the `scarf js` npm package and bundle it in the
>> > Webserver.
>> > > >    When the package is downloaded & Airflow webserver is opened,
>> > metadata
>> > > > is
>> > > >    recorded to the Scarf dashboard.
>> > > >    2. Utilize the Scarf Gateway [6], which we can use in front of
>> > docker
>> > > >    containers. While it’s possible people go around this gateway, we
>> > can
>> > > >    probably configure and encourage most traffic to go through these
>> > > > gateways.
>> > > >
>> > > > While Scarf does not store any personally identifying information
>> from
>> > > SDK
>> > > > telemetry data, it does send various bits of IP-derived information
>> as
>> > > > outlined here [7]. This data should be made as transparent as
>> possible
>> > by
>> > > > granting dashboard access to the Airflow PMC and any other relevant
>> > means
>> > > > of sharing/surfacing it that we encounter (Town Hall, Slack,
>> Newsletter
>> > > > etc).
>> > > >
>> > > > The following case studies are worth reading:
>> > > >
>> > > >    1. https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > (From
>> > > >    Maxime)
>> > > >    2.
>> > > >
>> > > >
>> > >
>> >
>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>> > > >
>> > > > Similar to them, this could help in various ways that come with
>> using
>> > > data
>> > > > for decision-making. With clear guidelines on "how to opt-out"
>> > > [8][9][10] &
>> > > > "what data is being collected" on the Airflow website, this can be
>> > > > beneficial to the entire community as we would be making more
>> informed
>> > > > decisions.
>> > > >
>> > > > Regards,
>> > > > Kaxil
>> > > >
>> > > >
>> > > > [1] https://about.scarf.sh/
>> > > > [2] https://privacy.apache.org/policies/privacy-policy-public.html
>> > > > [3] https://privacy.apache.org/faq/committers.html
>> > > > [4] https://github.com/apache/superset/issues/25639
>> > > > [5]
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>> > > > [6] https://about.scarf.sh/scarf-gateway
>> > > > [7] https://about.scarf.sh/privacy-policy
>> > > > [8]
>> > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>> > > > [9]
>> > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>> > > > [10]
>> > > >
>> > > >
>> > >
>> >
>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>> > > >
>> > >
>> >
>>
>

Reply via email to