+1 to this. It would be really useful. As long as we can opt out, I think we’re 
good.

Best,
Wei

> On Mar 31, 2024, at 12:47 AM, Kaxil Naik <kaxiln...@gmail.com> wrote:
> 
> Grammar Correction:
> 
> We should assume that those who deploy and upgrade Airflow - actually read
>> and take into account what is written in the release notes - especially if
>> they have security guys breathing their necks, similarly as we have to
>> assume they follow CVE announcements about security issues fixed. If we
>> are very straightforward and out-going about the change, inform very
>> clearly how to opt-out, I don't see a big problem with opt-out.
> 
> 
> I couldn't agree more; even though we shouldn't collect any data that
> hamper security (and we should aim to do the same), most security concerned
> folks don't just upgrade, and we can rely on them regarding release notes
> or announcements and we can make it very clear in our announcements too;
> and in our installation guides.
> 
> On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <kaxiln...@gmail.com> wrote:
> 
>> Grammar crrection:
>> 
>> 
>> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <kaxiln...@gmail.com> wrote:
>> 
>>> Have this at the end of the email too: but if folks don't read until the
>>> end and quoting Maxime from the use-case blog[1]:
>>> 
>>> "I think people often ask ‘how do I contribute to open source?’, ‘I've
>>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the
>>> very simplest thing that you can do is just say, ‘my organization gets real
>>> value from this piece of software.’ There are a bunch of ways to let the
>>> people know about it – and now Scarf is there. If your organization is
>>> getting a lot of value from a piece of open source software, make sure the
>>> devs know about it."
>>> 
>>> What kind of edge cases are you thinking about? I don't think it makes
>>> sense to have "opt-in" at all. As the goal is to collect data for most
>>> Airflow installations except for those that don't want to give data, then
>>> "opt-out" is the only way to maximize it. As long as we don't collect any
>>> PII data, this is in-compliance as well.
>>> 
>>> Imagine someone learning Airflow, if they have to opt-in via a config,
>>> they wouldn't even know or care about it, hence us losing most of the data.
>>> I understand why some orgs & individuals may want to opt-out.
>>> 
>>> Scarf Provides tracking pixels (essentially an HTML image tag) that you
>>> can place in your website or product to track visitors to that URL. If
>>> there were any concerns about Privacy, ASF wouldn't have approved it at all.
>>> 
>>> A few key details to note about the pixel:
>>> 
>>> 
>>>   - No PII is tracked… Scarf does not capture/retain IP information…
>>>   this information is discarded by the platform upon processing/aggregating
>>>   - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
>>>   these users will not be tracked whatsoever.
>>> 
>>> 
>>> All the ASF projects I had listed (whether they use Scarf gateway or
>>> Scarf pixel in product) are using opt-out.
>>> 
>>> 1. Short opt-in period before opt-out. Test this feature with users who
>>>> trust and if it works great - make it public. I think it's wise to handle
>>>> edge cases and configure collected data more accurately.
>>> 
>>> 
>>> 
>>> It would be a pixel in the webserver, should affect nothing at all even
>>> in an air-gapped environment.
>>> 
>>>> 2. It should not affect anything if access to the internet is restricted
>>>> which is default for many companies.
>>> 
>>> 
>>> 
>>> 100% agreed on the below:
>>> 
>>>> I think we have a very good blueprint to follow including at least 5
>>>> other
>>>> ASF projects that also passed the review of the privacy@asf. And while I
>>>> understand (and concur) the urge for opt-in by default coming from
>>>> consumer
>>>> market (where it makes perfect sense) Airflow is not a consumer
>>>> software and is used in "corporate environment" which has a little
>>>> different expectations and broad assumption that the company can make
>>>> decisions on such telemetry on behalf of the employees using it.
>>> 
>>> 
>>> Couldn't agree more; even though there shouldn't we collect hamper
>>> security (and we should aim to do the same), most security concerned folks
>>> don't just
>>> upgrade, and we can rely on them regarding release notes or announcements
>>> and we can make it very clear in our announcements too; and in our
>>> installation guides.
>>> 
>>> We should assume that those who deploy and upgrade Airflow - actually read
>>>> and take into account what is written in the release notes - especially
>>>> if
>>>> they have security guys breathing their necks, similarly as we have to
>>>> assume they follow CVE announcements about security issues fixed. If we
>>>> are very straightforward and out-going about the change, inform very
>>>> clearly how to opt-out, I don't see a big problem with opt-out.
>>> 
>>> 
>>> 
>>> To be clear, the collection of data, or at least the data we should
>>> gather here should help all the consumers without violating anything
>>> regulations. I will quote Maxime's quote in the use-case doc [1]
>>> 
>>> "*Another Form of Contributing*
>>> “I think people often ask ‘how do I contribute to open source?’, ‘I've
>>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the
>>> very simplest thing that you can do is just say, ‘my organization gets real
>>> value from this piece of software.’ There are a bunch of ways to let the
>>> people know about it – and now Scarf is there. If your organization is
>>> getting a lot of value from a piece of open source software, make sure the
>>> devs know about it.”"
>>> 
>>> 
>>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
>>> 
>>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kxe...@apache.org> wrote:
>>> 
>>>> Hi Jarek!
>>>> 
>>>> I understand the reasons for opt-out from a project view. I just suddenly
>>>> imagined the situation when an upgrade happens and here comes the data to
>>>> some third party service - that's a view from a user side of some big
>>>> company.
>>>> 
>>>> There could be good alternatives to handle this:
>>>> 1. Short opt-in period before opt-out. Test this feature with users who
>>>> trust and if it works great - make it public. I think it's wise to handle
>>>> edge cases and configure collected data more accurately.
>>>> 2. Explicitly somehow warn about this feature to make this feature not
>>>> get
>>>> unnoticed. Just to reduce possible frustration.
>>>> 
>>>> Just a personal thoughts for discussion (:
>>>> 
>>>> --
>>>> ,,,^..^,,,
>>>> 
>>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> it has to be:
>>>>> 
>>>>> 1. Opt-in by default to not trigger security guys about new unplanned
>>>>>> activity after regular upgrade.
>>>>>> 
>>>>> 
>>>>> That's a very good point about security triggering Alexander, but I am
>>>> not
>>>>> so sure it means that we "have to" do opt-in. There are other ways of
>>>>> communicating with the "deployment managers" who install and upgrade
>>>>> airflow - i.e. release notes. blogs, social media of ours, slack
>>>>> announcements etc. We have plenty of channels we can use to
>>>> communicate the
>>>>> change.
>>>>> 
>>>>> I think we have a very good blueprint to follow including at least 5
>>>> other
>>>>> ASF projects that also passed the review of the privacy@asf. And
>>>> while I
>>>>> understand (and concur) the urge for opt-in by default coming from
>>>> consumer
>>>>> market (where it makes perfect sense) Airflow is not a consumer
>>>>> software and is used in "corporate environment" which has a little
>>>>> different expectations and broad assumption that the company can make
>>>>> decisions on such telemetry on behalf of the employees using it.
>>>>> 
>>>>> We should assume that those who deploy and upgrade Airflow - actually
>>>> read
>>>>> and take into account what is written in the release notes -
>>>> especially if
>>>>> they have security guys breathing their necks, similarly as we have to
>>>>> assume they follow CVE announcements about security issues fixed. If we
>>>>> are very straightforward and out-going about the change, inform very
>>>>> clearly how to opt-out, I don't see a big problem with opt-out.
>>>>> 
>>>>> We should of course check with privacy@a.o (but I'v spend a good deal
>>>> of
>>>>> time reading the Superset  and other use case and explanation in
>>>> detail to
>>>>> make a better informed decision) - and it looks like they also went
>>>> opt-out
>>>>> way and got cleared by privacy@a.o.  And if we cannot reach
>>>> consensus, we
>>>>> should - as usual - make a voting decision on it (because yes, it is an
>>>>> important decision), but - after reading and understanding why others
>>>> also
>>>>> did it - for me personally, opt-out is a good path.
>>>>> 
>>>>> Also because it will rather increase the amount of data to gather, and
>>>> in
>>>>> our case - counter intuitively - it will be even better for privacy and
>>>>> corporate anonymity, because the more data we get, the more difficult
>>>> it
>>>>> will be to get any non-statistical/non-aggregated insight from it.
>>>> Imagine
>>>>> if only a few corporate users will enable it consciously - then we
>>>> will be
>>>>> able to draw much more conclusions if we find out who they are, than if
>>>>> everyone has it enabled by default.
>>>>> 
>>>>> That's my take on it - but again, it's up to us to vote, for me opt-in
>>>> is
>>>>> not "has to", and I am rather for opt-out.
>>>>> 
>>>>> J.
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> 
>>>>>>> I want to propose gathering telemetry for Airflow installations.
>>>> As the
>>>>>>> Airflow community, we have been relying heavily on the yearly
>>>> Airflow
>>>>>>> Survey and anecdotes to answer a few key questions about Airflow
>>>> usage.
>>>>>>> Questions like the following:
>>>>>>> 
>>>>>>> 
>>>>>>>   - Which versions of Airflow are people installing/using now
>>>> (i.e.
>>>>>>>   whether people have primarily made the jump from version X to
>>>>> version
>>>>>> Y)
>>>>>>>   - Which DB is used as the Metadata DB and which version e.g Pg
>>>> 14?
>>>>>>>   - What Python version is being used?
>>>>>>>   - Which Executor is being used?
>>>>>>>   - Approximately how many people out there in the world are
>>>>> installing
>>>>>>>   Airflow
>>>>>>> 
>>>>>>> 
>>>>>>> There is a solution that should help answer these questions: Scarf
>>>> [1].
>>>>>> The
>>>>>>> ASF already approves Scarf [2][3] and is already used by other ASF
>>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
>>>>> DevLake,
>>>>>>> Skywalking as it follows GDPR and other regulations.
>>>>>>> 
>>>>>>> Similar to Superset, we probably can use it as follows:
>>>>>>> 
>>>>>>> 
>>>>>>>   1. Install the `scarf js` npm package and bundle it in the
>>>>> Webserver.
>>>>>>>   When the package is downloaded & Airflow webserver is opened,
>>>>> metadata
>>>>>>> is
>>>>>>>   recorded to the Scarf dashboard.
>>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front of
>>>>> docker
>>>>>>>   containers. While it’s possible people go around this gateway,
>>>> we
>>>>> can
>>>>>>>   probably configure and encourage most traffic to go through
>>>> these
>>>>>>> gateways.
>>>>>>> 
>>>>>>> While Scarf does not store any personally identifying information
>>>> from
>>>>>> SDK
>>>>>>> telemetry data, it does send various bits of IP-derived
>>>> information as
>>>>>>> outlined here [7]. This data should be made as transparent as
>>>> possible
>>>>> by
>>>>>>> granting dashboard access to the Airflow PMC and any other relevant
>>>>> means
>>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
>>>> Newsletter
>>>>>>> etc).
>>>>>>> 
>>>>>>> The following case studies are worth reading:
>>>>>>> 
>>>>>>>   1. https://about.scarf.sh/post/scarf-case-study-apache-superset
>>>>> (From
>>>>>>>   Maxime)
>>>>>>>   2.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>>>>>>> 
>>>>>>> Similar to them, this could help in various ways that come with
>>>> using
>>>>>> data
>>>>>>> for decision-making. With clear guidelines on "how to opt-out"
>>>>>> [8][9][10] &
>>>>>>> "what data is being collected" on the Airflow website, this can be
>>>>>>> beneficial to the entire community as we would be making more
>>>> informed
>>>>>>> decisions.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Kaxil
>>>>>>> 
>>>>>>> 
>>>>>>> [1] https://about.scarf.sh/
>>>>>>> [2] https://privacy.apache.org/policies/privacy-policy-public.html
>>>>>>> [3] https://privacy.apache.org/faq/committers.html
>>>>>>> [4] https://github.com/apache/superset/issues/25639
>>>>>>> [5]
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>>>>>>> [6] https://about.scarf.sh/scarf-gateway
>>>>>>> [7] https://about.scarf.sh/privacy-policy
>>>>>>> [8]
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>>>>>>> [9]
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>>>>>>> [10]
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to