+1 looks like a good tool which could be super helpful. * We should have some transparency into the data that is collected or sent * We should have an option to optionally opt-out
Thanks & Regards, Amogh Desai On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <[email protected]> wrote: > +1 to this. It would be really useful. As long as we can opt out, I think > we’re good. > > Best, > Wei > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <[email protected]> wrote: > > > > Grammar Correction: > > > > We should assume that those who deploy and upgrade Airflow - actually > read > >> and take into account what is written in the release notes - especially > if > >> they have security guys breathing their necks, similarly as we have to > >> assume they follow CVE announcements about security issues fixed. If we > >> are very straightforward and out-going about the change, inform very > >> clearly how to opt-out, I don't see a big problem with opt-out. > > > > > > I couldn't agree more; even though we shouldn't collect any data that > > hamper security (and we should aim to do the same), most security > concerned > > folks don't just upgrade, and we can rely on them regarding release notes > > or announcements and we can make it very clear in our announcements too; > > and in our installation guides. > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <[email protected]> wrote: > > > >> Grammar crrection: > >> > >> > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <[email protected]> wrote: > >> > >>> Have this at the end of the email too: but if folks don't read until > the > >>> end and quoting Maxime from the use-case blog[1]: > >>> > >>> "I think people often ask ‘how do I contribute to open source?’, ‘I've > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, > the > >>> very simplest thing that you can do is just say, ‘my organization gets > real > >>> value from this piece of software.’ There are a bunch of ways to let > the > >>> people know about it – and now Scarf is there. If your organization is > >>> getting a lot of value from a piece of open source software, make sure > the > >>> devs know about it." > >>> > >>> What kind of edge cases are you thinking about? I don't think it makes > >>> sense to have "opt-in" at all. As the goal is to collect data for most > >>> Airflow installations except for those that don't want to give data, > then > >>> "opt-out" is the only way to maximize it. As long as we don't collect > any > >>> PII data, this is in-compliance as well. > >>> > >>> Imagine someone learning Airflow, if they have to opt-in via a config, > >>> they wouldn't even know or care about it, hence us losing most of the > data. > >>> I understand why some orgs & individuals may want to opt-out. > >>> > >>> Scarf Provides tracking pixels (essentially an HTML image tag) that you > >>> can place in your website or product to track visitors to that URL. If > >>> there were any concerns about Privacy, ASF wouldn't have approved it > at all. > >>> > >>> A few key details to note about the pixel: > >>> > >>> > >>> - No PII is tracked… Scarf does not capture/retain IP information… > >>> this information is discarded by the platform upon > processing/aggregating > >>> - Scarf pixels respect the Do Not Track (DNT) settings of browsers - > >>> these users will not be tracked whatsoever. > >>> > >>> > >>> All the ASF projects I had listed (whether they use Scarf gateway or > >>> Scarf pixel in product) are using opt-out. > >>> > >>> 1. Short opt-in period before opt-out. Test this feature with users who > >>>> trust and if it works great - make it public. I think it's wise to > handle > >>>> edge cases and configure collected data more accurately. > >>> > >>> > >>> > >>> It would be a pixel in the webserver, should affect nothing at all even > >>> in an air-gapped environment. > >>> > >>>> 2. It should not affect anything if access to the internet is > restricted > >>>> which is default for many companies. > >>> > >>> > >>> > >>> 100% agreed on the below: > >>> > >>>> I think we have a very good blueprint to follow including at least 5 > >>>> other > >>>> ASF projects that also passed the review of the privacy@asf. And > while I > >>>> understand (and concur) the urge for opt-in by default coming from > >>>> consumer > >>>> market (where it makes perfect sense) Airflow is not a consumer > >>>> software and is used in "corporate environment" which has a little > >>>> different expectations and broad assumption that the company can make > >>>> decisions on such telemetry on behalf of the employees using it. > >>> > >>> > >>> Couldn't agree more; even though there shouldn't we collect hamper > >>> security (and we should aim to do the same), most security concerned > folks > >>> don't just > >>> upgrade, and we can rely on them regarding release notes or > announcements > >>> and we can make it very clear in our announcements too; and in our > >>> installation guides. > >>> > >>> We should assume that those who deploy and upgrade Airflow - actually > read > >>>> and take into account what is written in the release notes - > especially > >>>> if > >>>> they have security guys breathing their necks, similarly as we have to > >>>> assume they follow CVE announcements about security issues fixed. If > we > >>>> are very straightforward and out-going about the change, inform very > >>>> clearly how to opt-out, I don't see a big problem with opt-out. > >>> > >>> > >>> > >>> To be clear, the collection of data, or at least the data we should > >>> gather here should help all the consumers without violating anything > >>> regulations. I will quote Maxime's quote in the use-case doc [1] > >>> > >>> "*Another Form of Contributing* > >>> “I think people often ask ‘how do I contribute to open source?’, ‘I've > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, > the > >>> very simplest thing that you can do is just say, ‘my organization gets > real > >>> value from this piece of software.’ There are a bunch of ways to let > the > >>> people know about it – and now Scarf is there. If your organization is > >>> getting a lot of value from a piece of open source software, make sure > the > >>> devs know about it.”" > >>> > >>> > >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset > >>> > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <[email protected]> > wrote: > >>> > >>>> Hi Jarek! > >>>> > >>>> I understand the reasons for opt-out from a project view. I just > suddenly > >>>> imagined the situation when an upgrade happens and here comes the > data to > >>>> some third party service - that's a view from a user side of some big > >>>> company. > >>>> > >>>> There could be good alternatives to handle this: > >>>> 1. Short opt-in period before opt-out. Test this feature with users > who > >>>> trust and if it works great - make it public. I think it's wise to > handle > >>>> edge cases and configure collected data more accurately. > >>>> 2. Explicitly somehow warn about this feature to make this feature not > >>>> get > >>>> unnoticed. Just to reduce possible frustration. > >>>> > >>>> Just a personal thoughts for discussion (: > >>>> > >>>> -- > >>>> ,,,^..^,,, > >>>> > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <[email protected]> > wrote: > >>>> > >>>>> Hello everyone, > >>>>> > >>>>> it has to be: > >>>>> > >>>>> 1. Opt-in by default to not trigger security guys about new unplanned > >>>>>> activity after regular upgrade. > >>>>>> > >>>>> > >>>>> That's a very good point about security triggering Alexander, but I > am > >>>> not > >>>>> so sure it means that we "have to" do opt-in. There are other ways of > >>>>> communicating with the "deployment managers" who install and upgrade > >>>>> airflow - i.e. release notes. blogs, social media of ours, slack > >>>>> announcements etc. We have plenty of channels we can use to > >>>> communicate the > >>>>> change. > >>>>> > >>>>> I think we have a very good blueprint to follow including at least 5 > >>>> other > >>>>> ASF projects that also passed the review of the privacy@asf. And > >>>> while I > >>>>> understand (and concur) the urge for opt-in by default coming from > >>>> consumer > >>>>> market (where it makes perfect sense) Airflow is not a consumer > >>>>> software and is used in "corporate environment" which has a little > >>>>> different expectations and broad assumption that the company can make > >>>>> decisions on such telemetry on behalf of the employees using it. > >>>>> > >>>>> We should assume that those who deploy and upgrade Airflow - actually > >>>> read > >>>>> and take into account what is written in the release notes - > >>>> especially if > >>>>> they have security guys breathing their necks, similarly as we have > to > >>>>> assume they follow CVE announcements about security issues fixed. If > we > >>>>> are very straightforward and out-going about the change, inform very > >>>>> clearly how to opt-out, I don't see a big problem with opt-out. > >>>>> > >>>>> We should of course check with [email protected] (but I'v spend a good > deal > >>>> of > >>>>> time reading the Superset and other use case and explanation in > >>>> detail to > >>>>> make a better informed decision) - and it looks like they also went > >>>> opt-out > >>>>> way and got cleared by [email protected]. And if we cannot reach > >>>> consensus, we > >>>>> should - as usual - make a voting decision on it (because yes, it is > an > >>>>> important decision), but - after reading and understanding why others > >>>> also > >>>>> did it - for me personally, opt-out is a good path. > >>>>> > >>>>> Also because it will rather increase the amount of data to gather, > and > >>>> in > >>>>> our case - counter intuitively - it will be even better for privacy > and > >>>>> corporate anonymity, because the more data we get, the more difficult > >>>> it > >>>>> will be to get any non-statistical/non-aggregated insight from it. > >>>> Imagine > >>>>> if only a few corporate users will enable it consciously - then we > >>>> will be > >>>>> able to draw much more conclusions if we find out who they are, than > if > >>>>> everyone has it enabled by default. > >>>>> > >>>>> That's my take on it - but again, it's up to us to vote, for me > opt-in > >>>> is > >>>>> not "has to", and I am rather for opt-out. > >>>>> > >>>>> J. > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> > >>>>>>> I want to propose gathering telemetry for Airflow installations. > >>>> As the > >>>>>>> Airflow community, we have been relying heavily on the yearly > >>>> Airflow > >>>>>>> Survey and anecdotes to answer a few key questions about Airflow > >>>> usage. > >>>>>>> Questions like the following: > >>>>>>> > >>>>>>> > >>>>>>> - Which versions of Airflow are people installing/using now > >>>> (i.e. > >>>>>>> whether people have primarily made the jump from version X to > >>>>> version > >>>>>> Y) > >>>>>>> - Which DB is used as the Metadata DB and which version e.g Pg > >>>> 14? > >>>>>>> - What Python version is being used? > >>>>>>> - Which Executor is being used? > >>>>>>> - Approximately how many people out there in the world are > >>>>> installing > >>>>>>> Airflow > >>>>>>> > >>>>>>> > >>>>>>> There is a solution that should help answer these questions: Scarf > >>>> [1]. > >>>>>> The > >>>>>>> ASF already approves Scarf [2][3] and is already used by other ASF > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, > >>>>> DevLake, > >>>>>>> Skywalking as it follows GDPR and other regulations. > >>>>>>> > >>>>>>> Similar to Superset, we probably can use it as follows: > >>>>>>> > >>>>>>> > >>>>>>> 1. Install the `scarf js` npm package and bundle it in the > >>>>> Webserver. > >>>>>>> When the package is downloaded & Airflow webserver is opened, > >>>>> metadata > >>>>>>> is > >>>>>>> recorded to the Scarf dashboard. > >>>>>>> 2. Utilize the Scarf Gateway [6], which we can use in front of > >>>>> docker > >>>>>>> containers. While it’s possible people go around this gateway, > >>>> we > >>>>> can > >>>>>>> probably configure and encourage most traffic to go through > >>>> these > >>>>>>> gateways. > >>>>>>> > >>>>>>> While Scarf does not store any personally identifying information > >>>> from > >>>>>> SDK > >>>>>>> telemetry data, it does send various bits of IP-derived > >>>> information as > >>>>>>> outlined here [7]. This data should be made as transparent as > >>>> possible > >>>>> by > >>>>>>> granting dashboard access to the Airflow PMC and any other relevant > >>>>> means > >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack, > >>>> Newsletter > >>>>>>> etc). > >>>>>>> > >>>>>>> The following case studies are worth reading: > >>>>>>> > >>>>>>> 1. https://about.scarf.sh/post/scarf-case-study-apache-superset > >>>>> (From > >>>>>>> Maxime) > >>>>>>> 2. > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding > >>>>>>> > >>>>>>> Similar to them, this could help in various ways that come with > >>>> using > >>>>>> data > >>>>>>> for decision-making. With clear guidelines on "how to opt-out" > >>>>>> [8][9][10] & > >>>>>>> "what data is being collected" on the Airflow website, this can be > >>>>>>> beneficial to the entire community as we would be making more > >>>> informed > >>>>>>> decisions. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Kaxil > >>>>>>> > >>>>>>> > >>>>>>> [1] https://about.scarf.sh/ > >>>>>>> [2] https://privacy.apache.org/policies/privacy-policy-public.html > >>>>>>> [3] https://privacy.apache.org/faq/committers.html > >>>>>>> [4] https://github.com/apache/superset/issues/25639 > >>>>>>> [5] > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code > >>>>>>> [6] https://about.scarf.sh/scarf-gateway > >>>>>>> [7] https://about.scarf.sh/privacy-policy > >>>>>>> [8] > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data > >>>>>>> [9] > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose > >>>>>>> [10] > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
