This is a whole area of research that is really cool. Something I read a while ago and think about sometimes: Explaining output in Modern Data Analytics (writeup here https://blog.acolyer.org/2017/02/01/explaining-outputs-in-modern-data-analytics/ )
Kenn On Wed, Jul 7, 2021 at 8:18 AM Reuven Lax <[email protected]> wrote: > One interesting area to explore is lineage - can every output record from > a pipeline be tracked back to its source input record. This gets > interesting with aggregations, where multiple input records combine to > create a single output record. > > On Wed, Jul 7, 2021 at 6:11 AM Guillermo RodrÃguez Cano <[email protected]> > wrote: > >> Hello! >> >> I am wondering if there is anyone interested in exploring the topic of >> privacy (and potentially security) in the Apache Beam unified programming >> model. >> >> I have been a user of Apache Beam mostly via Tensorflow Transform but >> also directly and followed its evolution and development early on. >> However, given my research background, I have always wondered about the >> topics of privacy and security when processing large amounts of data with, >> for example, Apache Beam. >> >> There is some work on the topic of differential privacy and how to >> achieve that practically. >> But I would like to explore and go beyond as I think the problem is much >> broader and requires a wider analysis to have it addressed in different >> angles or directions. >> >> Is there anyone in this list interested to discuss the topic and explore >> ideas? I would be happy to coordinate some special interest group if that >> makes it easier. >> Or maybe you know someone who would be interested or point me to where to >> head :) >> >> /Guillermo >> >
