Hi,

Some recap on NiFi concepts:

   - Content Repository stores FF contents.
   - Data Provenance events -used to check lineage of history of FFs- only
   stores pointers to FFs (not contents).
   - so one can have data deleted and still access lineage/data provenance
   history.

Heres a lof of in-depth on the subject, but above 3 points are the
summary of all:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html


*DATA - persistent data only exists in 2 scenarios:*

   - while your flow file running.
   - archived on content repository for 12h (to allow access contents when
   using inspect data provenance/lineage).
   
https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418


*PROVENANCE EVENTS (LINEAGE) OF DATA:*

   - contains only provenance attributes and FF uuid etcbut NO CONTENTS,
   available for 24h unless increasing/changed on config files.
   -
   
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties



So as you see both context by default expire daily. fast enough that dont
think GDPR is any problem or any action needed.
Now one can always boosts retention of just data provenance events for
months, 1 year or whatever suits. But data is long gone anyway.

Best Regards,
*Emanuel Oliveira*



On Thu, Jan 30, 2020 at 2:26 PM u...@moosheimer.com <u...@moosheimer.com>
wrote:

> Hi,
>
> > GDPR doesnt need milisecond realtime deletion right ?)
> right.
>
> > since inbound FFs have
> >    normally hundreds, thousands of records that will need to split,
> aggregate,
> >    in complex flow file, implementing a clean
> It depends on your application. Not everyone uses NiFi for IoT and
> therefore a single record may be included.
>
> > In my opinion your answer to business/management gate keepers is that
> data
> > will be stored on data provenance for 24h (default) which can be
> > configured, and that
>
> This is not necessarily the point of the Data Lineage, that the
> information is deleted after 24 hours (or whatever is configured).
> If Data Lineage is needed (revision, legal requirements etc.), then
> deleting the data after a defined time is not an option.
>
> This is the reason why Atlas supports it.
>
> Best Regards,
> Uwe
>
> Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
> > Hi, dont think makes sense an api for atomic records:
> >
> >    1. one configure retention od data provenance (default 24h is "good
> >    enough" GDPR doesnt need milisecond realtime deletion right ?)
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >    2. even if there would be one api to delete FF's with an attribute =
> >    <some id>, that would normally be useless as well, since inbound FFs
> have
> >    normally hundreds, thousands of records that will need to split,
> aggregate,
> >    in complex flow file, implementing a clean up an nano atomic level
> would be
> >    to hard and extra effort not needed, since your target single record
> would
> >    surely be part of multiple FF UUIDs, some only holding your record,
> but mot
> >    surefly will have 100s, 100s of other records including your record
> >    somewhere on the middle.
> >
> >
> > In my opinion your answer to business/management gate keepers is that
> data
> > will be stored on data provenance for 24h (default) which can be
> > configured, and that
> >
> >
> > Best Regards,
> > *Emanuel Oliveira*
> >
> >
> >
> > On Thu, Jan 30, 2020 at 1:54 PM u...@moosheimer.com <u...@moosheimer.com>
> > wrote:
> >
> >> Dear NiFi developer team,
> >>
> >> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
> >> environment of NiFi, so there is often no need to use Atlas.
> >>
> >> When using NiFi with customer data a problem arises.
> >> The problem is the GDPR requirement that a user has the right to be
> >> forgotten. Unfortunately, I can't find any API call or information on
> >> how to delete individual user data from the NiFi Provenance Repository
> >> based on a user-defined attribute and its defined characteristics.
> >>
> >> A delete request like "delete all data and dependencies where the
> >> attribute XYZ has the value 123" is currently not possible to my
> knowledge.
> >>
> >> My questions are:
> >> Is this actually possible and how? And if not, is it planned?
> >>
> >> Thanks
> >> Uwe
> >>
>
>

Reply via email to