hudi-bot opened a new issue, #14653:
URL: https://github.com/apache/hudi/issues/14653

   Incremental Pull should also stop returning the record on historical  datset 
when we delete them from latest snapshot.
   
    
   
   Context from Mailing list email :
   
    
   
   Hello,
   
   I am Siva's colleague and I am working on the problem below as well.
   
   I would like to describe what we are trying to achieve with Hudi as well as 
our current way of working and our GDPR and "Right To Be Forgotten " compliance 
policies.
   
   Our requirements :
   - We wish to apply a strict interpretation of the RTBF.  In other words, 
when we remove a person's data, it should be throughout the historical data and 
not just the latest snapshot.
   - We wish to use Hudi to reduce our storage requirements using upserts and 
don't want to have duplicates between commits.
   - We wish to retain history for persons who have not requested to be 
forgotten and therefore we do not want to delete commit files from the history 
as some have proposed.
   
   We have tried a couple of solutions, but so far without success :
   - replay the data omitting the data of the persons who have requested to be 
forgotten.  We wanted to manipulate the commit times to rebuild the history.
   We found that we couldn't manipulate the commit times and retain the history.
   
   - replay the data omitting the data of the persons who have requested to be 
forgotten, but writing to a date-based partition folder using the 
"partitionpath" parameter.
   We found that commits using upserts between the partitionpath folders, do 
not ignore data that is unchanged between 2 commit dates as when using the 
default commit file system, so we will not save on our storage or speed up our  
processing using this technique.
   
   So basically we would like to find a way to apply a strict RTBF, GDPR, 
maintain history and time-travel (large history) and save storage space using 
Hudi.
   
   Can anyone see a way to achieve this?
   
   Kind Regards,
   David Rosalia
   
    
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1212
   - Type: Improvement
   - Affects version(s):
     - 0.9.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to