Hi Folks,
In LinkedIn we need to comply with GDPR for a large part of our data, and
an important part of it is that we need to be sure we have completely
deleted the data the user requested to delete within a certain period of
time.
The way we have come up with so far is to:
1. Record the segment creation time somewhere (not decided yet, maybe index
commit userinfo, maybe some other place outside of lucene)
2. Create a new merge policy which delegate most operations to a normal MP,
like TieredMergePolicy, and then add extra single-segment (merge from 1
segment to 1 segment, basically only do deletion) merges if it finds any
segment is about to violate the GDPR time frame.

So here's my question:
1. Is there a better/existing way to do this?
2. I would like to directly contribute to Lucene about such a merge policy
since I think GDPR is more or less a common thing. Would like to know
whether people feel like it's necessary or not?
3. It's also nice if we can store the segment creation time to the index
directly by IndexWriter (maybe write to SegmentInfo?), I can try to do that
but would like to ask whether there's any objections?

Best
Patrick

Reply via email to