Hi Folks, In LinkedIn we need to comply with GDPR for a large part of our data, and an important part of it is that we need to be sure we have completely deleted the data the user requested to delete within a certain period of time. The way we have come up with so far is to: 1. Record the segment creation time somewhere (not decided yet, maybe index commit userinfo, maybe some other place outside of lucene) 2. Create a new merge policy which delegate most operations to a normal MP, like TieredMergePolicy, and then add extra single-segment (merge from 1 segment to 1 segment, basically only do deletion) merges if it finds any segment is about to violate the GDPR time frame.
So here's my question: 1. Is there a better/existing way to do this? 2. I would like to directly contribute to Lucene about such a merge policy since I think GDPR is more or less a common thing. Would like to know whether people feel like it's necessary or not? 3. It's also nice if we can store the segment creation time to the index directly by IndexWriter (maybe write to SegmentInfo?), I can try to do that but would like to ask whether there's any objections? Best Patrick