It's not that insane, it's about several weeks however the big segment can stay there for quite long if there's not enough update for a merge policy to pick it up
On Tue, Nov 28, 2023, 17:14 Dongyu Xu <[email protected]> wrote: > What is the expected grace time for the data-deletion request to take > place? > > I'm not expert about the policy but I think something like "I need my data > to be gone in next 2 second" is unreasonable. > > Tony X > > ------------------------------ > *From:* Robert Muir <[email protected]> > *Sent:* Tuesday, November 28, 2023 11:52 AM > *To:* [email protected] <[email protected]> > *Subject:* Re: GDPR compliance > > I don't think there's any problem with GDPR, and I don't think users > should be running unnecessary "optimize". GDRP just says data should > be erased without "undue" delay. waiting for a merge to nuke the > deleted docs isn't "undue", there is a good reason for it. > > On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai <[email protected]> wrote: > > > > Hi Folks, > > In LinkedIn we need to comply with GDPR for a large part of our data, > and an important part of it is that we need to be sure we have completely > deleted the data the user requested to delete within a certain period of > time. > > The way we have come up with so far is to: > > 1. Record the segment creation time somewhere (not decided yet, maybe > index commit userinfo, maybe some other place outside of lucene) > > 2. Create a new merge policy which delegate most operations to a normal > MP, like TieredMergePolicy, and then add extra single-segment (merge from 1 > segment to 1 segment, basically only do deletion) merges if it finds any > segment is about to violate the GDPR time frame. > > > > So here's my question: > > 1. Is there a better/existing way to do this? > > 2. I would like to directly contribute to Lucene about such a merge > policy since I think GDPR is more or less a common thing. Would like to > know whether people feel like it's necessary or not? > > 3. It's also nice if we can store the segment creation time to the index > directly by IndexWriter (maybe write to SegmentInfo?), I can try to do that > but would like to ask whether there's any objections? > > > > Best > > Patrick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
