Another way is to ensure that all documents get updated on a regular
cadence whether there are changes in the underlying data or not. Or,
regenerating the index from scratch all the time. Of course these
approaches might be more costly for an index that has intrinsically low
update rates, but they do keep the index fresh without the need for any
special tracking.

On Tue, Nov 28, 2023, 8:45 PM Patrick Zhai <zhai7...@gmail.com> wrote:

> It's not that insane, it's about several weeks however the big segment can
> stay there for quite long if there's not enough update for a merge policy
> to pick it up
>
> On Tue, Nov 28, 2023, 17:14 Dongyu Xu <dongyu...@hotmail.com> wrote:
>
>> What is the expected grace time for the data-deletion request to take
>> place?
>>
>> I'm not expert about the policy but I think something like "I need my
>> data to be gone in next 2 second" is unreasonable.
>>
>> Tony X
>>
>> ------------------------------
>> *From:* Robert Muir <rcm...@gmail.com>
>> *Sent:* Tuesday, November 28, 2023 11:52 AM
>> *To:* dev@lucene.apache.org <dev@lucene.apache.org>
>> *Subject:* Re: GDPR compliance
>>
>> I don't think there's any problem with GDPR, and I don't think users
>> should be running unnecessary "optimize". GDRP just says data should
>> be erased without "undue" delay. waiting for a merge to nuke the
>> deleted docs isn't "undue", there is a good reason for it.
>>
>> On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai <zhai7...@gmail.com> wrote:
>> >
>> > Hi Folks,
>> > In LinkedIn we need to comply with GDPR for a large part of our data,
>> and an important part of it is that we need to be sure we have completely
>> deleted the data the user requested to delete within a certain period of
>> time.
>> > The way we have come up with so far is to:
>> > 1. Record the segment creation time somewhere (not decided yet, maybe
>> index commit userinfo, maybe some other place outside of lucene)
>> > 2. Create a new merge policy which delegate most operations to a normal
>> MP, like TieredMergePolicy, and then add extra single-segment (merge from 1
>> segment to 1 segment, basically only do deletion) merges if it finds any
>> segment is about to violate the GDPR time frame.
>> >
>> > So here's my question:
>> > 1. Is there a better/existing way to do this?
>> > 2. I would like to directly contribute to Lucene about such a merge
>> policy since I think GDPR is more or less a common thing. Would like to
>> know whether people feel like it's necessary or not?
>> > 3. It's also nice if we can store the segment creation time to the
>> index directly by IndexWriter (maybe write to SegmentInfo?), I can try to
>> do that but would like to ask whether there's any objections?
>> >
>> > Best
>> > Patrick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Reply via email to