To the valid point Robert makes above about the underlying data still on
the disk (old news):
https://news.sophos.com/en-us/2022/09/23/morgan-stanley-fined-millions-for-selling-off-devices-full-of-customer-pii/

On Wed, Nov 29, 2023 at 11:01 AM Michael Sokolov <msoko...@gmail.com> wrote:

> Another way is to ensure that all documents get updated on a regular
> cadence whether there are changes in the underlying data or not. Or,
> regenerating the index from scratch all the time. Of course these
> approaches might be more costly for an index that has intrinsically low
> update rates, but they do keep the index fresh without the need for any
> special tracking.
>
> On Tue, Nov 28, 2023, 8:45 PM Patrick Zhai <zhai7...@gmail.com> wrote:
>
>> It's not that insane, it's about several weeks however the big segment
>> can stay there for quite long if there's not enough update for a merge
>> policy to pick it up
>>
>> On Tue, Nov 28, 2023, 17:14 Dongyu Xu <dongyu...@hotmail.com> wrote:
>>
>>> What is the expected grace time for the data-deletion request to take
>>> place?
>>>
>>> I'm not expert about the policy but I think something like "I need my
>>> data to be gone in next 2 second" is unreasonable.
>>>
>>> Tony X
>>>
>>> ------------------------------
>>> *From:* Robert Muir <rcm...@gmail.com>
>>> *Sent:* Tuesday, November 28, 2023 11:52 AM
>>> *To:* dev@lucene.apache.org <dev@lucene.apache.org>
>>> *Subject:* Re: GDPR compliance
>>>
>>> I don't think there's any problem with GDPR, and I don't think users
>>> should be running unnecessary "optimize". GDRP just says data should
>>> be erased without "undue" delay. waiting for a merge to nuke the
>>> deleted docs isn't "undue", there is a good reason for it.
>>>
>>> On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai <zhai7...@gmail.com> wrote:
>>> >
>>> > Hi Folks,
>>> > In LinkedIn we need to comply with GDPR for a large part of our data,
>>> and an important part of it is that we need to be sure we have completely
>>> deleted the data the user requested to delete within a certain period of
>>> time.
>>> > The way we have come up with so far is to:
>>> > 1. Record the segment creation time somewhere (not decided yet, maybe
>>> index commit userinfo, maybe some other place outside of lucene)
>>> > 2. Create a new merge policy which delegate most operations to a
>>> normal MP, like TieredMergePolicy, and then add extra single-segment (merge
>>> from 1 segment to 1 segment, basically only do deletion) merges if it finds
>>> any segment is about to violate the GDPR time frame.
>>> >
>>> > So here's my question:
>>> > 1. Is there a better/existing way to do this?
>>> > 2. I would like to directly contribute to Lucene about such a merge
>>> policy since I think GDPR is more or less a common thing. Would like to
>>> know whether people feel like it's necessary or not?
>>> > 3. It's also nice if we can store the segment creation time to the
>>> index directly by IndexWriter (maybe write to SegmentInfo?), I can try to
>>> do that but would like to ask whether there's any objections?
>>> >
>>> > Best
>>> > Patrick
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

Reply via email to