Thanks.

I think a configurable purger which can replace pages (like HBase
Compaction, as mentioned above) should suffice and the frequency of
compaction can be defined.

Do we do the full page replacement technique for replacing records
today in any scenario?

Regards,

Atri

On Tue, Dec 5, 2017 at 9:17 PM, lukas nalezenec <lu...@apache.org> wrote:
> Hi,
> I think that delete marker is good idea.
> I was in basic GDPR training and i think that it meets EU law requirements
>
> Lukas
>
> 2017-12-05 11:37 GMT+01:00 Atri Sharma <atri.j...@gmail.com>:
>
>> Agreed.
>>
>> I have come up with a patch to add metadata to the page header marking
>> the tuples deleted. The visibility checks will need to consult page
>> header before returning the read results back.
>>
>> The pruning still needs to be implemented.
>>
>> On Tue, Dec 5, 2017 at 3:20 AM, Eric Owhadi <eric.owh...@esgyn.com> wrote:
>> > May be the EU requirement provide a deadline for the delete. So one can
>> imagine to implement a "logical delete", and on a monthly basis (assuming
>> that is the EU deadline to be compliant), perform a physical delete by
>> reloading the data without the logical deletes? It is like HBase major
>> compaction concept?
>> > Eric
>> >
>> >
>> > -----Original Message-----
>> > From: Wes McKinney [mailto:wesmck...@gmail.com]
>> > Sent: Monday, December 4, 2017 3:38 PM
>> > To: dev@parquet.apache.org
>> > Subject: Re: Regarding PARQUET-1155
>> >
>> > hi Atri -- even if we could, I am not sure this would meet the
>> requirements of the EU law, since the "deleted" data could still be read by
>> an adversary even if a Parquet implementation like parquet-mr did not
>> permit it
>> >
>> > - Wes
>> >
>> > On Mon, Dec 4, 2017 at 11:55 AM, Atri Sharma <atri.j...@gmail.com>
>> wrote:
>> >> I see, thanks.
>> >>
>> >> Could we not introduce the concept of a delete marker, where we mark
>> >> the deleted records in the page header?
>> >>
>> >> On Mon, Dec 4, 2017 at 10:23 PM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >>> I don't think this is possible due to the encoding and compression
>> schemes.
>> >>>
>> >>> For example, suppose that you had the following data
>> >>>
>> >>> 1
>> >>> 1
>> >>> 1
>> >>> 4
>> >>> 4
>> >>> 4
>> >>> 4
>> >>>
>> >>> This would be dictionary-encoded and compressed to semantically look
>> >>> like
>> >>>
>> >>> dictionary: 1, 4
>> >>> data page: (3, 0) (4, 1)
>> >>>
>> >>> The encoded data page (using the hybrid bit-packing / RLE encoding
>> >>> scheme) would furthermore be compressed. Editing records in general
>> >>> would change the size of the compressed and encoded data stream, so
>> >>> you could not edit the page without rewriting the file.
>> >>>
>> >>> - Wes
>> >>>
>> >>> On Mon, Dec 4, 2017 at 11:46 AM, Atri Sharma <atri.j...@gmail.com>
>> wrote:
>> >>>> Hi Wes,
>> >>>>
>> >>>> Thanks for your response.
>> >>>>
>> >>>> My main use case is that I want to introduce updatability to Parquet
>> >>>> records without going the route of replacing the entire page.
>> >>>>
>> >>>> Is that something that has already been discussed please?
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Atri
>> >>>>
>> >>>> On Mon, Dec 4, 2017 at 10:10 PM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >>>>> hi Atri,
>> >>>>>
>> >>>>> From a prior discussion on the mailing list, it is not clear that
>> >>>>> this is a problem that concerns either Parquet format or the
>> >>>>> implementations in the Apache Parquet project. If data must be
>> >>>>> edited or deleted, then the point-of-truth Parquet files must be
>> >>>>> scanned and overwritten with the offending records deleted.
>> >>>>> Modifying files in place is not feasible due to the compression and
>> >>>>> encoding schemes (dictionary, run-length encoding) used in the
>> >>>>> Parquet format. Let me know if I am misunderstanding the use case.
>> >>>>>
>> >>>>> Thanks
>> >>>>> Wes
>> >>>>>
>> >>>>> On Mon, Dec 4, 2017 at 11:30 AM, Atri Sharma <atri.j...@gmail.com>
>> wrote:
>> >>>>>> Hi Folks,
>> >>>>>>
>> >>>>>> Any update?
>> >>>>>>
>> >>>>>> On Fri, Dec 1, 2017 at 9:23 AM, Atri Sharma <atri.j...@gmail.com>
>> wrote:
>> >>>>>>> https://issues.apache.org/jira/browse/PARQUET-1155
>> >>>>>>>
>> >>>>>>> Anybody working on it? Can I take it up?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Atri
>> >>>>>> l'apprenant
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Regards,
>> >>>>
>> >>>> Atri
>> >>>> l'apprenant
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >>
>> >> Atri
>> >> l'apprenant
>>
>>
>>
>> --
>> Regards,
>>
>> Atri
>> l'apprenant
>>



-- 
Regards,

Atri
l'apprenant

Reply via email to