Re: Proposal: Localized storage based on partition

Ryan Blue Fri, 02 Oct 2020 10:29:24 -0700

Thanks, Mick! I think the change to be able to plug in a dynamically loaded
implementation is a reasonable one. It would still be good to hear whether
other people would use the location provider you're building, too.


On Fri, Oct 2, 2020 at 8:44 AM Mick Jermsurawong
<[email protected]> wrote:

> The PR here takes a stab at a more general solution: dynamically loaded
> impl provided by user
> https://github.com/apache/iceberg/pull/1531
>
>
> On Wed, Sep 30, 2020 at 10:54 AM Mick Jermsurawong <
> [email protected]> wrote:
>
>> Hi thank you all for the discussion today!
>>
>> There are questions around whether this localization is *sufficient for
>> most data localization requirements*.
>> - The proposed solution here does not localize stats in the metadata, and
>> in the PII columns some data will be exposed. It was suggested there are
>> ways that we can turn off these stats for specific columns.
>> - Whether centralized computation, as assumed in this proposal, is ever
>> acceptable. If not, this proposal might not be of value to address data
>> localization. (Internally, we believe it is sufficient for at least one use
>> case we are working on)
>>
>> That brings us to *usefulness outside of data localization.*
>> - One suggestion sees this data localization as a possible solution to
>> lifecycle data management, partition value can suggest age of data to be
>> written to different storage systems with cost and latency profiles.
>>   - Others express that lifecycle policy management could be done from S3
>> itself, or do a complete rewrite.
>>
>> There is helpful *feedback on implementations*
>> - Whether we are leaking semantic meaning of "country"/"locality" into
>> the location provider.
>> - One suggestion is that this location provider can be customizable
>> enough that we can leave these business logic here complete to users,
>> instead of constraining it to simple string look-up as done in the proposed
>> solution.
>>
>> I'm happy to take more input from folks. One line of useful discussion
>> would be: if this is going to be off-the-box abstraction,
>> - how customizable do we want it to be
>> - what are use cases that this custom data location based on partitioning
>> would be helpful--besides data localization and lifecycle management
>>
>> Also I'm also happy to discuss how folks are solving specific problems of
>> data localization under different regulatory requirements.
>>
>> Best,
>> Mick Jermsurawong
>>
>>
>> On Mon, Sep 28, 2020 at 7:03 PM Mick Jermsurawong <
>> [email protected]> wrote:
>>
>>> Hi Iceberg community,
>>>
>>> We are solving data localization following legal requirements to store
>>> data in designated physical areas. We think that Iceberg can neatly solve
>>> this problem with the existing interfaces. Here's the current proposal
>>> <https://docs.google.com/document/d/1ZluOiRZlmsfNnQJLSqTiBQg7-XeSE-gvEOn2e0y6E54/edit#heading=h.cuifpdpzmfqz>
>>>  explaining
>>> our motivation and approach.
>>>
>>> We would appreciate input for the followings:
>>> - if there are already similar on-going features from the community
>>> - any non-iceberg approaches that have been considered to solve data
>>> localization
>>> - how much would this feature in our private fork be compatible with
>>> future directions
>>> - general feedback on the proposed solution
>>>
>>> Thank you in advance for any feedback here!
>>>
>>> Best,
>>> Mick Jermsurawong
>>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Proposal: Localized storage based on partition

Reply via email to