Thanks, Mick! I think the change to be able to plug in a dynamically loaded implementation is a reasonable one. It would still be good to hear whether other people would use the location provider you're building, too.
On Fri, Oct 2, 2020 at 8:44 AM Mick Jermsurawong <[email protected]> wrote: > The PR here takes a stab at a more general solution: dynamically loaded > impl provided by user > https://github.com/apache/iceberg/pull/1531 > > > On Wed, Sep 30, 2020 at 10:54 AM Mick Jermsurawong < > [email protected]> wrote: > >> Hi thank you all for the discussion today! >> >> There are questions around whether this localization is *sufficient for >> most data localization requirements*. >> - The proposed solution here does not localize stats in the metadata, and >> in the PII columns some data will be exposed. It was suggested there are >> ways that we can turn off these stats for specific columns. >> - Whether centralized computation, as assumed in this proposal, is ever >> acceptable. If not, this proposal might not be of value to address data >> localization. (Internally, we believe it is sufficient for at least one use >> case we are working on) >> >> That brings us to *usefulness outside of data localization.* >> - One suggestion sees this data localization as a possible solution to >> lifecycle data management, partition value can suggest age of data to be >> written to different storage systems with cost and latency profiles. >> - Others express that lifecycle policy management could be done from S3 >> itself, or do a complete rewrite. >> >> There is helpful *feedback on implementations* >> - Whether we are leaking semantic meaning of "country"/"locality" into >> the location provider. >> - One suggestion is that this location provider can be customizable >> enough that we can leave these business logic here complete to users, >> instead of constraining it to simple string look-up as done in the proposed >> solution. >> >> I'm happy to take more input from folks. One line of useful discussion >> would be: if this is going to be off-the-box abstraction, >> - how customizable do we want it to be >> - what are use cases that this custom data location based on partitioning >> would be helpful--besides data localization and lifecycle management >> >> Also I'm also happy to discuss how folks are solving specific problems of >> data localization under different regulatory requirements. >> >> Best, >> Mick Jermsurawong >> >> >> On Mon, Sep 28, 2020 at 7:03 PM Mick Jermsurawong < >> [email protected]> wrote: >> >>> Hi Iceberg community, >>> >>> We are solving data localization following legal requirements to store >>> data in designated physical areas. We think that Iceberg can neatly solve >>> this problem with the existing interfaces. Here's the current proposal >>> <https://docs.google.com/document/d/1ZluOiRZlmsfNnQJLSqTiBQg7-XeSE-gvEOn2e0y6E54/edit#heading=h.cuifpdpzmfqz> >>> explaining >>> our motivation and approach. >>> >>> We would appreciate input for the followings: >>> - if there are already similar on-going features from the community >>> - any non-iceberg approaches that have been considered to solve data >>> localization >>> - how much would this feature in our private fork be compatible with >>> future directions >>> - general feedback on the proposed solution >>> >>> Thank you in advance for any feedback here! >>> >>> Best, >>> Mick Jermsurawong >>> >> -- Ryan Blue Software Engineer Netflix
