Re: Suggested S3 FileIO/Getting Started

2020-11-15 Thread Saisai Shao
Thanks a lot Ryan for your explanation, greatly helpful. Best regards, Saisai Ryan Blue 于2020年11月14日周六 上午2:03写道: > Saisai, > > Iceberg's FileIO interface doesn't require guarantees as strict as a > Hadoop-compatible FileSystem. For S3, that allows us to avoid negative > caching that can cause

Re: Suggested S3 FileIO/Getting Started

2020-11-13 Thread Ryan Blue
Saisai, Iceberg's FileIO interface doesn't require guarantees as strict as a Hadoop-compatible FileSystem. For S3, that allows us to avoid negative caching that can cause problems when reading a table that has just been updated. (Specifically, S3A performs a HEAD request to check whether a path

Re: Suggested S3 FileIO/Getting Started

2020-11-12 Thread Saisai Shao
Hi all, Sorry to chime in, I also have a same concern about using Iceberg with Object storage. One of my concerns with S3FileIO is getting tied too much to a single > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful > so that S3FileIO and (a future) GCSFileIO could share

Re: Suggested S3 FileIO/Getting Started

2020-11-12 Thread Daniel Weeks
Hey John, about the concerns around cloud provider dependency, I feel like the FileIO interface is actually the right level of abstraction already. That interface basically requires "open for read" and "open for write", where the implementation will diverge across different platforms. I guess

Re: Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
Update: I think I'm wrong about the listing part. I think it will only do the HEAD request. Also it seems like the consistency issue is probably not something my team would encounter with our current jobs. On 2020/11/12 02:17:10, John Clara wrote: > (Not sure if this is actually replying or

Re: Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
(Not sure if this is actually replying or just starting a new thread) Hi Daniel, Thanks for the response! It's very helpful and answers a lot my questions. A couple follow ups: One of my concerns with S3FileIO is getting tied too much to a single cloud provider. I'm wondering if an

Re: Suggested S3 FileIO/Getting Started

2020-11-11 Thread Daniel Weeks
Hey John, I might be able to help answer some of your questions and provide some context around how you might want to go forward. So, one fundamental aspect of Iceberg is that it only relies on a few operations (as defined by the FileIO interface). This makes much of the functionality and

Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
Hello all, Thank you all for creating/continuing this great project! I am just starting to get comfortable with the fundamentals and I'm thinking that my team has been using Iceberg the wrong way at the FileIO level. I was wondering if people would be willing to share how they set up their