Re: Suggested S3 FileIO/Getting Started

Saisai Shao Sun, 15 Nov 2020 23:26:03 -0800

Thanks a lot Ryan for your explanation, greatly helpful.

Best regards,
Saisai


Ryan Blue <rb...@netflix.com.invalid> 于2020年11月14日周六 上午2:03写道：

> Saisai,
>
> Iceberg's FileIO interface doesn't require guarantees as strict as a
> Hadoop-compatible FileSystem. For S3, that allows us to avoid negative
> caching that can cause problems when reading a table that has just been
> updated. (Specifically, S3A performs a HEAD request to check whether a path
> is a directory even for overwrites.)
>
> It's up to users whether they want to use S3FileIO or S3A with
> HadoopFileIO, or even HadoopFileIO with a custom FileSystem. We are working
> to make it easy to switch between them, but there are notable problems like
> ORC not using the FileIO interface yet. I would recommend using S3FileIO to
> avoid negative caching, but S3A with S3Guard may be another way to avoid
> the problem. I think the default should be to use S3FileIO if it is in the
> classpath, and fall back to a Hadoop FileSystem if it isn't.
>
> For cloud providers, I think the question is whether there is a need for
> the relaxed guarantees. If there is no need, then using HadoopFileIO and a
> FileSystem works great and is probably the easiest way to maintain an
> implementation for the object store.
>
> On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Sorry to chime in, I also have a same concern about using Iceberg with
>> Object storage.
>>
>> One of my concerns with S3FileIO is getting tied too much to a single
>>> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
>>> so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
>>> looked deep enough into the S3FileIO to know how much logic is not s3
>>> specific. Maybe the FileIO interface is enough.
>>>
>>
>> Now we have a S3 specific FileIO implementation, is it recommended to use
>> this one instead of s3a like HCFS implementation? Also each public cloud
>> provider has its own HCFS implementation for its own object storage. Are we
>> going to suggest to create a specific FileIO implementation or use the
>> existing HCFS implementation?
>>
>> Thanks
>> Saisai
>>
>>
>> Daniel Weeks <dwe...@netflix.com.invalid> 于2020年11月13日周五 上午1:09写道：
>>
>>> Hey John, about the concerns around cloud provider dependency, I feel
>>> like the FileIO interface is actually the right level of abstraction
>>> already.
>>>
>>> That interface basically requires "open for read" and "open for write",
>>> where the implementation will diverge across different platforms.
>>>
>>> I guess you could think of it as S3FileIO is to FileIO what
>>> S3AFileSystem is to FileSystem (in Hadoop).  You can have many different
>>> implementations that coexist.
>>>
>>> In fact, recent changes to the Catalog allow for very flexible
>>> management of FIleIO and you could even have files within a table split
>>> across multiple cloud vendors.
>>>
>>> As to the consistency questions, the list operation can be inconsistent
>>> (e.g. if a new file is created and the implementation relies on list then
>>> read, it may not see newly created objects.  Iceberg does not list, so that
>>> should not be an issue).
>>>
>>> The stated read-after-write consistency is limited and does not include:
>>>  - Read after overwrite
>>>  - Read after delete
>>>  - Read after negative cache (e.g. a GET or HEAD that occurred before
>>> the object was created).
>>>
>>> Some of those inconsistencies have caused problems in certain cases when
>>> it comes to committing data (the negative cache being the main culprit).
>>>
>>> -Dan
>>>
>>>
>>> On Wed, Nov 11, 2020 at 6:49 PM John Clara <john.anthony.cl...@gmail.com>
>>> wrote:
>>>
>>>> Update: I think I'm wrong about the listing part. I think it will only
>>>> do the HEAD request. Also it seems like the consistency issue is
>>>> probably not something my team would encounter with our current jobs.
>>>>
>>>> On 2020/11/12 02:17:10, John Clara <j...@gmail.com> wrote:
>>>>  > (Not sure if this is actually replying or just starting a new
>>>> thread)>
>>>>  >
>>>>  > Hi Daniel,>
>>>>  >
>>>>  > Thanks for the response! It's very helpful and answers a lot my
>>>> questions.>
>>>>  >
>>>>  > A couple follow ups:>
>>>>  >
>>>>  > One of my concerns with S3FileIO is getting tied too much to a
>>>> single >
>>>>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be
>>>> helpful >
>>>>  > so that S3FileIO and (a future) GCSFileIO could share logic? I
>>>> haven't >
>>>>  > looked deep enough into the S3FileIO to know how much logic is not
>>>> s3 >
>>>>  > specific. Maybe the FileIO interface is enough.>
>>>>  >
>>>>  > About consistency (no need to respond here):>
>>>>  > I'm seeing that during "getFileStatus" my version of s3a does some
>>>> list >
>>>>  > requests (but I'm not sure if that could fail from consistency
>>>> issues).>
>>>>  > I'm also confused about the read-after-(initial) write part:>
>>>>  > "Amazon S3 provides read-after-write consistency for PUTS of new
>>>> objects >
>>>>  > in your S3 bucket in all Regions with one caveat. The caveat is that
>>>> if >
>>>>  > you make a HEAD or GET request to a key name before the object is >
>>>>  > created, then create the object shortly after that, a subsequent GET
>>>> >
>>>>  > might not return the object due to eventual consistency. - >
>>>>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html";>
>>>>  >
>>>>  > When my version of s3a does a create, it first does a
>>>> getMetadataRequest >
>>>>  > (HEAD) to check if the object exists before creating the object. I
>>>> think >
>>>>  > this is talked about in this issue: >
>>>>  > https://github.com/apache/iceberg/issues/1398 and talked about in
>>>> the >
>>>>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll
>>>> follow
>>>> up >
>>>>  > in that issue for more info.>
>>>>  >
>>>>  > John>
>>>>  >
>>>>  >
>>>>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
>>>> wrote:>
>>>>  > > Hey John, I might be able to help answer some of your questions
>>>> and >
>>>>  > provide>>
>>>>  > > some context around how you might want to go forward.>>
>>>>  > >>
>>>>  > > So, one fundamental aspect of Iceberg is that it only relies on a
>>>> few>>
>>>>  > > operations (as defined by the FileIO interface). This makes much
>>>> of
>>>> the>>
>>>>  > > functionality and complexity of full file system implementations>>
>>>>  > > unnecessary. You should not need features like S3Guard or
>>>> additional S3>>
>>>>  > > operations these implementations rely on in order to achieve file >
>>>>  > system>>
>>>>  > > contract behavior. Consistency issues should also not be a problem
>>>> >
>>>>  > since>>
>>>>  > > Iceberg does not overwrite or list and read-after-(initial)write
>>>> is
>>>> a>>
>>>>  > > guarantee provided by S3.>>
>>>>  > >>
>>>>  > > At Netflix, we use a custom FileSystem implementation (somewhat
>>>> like >
>>>>  > S3A),>>
>>>>  > > but with much of the contract behavior that drives additional >
>>>>  > operations>>
>>>>  > > against S3 disabled. However, we are transitioning to a more
>>>> native>>
>>>>  > > implementation of S3FileIO, which you'll see as part of the
>>>> ongoing >
>>>>  > work in>>
>>>>  > > Iceberg.>>
>>>>  > >>
>>>>  > > Per your specific questions:>>
>>>>  > >>
>>>>  > > 1) The S3FileIO implementation is very new, though internally we
>>>> have>>
>>>>  > > something very similar. There are features missing that we are >
>>>>  > working to>>
>>>>  > > add (e.g. progressive multipart upload for large files is likely
>>>> the >
>>>>  > most>>
>>>>  > > important).>>
>>>>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation,
>>>> >
>>>>  > though>>
>>>>  > > you may still see similar behavior with additional calls being
>>>> made
>>>> (I>>
>>>>  > > don't know if these can be disabled).>>
>>>>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is
>>>> likely >
>>>>  > not as>>
>>>>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
>>>> api, >
>>>>  > it>>
>>>>  > > would likely work for what HadoopFileIO exercises (as would the>>
>>>>  > > EMRFileSystem).>>
>>>>  > > 4) I would probably discourage you from writing your own file
>>>> system >
>>>>  > as the>>
>>>>  > > S3FileIO will likely be a more optimized implementation for what >
>>>>  > Iceberg>>
>>>>  > > needs.>>
>>>>  > >>
>>>>  > > If you want to contribute or have time to help contribute to >
>>>>  > S3FileIO, that>>
>>>>  > > is the path I would recommend. As for configuration, I would say a
>>>> >
>>>>  > lot of>>
>>>>  > > it comes down to how to configure the AWS S3 Client that you
>>>> provide >
>>>>  > to the>>
>>>>  > > S3FileIO implementation, but a lot of the defaults are reasonable
>>>> (you>>
>>>>  > > might want to tweak a few like max connections and maybe the retry
>>>> >
>>>>  > policy).>>
>>>>  > >>
>>>>  > > The recently committed work to dynamically load your FileIO should
>>>> >
>>>>  > make it>>
>>>>  > > relatively easy to test out and we'd love to have extra eyes and >
>>>>  > feedback>>
>>>>  > > on it.>>
>>>>  > >>
>>>>  > > Let me know if that helps,>>
>>>>  > > -Dan>>
>>>>  > >>
>>>>  > >>
>>>>  > >>
>>>>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>>>>  > > wrote:>>
>>>>  > >>
>>>>  > > > Hello all,>>
>>>>  > > >>>
>>>>  > > > Thank you all for creating/continuing this great project! I am
>>>> just>>
>>>>  > > > starting to get comfortable with the fundamentals and I'm
>>>> thinking >
>>>>  > that my>>
>>>>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>>>>  > > >>>
>>>>  > > > I was wondering if people would be willing to share how they set
>>>> up >
>>>>  > their>>
>>>>  > > > FileIO/FileSystem with S3 and any customizations they had to
>>>> add.>>
>>>>  > > >>>
>>>>  > > > (Preferably from smaller teams. My team is small and cannot>>
>>>>  > > > realistically customize everything. If there's an up to date
>>>> thread>>
>>>>  > > > discussing this that I missed, please link me that instead.)>>
>>>>  > > >>>
>>>>  > > > ***** My team's specific problems/setup which you can ignore
>>>> ***>>
>>>>  > > >>>
>>>>  > > > My team has been using Hadoop FileIO with the S3AFileSystem.
>>>> Jars
>>>> are>>
>>>>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use
>>>> DynamoDB >
>>>>  > for>>
>>>>  > > > atomic renames by implementing Iceberg's provided interfaces. We
>>>> >
>>>>  > read/write>>
>>>>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>>>>  > (managed by>>
>>>>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by
>>>> core >
>>>>  > nodes)>>
>>>>  > > > for the s3a buffered writes while the on-prem containers use the
>>>> >
>>>>  > docker>>
>>>>  > > > container's default file system which uses an overlay2 storage >
>>>>  > driver (that>>
>>>>  > > > I know nothing about).>>
>>>>  > > >>>
>>>>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
>>>> list>>
>>>>  > > > requests which is well known in the community (but not to my
>>>> team>>
>>>>  > > > unfortunately). There's also GET PUT GET inconsistency issues
>>>> with >
>>>>  > S3 that>>
>>>>  > > > have been talked about, but I don't yet understand how they
>>>> arise >
>>>>  > in the>>
>>>>  > > > 2.8.5 S3AFilesystem
>>>> (https://github.com/apache/iceberg/issues/1398).>>
>>>>  > > >>>
>>>>  > > > *** End of specific ***>>
>>>>  > > >>>
>>>>  > > >>>
>>>>  > > > The options I'm seeing are:>>
>>>>  > > >>>
>>>>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>>>>  > > >>>
>>>>  > > > This still seems very new unless it is actually based on
>>>> Netflix's>>
>>>>  > > > prod implementation that they're releasing to the community?
>>>> (I'm >
>>>>  > wondering>>
>>>>  > > > if it's safe to start moving onto it in prod in the near term.
>>>> If >
>>>>  > Netflix>>
>>>>  > > > is using it (or rolling it out) that would be more than enough
>>>> for >
>>>>  > my team.)>>
>>>>  > > >>>
>>>>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>>>>  > > > recommendations on a version and are you also using S3Guard?>>
>>>>  > > >>>
>>>>  > > > From a quick look, most gains compared to older versions seem to
>>>> be>>
>>>>  > > > from S3Guard. Are there substantial gains without it? (My team >
>>>>  > doesn't have>>
>>>>  > > > experience with S3Guard and Iceberg seems to not need it outside
>>>> of >
>>>>  > atomic>>
>>>>  > > > renames?)>>
>>>>  > > >>>
>>>>  > > > 3. Using an alternative hadoop file system. Any
>>>> recommendations?>>
>>>>  > > >>>
>>>>  > > > In the recent Iceberg S3 FileIO, the License states it was based
>>>> off>>
>>>>  > > > the Presto FileSystem. Has anyone used this file system as is
>>>> with >
>>>>  > Iceberg?>>
>>>>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>>>>  > > >>>
>>>>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
>>>> about>>
>>>>  > > > pitfalls or difficulties?>>
>>>>  > > >>>
>>>>  > > > rdblue hints that Netflix already done this:>>
>>>>  > > > >
>>>>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
>>>> .>>
>>>>  > > > (My team probably doesn't have the capacity for this)>>
>>>>  > > >>>
>>>>  > > >>>
>>>>  > > > Places where I tried looking for this info:>>
>>>>  > > >>>
>>>>  > > > - https://github.com/apache/iceberg/issues/761 (issue for
>>>> getting>>
>>>>  > > > started guide)>>
>>>>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>>>>  > > >>>
>>>>  > > > Thanks everyone,>>
>>>>  > > >>>
>>>>  > > > John Clara>>
>>>>  > > >>>
>>>>  > >>
>>>>  >
>>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Suggested S3 FileIO/Getting Started

Reply via email to