(Not sure if this is actually replying or just starting a new thread)
Hi Daniel,
Thanks for the response! It's very helpful and answers a lot my questions.
A couple follow ups:
One of my concerns with S3FileIO is getting tied too much to a single
cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
looked deep enough into the S3FileIO to know how much logic is not s3
specific. Maybe the FileIO interface is enough.
About consistency (no need to respond here):
I'm seeing that during "getFileStatus" my version of s3a does some list
requests (but I'm not sure if that could fail from consistency issues).
I'm also confused about the read-after-(initial) write part:
"Amazon S3 provides read-after-write consistency for PUTS of new objects
in your S3 bucket in all Regions with one caveat. The caveat is that if
you make a HEAD or GET request to a key name before the object is
created, then create the object shortly after that, a subsequent GET
might not return the object due to eventual consistency. -
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html"
When my version of s3a does a create, it first does a getMetadataRequest
(HEAD) to check if the object exists before creating the object. I think
this is talked about in this issue:
https://github.com/apache/iceberg/issues/1398 and talked about in the
S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow up
in that issue for more info.
John
On 2020/11/12 00:36:10, Daniel Weeks <d...@netflix.com.INVALID> wrote:
> Hey John, I might be able to help answer some of your questions and
provide>
> some context around how you might want to go forward.>
>
> So, one fundamental aspect of Iceberg is that it only relies on a few>
> operations (as defined by the FileIO interface). This makes much of the>
> functionality and complexity of full file system implementations>
> unnecessary. You should not need features like S3Guard or additional S3>
> operations these implementations rely on in order to achieve file
system>
> contract behavior. Consistency issues should also not be a problem
since>
> Iceberg does not overwrite or list and read-after-(initial)write is a>
> guarantee provided by S3.>
>
> At Netflix, we use a custom FileSystem implementation (somewhat like
S3A),>
> but with much of the contract behavior that drives additional
operations>
> against S3 disabled. However, we are transitioning to a more native>
> implementation of S3FileIO, which you'll see as part of the ongoing
work in>
> Iceberg.>
>
> Per your specific questions:>
>
> 1) The S3FileIO implementation is very new, though internally we have>
> something very similar. There are features missing that we are
working to>
> add (e.g. progressive multipart upload for large files is likely the
most>
> important).>
> 2) You can use S3AFileSystem with the HadoopFileIO implementation,
though>
> you may still see similar behavior with additional calls being made (I>
> don't know if these can be disabled).>
> 3) The PrestoS3FileSystem is tailored to Presto's use and is likely
not as>
> complete as S3A, but seeing as it is using the Hadoop FileSystem api,
it>
> would likely work for what HadoopFileIO exercises (as would the>
> EMRFileSystem).>
> 4) I would probably discourage you from writing your own file system
as the>
> S3FileIO will likely be a more optimized implementation for what
Iceberg>
> needs.>
>
> If you want to contribute or have time to help contribute to
S3FileIO, that>
> is the path I would recommend. As for configuration, I would say a
lot of>
> it comes down to how to configure the AWS S3 Client that you provide
to the>
> S3FileIO implementation, but a lot of the defaults are reasonable (you>
> might want to tweak a few like max connections and maybe the retry
policy).>
>
> The recently committed work to dynamically load your FileIO should
make it>
> relatively easy to test out and we'd love to have extra eyes and
feedback>
> on it.>
>
> Let me know if that helps,>
> -Dan>
>
>
>
> On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>
> wrote:>
>
> > Hello all,>
> >>
> > Thank you all for creating/continuing this great project! I am just>
> > starting to get comfortable with the fundamentals and I'm thinking
that my>
> > team has been using Iceberg the wrong way at the FileIO level.>
> >>
> > I was wondering if people would be willing to share how they set up
their>
> > FileIO/FileSystem with S3 and any customizations they had to add.>
> >>
> > (Preferably from smaller teams. My team is small and cannot>
> > realistically customize everything. If there's an up to date thread>
> > discussing this that I missed, please link me that instead.)>
> >>
> > ***** My team's specific problems/setup which you can ignore ***>
> >>
> > My team has been using Hadoop FileIO with the S3AFileSystem. Jars are>
> > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB
for>
> > atomic renames by implementing Iceberg's provided interfaces. We
read/write>
> > from either Spark in EMR or on-prem JVM's in docker containers
(managed by>
> > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core
nodes)>
> > for the s3a buffered writes while the on-prem containers use the
docker>
> > container's default file system which uses an overlay2 storage
driver (that>
> > I know nothing about).>
> >>
> > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and list>
> > requests which is well known in the community (but not to my team>
> > unfortunately). There's also GET PUT GET inconsistency issues with
S3 that>
> > have been talked about, but I don't yet understand how they arise
in the>
> > 2.8.5 S3AFilesystem (https://github.com/apache/iceberg/issues/1398).>
> >>
> > *** End of specific ***>
> >>
> >>
> > The options I'm seeing are:>
> >>
> > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>
> >>
> > This still seems very new unless it is actually based on Netflix's>
> > prod implementation that they're releasing to the community? (I'm
wondering>
> > if it's safe to start moving onto it in prod in the near term. If
Netflix>
> > is using it (or rolling it out) that would be more than enough for
my team.)>
> >>
> > 2. Using a newer hadoop version and use the S3AFileSystem. Any>
> > recommendations on a version and are you also using S3Guard?>
> >>
> > From a quick look, most gains compared to older versions seem to be>
> > from S3Guard. Are there substantial gains without it? (My team
doesn't have>
> > experience with S3Guard and Iceberg seems to not need it outside of
atomic>
> > renames?)>
> >>
> > 3. Using an alternative hadoop file system. Any recommendations?>
> >>
> > In the recent Iceberg S3 FileIO, the License states it was based off>
> > the Presto FileSystem. Has anyone used this file system as is with
Iceberg?>
> > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>
> >>
> > 4. Roll our own hadoop file system. Anyone have stories/blogs about>
> > pitfalls or difficulties?>
> >>
> > rdblue hints that Netflix already done this:>
> >
https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 .>
> > (My team probably doesn't have the capacity for this)>
> >>
> >>
> > Places where I tried looking for this info:>
> >>
> > - https://github.com/apache/iceberg/issues/761 (issue for getting>
> > started guide)>
> > - https://iceberg.apache.org/spec/#file-system-operations>
> >>
> > Thanks everyone,>
> >>
> > John Clara>
> >>
>