Hey John, about the concerns around cloud provider dependency, I feel like the FileIO interface is actually the right level of abstraction already.
That interface basically requires "open for read" and "open for write", where the implementation will diverge across different platforms. I guess you could think of it as S3FileIO is to FileIO what S3AFileSystem is to FileSystem (in Hadoop). You can have many different implementations that coexist. In fact, recent changes to the Catalog allow for very flexible management of FIleIO and you could even have files within a table split across multiple cloud vendors. As to the consistency questions, the list operation can be inconsistent (e.g. if a new file is created and the implementation relies on list then read, it may not see newly created objects. Iceberg does not list, so that should not be an issue). The stated read-after-write consistency is limited and does not include: - Read after overwrite - Read after delete - Read after negative cache (e.g. a GET or HEAD that occurred before the object was created). Some of those inconsistencies have caused problems in certain cases when it comes to committing data (the negative cache being the main culprit). -Dan On Wed, Nov 11, 2020 at 6:49 PM John Clara <john.anthony.cl...@gmail.com> wrote: > Update: I think I'm wrong about the listing part. I think it will only > do the HEAD request. Also it seems like the consistency issue is > probably not something my team would encounter with our current jobs. > > On 2020/11/12 02:17:10, John Clara <j...@gmail.com> wrote: > > (Not sure if this is actually replying or just starting a new thread)> > > > > Hi Daniel,> > > > > Thanks for the response! It's very helpful and answers a lot my > questions.> > > > > A couple follow ups:> > > > > One of my concerns with S3FileIO is getting tied too much to a single > > > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful > > > so that S3FileIO and (a future) GCSFileIO could share logic? I haven't > > > looked deep enough into the S3FileIO to know how much logic is not s3 > > > specific. Maybe the FileIO interface is enough.> > > > > About consistency (no need to respond here):> > > I'm seeing that during "getFileStatus" my version of s3a does some > list > > > requests (but I'm not sure if that could fail from consistency issues).> > > I'm also confused about the read-after-(initial) write part:> > > "Amazon S3 provides read-after-write consistency for PUTS of new > objects > > > in your S3 bucket in all Regions with one caveat. The caveat is that > if > > > you make a HEAD or GET request to a key name before the object is > > > created, then create the object shortly after that, a subsequent GET > > > might not return the object due to eventual consistency. - > > > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html"> > > > > When my version of s3a does a create, it first does a > getMetadataRequest > > > (HEAD) to check if the object exists before creating the object. I > think > > > this is talked about in this issue: > > > https://github.com/apache/iceberg/issues/1398 and talked about in the > > > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow > up > > > in that issue for more info.> > > > > John> > > > > > > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID> > wrote:> > > > Hey John, I might be able to help answer some of your questions and > > > provide>> > > > some context around how you might want to go forward.>> > > >> > > > So, one fundamental aspect of Iceberg is that it only relies on a > few>> > > > operations (as defined by the FileIO interface). This makes much of > the>> > > > functionality and complexity of full file system implementations>> > > > unnecessary. You should not need features like S3Guard or > additional S3>> > > > operations these implementations rely on in order to achieve file > > > system>> > > > contract behavior. Consistency issues should also not be a problem > > > since>> > > > Iceberg does not overwrite or list and read-after-(initial)write is > a>> > > > guarantee provided by S3.>> > > >> > > > At Netflix, we use a custom FileSystem implementation (somewhat like > > > S3A),>> > > > but with much of the contract behavior that drives additional > > > operations>> > > > against S3 disabled. However, we are transitioning to a more native>> > > > implementation of S3FileIO, which you'll see as part of the ongoing > > > work in>> > > > Iceberg.>> > > >> > > > Per your specific questions:>> > > >> > > > 1) The S3FileIO implementation is very new, though internally we > have>> > > > something very similar. There are features missing that we are > > > working to>> > > > add (e.g. progressive multipart upload for large files is likely the > > > most>> > > > important).>> > > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, > > > though>> > > > you may still see similar behavior with additional calls being made > (I>> > > > don't know if these can be disabled).>> > > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely > > > not as>> > > > complete as S3A, but seeing as it is using the Hadoop FileSystem > api, > > > it>> > > > would likely work for what HadoopFileIO exercises (as would the>> > > > EMRFileSystem).>> > > > 4) I would probably discourage you from writing your own file system > > > as the>> > > > S3FileIO will likely be a more optimized implementation for what > > > Iceberg>> > > > needs.>> > > >> > > > If you want to contribute or have time to help contribute to > > > S3FileIO, that>> > > > is the path I would recommend. As for configuration, I would say a > > > lot of>> > > > it comes down to how to configure the AWS S3 Client that you provide > > > to the>> > > > S3FileIO implementation, but a lot of the defaults are reasonable > (you>> > > > might want to tweak a few like max connections and maybe the retry > > > policy).>> > > >> > > > The recently committed work to dynamically load your FileIO should > > > make it>> > > > relatively easy to test out and we'd love to have extra eyes and > > > feedback>> > > > on it.>> > > >> > > > Let me know if that helps,>> > > > -Dan>> > > >> > > >> > > >> > > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>> > > > wrote:>> > > >> > > > > Hello all,>> > > > >>> > > > > Thank you all for creating/continuing this great project! I am > just>> > > > > starting to get comfortable with the fundamentals and I'm thinking > > > that my>> > > > > team has been using Iceberg the wrong way at the FileIO level.>> > > > >>> > > > > I was wondering if people would be willing to share how they set > up > > > their>> > > > > FileIO/FileSystem with S3 and any customizations they had to add.>> > > > >>> > > > > (Preferably from smaller teams. My team is small and cannot>> > > > > realistically customize everything. If there's an up to date > thread>> > > > > discussing this that I missed, please link me that instead.)>> > > > >>> > > > > ***** My team's specific problems/setup which you can ignore ***>> > > > >>> > > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars > are>> > > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB > > > for>> > > > > atomic renames by implementing Iceberg's provided interfaces. We > > > read/write>> > > > > from either Spark in EMR or on-prem JVM's in docker containers > > > (managed by>> > > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core > > > nodes)>> > > > > for the s3a buffered writes while the on-prem containers use the > > > docker>> > > > > container's default file system which uses an overlay2 storage > > > driver (that>> > > > > I know nothing about).>> > > > >>> > > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and > list>> > > > > requests which is well known in the community (but not to my team>> > > > > unfortunately). There's also GET PUT GET inconsistency issues with > > > S3 that>> > > > > have been talked about, but I don't yet understand how they arise > > > in the>> > > > > 2.8.5 S3AFilesystem > (https://github.com/apache/iceberg/issues/1398).>> > > > >>> > > > > *** End of specific ***>> > > > >>> > > > >>> > > > > The options I'm seeing are:>> > > > >>> > > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>> > > > >>> > > > > This still seems very new unless it is actually based on Netflix's>> > > > > prod implementation that they're releasing to the community? (I'm > > > wondering>> > > > > if it's safe to start moving onto it in prod in the near term. If > > > Netflix>> > > > > is using it (or rolling it out) that would be more than enough for > > > my team.)>> > > > >>> > > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>> > > > > recommendations on a version and are you also using S3Guard?>> > > > >>> > > > > From a quick look, most gains compared to older versions seem to > be>> > > > > from S3Guard. Are there substantial gains without it? (My team > > > doesn't have>> > > > > experience with S3Guard and Iceberg seems to not need it outside > of > > > atomic>> > > > > renames?)>> > > > >>> > > > > 3. Using an alternative hadoop file system. Any recommendations?>> > > > >>> > > > > In the recent Iceberg S3 FileIO, the License states it was based > off>> > > > > the Presto FileSystem. Has anyone used this file system as is with > > > Iceberg?>> > > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>> > > > >>> > > > > 4. Roll our own hadoop file system. Anyone have stories/blogs > about>> > > > > pitfalls or difficulties?>> > > > >>> > > > > rdblue hints that Netflix already done this:>> > > > > > > > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 > .>> > > > > (My team probably doesn't have the capacity for this)>> > > > >>> > > > >>> > > > > Places where I tried looking for this info:>> > > > >>> > > > > - https://github.com/apache/iceberg/issues/761 (issue for getting>> > > > > started guide)>> > > > > - https://iceberg.apache.org/spec/#file-system-operations>> > > > >>> > > > > Thanks everyone,>> > > > >>> > > > > John Clara>> > > > >>> > > >> > > >