I would agree with David's proposal as well.

Would it make sense to come up with some performance comparisons for the
different S3 implementations in the end? ...just to ensure that we're
improving things or (at least) don't make things worse. Or is there
something like that already somewhere?

A bit out of scope:
We noticed that the FileSystem contract is not well defined. The JavaDoc is
ambiguous (IMHO) for some operations. For instance, the return value of
delete [1] is true "if the operation was successful": It's unclear (at
least to me) what success means here. Is it about the processing (i.e. the
delete was performed on an existing file) or the outcome (i.e. success is
reached as well if the file didn't exist in the first place). Removing the
return type could help to make the contract clearer. In the end, only the
outcome (i.e. the file doesn't exist anymore) matters in my opinion. A
similar argument could be applied to mkdirs [2] and rename [3].

That said, I'm not suggesting you adapt the interface as part of your work.
But it would be good to collect other improvements as part of it. We could
consider improving the FileSystem interface as part of the 2.0 efforts as a
follow-up.

[1]
https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L695
[2]
https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L706
[3]
https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L773

On Tue, Oct 3, 2023 at 6:25 PM Martijn Visser <martijnvis...@apache.org>
wrote:

> +1 for David's suggestion. We should get away from the current
> approach with two abstractions and get to one rock solid one.
>
> On Mon, Oct 2, 2023 at 11:13 PM David Morávek <d...@apache.org> wrote:
> >
> > Hi Maomao,
> >
> > I wonder whether it would make sense to take a stab at consolidating the
> S3
> > filesystems instead and introduce a native one. The whole Hadoop wrapper
> > around the S3 client exists for legacy reasons, and it adds complexity
> and
> > probably an unnecessary performance penalty.
> >
> > If you take a look at the underlying presto implementation, it's actually
> > not too complex to adapt to Flink interfaces (since you're proposing to
> > maintain a copy of it anyway).
> >
> > Overall, the S3 FS is probably the most used one that we have so this
> could
> > be rather high impact. It would also eliminate user confusion when
> choosing
> > the implementation to use.
> >
> > WDYT?
> >
> > Best,
> > D.
> >
> > On Fri, Sep 29, 2023 at 2:41 PM Min, Maomao <mimao...@amazon.com.invalid
> >
> > wrote:
> >
> > > Hi Flink Dev,
> > >
> > > I’m Maomao, a developer from AWS EMR.
> > >
> > > Recently, our team is working on adding AWS SDK V2 support for Flink’s
> S3
> > > Filesystem. During development, we found out that our work was blocked
> by
> > > Presto. This is because that Presto still uses AWS SDK V1 and won’t add
> > > support for AWS SDK V2 in short term. To unblock, our team proposed
> several
> > > options and I’ve created a JIRA issue as here<
> > > https://issues.apache.org/jira/browse/FLINK-33157>.
> > >
> > > Since our team plans to contribute this work back to the community
> later,
> > > we’d like to collect feedback from the community about the options we
> > > proposed in the long term so that the community won’t need to duplicate
> > > this work in the future.
> > >
> > > Best,
> > > Maomao
> > >
> > >
>

Reply via email to