Just to add a bit more context to the performance test question: What I had
in mind was the exists call on a (non-existing) directories in a bucket
with a lot of objects. A comment from one of the SDK contributors about
that call was that it could be an expensive call in an object store if
implemented wrongly. I would imagine that this could be a valid concern
because the concept of directories is not really present in an object store
like S3, if I'm not mistaken?!

On Mon, Oct 9, 2023 at 6:49 PM Matthias Pohl <matthias.p...@aiven.io> wrote:

> I would agree with David's proposal as well.
>
> Would it make sense to come up with some performance comparisons for the
> different S3 implementations in the end? ...just to ensure that we're
> improving things or (at least) don't make things worse. Or is there
> something like that already somewhere?
>
> A bit out of scope:
> We noticed that the FileSystem contract is not well defined. The JavaDoc
> is ambiguous (IMHO) for some operations. For instance, the return value of
> delete [1] is true "if the operation was successful": It's unclear (at
> least to me) what success means here. Is it about the processing (i.e. the
> delete was performed on an existing file) or the outcome (i.e. success is
> reached as well if the file didn't exist in the first place). Removing the
> return type could help to make the contract clearer. In the end, only the
> outcome (i.e. the file doesn't exist anymore) matters in my opinion. A
> similar argument could be applied to mkdirs [2] and rename [3].
>
> That said, I'm not suggesting you adapt the interface as part of your
> work. But it would be good to collect other improvements as part of it. We
> could consider improving the FileSystem interface as part of the 2.0
> efforts as a follow-up.
>
> [1]
> https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L695
> [2]
> https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L706
> [3]
> https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L773
>
> On Tue, Oct 3, 2023 at 6:25 PM Martijn Visser <martijnvis...@apache.org>
> wrote:
>
>> +1 for David's suggestion. We should get away from the current
>> approach with two abstractions and get to one rock solid one.
>>
>> On Mon, Oct 2, 2023 at 11:13 PM David Morávek <d...@apache.org> wrote:
>> >
>> > Hi Maomao,
>> >
>> > I wonder whether it would make sense to take a stab at consolidating
>> the S3
>> > filesystems instead and introduce a native one. The whole Hadoop wrapper
>> > around the S3 client exists for legacy reasons, and it adds complexity
>> and
>> > probably an unnecessary performance penalty.
>> >
>> > If you take a look at the underlying presto implementation, it's
>> actually
>> > not too complex to adapt to Flink interfaces (since you're proposing to
>> > maintain a copy of it anyway).
>> >
>> > Overall, the S3 FS is probably the most used one that we have so this
>> could
>> > be rather high impact. It would also eliminate user confusion when
>> choosing
>> > the implementation to use.
>> >
>> > WDYT?
>> >
>> > Best,
>> > D.
>> >
>> > On Fri, Sep 29, 2023 at 2:41 PM Min, Maomao <mimao...@amazon.com.invalid
>> >
>> > wrote:
>> >
>> > > Hi Flink Dev,
>> > >
>> > > I’m Maomao, a developer from AWS EMR.
>> > >
>> > > Recently, our team is working on adding AWS SDK V2 support for
>> Flink’s S3
>> > > Filesystem. During development, we found out that our work was
>> blocked by
>> > > Presto. This is because that Presto still uses AWS SDK V1 and won’t
>> add
>> > > support for AWS SDK V2 in short term. To unblock, our team proposed
>> several
>> > > options and I’ve created a JIRA issue as here<
>> > > https://issues.apache.org/jira/browse/FLINK-33157>.
>> > >
>> > > Since our team plans to contribute this work back to the community
>> later,
>> > > we’d like to collect feedback from the community about the options we
>> > > proposed in the long term so that the community won’t need to
>> duplicate
>> > > this work in the future.
>> > >
>> > > Best,
>> > > Maomao
>> > >
>> > >
>>
>

Reply via email to