Hi Piotr,

Thanks for the proposal.
How adding a s5cmd will affect memory footprint? Since this is a native
binary, memory consumption will not be controlled by JVM or Flink.

Thanks,
Aleksandr

On Thu, 2 May 2024 at 11:12, Hong Liang <h...@apache.org> wrote:

> Hi Piotr,
>
> Thanks for the FLIP! Nice to see work to improve the filesystem
> performance. +1 to future work to improve the upload speed as well. This
> would be useful for jobs with large state and high Async checkpointing
> times.
>
> Some thoughts on the configuration, it might be good for us to introduce 2x
> points of configurability for future proofing:
> 1/ Configure the implementation of PathsCopyingFileSystem used, maybe by
> config, or by ServiceResources (this would allow us to use this for
> alternative clouds/Implement S3 SDKv2 support if we want this in the
> future). Also this could be used as a feature flag to determine if we
> should be using this new native file copy support.
> 2/ Configure the location of the s5cmd binary (version control etc.), as
> you have mentioned in the FLIP.
>
> Regards,
> Hong
>
>
> On Thu, May 2, 2024 at 9:40 AM Muhammet Orazov
> <mor+fl...@morazow.com.invalid> wrote:
>
> > Hey Piotr,
> >
> > Thanks for the proposal! It would be great improvement!
> >
> > Some questions from my side:
> >
> > > In order to configure s5cmd Flink’s user would need
> > > to specify path to the s5cmd binary.
> >
> > Could you please also add the configuration property
> > for this? An example showing how users would set this
> > parameter would be helpful.
> >
> > Would this affect any filesystem connectors that use
> > FileSystem[1][2] dependencies?
> >
> > Best,
> > Muhammet
> >
> > [1]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/
> > [2]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/
> >
> > On 2024-04-30 13:15, Piotr Nowojski wrote:
> > > Hi all!
> > >
> > > I would like to put under discussion:
> > >
> > > FLIP-444: Native file copy support
> > > https://cwiki.apache.org/confluence/x/rAn9EQ
> > >
> > > This proposal aims to speed up Flink recovery times, by speeding up
> > > state
> > > download times. However in the future, the same mechanism could be also
> > > used to speed up state uploading (checkpointing/savepointing).
> > >
> > > I'm curious to hear your thoughts.
> > >
> > > Best,
> > > Piotrek
> >
>

Reply via email to