Re: Flink File Source: File read strategy
Hi Kirti, I think you can refer to doc [1] and create a table in your S3 file system (put your s3 path in the `path` field), then submit jobs to write and read data with S3. You can refer to [2] if your jobs are `DataStream`. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/ [2] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/ Best, Shammon FY On Mon, Sep 25, 2023 at 12:36 PM Kirti Dhar Upadhyay K < kirti.k.dhar.upadh...@ericsson.com> wrote: > Thanks Shammon. > > Is there any way to verify that File Source reads files directly from S3? > > > > Regards, > > Kirti Dhar > > > > *From:* Shammon FY > *Sent:* 25 September 2023 06:27 > *To:* Kirti Dhar Upadhyay K > *Cc:* user@flink.apache.org > *Subject:* Re: Flink File Source: File read strategy > > > > Hi Kirti, > > > > I think the default file `Source` does not download files locally > in Flink, but reads them directly from S3. However, Flink also supports > configuring temporary directories through `io.tmp.dirs`. If it is a > user-defined source, it can be obtained from FlinkS3FileSystem. After the > Flink job is completed, the directory will be cleaned up. > > > > Best, > > Shammon FY > > > > On Fri, Sep 22, 2023 at 3:11 PM Kirti Dhar Upadhyay K via user < > user@flink.apache.org> wrote: > > Hi Community, > > > > I am using Flink File Source with Amazon S3. > > Please help me on below questions- > > > >1. When Split Enumerator assigns split to Source Reader, does it >downloads the file temporarily and then starts reading/decoding the records >from file or it creates direct stream with S3? > > > >1. If it is downloaded locally then on which path? Is it configurable? > > > >1. Does this temporary file automatically gets deleted or any explicit >cleanup is required? > > > > > > Regards, > > Kirti Dhar > >
RE: Flink File Source: File read strategy
Thanks Shammon. Is there any way to verify that File Source reads files directly from S3? Regards, Kirti Dhar From: Shammon FY Sent: 25 September 2023 06:27 To: Kirti Dhar Upadhyay K Cc: user@flink.apache.org Subject: Re: Flink File Source: File read strategy Hi Kirti, I think the default file `Source` does not download files locally in Flink, but reads them directly from S3. However, Flink also supports configuring temporary directories through `io.tmp.dirs`. If it is a user-defined source, it can be obtained from FlinkS3FileSystem. After the Flink job is completed, the directory will be cleaned up. Best, Shammon FY On Fri, Sep 22, 2023 at 3:11 PM Kirti Dhar Upadhyay K via user mailto:user@flink.apache.org>> wrote: Hi Community, I am using Flink File Source with Amazon S3. Please help me on below questions- 1. When Split Enumerator assigns split to Source Reader, does it downloads the file temporarily and then starts reading/decoding the records from file or it creates direct stream with S3? 1. If it is downloaded locally then on which path? Is it configurable? 1. Does this temporary file automatically gets deleted or any explicit cleanup is required? Regards, Kirti Dhar
Re: Flink File Source: File read strategy
Hi Kirti, I think the default file `Source` does not download files locally in Flink, but reads them directly from S3. However, Flink also supports configuring temporary directories through `io.tmp.dirs`. If it is a user-defined source, it can be obtained from FlinkS3FileSystem. After the Flink job is completed, the directory will be cleaned up. Best, Shammon FY On Fri, Sep 22, 2023 at 3:11 PM Kirti Dhar Upadhyay K via user < user@flink.apache.org> wrote: > Hi Community, > > > > I am using Flink File Source with Amazon S3. > > Please help me on below questions- > > > >1. When Split Enumerator assigns split to Source Reader, does it >downloads the file temporarily and then starts reading/decoding the records >from file or it creates direct stream with S3? > > > >1. If it is downloaded locally then on which path? Is it configurable? > > > >1. Does this temporary file automatically gets deleted or any explicit >cleanup is required? > > > > > > Regards, > > Kirti Dhar >
Flink File Source: File read strategy
Hi Community, I am using Flink File Source with Amazon S3. Please help me on below questions- 1. When Split Enumerator assigns split to Source Reader, does it downloads the file temporarily and then starts reading/decoding the records from file or it creates direct stream with S3? 1. If it is downloaded locally then on which path? Is it configurable? 1. Does this temporary file automatically gets deleted or any explicit cleanup is required? Regards, Kirti Dhar