DataStream API in Batch Execution mode

2021-06-07 Thread Marco Villalobos
How do I use a hierarchical directory structure as a file source in S3 when
using the DataStream API in Batch Execution mode?

I have been trying to find out if the API supports that, because currently
our data is organized by years, halves, quarters, months, and but before I
launch the job, I flatten the file structure just to process the right set
of files.


Re: DataStream API in Batch Execution mode

2021-06-07 Thread Guowei Ma
Hi, Macro

I think you could try the `FileSource` and you could find an example from
[1]. The `FileSource` would scan the file under the given
directory recursively.
Would you mind opening an issue for lacking the document?

[1]
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/test/java/org/apache/flink/connector/file/src/FileSourceTextLinesITCase.java
Best,
Guowei


On Tue, Jun 8, 2021 at 5:59 AM Marco Villalobos 
wrote:

> How do I use a hierarchical directory structure as a file source in S3
> when using the DataStream API in Batch Execution mode?
>
> I have been trying to find out if the API supports that, because currently
> our data is organized by years, halves, quarters, months, and but before I
> launch the job, I flatten the file structure just to process the right set
> of files.
>
>
>


Re: DataStream API in Batch Execution mode

2021-06-09 Thread Marco Villalobos
That worked.  Thank you very much.

On Mon, Jun 7, 2021 at 9:23 PM Guowei Ma  wrote:

> Hi, Macro
>
> I think you could try the `FileSource` and you could find an example from
> [1]. The `FileSource` would scan the file under the given
> directory recursively.
> Would you mind opening an issue for lacking the document?
>
> [1]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/test/java/org/apache/flink/connector/file/src/FileSourceTextLinesITCase.java
> Best,
> Guowei
>
>
> On Tue, Jun 8, 2021 at 5:59 AM Marco Villalobos 
> wrote:
>
>> How do I use a hierarchical directory structure as a file source in S3
>> when using the DataStream API in Batch Execution mode?
>>
>> I have been trying to find out if the API supports that,
>> because currently our data is organized by years, halves, quarters, months,
>> and but before I launch the job, I flatten the file structure just to
>> process the right set of files.
>>
>>
>>