Re: Request to make ContinuousFileReaderOperator public

Martijn Visser Thu, 18 Jul 2024 07:40:18 -0700

Hi Darin,

Your comment "I think our use case is rather unique and I'm not sure who
else would benefit" makes me think then why we should actually support it
in Flink, given that you're mentioning the use case is rather unique?
Especially since the SourceFunction has many flaws, and has been long
overdue to be cleaned up in favor of the new Source API.


The logical use case I see to read from something in the middle of the
stream is when you want to perform lookups, which is where I think we
should have different solutions in place already. Isn't that sufficient for
your use case?

Best regards,

Martijn


On Wed, Jul 17, 2024 at 5:21 PM Darin Amos <darin.a...@instacart.com.invalid>
wrote:

> I wanted to bump this request from last year. As I understand it the new
> FileSource is not suitable for my existing use case.
>
> The TLDR is that we have extensively customized/extended the file splits
> and input formats so a single ContinuousFileReaderOperator can handle
> non-homogenous files and acts as a source in the middle of our stream. The
> alternative is we will re-implement this operator on our own, which seems
> like a reasonable alternative so long as access to the MailboxExecutor
> won't also be deprecated.
>
> If needed, I can create this as a ticket.
>
> Cheers
>
> Darin
>
> On Tue, Nov 21, 2023 at 1:30 PM Darin Amos <darin.a...@instacart.com>
> wrote:
>
> > Hi All!
> >
> > I posted on the community slack channel and was referred to this mailing
> > list. I think it would be helpful if the ContinuousFileReaderOperator was
> > made a public class and not removed in Flink 2.0 (or to have an
> equivalent
> > created). I have a use case for it where FileSource isn't sufficient, at
> > least not to my knowledge.
> >
> > I think our use case is rather unique and I'm not sure who else would
> > benefit. Essentially this operator acts as a source in the middle of our
> > stream. Our application processes non-homogenous files which are
> generally,
> > but not limited to, CSV files. In our case each CSV file has varying
> > headers (both values and number of header), delimiters and quote
> > characters.
> >
> > Our application will receive a Kafka message with sufficient metadata to
> > parse a file (path, delimiter, quote char - configured by supplier) and
> > uses an Async operator to pre-download the headers. Afterwards we are
> able
> > to generate custom file splits (which contain the parsing instructions
> and
> > headers) paired with custom format class to create name-value-pair
> records
> > with the ContinuousFileReaderOperator.
> >
> > I'm more than happy to share more details about our customization if
> > required.
> >
> > Thanks!
> >
> > Darin Amos
> >
> >
> >
>

Re: Request to make ContinuousFileReaderOperator public

Reply via email to