CSV join in batch mode

2022-02-21 Thread Killian GUIHEUX
Hello all,

I have to perform a join between two large csv sets that do not fit in ram. I 
process this two files in batch mode. I also need a side output to catch csv 
processing errors.
So my question is what is the best way to this kind of join operation ? I think 
I should use a valueState state backend but would it work if my ram is my 
states goes larger than my RAM ?

Regards.

Killian

This message contains confidential information and is intended only for the 
individual(s) addressed in the message. If you are not the named addressee, you 
should not disseminate, distribute, or copy this e-mail. If you are not the 
intended recipient, you are notified that disclosing, distributing, or copying 
this e-mail is strictly prohibited.


Re: CSV join in batch mode

2022-02-23 Thread Guowei Ma
Hi, Killian
Sorry for responding late!
I think there is no simple way that could catch csv processing errors. That
means that you need to do it yourself.(Correct me if I am missing
something).
I think you could use RockDB State Backend[1], which would spill data to
disk.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/state_backends/#rocksdb-state-backend-details

Best,
Guowei


On Mon, Feb 21, 2022 at 6:33 PM Killian GUIHEUX <
killian.guiheu...@thalesdigital.io> wrote:

> Hello all,
>
> I have to perform a join between two large csv sets that do not fit in
> ram. I process this two files in batch mode. I also need a side output to
> catch csv processing errors.
> So my question is what is the best way to this kind of join operation ? I
> think I should use a valueState state backend but would it work if my ram
> is my states goes larger than my RAM ?
>
> Regards.
>
> Killian
>
> This message contains confidential information and is intended only for
> the individual(s) addressed in the message. If you are not the named
> addressee, you should not disseminate, distribute, or copy this e-mail. If
> you are not the intended recipient, you are notified that disclosing,
> distributing, or copying this e-mail is strictly prohibited.
>