Re: Flatmap operator in an Asynchronous call

2022-03-25 Thread Diwakar Jha
I'm not able to use asyncIO because the file will not fit in memory. I thought that flatmap will allow me to enrich/process records while downloading instead of waiting for the whole file to get downloaded. The solution works but its not scalable because i'm not able to use AsynFunction in

Re: Flatmap operator in an Asynchronous call

2022-03-09 Thread Arvid Heise
You can use flatMap to flatten and have an asyncIO after it. On Wed, Mar 9, 2022 at 8:08 AM Diwakar Jha wrote: > Thanks Gen, I will look into customized Source and SpiltEnumerator. > > On Mon, Mar 7, 2022 at 10:20 PM Gen Luo wrote: > >> Hi Diwakar, >> >> An asynchronous flatmap function

Re: Flatmap operator in an Asynchronous call

2022-03-08 Thread Diwakar Jha
Thanks Gen, I will look into customized Source and SpiltEnumerator. On Mon, Mar 7, 2022 at 10:20 PM Gen Luo wrote: > Hi Diwakar, > > An asynchronous flatmap function without the support of the framework can > be problematic. You should not call collector.collect outside the main > thread of the

Re: Flatmap operator in an Asynchronous call

2022-03-07 Thread Gen Luo
Hi Diwakar, An asynchronous flatmap function without the support of the framework can be problematic. You should not call collector.collect outside the main thread of the task, i.e. outside the flatMap method. I'd suggest using a customized Source instead to process the files, which uses a

Flatmap operator in an Asynchronous call

2022-03-07 Thread Diwakar Jha
Hello Everyone, I'm running a streaming application using Flink 1.11 and EMR 6.01. My use case is reading files from a s3 bucket, filter file contents ( say record) and enrich each record. Filter records and output to a sink. I'm reading 6k files per 15mints and the total number of records is 3