Re: [D] Reading a parquet file and writing to s3 using pekko connectors. [pekko-connectors]

via GitHub Sun, 06 Oct 2024 03:11:04 -0700


GitHub user raboof added a comment to the discussion: Reading a parquet file 
and writing to s3 using pekko connectors.


> I am using ByteString(outputStream.toByteArray) to convert the serialized 
> Parquet data into a format that can be streamed to S3. I’m concerned that 
> this approach could lead to OutOfMemory (OOM) issues, especially when 
> processing large files, as ByteArrayOutputStream keeps everything in memory.

It looks like you're not creating `ByteArrayOutputStream` for the whole file, 
but one for each `GenericRecord`. It looks like as long as the `GenericRecord`s 
aren't big, it shouldn't be a problem to process files that are large because 
they contain a large number of `GenericRecord`s.

GitHub link: 
https://github.com/apache/pekko-connectors/discussions/857#discussioncomment-10857523

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] Reading a parquet file and writing to s3 using pekko connectors. [pekko-connectors]

Reply via email to