GitHub user raboof added a comment to the discussion: Reading a parquet file and writing to s3 using pekko connectors.
> I am using ByteString(outputStream.toByteArray) to convert the serialized > Parquet data into a format that can be streamed to S3. I’m concerned that > this approach could lead to OutOfMemory (OOM) issues, especially when > processing large files, as ByteArrayOutputStream keeps everything in memory. It looks like you're not creating `ByteArrayOutputStream` for the whole file, but one for each `GenericRecord`. It looks like as long as the `GenericRecord`s aren't big, it shouldn't be a problem to process files that are large because they contain a large number of `GenericRecord`s. GitHub link: https://github.com/apache/pekko-connectors/discussions/857#discussioncomment-10857523 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
