[
https://issues.apache.org/jira/browse/PARQUET-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reza Shiftehfar updated PARQUET-209:
------------------------------------
Description:
While using ParquetWriter and before closing the writer to write content out to
the disk, there is no way to check/estimate the size of the output file. This
is useful in case we want to close files and upload them based on a size
threshold.
Since ParquetWriter keeps everything in memory and only writes it out to disk
at the end when writer is closed, it is not possible to have an estimate of the
file size before closing the writer.
Based on Parquet documentation, the data is written into memory object in the
final format, meaning that the size of the object in memory is the same as the
final size on disk. it would be great if you can expose the current size.
It is true that such a size will be different than the final output size
because of adding the schema and other metadata at the end of the file but it
still gives a close estimation of the output file size that will be very useful
when reading/writing streams.
Labels: parquetWriter (was: )
> Enhance ParquetWriter with exposing in-memory size of writer object
> -------------------------------------------------------------------
>
> Key: PARQUET-209
> URL: https://issues.apache.org/jira/browse/PARQUET-209
> Project: Parquet
> Issue Type: Wish
> Reporter: Reza Shiftehfar
> Labels: parquetWriter
>
> While using ParquetWriter and before closing the writer to write content out
> to the disk, there is no way to check/estimate the size of the output file.
> This is useful in case we want to close files and upload them based on a size
> threshold.
> Since ParquetWriter keeps everything in memory and only writes it out to disk
> at the end when writer is closed, it is not possible to have an estimate of
> the file size before closing the writer.
> Based on Parquet documentation, the data is written into memory object in the
> final format, meaning that the size of the object in memory is the same as
> the final size on disk. it would be great if you can expose the current size.
> It is true that such a size will be different than the final output size
> because of adding the schema and other metadata at the end of the file but it
> still gives a close estimation of the output file size that will be very
> useful when reading/writing streams.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)