[jira] [Updated] (PARQUET-209) Enhance ParquetWriter with exposing in-memory size of writer object

Reza Shiftehfar (JIRA) Fri, 06 Mar 2015 18:04:43 -0800

     [ 
https://issues.apache.org/jira/browse/PARQUET-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reza Shiftehfar updated PARQUET-209:
------------------------------------
    Description: 
While using ParquetWriter and before closing the writer to write content out to 
the disk, there is no way to check/estimate the size of the output file. This 
is useful in case we want to close files and upload them based on a size 
threshold. 
Since ParquetWriter keeps everything in memory and only writes it out to disk 
at the end when writer is closed, it is not possible to have an estimate of the 
file size before closing the writer.

Based on Parquet documentation, the data is written into memory object in the 
final format, meaning that the size of the object in memory is the same as the 
final size on disk. it would be great if you can expose the current size. 

It is true that such a size will be different than the final output size 
because of adding the schema and other metadata at the end of the file but it 
still gives a close estimation of the output file size that will be very useful 
when reading/writing streams.
         Labels: parquetWriter  (was: )

> Enhance ParquetWriter with exposing in-memory size of writer object
> -------------------------------------------------------------------
>
>                 Key: PARQUET-209
>                 URL: https://issues.apache.org/jira/browse/PARQUET-209
>             Project: Parquet
>          Issue Type: Wish
>            Reporter: Reza Shiftehfar
>              Labels: parquetWriter
>
> While using ParquetWriter and before closing the writer to write content out 
> to the disk, there is no way to check/estimate the size of the output file. 
> This is useful in case we want to close files and upload them based on a size 
> threshold. 
> Since ParquetWriter keeps everything in memory and only writes it out to disk 
> at the end when writer is closed, it is not possible to have an estimate of 
> the file size before closing the writer.
> Based on Parquet documentation, the data is written into memory object in the 
> final format, meaning that the size of the object in memory is the same as 
> the final size on disk. it would be great if you can expose the current size. 
> It is true that such a size will be different than the final output size 
> because of adding the schema and other metadata at the end of the file but it 
> still gives a close estimation of the output file size that will be very 
> useful when reading/writing streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PARQUET-209) Enhance ParquetWriter with exposing in-memory size of writer object

Reply via email to