Manish Agarwal created PARQUET-156:
--------------------------------------
Summary: parquet should have automatic memory management to avoid
out of memory error.
Key: PARQUET-156
URL: https://issues.apache.org/jira/browse/PARQUET-156
Project: Parquet
Issue Type: Improvement
Reporter: Manish Agarwal
I sent a mail on dev list but I seem to have a problem with email on dev list
so opening a bug here .
I am on a multithreaded system where there are M threads , each thread creating
an independent parquet writer and writing on the hdfs in its own
independent files . I have a finite amount of RAM say R .
Now when I created parquet writer using default block and page size i get heap
error (no memory ) on my set up . so I reduced my block size and page size to
very low and my system stopped giving me these out of memory errors and
started writing the file correctly . I am able to read these files correctly as
well .
I should not have to make the memory low and parquet should automatically make
sure i do not get these errors .
But in case i have to keep track of the memory my question is this .
Now keeping these values very less is not a recommended practice as i would
loose on performance . I am particularly concerned about write performance .
What math formula do you recommend that I should use to find correct
blockSize , pageSize to be passed to the parquet constructor to have the
right WRITE performance . ie how can i decide what should be the right
blockSize , pageSize for a parquet writer given that i have M threads and
total RAM memory available is R . I don't understand dictionaryPageSize need
and in case i need to bother about that as well kindly let me know but i
have kept enableDictionary flag as false .
I am using the bellow constructor .
public More ...ParquetWriter(
162 Path file,
163 WriteSupport<T> writeSupport,
164 CompressionCodecName compressionCodecName,
165 int blockSize,
166 int pageSize,
167 int dictionaryPageSize,
168 boolean enableDictionary,
169 boolean validating,
170 WriterVersion writerVersion,
171 Configuration conf) throws IOException {
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)