Manish Agarwal created PARQUET-156:
--------------------------------------

             Summary: parquet should have automatic memory management to avoid 
out of memory error. 
                 Key: PARQUET-156
                 URL: https://issues.apache.org/jira/browse/PARQUET-156
             Project: Parquet
          Issue Type: Improvement
            Reporter: Manish Agarwal


I sent a mail on dev list but I seem to have a problem with email on dev list  
so opening a bug here . 

I am on a multithreaded system where there are M threads , each thread creating 
  an   independent parquet writer  and writing on the hdfs  in its own 
independent files  . I have a  finite amount of RAM  say R  .  

Now when I created  parquet writer using default block and page size i get heap 
error (no memory )  on my set up  . so I reduced my block size and page size to 
very low and  my system stopped giving me these out of memory errors and 
started writing the file correctly . I am able to read these files correctly as 
well  . 

I should not have to make the memory low and parquet should automatically make 
sure i do not get these errors . 

But in case i have to keep track of the memory my question is this . 

Now keeping these  values very less is not a recommended practice as i would 
loose on performance . I am particularly concerned about  write performance .  
What math formula  do you recommend that I  should use to find correct 
blockSize , pageSize to be passed to the parquet constructor to have   the 
right  WRITE  performance  . ie how can i decide what should be the right 
blockSize , pageSize  for a parquet writer given that i have M threads and 
total RAM memory available is R   . I don't understand dictionaryPageSize need 
and in case i    need to bother about that as well kindly let me know but i 
have kept enableDictionary flag as false . 

I am using the bellow constructor .
public More ...ParquetWriter(
162      Path file,
163      WriteSupport<T> writeSupport,
164      CompressionCodecName compressionCodecName,
165      int blockSize,
166      int pageSize,
167      int dictionaryPageSize,
168      boolean enableDictionary,
169      boolean validating,
170      WriterVersion writerVersion,
171      Configuration conf) throws IOException {




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to