hi I am on a multithreaded system where there are M threads , each thread creating an indeendent parquet writer and writing on the hdfs in its own independent files . I have a finite amount of RAM say R .
Now when I created parquet writer using default block and page size i get heap error (no memory ) on my set up . so I reduced my block size and page size to very low and my system stopped giving me these out of memory errors and started writing the file correctly . I am able to read these files correctly as well . Now keeping these values very less is not a recommended practice as i would loose on performance . I am particularly concerned about write performance . What technique do you recommend that I should use to find correct *blockSize , **pageSize* to have the right *WRITE* performance . ie how can i decide what should be the right *blockSize , **pageSize * for a parquet writer given that i have M threads and total RAM memory available is R . I don't understand *dictionaryPageSize *need and in case i need to bother about that as well kindly let me know but i have kept enableDictionary flag as false . public <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#>ParquetWriter( 162 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#162> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> Path file, 163 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#163> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> WriteSupport <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/api/WriteSupport.java#WriteSupport><T> writeSupport, 164 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#164> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> CompressionCodecName <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/metadata/CompressionCodecName.java#CompressionCodecName> compressionCodecName, 165 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#165> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> int *blockSize*, 166 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#166> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> int *pageSize*, 167 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#167> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> int *dictionaryPageSize*, 168 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#168> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> boolean enableDictionary, 169 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#169> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> boolean validating, 170 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#170> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> WriterVersion <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-column/1.6.0rc3/parquet/column/ParquetProperties.java#ParquetProperties.WriterVersion> writerVersion, 171 <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#171> <http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-hadoop/1.6.0rc3/parquet/hadoop/ParquetWriter.java#> Configuration conf) throws IOException <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/io/IOException.java#IOException> { Thanks and Regards Manish Agarwal *[email protected] <[email protected]>*
