Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/826#discussion_r116895850 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java --- @@ -380,14 +384,21 @@ public void endRecord() throws IOException { // since ParquetFileWriter will overwrite empty output file (append is not supported) // we need to re-apply file permission - parquetFileWriter = new ParquetFileWriter(conf, schema, path, ParquetFileWriter.Mode.OVERWRITE); + if (useConfiguredBlockSize) { --- End diff -- What we are doing is create parquet file as single block without changing the file system default block size. For ex. default Parquet block size is 512MB and if file system block size is 128MB, we create single file with 4 blocks on filesystem, which can get distributed on different nodes, not good for performance. If we change Parquet block size to 128MB (to match with file system block size), for same amount of data, we end up creating 4 files, one block each, which is not good either. JIRA wanted single HDFS block per Parquet file that is larger than file system block size , without changing file system block size. They had file system block size configured as 128MB. Lowering parquet block size (from default value of 512MB) to match with file system block size is creating too many files for them. For some other reasons, they are not able to change file system block size.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---