Done, Thanks for the feedback https://issues.apache.org/jira/browse/DRILL-5379
On Thu, Mar 23, 2017 at 4:29 PM, Kunal Khatua <kkha...@mapr.com> wrote: > This seems like a reasonable feature request. It could also be expanded to > detect the underlying block size for the location being written to. > > > Could you file a JIRA for this? > > > Thanks > > Kunal > > ________________________________ > From: François Méthot <fmetho...@gmail.com> > Sent: Thursday, March 23, 2017 9:08:51 AM > To: dev@drill.apache.org > Subject: Re: Single Hdfs block per parquet file > > After further investigation, Drill uses the hadoop ParquetFileWriter ( > https://github.com/Parquet/parquet-mr/blob/master/ > parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileWriter.java > ). > This is where the file creation occurs so it might be tricky after all. > > However ParquetRecordWriter.java ( > https://github.com/apache/drill/blob/master/exec/java- > exec/src/main/java/org/apache/drill/exec/store/parquet/ > ParquetRecordWriter.java) > in Drill creates the ParquetFileWriter with an hadoop configuration object. > > However something to explore: Could the block size be set as a property > within the Configuration object before passing it to ParquetFileWriter > constructor? > > François > > On Wed, Mar 22, 2017 at 11:55 PM, Padma Penumarthy <ppenumar...@mapr.com> > wrote: > > > Yes, seems like it is possible to create files with different block > sizes. > > We could potentially pass the configured store.parquet.block-size to the > > create call. > > I will try it out and see. will let you know. > > > > Thanks, > > Padma > > > > > > > On Mar 22, 2017, at 4:16 PM, François Méthot <fmetho...@gmail.com> > > wrote: > > > > > > Here are 2 links I could find: > > > > > > http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/ > > apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop. > > fs.Path,%20boolean,%20int,%20short,%20long) > > > > > > http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/ > > apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop. > > fs.Path,%20boolean,%20int,%20short,%20long) > > > > > > Francois > > > > > > On Wed, Mar 22, 2017 at 4:29 PM, Padma Penumarthy < > ppenumar...@mapr.com> > > > wrote: > > > > > >> I think we create one file for each parquet block. > > >> If underlying HDFS block size is 128 MB and parquet block size is > > > >> 128MB, > > >> it will create more blocks on HDFS. > > >> Can you let me know what is the HDFS API that would allow you to > > >> do otherwise ? > > >> > > >> Thanks, > > >> Padma > > >> > > >> > > >>> On Mar 22, 2017, at 11:54 AM, François Méthot <fmetho...@gmail.com> > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> Is there a way to force Drill to store CTAS generated parquet file > as a > > >>> single block when using HDFS? Java HDFS API allows to do that, files > > >> could > > >>> be created with the Parquet block-size. > > >>> > > >>> We are using Drill on hdfs configured with block size of 128MB. > > Changing > > >>> this size is not an option at this point. > > >>> > > >>> It would be ideal for us to have single parquet file per hdfs block, > > >> setting > > >>> store.parquet.block-size to 128MB would fix our issue but we end up > > with > > >> a > > >>> lot more files to deal with. > > >>> > > >>> Thanks > > >>> Francois > > >> > > >> > > > > >