Re: Single Hdfs block per parquet file

François Méthot Wed, 22 Mar 2017 16:17:05 -0700

Here are 2 links I could find:

http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)


http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)

Francois

On Wed, Mar 22, 2017 at 4:29 PM, Padma Penumarthy <ppenumar...@mapr.com>
wrote:

> I think we create one file for each parquet block.
> If underlying HDFS block size is 128 MB and parquet block size  is  >
> 128MB,
> it will create more blocks on HDFS.
> Can you let me know what is the HDFS API that would allow you to
> do otherwise ?
>
> Thanks,
> Padma
>
>
> > On Mar 22, 2017, at 11:54 AM, François Méthot <fmetho...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Is there a way to force Drill to store CTAS generated parquet file as a
> > single block when using HDFS? Java HDFS API allows to do that, files
> could
> > be created with the Parquet block-size.
> >
> > We are using Drill on hdfs configured with block size of 128MB. Changing
> > this size is not an option at this point.
> >
> > It would be ideal for us to have single parquet file per hdfs block,
> setting
> > store.parquet.block-size to 128MB would fix our issue but we end up with
> a
> > lot more files to deal with.
> >
> > Thanks
> > Francois
>
>

Re: Single Hdfs block per parquet file

Reply via email to