On 04/13/2015 03:47 PM, Tianqi Tong wrote:
Hi Ryan,
Then back to the original topic: it should be okay if I break a Parquet file
into multiple HDFS blocks, right?
Because when I was querying via Impala, there's a warning like: Parquet file
should not be split into multiple hdfs-blocks.
Thanks!
Tianqi
It is fine to write data as multiple blocks, but Impala performance will
be better if you keep data in a single block for now. This is something
that the Impala team is working on.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.