You would need to add a new codec to the Impala source tree. The codecs are implemented in be/src/util/codec.h, be/src/util/compress.h and be/src/util/decompress.h. There are a few other places you may need to change. I would just "git grep -i gzip" to see how the gzip codec is implemented.
For compressed text files you would also need to add support to the frontend, e.g. in fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java I'm also not sure if there are any licensing issues here since the XZ library is GPL licensed. On Sat, Jun 10, 2017 at 5:41 PM, 孙清孟 <sqm2...@gmail.com> wrote: > I have added lzma codec (hadoop-xz) to parquet(modify the parquet-format > and parquet-mr) for hive, and get a higher compression ratio. > > But how add a new codec for Impala? >