Yanjia Gary Li has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14711 )

Change subject: IMPALA-8778: Support Apache Hudi Read Optimized Table
......................................................................


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14711/22/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/14711/22/testdata/data/README@489
PS22, Line 489: 
`ca51fa17-681b-4497-85b7-4f68e7a63ee7-0_1-5-10_20200112194517.parquet`
              : `ca51fa17-681b-4497-85b7-4f68e7a63ee7-0` is the bloom index 
hash of this file
              : `20200112194517` is the timestamp of this version
> I agree with Csaba, and it seems we can easily make the file sizes smaller.
Thanks for pointing this out.
I definitely agree here. Those parquet files are generated by a test in Hudi 
and the bloom.num_entries was set as default 60000. I am not familiar with the 
indexing part of Hudi's code so I am not sure if this is using any built-in 
bloom filter feature of PARQUET. But reducing this number to 100 will makes 
each parquet file to ~10KB. If this size is acceptable then I will update those 
files.



--
To view, visit http://gerrit.cloudera.org:8080/14711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Gerrit-Change-Number: 14711
Gerrit-PatchSet: 22
Gerrit-Owner: Yanjia Gary Li <yanjia.gary...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Yanjia Gary Li <yanjia.gary...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 21:48:56 +0000
Gerrit-HasComments: Yes

Reply via email to