Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22559 )

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG@28
PS1, Line 28: Choose 4000000 as the default value for this new flag to leave 
some
> Please clarify if the size of the file descriptor is constant in your assum
That's a good point. The majority of the size of a file descriptor is the file 
name. Block locations are just host & disk indexes in integers so have trivial 
size. I'm using file names like 
"part-00001-53ba34b7-2285-4f3a-8d99-492f87e1fedc-f724dc37-964d-4ea5-afde-7754fc758e39.txt"
 which is the format for files generated by Spark. I think that's already long 
enough. Files generated by Impala have names using the query id and a numeric 
string, e.g. "cf4a7a47ca0b6b1c-6094b4b600000004_1614373804_data.0.parq" which 
are shorter. Files generated by Hive have names like "bucket_00000_0" which are 
even shorter. So I think 4M files is safe if the table doesn't have incremental 
stats.

To be more accurate, we need to consider the size of partition-level 
tblproperties which store the incremental stats and customized key-values.

For performance, it seems it's dominant by GC pause time in such a large scale. 
Tried to exclude the GC pause time, here are the time spent in catalogd side 
corresponding to the response size:
* 371.71MB: 1s487ms
* 744.51MB: 4s035ms
* 1.09GB: 6s643ms

It seems smaller response size is better. But it requires more round-trips 
between impalad and catalogd. Need more tests on this.



--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Fri, 28 Feb 2025 12:13:55 +0000
Gerrit-HasComments: Yes

Reply via email to