Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22559
to look at the new patch set (#9).
Change subject: IMPALA-11402: Add limit on files fetched by a single
getPartialCatalogObject request
......................................................................
IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject
request
For a table with a huge number (e.g. 6M) of files, catalogd might hit
OOM of exceeding the JVM array limit when serializing the response of
a getPartialCatalogObject request for all partitions (thus all files).
This patch adds a new flag, catalog_partial_fetch_max_files, to define
the max number of file descriptors allowed in a response of
getPartialCatalogObject. Catalogd will truncate the response in
partition level when it's too big, and only return a subset of the
requested partitions. Coordinator should send new requests to fetch the
remaining partitions.
Here are some metrics of the number of files in a single response and
the corresponding byte array size and duration of a single response:
* 1000000: 371.71MB, 1s487ms
* 2000000: 744.51MB, 4s035ms
* 3000000: 1.09GB, 6s643ms
* 4000000: 1.46GB, duration not measured due to GC pauses
* 5000000: 1.82GB, duration not measured due to GC pauses
* 6000000: >2GB (hit OOM)
Choose 1000000 as the default value for now. We can tune it in the
future.
Tests:
- Added custom-cluster test
- Ran e2e tests in local-catalog mode with
catalog_partial_fetch_max_files=1000 so the new codes are used.
Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_local_catalog.py
7 files changed, 161 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/22559/9
--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>