[
https://issues.apache.org/jira/browse/IMPALA-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050056#comment-18050056
]
Arnab Karmakar commented on IMPALA-13491:
-----------------------------------------
Thanks [~stigahuang]
I just want one final clarification before I start working on this:
Since you mentioned that we already have a flag *num_metadata_loading_threads*
(default val 16) that takes care of limiting no of parallel threads in
TableLoadingMgr thread pool.
I'll introduce a new flag *catalog_max_parallel_load_operations* that controls
total concurrent load operations across ALL sources (affects background loads +
DDL/DML-triggered loads + REFRESH commands).
What should be the default value for {*}catalog_max_parallel_load_operations{*}?
We can keep it equal to *16* but there can be contention between the background
loads and DDL/DML triggered loads, keeping it to *32* is generous but that
could consume memory. Its configurable of course.
> Add config on catalogd for controlling the number of concurrent
> loading/refresh commands
> ----------------------------------------------------------------------------------------
>
> Key: IMPALA-13491
> URL: https://issues.apache.org/jira/browse/IMPALA-13491
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Manish Maheshwari
> Assignee: Arnab Karmakar
> Priority: Critical
>
> When running Table Loading or Refresh commands, catalogd requires working
> memory in proportion to the number of tables been refreshed. While we have a
> table level lock, we dont have a config to control concurrent load/refresh
> operations.
> In case of customers that run refresh in parallel in multiple threads, the
> number of load/refresh command can cause OOM on the catalog due to running
> out of working memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]