[jira] [Commented] (IMPALA-11805) Codegen cache size estimation is less than the actual allocation

ASF subversion and git services (Jira) Tue, 19 Dec 2023 12:48:04 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798734#comment-17798734
 ]


ASF subversion and git services commented on IMPALA-11805:
----------------------------------------------------------

Commit f93bd986214e390375a34199c205537e440e2b25 in impala's branch 
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f93bd9862 ]

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, and after the compilation of the module, the
compiled codegened functions of the module would be set to the
execution engine, therefore functions could be used by Impala.
During function compilation within the module, if an ObjectCache
is set to the execution engine, the compiled codegened functions
would be also written into the cache. This way, if we keep the
cache, when revisiting the same module (fragment), we can
efficiently reuse the specific ObjectCache, loading pre-compiled
codegened functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(1)  | parquet / none / none | 0.22    | -0.65%     | 0.20       | -0.75% 
        |
+----------+-----------------------+---------+------------+------------+----------------+
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(1)  | TPCH-Q13 | parquet / none / none | 0.49   | 0.47        |   +2.80% 
  |   5.32%    |   5.07%        | 10    |   +1.22%       | 1.63    | 1.19  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.16   | 0.16        |   +3.51% 
  |   1.32%    | * 10.38% *     | 10    |   +0.06%       | 0.49    | 1.06  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.12   | 0.12        |   +1.39% 
  |   2.27%    |   2.24%        | 10    |   +1.50%       | 1.90    | 1.37  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.21   | 0.21        |   +1.56% 
  | * 10.02% * | * 11.42% *     | 10    |   +1.18%       | 0.57    | 0.32  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 0.27   | 0.27        |   +1.71% 
  |   6.46%    |   1.29%        | 10    |   -0.19%       | -1.19   | 0.81  |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.11   | 0.11        |   +0.79% 
  |   2.76%    |   2.15%        | 10    |   +0.10%       | 1.46    | 0.71  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.26   | 0.26        |   +0.71% 
  |   6.63%    |   6.18%        | 10    |   +0.04%       | 0.49    | 0.25  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 0.17   | 0.17        |   +0.41% 
  | * 14.66% * | * 13.01% *     | 10    |   +0.05%       | 0.40    | 0.07  |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.16   | 0.16        |   +0.19% 
  |   1.41%    |   1.39%        | 10    |   +0.25%       | 1.46    | 0.31  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.17   | 0.17        |   +0.22% 
  |   1.70%    |   1.77%        | 10    |   -0.05%       | -0.40   | 0.28  |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.16   | 0.16        |   -0.27% 
  |   0.54%    |   1.46%        | 10    |   +0.14%       | 0.93    | -0.54 |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.11   | 0.11        |   -0.38% 
  |   0.81%    |   2.06%        | 10    |   +0.03%       | 0.22    | -0.54 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.17   | 0.17        |   -0.38% 
  |   0.67%    |   1.58%        | 10    |   -0.01%       | -0.13   | -0.70 |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.27   | 0.27        |   -0.08% 
  |   1.24%    |   1.15%        | 10    |   -0.33%       | -1.37   | -0.15 |
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.16   | 0.16        |   -1.18% 
  | * 16.61% * | * 10.25% *     | 10    |   +0.33%       | 0.40    | -0.19 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.22   | 0.22        |   -1.67% 
  |   1.62%    |   7.45%        | 10    |   +0.43%       | 1.02    | -0.70 |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.22   | 0.22        |   -0.98% 
  |   0.22%    |   1.55%        | 10    |   -0.26%       | -2.16   | -1.97 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 0.48   | 0.49        |   -1.18% 
  |   3.58%    |   4.40%        | 10    |   -0.25%       | -1.19   | -0.66 |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.26   | 0.26        |   -1.93% 
  |   7.84%    |   6.24%        | 10    |   -0.14%       | -0.13   | -0.62 |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.18   | 0.19        |   -3.31% 
  | * 11.47% * | * 12.47% *     | 10    |   -0.25%       | -1.72   | -0.63 |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 0.34   | 0.35        |   -5.22% 
  |   6.87%    | * 10.03% *     | 10    |   -2.15%       | -1.28   | -1.38 |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.16   | 0.18        |   
-11.00%  | * 16.07% * |   3.84%        | 10    |   -0.90%       | -1.81   | 
-2.35 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

We are no longer using ExecutionEngine for caching, so we got rid of
the LlvmExecutionEngineWrapper class. Instead, we brought in a new
class CodeGenObjectCache to implement llvm::ObjectCache.

Testing:
Passed LlvmCodeGenCacheTest and custom_cluster/test_codegen_cache.py.

Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Reviewed-on: http://gerrit.cloudera.org:8080/20733
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Codegen cache size estimation is less than the actual allocation
> ----------------------------------------------------------------
>
>                 Key: IMPALA-11805
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11805
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.3.0
>            Reporter: Yida Wu
>            Assignee: Yida Wu
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> In IMPALA-11470, we implement the cache for codegen functions, however, the 
> expected size of a cache entry is much less than the actual allocation, 
> according to the data in tcmalloc memory tracker. This could lead to a result 
> of unexpected query failure when the memory tracker hits the capacity.
> The current way to estimate the memory consumption of a codegen cache entry, 
> mainly the memory consumption of a llvm::ExecutionEngine that stored in each 
> entry, is to use the customized ImpalaMCJITMemoryManager 
> [https://github.com/apache/impala/blob/f705496e34ac474e8e1c999619e3b928c5e39e0f/be/src/codegen/mcjit-mem-mgr.h#L60],
>  to accumulated bytes when the execution engine allocates code or data 
> section. However in fact, the actual bytes allocated by the execution engine 
> could be much larger.
> Tested in tpch and tpcds queries, in normal mode, the final consumption could 
> be 3~4 times of the estimation, and it would be worse in the optimal mode, 
> because the main difference is between the memory_manager_->bytes_allocated() 
> and the actual execution engine allocation, and in normal mode it contains 
> the size of the key, which is accurate.
> When the execution engine is only existing a short period in runtime, the 
> issue isn't that bad. However, when it becomes a part of the long-living 
> cache entry, it could cause more problems by consuming much more memory than 
> estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11805) Codegen cache size estimation is less than the actual allocation

Reply via email to