[VOTE][APE] Query Plan Cache

Glenn Justo Galvizo Thu, 07 Dec 2023 14:21:53 -0800

Every time a query is issued to AsterixDB, the query must undergo compilation. 
If the same query is run repeatedly, this query must be recompiled each and 
every time. A query plan cache can help AsterixDB achieve a lower floor on the 
end-to-end time by storing the job specifications for previously compiled 
queries, ultimately skipping the AST rewriting and Algebricks compilation of a 
previously executed query.


(APE copied from contributor Sushrut Borkor)

This APE is about adding a query plan cache to AsterixDB. More specifically, 
this query plan cache acts as a hash table that skips 1) the AST rewriting, 2) 
the entire Algebricks plan translation to Algebricks optimization, and 3) the 
Hyracks job generation. The keys of this hash table are:
    • AST String. We cache this instead of the original query string before 
parsing because it is resilient to minor changes in the query, such as adding 
spaces or empty lines.
    • SessionConfig. For example, if the user runs a query, changes part of the 
session configuration (e.g. the preferred output format), and reruns the query, 
this prevents the second query from being served from the cache.
    • Config, to capture the effects of used SET statements.
    • Active Dataverse, e.g., as defined in a USE statement.
    • Result Set ID, which distinguishes among queries in multi-statement 
requests.

While the values of each hash table entry are:
    • Hyracks Job Spec to be submitted to Hyracks.
    • Cached warnings. Since we skip compilation when serving queries from the 
cache, we cannot detect compile time warnings. To get around this, we cache 
warnings issued during rewriting and compilation, and then reissue them for 
cache hits. As a result, line numbers in warnings may be incorrect for queries 
answered using the cache.
    • Lock. Since running the same job from multiple threads does not work, we 
include a lock in the cache value. To use a cached job spec, a thread must 
acquire this lock, and then release it after the job has finished running. If 
the lock is held by another thread, we recompile the query instead of blocking.

The proposed changes are the following:

Interface:
We introduce two new statements for controlling cache access:
    • “SET `compiler.querycache.bypass` "true";” forces the current query to 
ignore the cache.
    • “SET `compiler.querycache.clear` "true";” clears all cache entries. The 
current query may still insert into the cache.
We also add a boolean HTTP API parameter bypass_cache which does the same thing 
as the first SET statement above. Finally, the parameter query.cache.capacity 
can be configured in the [cc] section of the cc.conf file to control the 
maximum cache size before replacement.

Changes:
    • Compilation logic is changed in the source code since we skip rewriting 
and compilation for cache hits.
    • Hints are now included in the AST string to prevent incorrect cache 
lookups that would otherwise miss the hints.
    • A bug is fixed where the AST string of WINDOW expressions did not include 
FROM LAST or IGNORE NULLS.

See https://issues.apache.org/jira/projects/ASTERIXDB/issues/ASTERIXDB-3183 for 
the JIRA issue, as well as 
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+2%3A+Query+Plan+Cache 
for more details.

Please vote on this APE. We will keep this open for 72 hours and pass with 
either 3 votes or a majority of positive votes.

[VOTE][APE] Query Plan Cache

Reply via email to