Every time a query is issued to AsterixDB, the query must undergo compilation.
If the same query is run repeatedly, this query must be recompiled each and
every time. A query plan cache can help AsterixDB achieve a lower floor on the
end-to-end time by storing the job specifications for previously compiled
queries, ultimately skipping the AST rewriting and Algebricks compilation of a
previously executed query.
(APE copied from contributor Sushrut Borkor)
This APE is about adding a query plan cache to AsterixDB. More specifically,
this query plan cache acts as a hash table that skips 1) the AST rewriting, 2)
the entire Algebricks plan translation to Algebricks optimization, and 3) the
Hyracks job generation. The keys of this hash table are:
• AST String. We cache this instead of the original query string before
parsing because it is resilient to minor changes in the query, such as adding
spaces or empty lines.
• SessionConfig. For example, if the user runs a query, changes part of the
session configuration (e.g. the preferred output format), and reruns the query,
this prevents the second query from being served from the cache.
• Config, to capture the effects of used SET statements.
• Active Dataverse, e.g., as defined in a USE statement.
• Result Set ID, which distinguishes among queries in multi-statement
requests.
While the values of each hash table entry are:
• Hyracks Job Spec to be submitted to Hyracks.
• Cached warnings. Since we skip compilation when serving queries from the
cache, we cannot detect compile time warnings. To get around this, we cache
warnings issued during rewriting and compilation, and then reissue them for
cache hits. As a result, line numbers in warnings may be incorrect for queries
answered using the cache.
• Lock. Since running the same job from multiple threads does not work, we
include a lock in the cache value. To use a cached job spec, a thread must
acquire this lock, and then release it after the job has finished running. If
the lock is held by another thread, we recompile the query instead of blocking.
The proposed changes are the following:
Interface:
We introduce two new statements for controlling cache access:
• “SET `compiler.querycache.bypass` "true";” forces the current query to
ignore the cache.
• “SET `compiler.querycache.clear` "true";” clears all cache entries. The
current query may still insert into the cache.
We also add a boolean HTTP API parameter bypass_cache which does the same thing
as the first SET statement above. Finally, the parameter query.cache.capacity
can be configured in the [cc] section of the cc.conf file to control the
maximum cache size before replacement.
Changes:
• Compilation logic is changed in the source code since we skip rewriting
and compilation for cache hits.
• Hints are now included in the AST string to prevent incorrect cache
lookups that would otherwise miss the hints.
• A bug is fixed where the AST string of WINDOW expressions did not include
FROM LAST or IGNORE NULLS.
See https://issues.apache.org/jira/projects/ASTERIXDB/issues/ASTERIXDB-3183 for
the JIRA issue, as well as
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+2%3A+Query+Plan+Cache
for more details.
Please vote on this APE. We will keep this open for 72 hours and pass with
either 3 votes or a majority of positive votes.