[
https://issues.apache.org/jira/browse/PHOENIX-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-6791:
----------------------------------
Fix Version/s: 5.4.0
(was: 5.3.0)
> WHERE optimizer redesign
> ------------------------
>
> Key: PHOENIX-6791
> URL: https://issues.apache.org/jira/browse/PHOENIX-6791
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Priority: Major
> Fix For: 5.4.0
>
>
> The WHERE optimizer in Phoenix derives the information about which row key
> ranges to be scanned from the primary key (PK) column expressions in a where
> clause. These key ranges are then used to determine the table regions to scan
> and generate a SkipScanFilter for each of these scans if applicable.
> The WHERE expression may include non-PK column (sub) expressions. After
> identifying the key ranges, the WHERE optimizer removes the nodes for PK
> columns from the expression tree if these nodes are fully used to determine
> the key ranges.
> Since the values in the WHERE expression are expressed by byte arrays, the
> key ranges are also expressed using byte arrays. KeyRange represents a range
> for a row key or any sub part of a row key key. A key range is composed of
> two pairs, one for each end of the range, lower and upper. The pair is formed
> from a byte array and a boolean value. The boolean value indicates if the end
> of the range specified by the byte array is inclusive or not. If the byte
> array is empty, it means that the corresponding end of the range is
> unbounded.
> KeySlot represents a key part and the list of key ranges for this key part
> where a key part can be any sub part of a PK, including leading, trailing, or
> middle part of the key. The number of columns in a key part is called span.
> For the terminal nodes (i..e, constant values) in the expression tree,
> KeySlot objects are created with a single key range. When KeySlot objects are
> rolled up in the expression tree, they can have multiple ranges. For example,
> a KeySlot object representing an IN expression will have a separate range for
> each member of the IN expression. Similarly the KeySlot object for an OR
> expression can have multiple ranges similarly. Please note an IN operator can
> be replaced by an equivalent OR expression.
> When the WHERE optimizer visits the nodes of the expression tree, it
> generates a KeySlots object. KeySlots is essentially a list of KeySlot
> objects (please note the difference between KeySlots vs KeySlot). There are
> two types of KeySlots: SingleKeySlot and MultiKeySlot. SingleKeySlot
> represents a single key slot whereas MultiKeySlot is a list of key slots the
> results of AND expression on SingleKeySlot or MultiKeySlot objects.
> The key slots are rolled into a MultiKeySlot object when processing an AND
> expression. The AND operation on two key slots starting their spans with the
> same PK columns is equivalent to taking intersection of their ranges. The OR
> operation implementation is limited and rather simple compared to the AND
> operation. The OR operation attempts to coalesce key slots if all of the key
> slots have the same starting PK column. If not, it generates a null KeySlots.
> When an expression node is used fully in generating a key slot, this
> expression node is removed from the expression tree.
> A row key for a given table can be composed of several PK columns. Without
> any restrictions imposed by predefined rules, intersection of key slots can
> lead to a large number of key slots, i.e., key ranges. For example, consider
> a row key composed of three integer columns, PK1, PK2, and PK3, and the
> expression (PK1, PK2) > (100, 25) AND PK3 = 5. The result would be a very
> large number of key slots and each key slot represents a point in the three
> dimensional space, including (100, 26, 5), (100, 27, 5), …, (100, 2147483647,
> 5), (101, 1, 5), (101, 2, 5), … .
> A simple expression (like the one given above) with a relatively small number
> of PK columns and a simple data type, e.g., integer, is sufficient to show
> that finding key ranges for an arbitrary expression is an intractable
> problem. Attempting to optimize the queries by enumerating the key ranges can
> lead to excessive memory allocation and long computation times and the
> optimization can defeat its purpose.
> The current implementation attempts to enumerate all possible key ranges in
> general. Because of this, the WHERE optimizer has caused out of memory
> issues, and query timeouts due to high CPU usage. The very recent bug fixes
> attempts to catch these cases and prevent them. However, these fixes do not
> attempt to cover all cases and are formulated based on known cases.
> In addition to inefficient resource utilization, there are known types of
> expressions, the current implementation still returns wrong results for them.
> For example, please see PHOENIX-6669 where if degenerate queries are caused
> by some conditions on non-leading PK columns, then Phoenix cannot catch this
> and can return wrong results.
> An example to show inconsistencies in the implementation is as follows. An
> RVC expression can be converted to an equivalent AND/OR expression. For
> example, (PK1, PK2) > (A, B) is equivalent to (PK1 > A) OR (PK1 = A AND PK2 >
> B). The implementation converts the first expression into a single key range
> and thus a scan with the start and stop rows keys without a filter. However,
> the implementation cannot do the same for the second expression and instead
> it generates a scan with a filter for the expression without generating a key
> range.
> Due to tens of possibly conflicting bug fixes over the years and not having a
> document that clearly describes the design, the current implementation has
> become hard to understand and maintain.
> The WHERE optimizer redesign will be formulated based on the following
> observations:
> # As described in the previous section, attempting to enumerate the PK
> ranges over arbitrary expression is an intractable problem due to key range
> explosion. Since identifying key ranges is just for the optimization but not
> for the correctness of queries, the cost of optimization should justify the
> gain from the optimization.
> # The optimization gain comes from first skipping table regions and then
> skipping rows within table regions. In practice, the most gain comes from the
> most significant leading PK columns. The optimization is not useful if the
> first leading PK column is not included in a WHERE expression. The value of
> the optimization decreases with the subsequent PK columns.
> The objectives of the redesign are as follows:
> # The space and time complexity of the WHERE optimizer should not be more
> than O(N2).
> # The redesign should be provably correct. This requires constructing a
> mathematical system with well defined elements and operations.
> # The redesign should generate the same result for the expressions that are
> logically equivalent.
> # The redesign should lead to significantly simpler implementation. This can
> be achieved using well defined and clearly separated operations and concepts.
> # The scope of the redesign will be limited to the WHERE optimizer and so
> the changes will mostly be limited to where optimizer and Expression classes.
> For example, this redesign does not attempt to change the skip scan filter
> design or the WHERE compiler.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)