[jira] [Updated] (PHOENIX-6791) WHERE optimizer redesign

Viraj Jasani (Jira) Thu, 02 Oct 2025 18:15:34 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated PHOENIX-6791:
----------------------------------
    Fix Version/s: 5.4.0
                       (was: 5.3.0)

> WHERE optimizer redesign
> ------------------------
>
>                 Key: PHOENIX-6791
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6791
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Priority: Major
>             Fix For: 5.4.0
>
>
> The WHERE optimizer in Phoenix derives the information about which row key 
> ranges to be scanned from the primary key (PK) column expressions in a where 
> clause. These key ranges are then used to determine the table regions to scan 
> and generate a SkipScanFilter for each of these scans if applicable. 
> The WHERE expression may include non-PK column (sub) expressions. After 
> identifying the key ranges, the WHERE optimizer removes the nodes for PK 
> columns from the expression tree if these nodes are fully used to determine 
> the key ranges.
> Since the values in the WHERE expression are expressed by byte arrays, the 
> key ranges are also expressed using byte arrays. KeyRange represents a range 
> for a row key or any sub part of a row key key. A key range is composed of 
> two pairs, one for each end of the range, lower and upper. The pair is formed 
> from a byte array and a boolean value. The boolean value indicates if the end 
> of the range specified by the byte array is inclusive or not. If the byte 
> array is empty, it means that the corresponding end of the range is 
> unbounded. 
> KeySlot represents a key part and the list of key ranges for this key part 
> where a key part can be any sub part of a PK, including leading, trailing, or 
> middle part of the key. The number of columns in a key part is called span. 
> For the terminal nodes (i..e, constant values) in the expression tree, 
> KeySlot objects are created with a single key range. When KeySlot objects are 
> rolled up in the expression tree, they can have multiple ranges. For example, 
> a KeySlot object representing an IN expression will have a separate range for 
> each member of the IN expression. Similarly the KeySlot object for an OR 
> expression can have multiple ranges similarly. Please note an IN operator can 
> be replaced by an equivalent OR expression. 
> When the WHERE optimizer visits the nodes of the expression tree, it 
> generates a KeySlots object. KeySlots is essentially a list of KeySlot 
> objects (please note the difference between KeySlots vs KeySlot). There are 
> two types of KeySlots: SingleKeySlot and MultiKeySlot. SingleKeySlot 
> represents a single key slot whereas MultiKeySlot is a list of key slots the 
> results of AND expression on SingleKeySlot or MultiKeySlot objects. 
> The key slots are rolled into a MultiKeySlot object when processing an AND 
> expression. The AND operation on two key slots starting their spans with the 
> same PK columns is equivalent to taking intersection of their ranges. The OR 
> operation implementation is limited and rather simple compared to the AND 
> operation. The OR operation attempts to coalesce key slots if all of the key 
> slots have the same starting PK column. If not, it generates a null KeySlots. 
> When an expression node is used fully in generating a key slot, this 
> expression node is removed from the expression tree.
> A row key for a given table can be composed of several PK columns. Without 
> any restrictions imposed by predefined rules, intersection of key slots can 
> lead to a large number of key slots, i.e., key ranges.  For example, consider 
> a row key composed of three integer columns, PK1, PK2, and PK3, and the 
> expression (PK1,  PK2) > (100, 25) AND PK3 = 5. The result would be a very 
> large number of key slots and each key slot represents a point in the three 
> dimensional space, including (100, 26, 5), (100, 27, 5), …, (100, 2147483647, 
> 5), (101, 1, 5), (101, 2, 5), … .
> A simple expression (like the one given above) with a relatively small number 
> of PK columns and a simple data type, e.g., integer, is sufficient to show 
> that finding key ranges for an arbitrary expression is an intractable 
> problem. Attempting to optimize the queries by enumerating the key ranges can 
> lead to excessive memory allocation and long computation times and the 
> optimization can defeat its purpose. 
> The current implementation attempts to enumerate all possible key ranges in 
> general. Because of this, the WHERE optimizer has caused out of memory 
> issues, and query timeouts due to high CPU usage. The very recent bug fixes 
> attempts to catch these cases and prevent them. However, these fixes do not 
> attempt to cover all cases and are formulated based on known cases.
> In addition to inefficient resource utilization, there are known types of 
> expressions, the current implementation still returns wrong results for them. 
>  For example, please see PHOENIX-6669 where if degenerate queries are caused 
> by some conditions on non-leading PK columns, then Phoenix cannot catch this 
> and can return wrong results.
> An example to show inconsistencies in the implementation is as follows. An 
> RVC expression can be converted to an equivalent AND/OR expression. For 
> example, (PK1, PK2) > (A, B) is equivalent to (PK1 > A) OR (PK1 = A AND PK2 > 
> B). The implementation converts the first expression into a single key range 
> and thus a scan with the start and stop rows keys without a filter. However, 
> the implementation cannot do the same for the second expression and instead 
> it generates a scan with a filter for the expression without generating a key 
> range.
> Due to tens of possibly conflicting bug fixes over the years and not having a 
> document that clearly describes the design, the current implementation has 
> become hard to understand and maintain. 
> The WHERE optimizer redesign will be formulated based on the following 
> observations:
>  # As described in the previous section, attempting to enumerate the PK 
> ranges over arbitrary expression is an intractable problem due to key range 
> explosion. Since identifying key ranges is just for the optimization but not 
> for the correctness of queries, the cost of optimization should justify the 
> gain from the optimization.
>  # The optimization gain comes from first skipping table regions and then 
> skipping rows within table regions. In practice, the most gain comes from the 
> most significant leading PK columns. The optimization is not useful if the 
> first leading PK column is not included in a WHERE expression. The value of 
> the optimization decreases with the subsequent PK columns. 
> The objectives of the redesign are as follows:
>  # The space and time complexity of the WHERE optimizer should not be more 
> than O(N2).
>  # The redesign should be provably correct. This requires constructing a 
> mathematical system with well defined elements and operations. 
>  # The redesign should generate the same result for the expressions that are 
> logically equivalent. 
>  # The redesign should lead to significantly simpler implementation. This can 
> be achieved using well defined and clearly separated operations and concepts.
>  # The scope of the redesign will be limited to the WHERE optimizer and so 
> the changes will mostly be limited to where optimizer and Expression classes. 
> For example, this redesign does not attempt to change the skip scan filter 
> design or the WHERE compiler. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-6791) WHERE optimizer redesign

Reply via email to