Re: [PR] PHOENIX-6791 WHERE optimizer redesign [phoenix]

via GitHub Tue, 05 May 2026 12:04:07 -0700


Copilot commented on code in PR #2428:
URL: https://github.com/apache/phoenix/pull/2428#discussion_r3190887178



##########
phoenix-core-client/src/main/java/org/apache/phoenix/compile/StatementContext.java:
##########
@@ -309,6 +310,20 @@ public void setScanRanges(ScanRanges scanRanges) {
     scanRanges.initializeScan(scan);
   }
 
+  /**
+   * V2-owned metadata attached by the V2 scan-construction path; null under 
the V1 path
+   * ({@code WHERE_OPTIMIZER_V2_ENABLED=false}). Consumers (currently the 
explain-plan
+   * formatter) prefer this when present; others are unaffected.
+   */
+  public org.apache.phoenix.compile.keyspace.scan.V2ScanArtifact 
getV2ScanArtifact() {
+    return this.v2ScanArtifact;
+  }
+
+  public void setV2ScanArtifact(
+    org.apache.phoenix.compile.keyspace.scan.V2ScanArtifact artifact) {
+    this.v2ScanArtifact = artifact;

Review Comment:
   scanRanges and v2ScanArtifact can now diverge because setScanRanges() never 
clears or updates the artifact. Any later setScanRanges(...) on the same 
StatementContext—such as switching to EVERYTHING/NOTHING or replacing the scan 
shape during planning—will leave ExplainTable formatting key ranges from stale 
V2 metadata.



##########
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##########
@@ -792,6 +792,76 @@ public int compare(QueryPlan plan1, QueryPlan plan2) {
     return stopAtBestPlan ? bestCandidates.subList(0, 1) : bestCandidates;
   }
 
+  /**
+   * Variant of {@link 
org.apache.phoenix.compile.ScanRanges#getBoundPkColumnCount} for the cost
+   * comparator: only honors the extra span reported by {@code slotSpan[i]} 
when the slot's ranges
+   * genuinely span multiple PK columns via byte-level encoding. A slot whose 
only ranges are
+   * non-point finite-bounded scalars (e.g. {@code BETWEEN a AND b} on a 
single PK col) under V2
+   * can carry an artificial {@code slotSpan[i] > 0} set by
+   * {@link 
org.apache.phoenix.compile.keyspace.KeyRangeExtractor#emitV1Projection} as a
+   * V1-compat marker for {@link org.apache.phoenix.filter.SkipScanFilter}'s 
per-region cursor
+   * stepping. That marker is needed for SkipScanFilter correctness on 
CDC-style scans but
+   * would inflate the PK-col count the cost comparator ranks plans by, 
letting a plan with a
+   * narrow bounded range on one PK col tie with a plan whose compound range 
genuinely covers
+   * multiple PK cols. Tie on primary key falls through to weaker tiebreakers 
that flip the
+   * plan choice (see CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3 
and its
+   * Upsert/Delete variants).
+   * <p>
+   * Rule: {@code slotSpan[i]} is honored unless every range in the slot is a 
non-point
+   * finite-bounded scalar. Point keys (single-key compound bytes encoding 
multiple PK cols,
+   * e.g. from RVC-IN or tuple equality) and ranges with an UNBOUND side 
(where trailing cols
+   * inherit the unbounded side's min/max at the byte level) both represent 
genuine multi-col
+   * narrowing and are counted as {@code slotSpan[i]+1}.
+   */
+  private static int effectiveBoundPkColumnCount(
+      org.apache.phoenix.compile.ScanRanges scanRanges) {
+    java.util.List<java.util.List<org.apache.phoenix.query.KeyRange>> ranges =
+      scanRanges.getRanges();
+    int[] slotSpan = scanRanges.getSlotSpans();
+    int count = 0;
+    boolean hasUnbound = false;
+    int nRanges = ranges.size();
+    for (int i = 0; i < nRanges && !hasUnbound; i++) {
+      java.util.List<org.apache.phoenix.query.KeyRange> orRanges = 
ranges.get(i);
+      boolean slotHasUnbound = false;
+      boolean slotAllBoundedNonPoint = true;
+      for (org.apache.phoenix.query.KeyRange range : orRanges) {
+        if (range == org.apache.phoenix.query.KeyRange.EVERYTHING_RANGE) {
+          return count;
+        }
+        if (range.isUnbound()) {
+          slotHasUnbound = true;
+          hasUnbound = true;
+        }
+        if (range.isSingleKey() || range.isUnbound()) {
+          slotAllBoundedNonPoint = false;
+        }
+      }
+      int span = (slotSpan != null && i < slotSpan.length) ? slotSpan[i] : 0;
+      // V2-only artificial extension: {@code 
KeyRangeExtractor.emitV1Projection} bumps
+      // {@code slotSpan[last]} to span trailing unconstrained PK cols for
+      // {@link SkipScanFilter} cursor stepping, even when the slot holds a 
single-column
+      // finite-bounded scalar range whose bytes don't actually encode 
trailing cols.
+      // For cost ranking we want to count only columns genuinely narrowed by 
the scan
+      // bounds, so discount that shape to 1.
+      //
+      // Discriminator: the extension path ALWAYS has >=2 slots (at least one 
leading
+      // concretely-narrowed slot plus the last bumped slot). A single-slot 
shape with
+      // {@code slotSpan > 0} is the compound-emission path — a genuine 
multi-col range
+      // whose bytes concatenate {@code span+1} columns — and keeps {@code 
span + 1}.
+      // See CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3 
(multi-slot shape
+      // must discount) vs. 
RowTimestampIT.testAutomaticallySettingRowTimestampWith*
+      // (single-slot compound must not discount).
+      boolean isArtificialExtension = nRanges > 1 && i == nRanges - 1;

Review Comment:
   The 'ALWAYS has >=2 slots' assumption is too strong. The V1-compat slotSpan 
extension can also appear when there is only one emitted range slot followed by 
unconstrained trailing PK columns, so single-slot plans can still be artificial 
here. Those scans will be over-counted as genuinely multi-column and can still 
skew plan ranking.
   



##########
phoenix-core-client/src/main/java/org/apache/phoenix/compile/keyspace/oracle/Oracle.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile.keyspace.oracle;
+
+/**
+ * Reference implementation (oracle) of the key-space model's key-range 
extraction
+ * algorithm. Given an {@link AbstractExpression} tree over a schema with 
{@code nPk}
+ * primary-key dimensions, produces the {@link AbstractKeySpaceList} the 
algorithm
+ * should emit.
+ * <p>
+ * The purpose is differential testing: we compare the oracle's output against 
the
+ * production {@code WhereOptimizerV2} implementation's {@code KeySpaceList} 
to detect
+ * divergences. Any difference is either a production bug or an oracle bug — 
the oracle
+ * being shorter and directly derived from the design, the default suspect is 
production.

Review Comment:
   This package is explicitly described as differential-test scaffolding, but 
it is being added under src/main/java, which ships it in the client artifact 
and turns the reference model into production package surface. These classes 
should live under test sources or a dedicated test jar instead of the runtime 
module.



##########
phoenix-core-client/src/main/java/org/apache/phoenix/compile/keyspace/KeySpace.java:
##########
@@ -0,0 +1,382 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile.keyspace;
+
+import java.util.Arrays;
+import java.util.Optional;
+
+import org.apache.phoenix.query.KeyRange;
+
+/**
+ * An N-dimensional key space over a table's primary key columns. Each 
dimension is a
+ * {@link KeyRange} over the encoded byte representation of a single PK 
column. An
+ * expression node is modeled as a list of {@code KeySpace} instances; see
+ * {@link KeySpaceList} for the list-level algebra.
+ * <p>
+ * Instances are immutable. {@link #and(KeySpace)} is the per-dimension 
intersection;
+ * {@link #unionIfMergeable(KeySpace)} returns the union when either (a) one 
space contains
+ * the other or (b) the two spaces agree on all but one dimension and the 
differing dim's
+ * ranges are non-disjoint.
+ * <p>
+ * A single-value predicate like {@code PK2 >= 3} on a 3-PK table is 
represented as
+ * {@code [(*,*), [3,*), (*,*)]} — a singleton {@link KeySpaceList} containing 
a single
+ * {@code KeySpace} where every dim not mentioned in the predicate holds
+ * {@link KeyRange#EVERYTHING_RANGE}. RVC inequalities are pre-normalized by
+ * {@link ExpressionNormalizer} into lexicographic AND/OR of scalar 
comparisons so that
+ * per-dim intersection composes correctly with every other predicate; this 
class therefore
+ * never needs to model compound-byte concatenation directly.
+ */
+public final class KeySpace {
+
+  private final KeyRange[] dims;
+  private final boolean empty;
+
+  private KeySpace(KeyRange[] dims, boolean empty) {
+    this.dims = dims;
+    this.empty = empty;
+  }
+
+  public static KeySpace everything(int n) {
+    KeyRange[] dims = new KeyRange[n];
+    Arrays.fill(dims, KeyRange.EVERYTHING_RANGE);
+    return new KeySpace(dims, false);
+  }
+
+  public static KeySpace empty(int n) {
+    KeyRange[] dims = new KeyRange[n];
+    Arrays.fill(dims, KeyRange.EMPTY_RANGE);
+    return new KeySpace(dims, true);
+  }
+
+  public static KeySpace single(int dim, KeyRange r, int n) {
+    if (r == KeyRange.EMPTY_RANGE) {
+      return empty(n);
+    }
+    KeyRange[] dims = new KeyRange[n];
+    Arrays.fill(dims, KeyRange.EVERYTHING_RANGE);
+    dims[dim] = r;
+    return new KeySpace(dims, false);
+  }
+
+  public static KeySpace of(KeyRange[] dims) {
+    boolean empty = false;
+    for (KeyRange r : dims) {
+      if (r == KeyRange.EMPTY_RANGE) {
+        empty = true;
+        break;
+      }
+    }
+    return new KeySpace(dims.clone(), empty);
+  }
+
+  public int nDims() {
+    return dims.length;
+  }
+
+  public KeyRange get(int dim) {
+    return dims[dim];
+  }
+
+  /**
+   * Returns a new {@link KeySpace} identical to this one except with dim 
{@code dim}
+   * replaced by {@code r}. Used by {@link KeySpaceList}'s widening path to 
drop a
+   * trailing dim (by replacing it with {@link KeyRange#EVERYTHING_RANGE}). 
Allocates a
+   * fresh dims array; original is unchanged.
+   */
+  public KeySpace withDimReplaced(int dim, KeyRange r) {
+    if (dims[dim].equals(r)) {
+      return this;
+    }
+    KeyRange[] newDims = dims.clone();
+    newDims[dim] = r;
+    if (r == KeyRange.EMPTY_RANGE) {
+      return empty(dims.length);
+    }
+    return new KeySpace(newDims, false);
+  }
+
+  public boolean isEmpty() {
+    return empty;
+  }
+
+  public boolean isEverything() {
+    if (empty) {
+      return false;
+    }
+    for (KeyRange r : dims) {
+      if (r != KeyRange.EVERYTHING_RANGE) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  /**
+   * Per-dimension intersection. If any dim collapses to {@link 
KeyRange#EMPTY_RANGE}, the
+   * result is {@link #empty(int)}.
+   */
+  public KeySpace and(KeySpace other) {
+    requireSameArity(other);
+    if (this.empty || other.empty) {
+      return empty(dims.length);
+    }
+    KeyRange[] newDims = new KeyRange[dims.length];
+    for (int i = 0; i < dims.length; i++) {
+      KeyRange a = this.dims[i];
+      KeyRange b = other.dims[i];
+      KeyRange inter = intersectRange(a, b);
+      if (inter == KeyRange.EMPTY_RANGE) {
+        return empty(dims.length);
+      }
+      newDims[i] = inter;
+    }
+    return new KeySpace(newDims, false);
+  }
+
+  /**
+   * Per-dim intersection that special-cases {@link KeyRange#EVERYTHING_RANGE} 
against
+   * {@link KeyRange#IS_NULL_RANGE} / {@link KeyRange#IS_NOT_NULL_RANGE}. Plain
+   * {@link KeyRange#intersect} treats EVERYTHING ∩ IS_NULL as EMPTY because 
IS_NULL uses
+   * an empty-byte-array sentinel that coincides with the EVERYTHING 
representation.
+   */
+  private static KeyRange intersectRange(KeyRange a, KeyRange b) {
+    if (a == KeyRange.EVERYTHING_RANGE) {
+      return b;
+    }
+    if (b == KeyRange.EVERYTHING_RANGE) {
+      return a;
+    }
+    return a.intersect(b);
+  }
+
+  /**
+   * Returns the union of {@code this} and {@code other} as a single {@code 
KeySpace} when
+   * one of the two merge rules applies; otherwise {@link Optional#empty()}.
+   * <ul>
+   * <li>Rule 1 (containment): one space is fully contained in the other; 
return the larger.</li>
+   * <li>Rule 2 (adjacent boxes): agreeing on all-but-one dim and the 
remaining dim's ranges
+   * overlap or are adjacent; return the space with the merged dim's 
range.</li>
+   * </ul>
+   */
+  public Optional<KeySpace> unionIfMergeable(KeySpace other) {
+    requireSameArity(other);
+    if (this.empty) {
+      return Optional.of(other);
+    }
+    if (other.empty) {
+      return Optional.of(this);
+    }
+    if (this.equals(other)) {
+      return Optional.of(this);
+    }
+    if (contains(other)) {
+      return Optional.of(this);
+    }
+    if (other.contains(this)) {
+      return Optional.of(other);
+    }
+    int diffDim = -1;
+    for (int i = 0; i < dims.length; i++) {
+      if (!this.dims[i].equals(other.dims[i])) {
+        if (diffDim != -1) {
+          return Optional.empty();
+        }
+        diffDim = i;
+      }
+    }
+    if (diffDim == -1) {
+      return Optional.of(this);
+    }
+    KeyRange a = this.dims[diffDim];
+    KeyRange b = other.dims[diffDim];
+    // Two distinct single-key points are disjoint and NOT adjacent (they're 
different
+    // values). KeyRange.intersect has a bug for inverted (DESC) singleton 
pairs where
+    // the intersection is computed as a non-empty backward range rather than
+    // EMPTY_RANGE, so the check below would incorrectly fall through to 
union. Detect
+    // this shape explicitly.
+    if (a.isSingleKey() && b.isSingleKey()
+      && !java.util.Arrays.equals(a.getLowerRange(), b.getLowerRange())) {
+      return Optional.empty();
+    }
+    if (a.intersect(b) == KeyRange.EMPTY_RANGE && !isAdjacent(a, b)) {
+      return Optional.empty();
+    }
+    KeyRange[] newDims = dims.clone();
+    newDims[diffDim] = a.union(b);
+    return Optional.of(new KeySpace(newDims, false));
+  }
+
+  /**
+   * Two 1-D ranges are adjacent when the upper bound of one equals the lower 
bound of the
+   * other and exactly one side is inclusive (so together they cover the 
shared endpoint
+   * exactly once).
+   */
+  private static boolean isAdjacent(KeyRange a, KeyRange b) {
+    return adjacentOneWay(a, b) || adjacentOneWay(b, a);
+  }
+
+  private static boolean adjacentOneWay(KeyRange a, KeyRange b) {
+    if (a.upperUnbound() || b.lowerUnbound()) {
+      return false;
+    }
+    if (!java.util.Arrays.equals(a.getUpperRange(), b.getLowerRange())) {
+      return false;
+    }
+    return a.isUpperInclusive() != b.isLowerInclusive();
+  }
+
+  /**
+   * {@code this} contains {@code other} iff every dim of {@code this} 
contains the dim of
+   * {@code other}.
+   */
+  public boolean contains(KeySpace other) {
+    requireSameArity(other);
+    if (other.empty) {
+      return true;
+    }
+    if (this.empty) {
+      return false;
+    }
+    for (int i = 0; i < dims.length; i++) {
+      KeyRange inter = this.dims[i].intersect(other.dims[i]);
+      if (!inter.equals(other.dims[i])) {
+        return false;
+      }
+    }

Review Comment:
   contains() still relies on raw KeyRange.intersect(), even though and() 
already had to special-case EVERYTHING with IS_NULL/IS_NOT_NULL because 
intersect() collapses that combination to EMPTY. A KeySpace containing 
EVERYTHING will therefore incorrectly report that it does not contain 
null/not-null ranges, which breaks OR-merge containment for those predicates.



##########
phoenix-core-client/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java:
##########
@@ -214,6 +216,9 @@ public class QueryServicesOptions {
   public static final boolean DEFAULT_IS_NAMESPACE_MAPPING_ENABLED = false;
   public static final boolean DEFAULT_IS_SYSTEM_TABLE_MAPPED_TO_NAMESPACE = 
true;
   public static final int DEFAULT_MAX_IN_LIST_SKIP_SCAN_SIZE = 50000;
+  public static final boolean DEFAULT_WHERE_OPTIMIZER_V2_ENABLED = true;

Review Comment:
   This makes the new optimizer opt-out for every deployment, but the same PR 
is still disabling a large number of existing tests for known V2 gaps (explain 
parity, ORDER BY/GROUP BY optimization, and index-selection behavior). Enabling 
it by default will expose those known regressions in production; the default 
should stay false until those cases are no longer being skipped.
   



##########
phoenix-core-client/src/main/java/org/apache/phoenix/compile/keyspace/ExpressionNormalizer.java:
##########
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile.keyspace;
+
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.hadoop.hbase.CompareOperator;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.phoenix.expression.AndExpression;
+import org.apache.phoenix.expression.ComparisonExpression;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.InListExpression;
+import org.apache.phoenix.expression.OrExpression;
+import org.apache.phoenix.expression.RowValueConstructorExpression;
+
+/**
+ * Rewrites a WHERE {@link Expression} into the canonical AND/OR form the v2 
key-space model
+ * operates on. Two rewrites are applied bottom-up:
+ * <ul>
+ * <li><b>RVC inequality</b>: {@code (c1,...,cK) OP (v1,...,vK)} for OP in
+ * {@code <, ≤, >, ≥} expands to the lexicographic OR of ANDs. Example:
+ * {@code (c1,c2,c3) > (v1,v2,v3)} becomes
+ * {@code (c1>v1) OR (c1=v1 AND c2>v2) OR (c1=v1 AND c2=v2 AND c3>v3)}. 
Equality is already
+ * expanded upstream by {@link ComparisonExpression#create}.</li>
+ * <li><b>IN list on a scalar</b>: {@code a IN (v1,...,vK)} expands to
+ * {@code a=v1 OR ... OR a=vK}. IN with an RVC LHS is left intact; the visitor 
handles
+ * that shape directly by producing one KeySpace per row value.</li>
+ * </ul>

Review Comment:
   The class-level contract no longer matches the implementation. Scalar IN 
lists are now left intact because rewriteScalarInList() is a documented no-op, 
but the Javadoc still says they are rewritten into OR-of-equalities.



##########
phoenix-core-client/src/main/java/org/apache/phoenix/compile/keyspace/scan/V2ExplainFormatter.java:
##########
@@ -0,0 +1,290 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile.keyspace.scan;
+
+import java.text.Format;
+
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.phoenix.compile.StatementContext;
+import org.apache.phoenix.compile.keyspace.KeySpace;
+import org.apache.phoenix.compile.keyspace.KeySpaceList;
+import org.apache.phoenix.query.KeyRange;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.util.StringUtil;
+
+/**
+ * V2-owned explain-plan keyRanges formatter.
+ * <p>
+ * Reads from {@link V2ScanArtifact#list()} directly rather than re-decoding 
the byte-
+ * encoded {@code ScanRanges} that drives actual scan execution. The artifact 
carries the
+ * pre-encoding, mathematical form of the scan, so inclusive-upper displays as
+ * {@code [*, 1]} (matching V1) instead of {@code [*, 2)} (what V2's compound 
byte
+ * emission produces after {@code nextKey(1) = 2}).
+ * <p>
+ * Current scope: single-space KeySpaceList with point or range ranges per 
dim. For
+ * multi-space lists the caller falls back to the legacy byte-decoding path in
+ * {@code ExplainTable.appendKeyRanges}, which produces different but 
semantically
+ * equivalent output. A future extension will handle the multi-space case by 
computing
+ * per-dim unions and emitting SkipScanFilter-style displays.
+ */
+public final class V2ExplainFormatter {
+
+  private V2ExplainFormatter() {
+  }
+
+  /**
+   * Build the keyRanges display string for a scan whose V2 artifact is {@code 
artifact}.
+   * Returns {@code null} if this formatter does not handle the input shape — 
the caller
+   * should fall back to the legacy formatter.
+   */
+  public static String appendKeyRanges(StatementContext context, TableRef 
tableRef,
+    V2ScanArtifact artifact) {
+    KeySpaceList list = artifact.list();
+    if (list.isUnsatisfiable()) {
+      return "";
+    }
+    PTable table = tableRef.getTable();
+    int nPk = artifact.nPkColumns();
+    int prefixSlots = artifact.prefixSlots();
+
+    // KeySpaceList.isEverything() means no user-dim constraint, but when 
prefix slots
+    // (salt / viewIndexId / tenantId) are present the scan is still narrowed 
to the
+    // tenant partition — render those prefix values. Without this, 
tenant-only queries
+    // display as empty brackets while V1 shows {@code ['tenantId']}.
+    if (list.isEverything()) {
+      return renderPrefixOnly(context, table, prefixSlots);
+    }
+    if (list.size() != 1) {
+      // Multi-space path not implemented yet; fall back.
+      return null;
+    }
+    KeySpace space = list.spaces().get(0);
+    boolean isLocalIndex = 
org.apache.phoenix.util.ScanUtil.isLocalIndex(context.getScan());
+
+    // KeySpace indexing is by absolute PK position ({@link 
KeySpaceExpressionVisitor}
+    // emits ranges via {@code KeySpace.single(pkPos, ...)} with {@code pkPos} 
being the
+    // column's absolute index in the PK). So {@code space.get(d)} — where d 
is an
+    // absolute PK index — returns the dim for PK column d. Prefix dims 
(viewIndexId /
+    // tenantId / salt) are always EVERYTHING inside the KeySpaceList because 
the visitor
+    // doesn't know about them; prefix values are rendered from scanRanges 
below.
+    //
+    // Fall back when a dim's raw bytes don't match the PK column's full 
fixed-width size.
+    // Happens when a scalar function (SUBSTR, FLOOR, ...) creates a KeyRange 
with truncated
+    // bytes (e.g. 3-byte substr prefix on an 8-byte LONG column). V1's 
ExplainTable reads
+    // post-processing ScanRanges bytes which have been zero-padded to the 
field width; V2's
+    // artifact carries the raw (pre-processing) bytes. Rather than 
duplicating V1's padding
+    // logic, defer to the legacy byte-decoding formatter.
+    for (int d = prefixSlots; d < nPk && d < space.nDims(); d++) {
+      KeyRange kr = space.get(d);
+      if (kr == KeyRange.EVERYTHING_RANGE || kr == KeyRange.IS_NULL_RANGE
+        || kr == KeyRange.IS_NOT_NULL_RANGE) {
+        continue;
+      }
+      PColumn col = table.getPKColumns().get(d);
+      PDataType type = col.getDataType();
+      if (!type.isFixedWidth()) {
+        continue;
+      }
+      Integer maxLen = col.getMaxLength();
+      Integer typeSize = type.getByteSize();
+      if (maxLen == null && typeSize == null) {
+        continue;
+      }
+      int expected = maxLen != null ? maxLen : typeSize;
+      byte[] lb = kr.getRange(KeyRange.Bound.LOWER);
+      byte[] ub = kr.getRange(KeyRange.Bound.UPPER);
+      if ((lb != null && lb.length != 0 && lb.length != expected)
+        || (ub != null && ub.length != 0 && ub.length != expected)) {
+        return null;
+      }
+    }
+
+    // Last-dim-to-display index: highest dim with a non-EVERYTHING range. V1 
truncates
+    // at the first EVERYTHING past the prefix so the display doesn't end in 
`*,*,*`.
+    int lastConstrained = prefixSlots - 1;
+    for (int d = prefixSlots; d < nPk && d < space.nDims(); d++) {
+      KeyRange kr = space.get(d);
+      if (kr != KeyRange.EVERYTHING_RANGE) {
+        lastConstrained = d;
+      }
+    }
+    if (lastConstrained < prefixSlots) {
+      return "";
+    }
+
+    StringBuilder lower = new StringBuilder();
+    StringBuilder upper = new StringBuilder();
+    // Prefix columns (salt byte / viewIndexId / tenantId) aren't in the 
KeySpaceList —
+    // they're auto-populated slots. Read them from the ScanRanges's per-slot 
structure,
+    // which the caller already built. This keeps prefix display identical to 
V1's.
+    int prefixEmitted = 0;
+    if (context.getScanRanges() != null && 
!context.getScanRanges().getRanges().isEmpty()) {
+      java.util.List<java.util.List<KeyRange>> ranges = 
context.getScanRanges().getRanges();
+      for (int d = 0; d < prefixSlots && d < ranges.size(); d++) {
+        // Slot 0 on a salted table holds the full [0..nBuckets-1] salt-bucket 
range
+        // list after {@link ScanRanges}'s constructor populates it via
+        // {@link SaltingUtil#generateAllSaltingRanges}. Lower bound reads the 
first
+        // range, upper bound reads the last — so the scan displays as
+        // {@code [X'00', ...] - [X'03', ...]} for nBuckets=4. Reading the 
same entry
+        // for both bounds would display {@code [X'00', ...]} as both lower 
and upper,
+        // losing the span across salt buckets. Other prefix slots 
(viewIndexId, tenantId)
+        // have a single singleton range, so the choice doesn't matter there.
+        java.util.List<KeyRange> slot = ranges.get(d);
+        KeyRange lowerRange = slot.get(0);
+        KeyRange upperRange = slot.get(slot.size() - 1);
+        byte[] lb = lowerRange.getRange(KeyRange.Bound.LOWER);
+        byte[] ub = upperRange.getRange(KeyRange.Bound.UPPER);
+        // On a local-index scan, slot 0 holds the viewIndexId. Render it via 
V1's
+        // shifted-value format ({@code Short.MIN_VALUE + 1 → "1"}) so the 
explain-plan
+        // matches V1's output. Without this flag, the raw stored bytes 
display as
+        // {@code -32768}, diverging from V1 on every local-index explain 
assertion.
+        boolean changeViewIndexId = isLocalIndex && d == 0;
+        appendPKColumnValue(lower, context, table, lb, null, d, 
changeViewIndexId);

Review Comment:
   This assumes every local-index scan has the viewIndexId in prefix slot 0, 
but salted local indexes prepend the salt byte before it. On those tables the 
formatter will interpret the salt bucket as a viewIndexId and leave the real 
viewIndexId unshifted, so the explain output is wrong.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] PHOENIX-6791 WHERE optimizer redesign [phoenix]

Reply via email to