kadirozde commented on code in PR #2428:
URL: https://github.com/apache/phoenix/pull/2428#discussion_r3192229704
##########
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##########
@@ -792,6 +792,76 @@ public int compare(QueryPlan plan1, QueryPlan plan2) {
return stopAtBestPlan ? bestCandidates.subList(0, 1) : bestCandidates;
}
+ /**
+ * Variant of {@link
org.apache.phoenix.compile.ScanRanges#getBoundPkColumnCount} for the cost
+ * comparator: only honors the extra span reported by {@code slotSpan[i]}
when the slot's ranges
+ * genuinely span multiple PK columns via byte-level encoding. A slot whose
only ranges are
+ * non-point finite-bounded scalars (e.g. {@code BETWEEN a AND b} on a
single PK col) under V2
+ * can carry an artificial {@code slotSpan[i] > 0} set by
+ * {@link
org.apache.phoenix.compile.keyspace.KeyRangeExtractor#emitV1Projection} as a
+ * V1-compat marker for {@link org.apache.phoenix.filter.SkipScanFilter}'s
per-region cursor
+ * stepping. That marker is needed for SkipScanFilter correctness on
CDC-style scans but
+ * would inflate the PK-col count the cost comparator ranks plans by,
letting a plan with a
+ * narrow bounded range on one PK col tie with a plan whose compound range
genuinely covers
+ * multiple PK cols. Tie on primary key falls through to weaker tiebreakers
that flip the
+ * plan choice (see CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3
and its
+ * Upsert/Delete variants).
+ * <p>
+ * Rule: {@code slotSpan[i]} is honored unless every range in the slot is a
non-point
+ * finite-bounded scalar. Point keys (single-key compound bytes encoding
multiple PK cols,
+ * e.g. from RVC-IN or tuple equality) and ranges with an UNBOUND side
(where trailing cols
+ * inherit the unbounded side's min/max at the byte level) both represent
genuine multi-col
+ * narrowing and are counted as {@code slotSpan[i]+1}.
+ */
+ private static int effectiveBoundPkColumnCount(
+ org.apache.phoenix.compile.ScanRanges scanRanges) {
+ java.util.List<java.util.List<org.apache.phoenix.query.KeyRange>> ranges =
+ scanRanges.getRanges();
+ int[] slotSpan = scanRanges.getSlotSpans();
+ int count = 0;
+ boolean hasUnbound = false;
+ int nRanges = ranges.size();
+ for (int i = 0; i < nRanges && !hasUnbound; i++) {
+ java.util.List<org.apache.phoenix.query.KeyRange> orRanges =
ranges.get(i);
+ boolean slotHasUnbound = false;
+ boolean slotAllBoundedNonPoint = true;
+ for (org.apache.phoenix.query.KeyRange range : orRanges) {
+ if (range == org.apache.phoenix.query.KeyRange.EVERYTHING_RANGE) {
+ return count;
+ }
+ if (range.isUnbound()) {
+ slotHasUnbound = true;
+ hasUnbound = true;
+ }
+ if (range.isSingleKey() || range.isUnbound()) {
+ slotAllBoundedNonPoint = false;
+ }
+ }
+ int span = (slotSpan != null && i < slotSpan.length) ? slotSpan[i] : 0;
+ // V2-only artificial extension: {@code
KeyRangeExtractor.emitV1Projection} bumps
+ // {@code slotSpan[last]} to span trailing unconstrained PK cols for
+ // {@link SkipScanFilter} cursor stepping, even when the slot holds a
single-column
+ // finite-bounded scalar range whose bytes don't actually encode
trailing cols.
+ // For cost ranking we want to count only columns genuinely narrowed by
the scan
+ // bounds, so discount that shape to 1.
+ //
+ // Discriminator: the extension path ALWAYS has >=2 slots (at least one
leading
+ // concretely-narrowed slot plus the last bumped slot). A single-slot
shape with
+ // {@code slotSpan > 0} is the compound-emission path — a genuine
multi-col range
+ // whose bytes concatenate {@code span+1} columns — and keeps {@code
span + 1}.
+ // See CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3
(multi-slot shape
+ // must discount) vs.
RowTimestampIT.testAutomaticallySettingRowTimestampWith*
+ // (single-slot compound must not discount).
+ boolean isArtificialExtension = nRanges > 1 && i == nRanges - 1;
Review Comment:
Good catch on the single-slot extension case — the nRanges > 1 guard was too
coarse and would have over-counted it. But i == nRanges - 1 alone would
re-introduce a different regression: single-slot compound ranges
(RowTimestampIT.testAutomaticallySettingRowTimestampWith* — a PK1=v AND PK2
BETWEEN a AND b data-table plan emits one slot with slotSpan=[1] and a compound
range concatenating PK1's bytes + separator + PK2's bytes) are genuine 2-col
narrowing that must count as span+1, not 1.
The distinguishing signal is byte-vs-schema: a compound range's bytes decode
into span+1 PK fields; a V1-compat extension's bytes cover one field. I
switched the discriminator to a byte-length decode against the schema so both
cases resolve correctly:
Multi-slot extension
(CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3): bytes cover 1 field
→ discount to 1.
Single-slot extension (your new case): bytes cover 1 field → discount to 1.
Single-slot compound (RowTimestampIT): bytes cover span+1 fields → keep
span+1.
Point-key compound
(testQueryOptimizerShouldSelectThePlanWithMoreNumberOfPKColumns): gated out by
slotAllBoundedNonPoint=false, falls through to span+1.
Verified: RowTimestampIT 24/24, CostBasedDecisionIT 20/20,
WhereOptimizerTest 142/142, QueryOptimizerTest 48/48.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]