gortiz opened a new pull request, #14212:
URL: https://github.com/apache/pinot/pull/14212
This PR fixes 2 non critical but annoying issues in multi-stage:
## Issue 1
Plans from different servers were not correctly merged when segments for
each server produced different plans. For example, in colocated join, the
following query:
```sql
EXPLAIN PLAN FOR
SELECT DISTINCT deviceOS, groupUUID
FROM userAttributes AS a
JOIN userGroups AS g
ON a.userUUID = g.userUUID
WHERE g.groupUUID = 'group-1'
LIMIT 100
```
Produced:
```
Execution Plan
LogicalSort(offset=[0], fetch=[100])
PinotLogicalSortExchange(distribution=[hash], collation=[[]],
isSortOnSender=[false], isSortOnReceiver=[false])
LogicalSort(fetch=[100])
PinotLogicalAggregate(group=[{0, 1}])
PinotLogicalExchange(distribution=[hash[0, 1]])
PinotLogicalAggregate(group=[{0, 2}])
LogicalJoin(condition=[=($1, $3)], joinType=[inner])
PinotLogicalExchange(distribution=[hash[1]])
LeafStageCombineOperator(table=[userAttributes])
StreamingInstanceResponse
StreamingCombineSelect(repeated=[4])
SelectStreaming(table=[userAttributes],
totalDocs=[10000])
Project(columns=[[deviceOS, userUUID]])
DocIdSet(maxDocs=[40000])
FilterMatchEntireSegment(numDocs=[10000])
IntermediateCombine
Alternative(servers=[1])
PinotLogicalExchange(distribution=[hash[1]])
LeafStageCombineOperator(table=[userGroups])
StreamingInstanceResponse
StreamingCombineSelect
SelectStreaming(segment=[userGroups_OFFLINE_0],
table=[userGroups], totalDocs=[7])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[10000])
FilterInvertedIndex(predicate=[groupUUID =
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
SelectStreaming(segment=[userGroups_OFFLINE_4],
table=[userGroups], totalDocs=[4])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[10000])
FilterEmpty
SelectStreaming(segment=[userGroups_OFFLINE_6],
table=[userGroups], totalDocs=[4])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[10000])
FilterMatchEntireSegment(numDocs=[4])
Alternative(servers=[1])
PinotLogicalExchange(distribution=[hash[1]])
LeafStageCombineOperator(table=[userGroups])
StreamingInstanceResponse
StreamingCombineSelect(repeated=[4])
SelectStreaming(table=[userGroups],
totalDocs=[2471])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[40000])
FilterInvertedIndex(predicate=[groupUUID =
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
```
While with these changes both alternatives are merged, producing the
following explain:
```
Execution Plan
LogicalSort(offset=[0], fetch=[100])
PinotLogicalSortExchange(distribution=[hash], collation=[[]],
isSortOnSender=[false], isSortOnReceiver=[false])
LogicalSort(fetch=[100])
PinotLogicalAggregate(group=[{0, 1}])
PinotLogicalExchange(distribution=[hash[0, 1]])
PinotLogicalAggregate(group=[{0, 2}])
LogicalJoin(condition=[=($1, $3)], joinType=[inner])
PinotLogicalExchange(distribution=[hash[1]])
LeafStageCombineOperator(table=[userAttributes])
StreamingInstanceResponse
StreamingCombineSelect
SelectStreaming(table=[userAttributes],
totalDocs=[10000])
Project(columns=[[deviceOS, userUUID]])
DocIdSet(maxDocs=[40000])
FilterMatchEntireSegment(numDocs=[10000])
PinotLogicalExchange(distribution=[hash[1]])
LeafStageCombineOperator(table=[userGroups])
StreamingInstanceResponse
StreamingCombineSelect
SelectStreaming(table=[userGroups], totalDocs=[2478])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[50000])
FilterInvertedIndex(predicate=[groupUUID =
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
SelectStreaming(segment=[userGroups_OFFLINE_4],
table=[userGroups], totalDocs=[4])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[10000])
FilterEmpty
SelectStreaming(segment=[userGroups_OFFLINE_6],
table=[userGroups], totalDocs=[4])
Project(columns=[[groupUUID, userUUID]])
DocIdSet(maxDocs=[10000])
FilterMatchEntireSegment(numDocs=[4])
```
Which is easier to read.
## Issue 2
There was an error in how IDEMPOTENT and IGNORABLE attributes were merged,
which ended up randomly including the `segment` attribute `SelectStreaming`.
The expected behavior is that this attribute should only appear if there is a
single plan for that segment. Before this fix, the attribute was removed when
merging 2 plans with that attribute and different value, but was kept when
merging a plan without the attribute with another with the attribute.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]