gortiz commented on PR #18741:
URL: https://github.com/apache/pinot/pull/18741#issuecomment-4780562091

   > It's pretty common to have a lookup join hint, but otherwise not 
particularly care about join order. Does 'no hinted joins' mean that we won't 
reorder queries that have any hinted joins?
   
   Hints can change the behavior of the join, which means CBO needs to 
understand them. We can do that on a follow-up. But lookups are a good example. 
You don't care about join reordering on lookup joins, as there is no reason to 
swap them (the lookup should always be the right side). In the future we could 
add support for that so even if we write the join in the other way around, cbo 
reorders the lookup join if needed.
   
   >> hint veto
   >
   > Can you elaborate more on what this means?
   
   This is explained in 
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/JoinReorderOptimizer.java,
 which contains extensive javadoc. Remember this is just _a_ rule. More rules 
could be added, but I wanted to keep the PR as simple as possible.
   
   >> upsert/dedup row counts marked LOW confidence (physical docs over-count 
logical rows), ... The planner treats LOW/UNKNOWN confidence as "no stats".
   >
   > Will we improve this once we have support for number of distinct values 
column stats consuming segments downgrade confidence.
   
   Keeping row counts for upserts/dedups would mean reproducing the 
upsert/dedup logic (and their expensive data structure) on the brokers. I don't 
think that makes much sense, but it could be done in a follow-up.
   
   > What does downgrade mean? What confidence values do we have?
   
   For committed and offline segments we already know how many rows we have. 
For realtime consuming segments we cannot know that (at least the broker cannot 
know that). That is why we have to estimate how many rows we have. How we do 
that is not very important right now. Probably, we can find better heuristics 
in the future.
   
   > Also, I vote for moving the independent fixes out to a seperate PR. Maybe 
we'll want to revert this PR, and it would be nice to not have to revert the 
other fixes too. I think it'd be nice to split this up a little bit more, it's 
harder to find and understand the logic amidst all the plumbing.
   
   They can _also_ be added as independent PRs, but they I needed them here to 
run the quickstarts. Anyway they are not very serious issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to