gustavodemorais commented on PR #27508:
URL: https://github.com/apache/flink/pull/27508#issuecomment-4031911886

    > **As a result, may to create 3 Jira tickets :**
   > 
   > 1. Generic/Binary row data problem in MultiJoinStateViews
   > 2. Transform binary MultiJoin (with 2 inputs) bact to regular Join
   > 3. Predicate pushdown rules for MultiJoin case
   
   1. Sounds good, we can look into it in the next PR
   2. _I think_ the performance difference shouldn't be too considerable - 
we'll have the additional unnecessary joinKey layer in state. It's not 
necessary because since the commonKey = joinKey we'll have a state with only 
one key. That said, that may not cause any noticeable difference for most 
cases. In my experience testing it with rocksdb, the biggest chunk of time is 
used by the rocksdb API and not by the key serialization. Having a rule to 
convert the nodes back feels a bit weird so the performance gain would have to 
be considerable for us to justify doing that. There might be other ways of 
dealing with that. Also, the MultiJoin can be used with hints MULTI_JOIN(t1, 
t2), so if a user wants we can configure that already. I think what you 
mentioned on 3 is more interesting.
   3. This is a valid problem. As you mentioned, I think there are two valid 
strategies we could look into both with some pros and cons. In general, having 
the MultiJoin rule later makes so that we reuse the rules written for regular 
joins. If we have it earlier, as you mentioned we might have to reimplement the 
rules. Also, if we could implement an optimization which is valid for both 
regular joins and multijoins, that's even better. In the example you mentioned, 
if we could push the calc after the regular join so that it's also not there 
when they turn into multijoin, it's better since we're optimizing more use 
cases. That said, as we did in this ticket, these require research and some 
technical discussion. So go ahead with the ticket and any discussion can 
continue there :)
   
   I'd first create a ticket for 1 and 2. For **this PR**, let's keep scope 
focused on the original fix only. Thanks, @ldadima.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to