Re: [DISCUSS] Introduce Hash Lookup Join

Martijn Visser Tue, 28 Dec 2021 06:02:31 -0800

Hi Jing,

Thanks a lot for the explanation and the FLIP. I definitely learned
something when reading more about `use_hash`. My interpretation would be
that the primary benefit of a hash lookup join would be improved
performance by allowing the user to explicitly optimise the planner.


I have a couple of questions:

- When I was reading about this topic [1] I was wondering if this feature
would be more beneficial for bounded use cases and not so much for
unbounded use cases. What do you think?
- If I look at the current documentation for SQL Hints in Flink [2], I
notice that all of the hints there are located at the end of the SQL
statement. In the FLIP, the use_hash is defined directly after the 'SELECT'
keyword. Can we somehow make this consistent for the user? Or should the
user be able to specify hints anywhere in its SQL statement?

Best regards,

Martijn

[1] https://logicalread.com/oracle-11g-hash-joins-mc02/#.V5Wm4_mnoUI
[2]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sql/queries/hints/


On Tue, 28 Dec 2021 at 08:17, Jing Zhang <[email protected]> wrote:

> Hi everyone,
> Look up join
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
> >[1]
> is
> commonly used feature in Flink SQL. We have received many optimization
> requirements on look up join. For example:
> 1. Enforces left side of lookup join do a hash partitioner to raise cache
> hint ratio
> 2. Solves the data skew problem after introduces hash lookup join
> 3. Enables mini-batch optimization to reduce RPC call
>
> Next we will solve these problems one by one. Firstly,  we would focus on
> point 1, and continue to discuss point 2 and point 3 later.
>
> There are many similar requirements from user mail list and JIRA about hash
> Lookup Join, for example:
> 1. FLINK-23687 <https://issues.apache.org/jira/browse/FLINK-23687> -
> Introduce partitioned lookup join to enforce input of LookupJoin to hash
> shuffle by lookup keys
> 2. FLINK-25396 <https://issues.apache.org/jira/browse/FLINK-25396> -
> lookupjoin source table for pre-partitioning
> 3. FLINK-25262 <https://issues.apache.org/jira/browse/FLINK-25262> -
> Support to send data to lookup table for KeyGroupStreamPartitioner way for
> SQL.
>
> In this FLIP, I would like to start a discussion about Hash Lookup Join.
> The core idea is introducing a 'USE_HASH' hint in query.  This syntax is
> directly user-oriented and therefore requires careful design.
> There are two ways about how to propagate this hint to LookupJoin in
> optimizer. We need further discussion to do final decide. Anyway, the
> difference between the two solution is only about the internal
> implementation and has no impact on the user.
>
> For more detail on the proposal:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
>
>
> Looking forward to your feedback, thanks.
>
> Best,
> Jing Zhang
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
>

Re: [DISCUSS] Introduce Hash Lookup Join

Reply via email to