Re: [PR] Minor: Improve `HashJoinExec` documentation [arrow-datafusion]

via GitHub Fri, 27 Oct 2023 11:20:08 -0700


viirya commented on code in PR #7953:
URL: https://github.com/apache/arrow-datafusion/pull/7953#discussion_r1374907253



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -78,29 +77,140 @@ use futures::{ready, Stream, StreamExt, TryStreamExt};
 
 type JoinLeftData = (JoinHashMap, RecordBatch, MemoryReservation);
 
-/// Join execution plan executes partitions in parallel and combines them into 
a set of
-/// partitions.
+/// Join execution plan: Evaluates eqijoin predicates in parallel on multiple
+/// partitions using a hash table and an optional filter list to apply post
+/// join.
 ///
-/// Filter expression expected to contain non-equality predicates that can not 
be pushed
-/// down to any of join inputs.
-/// In case of outer join, filter applied to only matched rows.
+/// # Join Expressions
+///
+/// This implementation is optimized for evaluating eqijoin predicates  (
+/// `<col1> = <col2>`) expressions, which are represented as a list of 
`Columns`
+/// in [`Self::on`].
+///
+/// Non-equality predicates, which can not pushed down to a join inputs (e.g.
+/// `<col1> != <col2>`) are known as "filter expressions" and are evaluated
+/// after the equijoin predicates.
+///
+/// # "Build Side" vs "Probe Side"
+///
+/// HashJoin takes two inputs, which are referred to as the "build" and the
+/// "probe". The build side is the first child, and the probe side is the 
second
+/// child.
+///
+/// The two inputs are treated differently and it is VERY important that the
+/// *smaller* input is placed on the build side to minimize the work of 
creating
+/// the hash table.
+///
+/// ```text
+///          ┌───────────┐
+///          │ HashJoin  │
+///          │           │
+///          └───────────┘
+///              │   │
+///        ┌─────┘   └─────┐
+///        ▼               ▼
+/// ┌────────────┐  ┌─────────────┐
+/// │   Input    │  │    Input    │
+/// │    [0]     │  │     [1]     │
+/// └────────────┘  └─────────────┘
+///
+///  "build side"    "probe side"
+/// ```
+///
+/// Execution proceeds in 2 stages:
+///
+/// 1. the **build phase** where a hash table is created from the tuples of the
+/// build side.
+///
+/// 2. the **probe phase** where the tuples of the probe side are streamed
+/// through, checking for matches of the join keys in the hash table.
+///
+/// ```text
+///                 ┌────────────────┐          ┌────────────────┐
+///                 │ ┌─────────┐    │          │ ┌─────────┐    │
+///                 │ │  Hash   │    │          │ │  Hash   │    │
+///                 │ │  Table  │    │          │ │  Table  │    │
+///                 │ │(keys are│    │          │ │(keys are│    │
+///                 │ │equi join│    │          │ │equi join│    │  Stage 2: 
batches from
+///  Stage 1: the   │ │columns) │    │          │ │columns) │    │    the 
probe side are
+/// *entire* build  │ │         │    │          │ │         │    │  streamed 
through, and
+///  side is read   │ └─────────┘    │          │ └─────────┘    │   checked 
against the
+/// into the hash   │      ▲         │          │          ▲     │   contents 
of the hash
+///     table       │       HashJoin │          │  HashJoin      │          
table
+///                 └──────┼─────────┘          └──────────┼─────┘
+///             ─ ─ ─ ─ ─ ─                                 ─ ─ ─ ─ ─ ─ ─
+///            │                                                         │
+///
+///            │                                                         │
+///     ┌────────────┐                                            
┌────────────┐
+///     │RecordBatch │                                            │RecordBatch 
│
+///     └────────────┘                                            
└────────────┘
+///     ┌────────────┐                                            
┌────────────┐
+///     │RecordBatch │                                            │RecordBatch 
│
+///     └────────────┘                                            
└────────────┘
+///           ...                                                       ...
+///     ┌────────────┐                                            
┌────────────┐
+///     │RecordBatch │                                            │RecordBatch 
│
+///     └────────────┘                                            
└────────────┘
+///
+///        build side                                                probe side
+///
+/// ```
+///
+/// # Exampe "Optimial" Plans
+///
+/// The differences in the inputs means that for  classic "Star Schema Query",
+/// the optimal plan will be a **"Right Deep Tree"** . A Star Schema Query is
+/// one where there is one large table and several smaller "dimension" tables,
+/// joined on `Foreign Key = Primary Key` predicates. with predicates.

Review Comment:
   ```suggestion
   /// joined on `Foreign Key = Primary Key` predicates.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Minor: Improve `HashJoinExec` documentation [arrow-datafusion]

Reply via email to