[MariaDB developers] Re: Review of oldest 3 commits of full-outer-join

Dave Gosselin via developers Tue, 27 Jan 2026 07:49:20 -0800

Hi Monty,

Thanks for your thorough review.  I applied as many of your suggestions as 
possible; please see my responses inline below (I truncated much of the 
original email text for clarity).


Thanks,
Dave


> On Jan 16, 2026, at 10:03, Michael Widenius <[email protected]> 
> wrote:
> 
> Hi!
> 
> First, I really like that you have commented (almost) all your code
> throughly.  Very good!
Thanks.


> +  if (thd->lex->has_full_outer_join &&  // FULL JOIN not yet supported...
> +      !thd->lex->is_view_context_analysis() &&  // ...but allow VIEW
> creation...
> +      !thd->lex->describe)  // ...and limited EXPLAIN support during
> development
> +    my_error(ER_NOT_SUPPORTED_YET, MYF(0), "full join");
> +
> 
> Please move the above code to
> MDEV-37932: Parser support FULL OUTER JOIN syntax
> to the place where you added things in sql_select.cc later.
> 
> Makes it easier to review patches and ensure we can easier do rebase
> on other MariaDB versions.
> Note that we may have to add the full-outer-join feature on 'main' and
> 'oracle-compatibility' branch (based on 11.8)
We can’t really move this gate back in earlier patches because its location is 
based on what's supported at the point of that patch.  You’ll see in later 
patches that this check moves deeper into the code and takes on different 
restrictions as the implementation matures.


> **Recommendation:**
> Consider a more explicit check. After the main loop in
> `simplify_joins`, you could perform a final pass over the join list to
> check for any remaining `JOIN_TYPE_FULL` flags. This would decouple
> the error detection from the iteration logic.
> 
The AI’s assessment is mostly correct with the exception that this behavior is 
expected:  not every FULL JOIN can be rewritten to a LEFT/RIGHT/INNER JOIN.


> ##### 4. Missing `EXPLAIN` Output in Tests
> 
> The tests do an excellent job of verifying the correctness of query
> results. However, they don't confirm that the optimizer plan for a
> rewritten `FULL JOIN` is identical to its explicit `LEFT/RIGHT/INNER
> JOIN` counterpart.
This is intentional so that we see FULL JOIN and RIGHT JOIN equivalencies 
(RIGHT JOINs are rewritten to LEFT JOINs and are never seen via EXPLAIN).  At 
this point in the commit history, there's no FULL JOIN implementation so it's 
impossible for FULL JOIN queries to execute.  The only way results can be 
generated is by a rewrite.  Later, when we implement the algorithm, we'll 
update the tests accordingly.


>   DBUG_ASSERT(right_neighbor);
> -  context->first_name_resolution_table=
> -    right_neighbor->first_leaf_for_name_resolution();
> +  if (!is_full_outer_join(context, right_neighbor))
> +    context->first_name_resolution_table=
> +      right_neighbor->first_leaf_for_name_resolution();
> 
> Here we are calling in most case first_leaf_for_name_resolution() twice.
> This is an expensive function, so plese try to remove the double call.
This was the case before my change.  Due to a later suggestion, I removed the 
check is_full_outer_join so this code will be untouched by my changes.  If we 
want to make this function more efficient, then can we treat it as a separate 
problem and fix it in another patch (it’s not FULL JOIN related)?  Would make 
the git history clearer for future archaeology.


> I checked what happens if we remove the test for
> if (!is_full_outer_join(context, right_neighbor))
> 
> In essence the diff is:
> select * from x natural full join xsq where x.pk is not null and
> xsq.pk is not null;
> -pk     x       y       pk      x       y
> -6      0       0       6       0       0
> -7      1       1       7       1       1
> +pk     x       y
> +6      0       0
> +7      1       1
> 
> The new result is actually correct as with natural full outer join
> we should only have the common column names once.
> What is confusing is that the column value is COALESCE() of all the values.
> 
> From postgresql:
> 
> test=# select * from t1;
> a | b
> ---+---
> 1 | 1
> 2 | 2
> (2 rows)
> 
> test=# select * from t2;
> a | b
> ---+---
> 1 | 1
> 3 | 3
> 
> test=# select *, 'XXXX',  t1.a, t2.a from t1 natural full outer join t2;
> a | b | ?column? | a | a
> ---+---+----------+---+---
> 1 | 1 | XXXX     | 1 | 1
> 2 | 2 | XXXX     | 2 |
> 3 | 3 | XXXX     |   | 3
> 
> Another issue is that at the point when this code is called, we do not
> know yet if the full outer join can be changed to left, right or inner
> join.
> 
> In essense for the current patch, we should remove the is_full_outer_join()
> function.  I verified this with Sergei.
I don’t think this is the whole solution because the FULL NATURAL to RIGHT JOIN 
rewrite results are wrong.  I'll need to investigate this case further.


> /sql/sql_list.h
> 
> 
> +  T* swap_next()
> +  {
> +    if (!ref() || !*ref() || !peek())
> +      return nullptr;
> 
> I would like to suggest we add an DBUG_ASSERT() here as we should
> never try to do a swap when there is nothing to swap.
Callsites shouldn't have to check how many elements are in their list before 
calling this method, it should just handle the caller's request the best it 
can.  We have lots of places in our code where we return if the function can't 
help the caller (like `setup_wild` which will return if there are no elements 
in the fields parameter, or `make_range_rowid_filters` if the const_table_map 
doesn't equal the found_const_table_map, etc.  How is this new method a special 
case?).


> +
> +    /*
> +      Attempt to rewrite any FULL JOINs as LEFT or RIGHT JOINs.  Any 
> subsequent
> +      JOINs that could be further rewritten to INNER JOINs are done below.
> +    */
> +    conds= rewrite_full_outer_joins(join, conds, top, in_sj, &table,
> +                                    &li, &used_tables,
> +                                    &not_null_tables);
> 
> Shouldn't we check if thd->lex->has_full_outer_join is set here to
> avoid the call if it is not needed?
I made that change but I don’t fully agree (for the same reason given above 
regarding `setup_wild` or `make_range_rowid_filters`).


> There was some duplicate code testing left_has_nested and taking actions
> depending on it's value.
> 
> Here is patch that fixes it by setting nested_... variables depending
> on if left nested_join was used or not.
> (Makes the code notable shorter).
While I made this change, I don't like it because the variable names are wrong 
when the bits didn't come from nested join tables.  It's just extra details 
that a maintainer has to remember ("nested_used_tables is from a nested join, 
except when there's no nested join, then it's just the left_table's map" -- 
this makes it harder to reason through the code).  We also lose a bunch of 
comments about how the thing works by mapping non-nested table state onto 
nested table state via variable reuse.  I’ll come up with better variable names 
to address it.


> table.cc:
> 
>   /*
>      If the current nested join is a RIGHT JOIN, the operands in
>      'join_list' are in reverse order, thus the first operand is
>      already at the front of the list. Otherwise the first operand
>      is in the end of the list of join operands.
>    */
> -    if (!(cur_table_ref->outer_join & JOIN_TYPE_RIGHT))
> +    if (!(cur_table_ref->outer_join & JOIN_TYPE_RIGHT) ||
> +        (cur_table_ref->outer_join & JOIN_TYPE_FULL))
> 
> Is this because of the code change in sql_base.cc where we do not set
> context->first_name_resolution_table in case of full join?
It's unrelated.  Without this change, we don't correctly resolve names in some 
VIEW contexts (e.g., create view v1 as select t1.a as t1a, t2.a as t2a from t1 
full join t2 on t1.a = t2.a).

_______________________________________________
developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB developers] Re: Review of oldest 3 commits of full-outer-join

Reply via email to