Re: Pushing filters with unsafe operations (CAST/DIVISION) below inner joins

Julian Hyde Fri, 07 Oct 2022 17:05:17 -0700

Thanks for logging this. I agree that it’s not high priority. I added some 
commentary about related bugs.


> On Oct 7, 2022, at 9:07 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> I just logged https://issues.apache.org/jira/browse/CALCITE-5315 but I am
> not planning to work on this sometime soon.
> 
> Best,
> Stamatis
> 
> On Fri, Feb 18, 2022 at 10:56 PM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
> 
>> Thank you all for the feedback.
>> 
>> I will do a small research about the proper term to use for functions that
>> might throw / never throw and log a JIRA to move the discussion further.
>> 
>> @Viliam : Note that the problem you mention, although similar, is not
>> exactly the same as the one I brought up now.
>> 
>> The reordering of predicates in the WHERE clause has been brought up quite
>> a few times in the dev list and our stance [1] is that since the standard
>> leaves this decision to the implementor people should not rely on this (or
>> put the appropriate guards).
>> 
>> The case here is a bit different, at least the way I read the standard,
>> cause it defines the following:
>> 
>> "If all optional clauses are omitted, then the result of the <table
>> expression> is the same as the result of the
>> <from clause>. Otherwise, each specified clause is applied to the result
>> of the previously specified clause
>> and the result of the <table expression> is the result of the application
>> of the last specified clause."
>> 
>> and one of the optional clauses mentioned in the previous paragraph is the
>> <where clause>. There seems to be a clearly defined order between the <from
>> clause>, which includes inner joins, and the <where clause>.
>> 
>> Best,
>> Stamatis
>> 
>> [1] https://lists.apache.org/thread/mq44cnrohz19hh10btms126vbcoxl50w
>> 
>> On Fri, Feb 18, 2022 at 9:58 AM Viliam Durina <vil...@hazelcast.com>
>> wrote:
>> 
>>> I have observed this issue years ago in well-known databases. My case was
>>> much simpler:
>>> 
>>> data
>>> recordType:int  value:text
>>> -----           ------
>>> 0               1
>>> 1               a
>>> 
>>> SELECT *
>>> FROM data
>>> WHERE recordType='1' AND CAST(value AS INT)<10
>>> 
>>> 
>>> SQL is declarative, and unlike procedural languages, it doesn't prescribe
>>> short-circuit evaluation of the WHERE clause, or any specific evaluation
>>> order. If it was prescribed, the query would be perfectly safe. But
>>> prescribing the evaluation order would rule out many optimizations, or
>>> make
>>> them much harder, such as this half-null-half-error value.
>>> 
>>> For example, reordering additions might or might not lead to overflow:
>>> TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 -
>>> 90
>>> + 100` don't - imagine each value comes from a different table and we
>>> reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but
>>> what if they are BIGINTs?
>>> 
>>> My understanding was that any expression must not fail in any plausible
>>> execution plan. Therefore if I parse the column `value`, it must succeed
>>> for every possible scanned row. In my specific case I ended up
>>> implementing
>>> a custom TO_NUMBER function that returns null on parsing error, and that
>>> null never appeared in the result because of the other condition.
>>> 
>>> Viliam
>>> 
>>> On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote:
>>> 
>>>> You should definitely log a bug, even if there are no plans to fix it.
>>> It
>>>> is a violation of the standard, and therefore it is a bug, and
>>> therefore we
>>>> should document that it exists.
>>>> 
>>>> Can you do some research on the right terminology. You use the term
>>>> ‘unsafe’. I think the PL community uses the term ’strict’ for
>>> expressions
>>>> that must be evaluated (and therefore if they throw, the query is sure
>>> to
>>>> abort). We have other related concepts floating around in Calcite -
>>> dynamic
>>>> functions, deterministic operators - and standardizing terminology is
>>> key
>>>> if we are to keep the menagerie in order.
>>>> 
>>>> There might be a way to have our cake and eat it too. We could push down
>>>> the division, and if there is a division-by-zero we return a special
>>> value
>>>> such as NULL. If the special value is not eliminated by the join then
>>> it is
>>>> promoted to a throw. The “cake” here is the performance benefit of
>>> pushing
>>>> down a filter (that may never or rarely throw); the “eating it” is
>>> safety
>>>> on the occasion that the filter does throw.
>>>> 
>>>> Even if that strategy doesn’t work, maybe we could have a flag that says
>>>> whether to push down conditions that might throw.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Is it feasible to prevent the filter push down in cases like this (
>>>> detect
>>>>> potential division by zero) ? What other exceptions that could happen?
>>>>> 
>>>>> Should it only push down some filters and preserve the complex filter
>>>> above
>>>>> the join?
>>>>> 
>>>>> Regarding a Jira ticket, generating an exception when it should
>>> produce
>>>>> valid results is gotta be a Jira case IMHO.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Till today, I believed that pushing a filter below an inner join is
>>> not
>>>>>> strictly related to the operators (equals, plus, minus, cast,
>>> division)
>>>>>> present in the filter.
>>>>>> 
>>>>>> However, the query below will return some results if the filter is
>>>> executed
>>>>>> after the join or it will fail with an exception if the filter is
>>> pushed
>>>>>> below the join.
>>>>>> 
>>>>>> EMP [EMPNO, NAME, DEPTNO]
>>>>>> 0   | Alex | 0
>>>>>> 10 | Bob | 1
>>>>>> 
>>>>>> DEP [DEPTNO]
>>>>>> 1
>>>>>> 
>>>>>> SELECT e.name
>>>>>> FROM emp e
>>>>>> INNER JOIN dep d ON e.deptno = d.deptno
>>>>>> WHERE (10 / e.empno) = 1
>>>>>> 
>>>>>> It seems that SQL standard (Section 7.4 general rule 1) mandates that
>>>> WHERE
>>>>>> should be applied to the result of FROM so in the case above pushing
>>>>>> filters below the join seems to violate the standard.
>>>>>> 
>>>>>> Currently, in Calcite we are going to push the division, cast, and
>>>>>> basically any kind of operator below the join. Many well-known
>>> databases
>>>>>> e.g., Postgres, do that as well.
>>>>>> 
>>>>>> Should we log it as a bug and try to do something about it or let it
>>> be
>>>> as
>>>>>> it is?
>>>>>> 
>>>>>> Best,
>>>>>> Stamatis
>>>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> This message contains confidential information and is intended only for
>>> the
>>> individuals named. If you are not the named addressee you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately by e-mail if you have received this e-mail by mistake and
>>> delete this e-mail from your system. E-mail transmission cannot be
>>> guaranteed to be secure or error-free as information could be
>>> intercepted,
>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>>> viruses.
>>> The sender therefore does not accept liability for any errors or
>>> omissions
>>> in the contents of this message, which arise as a result of e-mail
>>> transmission. If verification is required, please request a hard-copy
>>> version. -Hazelcast
>>> 
>>

Re: Pushing filters with unsafe operations (CAST/DIVISION) below inner joins

Reply via email to