I just logged https://issues.apache.org/jira/browse/CALCITE-5315 but I am not planning to work on this sometime soon.
Best, Stamatis On Fri, Feb 18, 2022 at 10:56 PM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Thank you all for the feedback. > > I will do a small research about the proper term to use for functions that > might throw / never throw and log a JIRA to move the discussion further. > > @Viliam : Note that the problem you mention, although similar, is not > exactly the same as the one I brought up now. > > The reordering of predicates in the WHERE clause has been brought up quite > a few times in the dev list and our stance [1] is that since the standard > leaves this decision to the implementor people should not rely on this (or > put the appropriate guards). > > The case here is a bit different, at least the way I read the standard, > cause it defines the following: > > "If all optional clauses are omitted, then the result of the <table > expression> is the same as the result of the > <from clause>. Otherwise, each specified clause is applied to the result > of the previously specified clause > and the result of the <table expression> is the result of the application > of the last specified clause." > > and one of the optional clauses mentioned in the previous paragraph is the > <where clause>. There seems to be a clearly defined order between the <from > clause>, which includes inner joins, and the <where clause>. > > Best, > Stamatis > > [1] https://lists.apache.org/thread/mq44cnrohz19hh10btms126vbcoxl50w > > On Fri, Feb 18, 2022 at 9:58 AM Viliam Durina <vil...@hazelcast.com> > wrote: > >> I have observed this issue years ago in well-known databases. My case was >> much simpler: >> >> data >> recordType:int value:text >> ----- ------ >> 0 1 >> 1 a >> >> SELECT * >> FROM data >> WHERE recordType='1' AND CAST(value AS INT)<10 >> >> >> SQL is declarative, and unlike procedural languages, it doesn't prescribe >> short-circuit evaluation of the WHERE clause, or any specific evaluation >> order. If it was prescribed, the query would be perfectly safe. But >> prescribing the evaluation order would rule out many optimizations, or >> make >> them much harder, such as this half-null-half-error value. >> >> For example, reordering additions might or might not lead to overflow: >> TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 - >> 90 >> + 100` don't - imagine each value comes from a different table and we >> reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but >> what if they are BIGINTs? >> >> My understanding was that any expression must not fail in any plausible >> execution plan. Therefore if I parse the column `value`, it must succeed >> for every possible scanned row. In my specific case I ended up >> implementing >> a custom TO_NUMBER function that returns null on parsing error, and that >> null never appeared in the result because of the other condition. >> >> Viliam >> >> On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote: >> >> > You should definitely log a bug, even if there are no plans to fix it. >> It >> > is a violation of the standard, and therefore it is a bug, and >> therefore we >> > should document that it exists. >> > >> > Can you do some research on the right terminology. You use the term >> > ‘unsafe’. I think the PL community uses the term ’strict’ for >> expressions >> > that must be evaluated (and therefore if they throw, the query is sure >> to >> > abort). We have other related concepts floating around in Calcite - >> dynamic >> > functions, deterministic operators - and standardizing terminology is >> key >> > if we are to keep the menagerie in order. >> > >> > There might be a way to have our cake and eat it too. We could push down >> > the division, and if there is a division-by-zero we return a special >> value >> > such as NULL. If the special value is not eliminated by the join then >> it is >> > promoted to a throw. The “cake” here is the performance benefit of >> pushing >> > down a filter (that may never or rarely throw); the “eating it” is >> safety >> > on the occasion that the filter does throw. >> > >> > Even if that strategy doesn’t work, maybe we could have a flag that says >> > whether to push down conditions that might throw. >> > >> > Julian >> > >> > >> > > On Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com> >> > wrote: >> > > >> > > Is it feasible to prevent the filter push down in cases like this ( >> > detect >> > > potential division by zero) ? What other exceptions that could happen? >> > > >> > > Should it only push down some filters and preserve the complex filter >> > above >> > > the join? >> > > >> > > Regarding a Jira ticket, generating an exception when it should >> produce >> > > valid results is gotta be a Jira case IMHO. >> > > >> > > >> > > >> > > >> > > On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com> >> > wrote: >> > > >> > >> Hi all, >> > >> >> > >> Till today, I believed that pushing a filter below an inner join is >> not >> > >> strictly related to the operators (equals, plus, minus, cast, >> division) >> > >> present in the filter. >> > >> >> > >> However, the query below will return some results if the filter is >> > executed >> > >> after the join or it will fail with an exception if the filter is >> pushed >> > >> below the join. >> > >> >> > >> EMP [EMPNO, NAME, DEPTNO] >> > >> 0 | Alex | 0 >> > >> 10 | Bob | 1 >> > >> >> > >> DEP [DEPTNO] >> > >> 1 >> > >> >> > >> SELECT e.name >> > >> FROM emp e >> > >> INNER JOIN dep d ON e.deptno = d.deptno >> > >> WHERE (10 / e.empno) = 1 >> > >> >> > >> It seems that SQL standard (Section 7.4 general rule 1) mandates that >> > WHERE >> > >> should be applied to the result of FROM so in the case above pushing >> > >> filters below the join seems to violate the standard. >> > >> >> > >> Currently, in Calcite we are going to push the division, cast, and >> > >> basically any kind of operator below the join. Many well-known >> databases >> > >> e.g., Postgres, do that as well. >> > >> >> > >> Should we log it as a bug and try to do something about it or let it >> be >> > as >> > >> it is? >> > >> >> > >> Best, >> > >> Stamatis >> > >> >> > >> > >> >> -- >> This message contains confidential information and is intended only for >> the >> individuals named. If you are not the named addressee you should not >> disseminate, distribute or copy this e-mail. Please notify the sender >> immediately by e-mail if you have received this e-mail by mistake and >> delete this e-mail from your system. E-mail transmission cannot be >> guaranteed to be secure or error-free as information could be >> intercepted, >> corrupted, lost, destroyed, arrive late or incomplete, or contain >> viruses. >> The sender therefore does not accept liability for any errors or >> omissions >> in the contents of this message, which arise as a result of e-mail >> transmission. If verification is required, please request a hard-copy >> version. -Hazelcast >> >