Hello, First of all, thanks Hongze for working on this!
CALCITE-525 asks for throwing exceptions in certain circumstances and I think the fix should focus exclusively on this. Regarding the discussion about additional actions in case of an error (omitting rows or logging) I believe it is useful but I think it could be achieved by other means such as overloading the operators (using SqlOperatorTable). Omitting rows or logging is not a very common policy to become part of the core project. I think that projects relying on Calcite and need this kind of functionality could provide their own operator overloads relatively easy. Providing additional built in functions, such as CATCH_ERROR, might become handy in some situations but again it is something that individual projects could easily plugin as a user defined function. Note that, I am not against the existing PR but I would like to highlight an alternative option. Best, Stamatis Στις Πέμ, 18 Οκτ 2018 στις 1:36 μ.μ., ο/η notify...@126.com < notify...@126.com> έγραψε: > Hi, > > Thanks for opening this discussion and thank you all for your replies. > > The reason why I would like to help solve CALCITE-525 is because I agree > that an function error should not easily break a large job, this "job" > could indicate a large ad-hoc query or a query on a stream, or like what > Julian said, a ETL job. > According to some tests I had run about Calcite streaming, a simple > division by zero error could break the whole query job. Say, If Calcite > does not provide a option to silence the error, users should spend extract > daily time on checking whether their streaming job is alive if their SQL > includes a "/" operator. > > The solutions of B and C are different: B is to ignore and drop a whole > row when error happens, and C is to make the failed call return null value. > In C, the major reason to introduce "CATCH_ERROR" function is to represent > the error handling on Rel, whether to enable the function in Parser is not > important. > At first I personally prefer C, but as C does need more code change than > B, I don't have a very strong inclination between B and C now. > > At last, I agree that making enough discussion on introducing new feature > is very important. If we don't really need such a feature now, I think I > can help whenever it is considered helpful in future. > > Thanks, > Hongze > > From: Jesus Camacho Rodriguez > Date: 2018-10-18 10:26 > To: dev@calcite.apache.org > Subject: Re: Exception-handling in built-in functions > I do not believe there is enough reason to block CALCITE-525. IMO, > CALCITE-525 describes a problem that some Calcite users are facing and a > reasonable plugable solution. We should not be vetoing such a feature > without providing viable alternatives. (Without having checked the specific > implementation details, I prefer approach B described below as it is less > intrusive. And A should be fixed in a different issue.) > I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or > Postgres, and it has always tried to provide flexibility to underlying > engines, one of the reasons for its wide adoption. In addition, systems are > not forced to use this feature, it is tagged as experimental and by default > we are still running in same mode. I believe that is sufficient. > Personally, I will not be happy if a developer feels compelled to fork > Calcite or stop contributing code because we do not accept features such as > the one described there. > > Thanks, > Jesús > > > On 10/17/18, 5:17 PM, "Michael Mior" <mm...@apache.org> wrote: > > My apologies for missing this thread a couple days ago. (Thanks for > pinging > it.) Here's my two cents: taking care of contributors to the project is > just as important (if not more important) than taking care of the > code. I'm > not saying we should merge terrible code just to keep each other > happy, but > I don't think that's the case here. If anyone writes some code which > you > disagree with, you should be free to voice your disagreement. However, > especially when the code is from a core contributor and the argument > focuses on potential future problems, I think it's important to > consider > that people who have shown dedication to the project over the years are > very likely to be around and willing to fix these problems as they > arise. > > Code which turns out to cause problems can always be deleted, reverted, > refactored, etc. It's much harder to back out when a contributor is > burned > out or interpersonal conflicts get heated. > > -- > Michael Mior > mm...@apache.org > > > Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit : > > > Vladimir, > > > > You’ve made your points. And I hear them. > > > > However I get the impression that you are not open to persuasion. > Which > > means that I am wasting my time trying to reach consensus with you. > Which > > means that people win arguments not on merit, but based upon who is > most > > persistent. > > > > Here is my point. Calcite's goal is not to re-create what Oracle or > > PostgreSQL did ten years later. It is a platform that allows people > to > > write their own data engine. If they want to redefine the “+” > operator such > > that 2 + 2 returns 5, the platform should allow it. > > > > Certainly if they want to engineer their own error-handling > strategy, we > > should let them do it. I didn’t have the energy to find an example > of a SQL > > engine that discards rows with divide-by-zero errors, but I believe > there > > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL > > engines that I have worked on that performed ETL-like tasks, all had > that > > capability. And all ETL tools have very flexible error-handling > strategies. > > They are not SQL-based, but Calcite is not exclusively for SQL > systems. > > > > I have been designing and building world-class data engines for 30 > years. > > Please take me on good faith that a flexible error-handing strategy > is a > > good idea. Don’t force me to bicker over email for hours and hours. > When a > > long discussion leads to the rejection of a contribution, I get > > considerably closer to burning out. > > > > Julian > > > > > > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov < > > sitnikov.vladi...@gmail.com> wrote: > > > > > > Juilian>Hey, folks. We need your input here. > > > > > > Here are my thoughts: > > > 1) I think the features we add should have at least some level of > > > consistency > > > 2) It is much safer to adopt well-known features rather than be > pioneers > > in > > > the field. I do not mean we must wait for someone else implement > and try > > > out a feature, however I would not rush for implementing a feature > that > > > no-one else explored. > > > > > > CALCITE-525 has two key points: > > > A) Current implementation of enumerable factors code like 0/0 to a > static > > > field of a generated code. It causes the generated code to fail at > load > > > time even before the query is executed. > > > Of course that is a bug, and I'm even inclined to remove that > "static > > > fields" > > > > > > B) Someone (Hongze? Juilan?) suggest to implement a mode to > silently > > ignore > > > the error (e.g. by ignoring the row or by returning default value). > > > First of all, I don't think "ignore the row" kind of processing > would do > > > any good to the user since it would not be possible to predict the > > output. > > > "ignore the row" is very tricky when join/semijoin/antijoin is > there. > > > > > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I > think > > we > > > should not rush for it. > > > > > > C) Hongze suggests CATCH_ERROR(1 / 0 EMPTY ON ERROR) or > CATCH_ERROR(1 / > > > 0) EMPTY ON ERROR kind of functions. > > > That enables to confine the scope of the error, however I don't > think it > > > would be used often (does that mean one would have to wrap each > > > expression?), and this "catch error" would be non-trivial to > propagate to > > > the downstream executors. > > > On top of that, we might end up inventing full-blown > > try-catch-catch-catch > > > syntax. > > > > > > I truly see no business value in implementing B/C, however I do > see the > > > pain it would introduce. It would complicate Calcite maintenance. > "B" > > could > > > silently produce wrong results, and I'm sure we don't want get > results > > out > > > of thin air. > > > > > > Vladimir > > > > > > >