Hello all,

The issue solved by CALCITE-6071 is that the compiler drops source-position 
information after the validation stage. The Rel representation has no knowledge 
about where any program constructs originated.

This is problematic in at least two cases that require error handling:

  *
If one wants to write further validation besides the one provided by the 
existing validator
  *
If one wants to report precise runtime errors

Consider a large SQL program where you get a runtime error that some addition 
caused an overflow. You have no idea which addition in the program caused the 
overflow. Worse, this addition could be perhaps inside some aggregate function.

The PR https://github.com/apache/calcite/pull/3506 attempts to solve this 
problem by adding and propagating source position information to only two kinds 
of nodes in the IR: RexCall, and AggregateCall. This is not perfect, but I 
figure that it would cover 95% of the use cases, and perhaps 100% of the use 
cases for runtime errors.

As you can imagine, this is not a particularly nice PR, since it needs to 
modify all places where such nodes are constructed and insert the position 
information if it is available. The existing APIs, without position 
information, have been left unchanged, but for almost every constructor call 
there is now an additional one. However, this is also not an intrusive PR, 
since it leaves the functionality of Calcite completely unchanged, it should 
not affect the output of the compiler in any way. I don't see how the code 
could be beautified, but I think that the benefits (better errors) outweigh the 
downsides (a big uglier code).

I am sure that other projects based on Calcite can benefit from this 
information.

Thank you,
Mihai


Reply via email to