At the moment we support only ScalarFunction UDF, it's functions that
operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
functions (that we already support), table macro and table functions. The
difference between table functions and macros is that macros expand to
relations, and table functions can refer to anything queryable, e.g.,
enumerables. But in the case of Beam SQL, given everything translates to
PTransforms, only table macros are relevant.

UDTF are in a way similar to external tables but don't require to specify a
schema explicitly. Instead, they can derive schema based on arguments. One
of the use-cases would be querying ranges of dataset partitions using a
helper function like:

SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start => '2017-01-01',
end => '2018-01-01'))

With BEAM-6133 <https://issues.apache.org/jira/browse/BEAM-6133> (
apache/beam/#7141 <https://github.com/apache/beam/pull/7141>) we would have
support for UDTF in Beam SQL.

[1] https://issues.apache.org/jira/browse/BEAM-6133
[2] https://github.com/apache/beam/pull/7141

Gleb

Reply via email to