morningman commented on issue #6746:
URL:
https://github.com/apache/incubator-doris/issues/6746#issuecomment-982366698
## Execution of Table Function Node
Table Function Node (TFN) contains one or more Table Functions, and its main
logic is to expand the data received from the child nodes into multiple rows
through the Table Function and return the data to the upper layer. The main
execution process is as follows:
1. Get a row of data from the child node child row.
2. Pass the child row into each table function, and each table function will
calculate and get a result set: S1, S2,...
3. Do the Cartesian product of child row and each result set and send it to
the upper layer.
for example. Suppose the child row has 3 columns, k1, v1, v2:
| k1 | v1 | v2 |
|---|---|---|
| 1 | "a,b,c" | "4,5,6" |
Two Table Functions: `explod_split(v1,',')` and `explode_split(v2,',')`
respectively produce the following result sets:
| `explod_split(v1,',')` |
|---|
| "a" |
| "b" |
| "c" |
| `explode_split(v2,',')` |
|---|
| "4" |
| "5" |
| "6" |
The final Cartesian product result is:
| k1 | `explod_split(v1,',')` | `explode_split(v2,',')` |
|---|---|---|
| 1 | "a" | "4" |
| 1 | "a" | "5" |
| 1 | "a" | "6" |
| 1 | "b" | "4" |
| 1 | "b" | "5" |
| 1 | "b" | "6" |
| 1 | "c" | "4" |
| 1 | "c" | "5" |
| 1 | "c" | "6" |
### Table Function Interface Design
Because Doris does not currently support complex data types (such as Array),
and Table Function is essentially an expression that returns an array type. So
in this implementation, we will do special treatment for Table Function.
1. DummyTableFunctions
This is a deception class. Its main purpose is to generate the scalar
function signature of the table function on the BE side to facilitate query
planning on the FE side, and to use the existing scalar function framework when
the BE performs parameter expression calculations. In other words, in the
planning and execution preparation stages of the entire query, Table Function
is treated as a scalar function.
2. TableFunctionFactory
The factory class of Table Function returns real Table Function
instances based on the function name. Currently only supports matching Function
by function name.
3. TableFunction
The actual Table Function implementation class. Provide the following
interfaces:
1. prepare()/open()
Some preparations, such as calculation of constant expressions,
memory allocation for intermediate result sets, and so on.
3. process(row)
According to the incoming data (row), calculate the Table Function
result set.
4. reset()
Because of the Cartesian product relationship between multiple Table
Functions, all the result sets of a Function may be traversed multiple times.
This method will set the cursor of the result set to the initial position in
order to continue the traversal.
5. get_value()
Get the value of the position pointed by the current cursor.
6. forward()
Move the cursor forward, then you can call get_value() to get the
next value
6. close()
The cleanup work after Function execution.
The subclasses of TableFunction are concrete implementations of each
Table Function. The following three functions are implemented in this issue:
1. `explode_split(str, delimiter)`
Split str into multiple strings according to delimiter.
2. `explode_json_array_xxx(json_str)`
Split a json array. According to the type of elements in the json
array, xxx can be string, int or double
3. `explode_bitmap(bitmap)`
Expand a bitmap and return the value of each element in the bitmap.
### Table Function Node Interface Design
Table Function Node inherits from Exec Node. There are the following
interfaces:
1. init()
Some initialization work, including obtaining Table Function objects,
etc.
2. prepare()/open()
Some preparations. For example, prepare()/open() of the call expression
3. get_next()
Get a batch of results. Here, get_next() of the child node will be
called to get the child node data first, then calculate the result of the Table
Function, and return the data after the association.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]