Hi IoTDB community,
I would like to discuss a possible feature for the IoTDB table model:
adding built-in FFT and DFT functions for time-series frequency-domain
analysis.
FFT stands for Fast Fourier Transform, and DFT stands for Discrete Fourier
Transform. Both are used to transform time-domain data into
frequency-domain data. FFT is essentially a fast algorithm for computing
DFT, so I think these two functions can be designed together, sharing
similar parameters, output schema, and test cases.
For IoTDB, this could be useful for scenarios such as sensor vibration
analysis, dominant frequency detection, and periodic signal analysis.
Preliminary Analysis
After some preliminary analysis, I think FFT/DFT are more suitable as
table-valued functions (TVFs), rather than scalar functions or window
functions.
The reason is that FFT/DFT do not work as one-row-in, one-row-out scalar
functions like abs(), sin(), or round(). They also do not aggregate
multiple rows into a single value like avg() or sum().
Instead, their semantics are:
a time-ordered sequence -> multiple frequency points
Possible SQL Form
SELECT *
FROM FFT(
DATA => (
SELECT time, device_id, value
FROM sensor
) PARTITION BY device_id ORDER BY time,
VALUE => 'value'
);
This means that the input table is partitioned by device_id, each partition
is ordered by time, and the value column is transformed into
frequency-domain results.
Similarly, DFT could use the same form:
SELECT *
FROM DFT(
DATA => (
SELECT time, value
FROM sensor
WHERE device_id = 'd1'
) ORDER BY time,
VALUE => 'value'
);
Possible Output Schema
A possible output schema could be:
frequency_index, frequency(optional), real, imag, amplitude, phase
Here, frequency_index, real, and imag are the core results of FFT/DFT.
amplitude and phase can be derived from real/imag and may be useful for
analysis.
The frequency column would require the user to provide a sample rate or
sample interval; otherwise, only frequency_index can be returned.
Existing Related Work
I also noticed that IoTDB already has FFT-related UDF support in the
library-udf module. This proposal focuses on whether FFT/DFT should be
provided as built-in functions in the table model, and whether TVF is the
right abstraction.
Questions
I would appreciate your feedback on this direction, especially:
1.
Whether FFT/DFT are suitable as built-in functions in the table model.
2.
Whether TVF is the right function type for them.
3.
What the expected parameters and output schema should be.
Best regards, Bryan Yang(杨易达)