Xiangdong Huang created COMDEV-405:
--------------------------------------
Summary: Implement PISA index in Apache IoTDB
Key: COMDEV-405
URL: https://issues.apache.org/jira/browse/COMDEV-405
Project: Community Development
Issue Type: Task
Components: GSoC/Mentoring ideas
Reporter: Xiangdong Huang
Apache IoTDB is a highly efficient time series database, which supports high
speed query process, including aggregation query.
Currently, IoTDB pre-calculates the aggregation info, or called the summary
info, (sum, count, max_time, min_time, max_value, min_value) for each page and
each Chunk. The info is helpful for aggregation operations and some query
filters. For example, if the query filter is value >10 and the max value of a
page is 9, we can skip the page. For another example, if the query is select
max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is
20.
However, there are two drawbacks:
1. The summary info actually reduces the data that needs to be scanned as 1/k
(suppose each page has k data points). However, the time complexity is still
O(N). If we store a long historical data, e.g., storing 2 years data with
500KHz, then the aggregation operation may be still time-consuming. So, a
tree-based index to reduce the time complexity from O(N) to O(logN) is a good
choice. Some basic ideas have been published in [1], while it can just handle
data with fix frequency. So, improving it and implementing it into IoTDB is a
good choice.
2. The summary info is helpless for evaluating the query like where value >8 if
the max value = 10. If we can enrich the summary info, e.g., storing the data
histogram, we can use the histogram to evaluate how many points we can return.
This proposal is mainly for adding an index for speeding up the aggregation
query. Besides, if we can let the summary info be more useful, it could be
better.
Notice that the premise is that the insertion speed should not be slow down too
much!
By the way, IoTDB provides an index framework already. So, the PISA index
should be compatible with the index framework.
You should know:
• IoTDB query process
• TsFile structure and organization
• Basic index knowledge
• Java
difficulty: Major
mentors:
[email protected]
Reference:
[1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]