[ https://issues.apache.org/jira/browse/IOTDB-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jialin Qiao reopened IOTDB-544: ------------------------------- > Apache IoTDB integration with more powerful aggregation index > ------------------------------------------------------------- > > Key: IOTDB-544 > URL: https://issues.apache.org/jira/browse/IOTDB-544 > Project: Apache IoTDB > Issue Type: Wish > Components: Core/Engine > Reporter: Xiangdong Huang > Assignee: Zesong Sun > Priority: Major > Labels: IoTDB, gsoc2020, mentor, pull-request-available > > IoTDB is a highly efficient time series database, which supports high speed > query process, including aggregation query. > Currently, IoTDB pre-calculates the aggregation info, or called the summary > info, (sum, count, max_time, min_time, max_value, min_value) for each page > and each Chunk. The info is helpful for aggregation operations and some query > filters. For example, if the query filter is value >10 and the max value of a > page is 9, we can skip the page. For another example, if the query is select > max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) > is 20. > However, there are two drawbacks: > 1. The summary info actually reduces the data that needs to be scanned as 1/k > (suppose each page has k data points). However, the time complexity is still > O(N). If we store a long historical data, e.g., storing 2 years data with > 500KHz, then the aggregation operation may be still time-consuming. So, a > tree-based index to reduce the time complexity from O(N) to O(logN) is a good > choice. Some basic ideas have been published in [1], while it can just handle > data with fix frequency. So, improving it and implementing it into IoTDB is a > good choice. > 2. The summary info is helpless for evaluating the query like where value >8 > if the max value = 10. If we can enrich the summary info, e.g., storing the > data histogram, we can use the histogram to evaluate how many points we can > return. > This proposal is mainly for adding an index for speeding up the aggregation > query. Besides, if we can let the summary info be more useful, it could be > better. > Notice that the premise is that the insertion speed should not be slow down > too much! > You should know: > • IoTDB query process > • TsFile structure and organization > • Basic index knowledge > • Java > difficulty: Major > mentors: > h...@apache.org > Reference: > [1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489] > > > -- This message was sent by Atlassian Jira (v8.20.7#820007)