Xiangdong Huang created IOTDB-544:
-------------------------------------

             Summary: Apache IoTDB integration with more powerful aggregation 
index
                 Key: IOTDB-544
                 URL: https://issues.apache.org/jira/browse/IOTDB-544
             Project: Apache IoTDB
          Issue Type: Wish
          Components: Core/Engine
            Reporter: Xiangdong Huang


IoTDB is a highly efficient time series database, which supports high speed 
query process, including aggregation query.

Currently, IoTDB pre-calculates the aggregation info, or called the summary 
info, (sum, count, max_time, min_time, max_value, min_value) for each page and 
each Chunk. The info is helpful for aggregation operations and some query 
filters. For example, if the query filter is value >10 and the max value of a 
page is 9, we can skip the page. For another example, if the query is select 
max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is 
20. 

However, there are two drawbacks:

1. The summary info actually reduces the data that needs to be scanned as 1/k 
(suppose each page has k data points). However, the time complexity is still 
O(N). If we store a long historical data, e.g., storing 2 years data with 
500KHz, then the aggregation operation may be still time-consuming. So, a 
tree-based index to reduce the time complexity from O(N) to O(logN) is a good 
choice. Some basic ideas have been published in [1], while it can just handle 
data with fix frequency. So, improving it and implementing it into IoTDB is a 
good choice.

2. The summary info is helpless for evaluating the query like where value >8 if 
the max value = 10. If we can enrich the summary info, e.g., storing the data 
histogram, we can use the histogram to evaluate how many points we can return. 

This proposal is mainly for adding an index for speeding up the aggregation 
query. Besides, if we can let the summary info be more useful, it could be 
better.

Notice that the premise is that the insertion speed should not be slow down too 
much!

You should know:
 • IoTDB query process
 • TsFile structure and organization
 • Basic index knowledge
 • Java 

difficulty: Major
 mentors:
 h...@apache.org

Reference:

[1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
  
  
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to