hi all 

  Carbon will add min/max index for sort columns which used for better
filter query. So can we add more index for the sort column to make filter
faster. 

  This is one idea which I get from anther database design.
   For example this is one student, and the column: score in the student
table which will be sorted column. And the score range is from 1 to 100. 
The table as following: 

id      name    score
1       bill001 83
2       bill002 84
3       bill003 90
4       bill004 89
5       bill005 93
6       bill006 76
7       bill007 87
8       bill008 90
9       bill009 89
10      bill010 96
11      bill011 96
12      bill012 100
13      bill013 84
14      bill014 90
15      bill015 79
16      bill016 1
17      bill017 97
18      bill018 79
19      bill019 88
20      bill068 95
  
 After load the data into Cabron the score column will sort as following:
1       76      79      79      83      84      84      87      88      89      
89      90      90      90      93      95      96      96      97      100

the min/max index is 1/100.
So for the query as following will take all the block data. 
query1:select sum(score) from student when score score > 90 
query2:select sum(score) from student when score score > 60 and score < 70. 

Following two suggestion to reduce the block scan. 
Suggestion 1: according the score range to divide into multiple small range
for example 4: 
<http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/file/n8891/index1.png>
 
0: meas this block has not the score rang value
1: meas this block has the score rang value
If add this index, for the query1 only need scan 1/4 data of the block and
query2 no need scan any data, directly sckip this block

Suggestion 2: record more min/max for the score, for example every 5 rows
record one min/max
<http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/file/n8891/index2.png>
 
If add this index for query1 only need scan 1/2 data of the block and query2
only need scan 1/4 data of the block

this is the raw idea, please Jacky, Ravindra and liang correct it whether we
can add this feature. thanks 

Regards
Bill 




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSS-add-more-index-for-sort-columns-tp8891.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to