Hi, Currently In include and exclude filter case when dimension column does not have inverted index it is doing linear search , We can add binary search when data for that column is sorted, to get this information we can check in carbon table for that column whether user has selected no inverted index or not. If user has selected No inverted index while creating a column this code is fine, if user has not selected then data will be sorted so we can add binary search which will improve the performance.
Please raise a Jira for this improvement -Regards Kumar Vishal On Fri, Mar 3, 2017 at 7:42 PM, 马云 <simafengyun1...@163.com> wrote: > Hi Dev, > > > I used carbondata version 0.2 in my local machine, and found that the > "between and" filter query is very slow. > the root caused is by the below code in IncludeFilterExecuterImpl.java. > It takes about 20s in my test. > The code's time complexity is O(n*m). I think it needs to optimized, > please confirm. thanks > > > > > > private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens > ionColumnDataChunk, > > intnumerOfRows) { > > BitSet bitSet = new BitSet(numerOfRows); > > if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk) > { > > FixedLengthDimensionDataChunk fixedDimensionChunk = > > (FixedLengthDimensionDataChunk) dimensionColumnDataChunk; > > byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); > > > > longstart = System.currentTimeMillis(); > > for (intk = 0; k < filterValues.length; k++) { > > for (intj = 0; j < numerOfRows; j++) { > > if (ByteUtil.UnsafeComparer.INSTANCE > > .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j * > filterValues[k].length, > > filterValues[k].length, filterValues[k], 0, > filterValues[k].length) == 0) { > > bitSet.set(j); > > } > > } > > } > > System.out.println("loop time: "+(System.currentTimeMillis() - > start)); > > } > > > > > returnbitSet; > > }