Hi, Dominique, Thanks for the update. Another alternative might be to sort the dictionary before building the indexes for ibis::category. This will ensure the bitmaps corresponding to the words with the same prefixes are ordered together in memory. I will give this option a try in the next few hours and see how far I can go with it.
The function bitvector::do_cnt is used a lot, but its results are cached and therefore should be called only once for every bitvector object. The cost in using this function should be proportional to the total size (in number of bytes) of all bitvector objects read into memory. If the bitvectors are reconstituted with memory maps, then it is possible that the cost of actual I/O operation is attributed to this function because it is often the first function called after a new bitvector object is initialized. The operation itself should be reasonably efficient. I am inclined to guess that the I/O cost is the really problem here. Sorting the words in the dictionary might have a decent chance of fixing the problem. As soon as I have the code working, I will let you know. John On 3/16/12 6:34 AM, Dominique Prunier wrote: > Hey John, > > I tried the sort. It ends up speeding up some cases but slowing > down some others (i guess when there are many values, you are > paying the sort). At the end, on my large query test set, it ended > up being slower so i reverted it. > > Like i said earlier, this exact same set is now 6 times slower. I > think it is related to the fact that my bit vectors are much more > complex (and more distributed) and more likely to match something > (since the previous results were pure garbage), but i'm not sure > why it make such a difference (i'd assume the first N one would be > as dispersed that an set of N vectors). The hot spot is > ibis::bitvector::do_cnt and i have the impression that it is called > quite often with a test like cnt() != 0 or > 0 which could probably > be simplified. I'll take a look at that if i have time today. > > Thanks, > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
