Hi, Dominique,

Thanks for the update.  Another alternative might be to sort the
dictionary before building the indexes for ibis::category.  This will
ensure the bitmaps corresponding to the words with the same prefixes
are ordered together in memory.  I will give this option a try in the
next few hours and see how far I can go with it.

The function bitvector::do_cnt is used a lot, but its results are
cached and therefore should be called only once for every bitvector
object.  The cost in using this function should be proportional to the
total size (in number of bytes) of all bitvector objects read into
memory.  If the bitvectors are reconstituted with memory maps, then it
is possible that the cost of actual I/O operation is attributed to
this function because it is often the first function called after a
new bitvector object is initialized.  The operation itself should be
reasonably efficient.

I am inclined to guess that the I/O cost is the really problem here.
Sorting the words in the dictionary might have a decent chance of
fixing the problem.  As soon as I have the code working, I will let
you know.

John




On 3/16/12 6:34 AM, Dominique Prunier wrote:
> Hey John,
> 
> I tried the sort. It ends up speeding up some cases but slowing
> down some others (i guess when there are many values, you are
> paying the sort). At the end, on my large query test set, it ended
> up being slower so i reverted it.
> 
> Like i said earlier, this exact same set is now 6 times slower. I
> think it is related to the fact that my bit vectors are much more
> complex (and more distributed) and more likely to match something
> (since the previous results were pure garbage), but i'm not sure
> why it make such a difference (i'd assume the first N one would be
> as dispersed that an set of N vectors). The hot spot is
> ibis::bitvector::do_cnt and i have the impression that it is called
> quite often with a test like cnt() != 0 or > 0 which could probably
> be simplified. I'll take a look at that if i have time today.
> 
> Thanks,
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to