Hi Paul, 
not really an answer to your questions, I just thought you may find it useful 
as a confirmation that this packing of integers into (B or some other) Tree is 
good one. 

I have seen Integer set distributions that can profit hugely from the tree 
organization on top. 
 
have look at: http://www.iis.uni-stuttgart.de/intset/
not meant for on disk storage, but the idea is quite similar.
 
cheers, 
eks






________________________________
From: Paul Elschot <paul.elsc...@xs4all.nl>
To: java-dev@lucene.apache.org
Sent: Sunday, 18 January, 2009 23:51:36
Subject: Re: Filesystem based bitset

 
On Friday 09 January 2009 22:30:14 Marvin Humphrey wrote:
> On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:
> 
> > SSD is pretty close to RAM when it comes to seeking. Wouldn't that 
> > mean that a bitset stored on an SSD would be more or less as fast as a 
> > bitset in RAM? 
> 
> Provided that your index can fit in the system i/o cache and stay there, you
> get the speed of RAM regardless of the underlying permanent storage type.
> There's no reason to wait on SSDs before implementing such a feature.
Since this started by thinking out loud, I'd like to continue doing that.
I've been thinking about how to add a decent skipTo() to something that
compresses better than an (Open)BitSet, and this turns out to be an
integer set implemented as a B plus tree (all leafs on the same level) of
only integers with key/data compression by a frame of reference for
every node (see LUCENE-1410).
I found a java implementation for a B plus tree  on sourceforge: BpLusDotNet
in the BplusJ package, see http://bplusdotnet.sourceforge.net/ .
This has nice transaction semantics on a file system and it has a BSD licence,
so it could be used as a starting point, but:
- it only has strings as index values, so it will need quite some simplification
to work on integers as keys and data, and
- it has no built in compression as far as I could see on first inspection.
The questions:
Would someone know of a better starting point for a B plus tree of integers
with node compression?
For example, how close is the current lucene code base to implementing
a b plus tree for the doc ids of a single term?
How valuable are transaction semantics for such an integer set? It is
tempting to try and implement such an integer set by starting from the
ground up, but I don't have any practical programming experience with
transaction semantics, so it may be better to start from something that
has transactions right from the start.
Regards,
Paul Elschot


      

Reply via email to