On Thu, 19 Sep 2002, Neal Richter wrote:

>    Merging Loic's latest mifluz is supposed to fix this problem (Geoff
> and I have been working on this), but so far the merge is fairly complex
> and needs much more work and long term testing.  This is a decent 
> interim solution.

Obviously I'm more concerned with the mifluz merge and figuring out the
lousy performance. But if you've seen that switching to zlib or the newer
codec seem to solve the database bugs, then I'm happy with this as an
interim solution. We could use this for a 3.2.0b4 (which we need) and then
work on the mifluz merge for 3.2.0b5.

> WORD    DOCID   LOCATION
> affect  323    43  

So first off, I should point out that it's not quite as bad as this. Loic
and I worked on "key compression," which means that the database doesn't
actually store multiple keys when they're only slightly different. There's
also a rationale behind this system--it was faster to keep all these keys
than changing the length of the records:

> affect  323    43, 53
> affect  336    14, 148, 155

> Value-field in BDB by making it a fixed width.
> 
> Ex: Let's say this LOCATION-value is 'Full' @ 32 characters.  Further
> locations of 'affect' in doc 400 get new rows

OK, so having a fixed width and multiple rows may be a reasonable idea,
but your description isn't very workable. For one, the keys need to be
unique. So you'd want something like:

Key: WORD   DOCID   ROW
Record: Location/Flags/Anchor designation list

The key would be to come up with a compact binary representation of
these. Using characters to store integers is a bit inefficient. :-) More
on that later, perhaps.

-Geoff



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to