Phooey! I knew I forgot to bring something in to work this morning. It
turns out I forgot to bring the list. I'll see if I can generate enough
of it from memory... These are in no particular order.

Goals                                                           Current Status
* Regex restrict/include/exclude                                Done
* Document DB keyed on DocID                                    Done
* Document Excerpts moved to separate DB                        Incomplete (need 
compression
turned on)
* Word DB conversion                                            Incomplete (mostly in 
place, a few prob.)
* Regex fuzzy                                                   Incomplete
* Speling fuzzy                                                 Incomplete
* Transport rewrite                                             Incomplete
   ExternalTransport                                            Not begun (need API)
* Trigram fuzzy                                                 Not begun (short)
* Generate a list of all documents                              Not begun (very short)
* HtTools                                                       Not begun (medium)
* UTF-8/Unicode support                                         ?
* Character-Set translation                                     ?
* Detection of duplicate documents while indexing               Not begun (short)
* External Decoders                                             ?
* Documentation / Website changes                               ?
* Distributed queries / Database collections                    ?
* Configuration changes                                         ?
* URL weighting factors (e.g. server A gets 'boost')            ?
        indexing of URL text                                    ?
* Search 'similar'                                              ?
* Field-based searching                                         (requires incomplete 
code)
* Phrase matching                                               (requires incomplete 
code)
* Shared libraries for distinct functionalities                 ?
* 

I'm going to reply to my own message in a minute with some commentary.

>  . Implement new index structure (on db entry per word occurence, I can
>    provide extended help on that)

This is what my WordList changes did. It needs some changes, but much of
the code is already committed. (Unless you have big changes I don't know
about.)

>  . Implement db transparent compression (that what I'm doing, first release
>    4 August, benchmark results 5 August

Does this recognize already-compressed data? I didn't think about this
earlier, but the excerpts are *supposed* to be using the HtZlibCodec
since they're large enough to get significant benefit.

>  . Upgrade to db-2.7.5

I don't really consider this a goal. Part of a move towards a release
entails updating code from external sources to the latest version. This
includes a variety of files in htlib/ from glibc as well as the db code.
By the time we have a 3.2 release, the versions will likely be
different.

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to