On Thu, 25 Mar 2004, Jeff Kirby wrote:

> Here is a brief description of what I'm trying to accomplish:
> We have about 60,000 documents that we are indexing, most of them have
> statute numbers (similar to "356.47(b)(a)" )... you'll notice a problem
> right off the bat when looking at this... and that is the period.  Now
> if I include the period and open/close paranthesis, then I'm going to be
> indexing invalid words as well...
> 
> So, I thought of two possible of solutions, but I don't think they are
> implemented in ht://Dig.  One would be the ability to include a list of
> valid words to search and index (i.e. these would be recognized in a
> document before the removal of punctuation).  The second would be to
> have a regular expression that also searches for valid words.

You might take a look at htdig's external parser support. If you are
up for writing a bit of code, this should provide you with complete
control over what is passed to htdig for indexing. Check the following
for more info on external parser support.

http://www.htdig.org/attrs.html#external_parsers


Jim


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to