According to Charles Dill:
> I'm creating a new site for a client who wishes to implement a search
> engine. The clients current site is running on an Apache server with
> htdig already installed. The new site will replace the old one on that
> server.
>  
> Since I've yet to use htdig, or implement a search engine for that
> matter, I was hoping I could get some help. Basically I'm wondering what
> I'll need to do, if anything, to the individual web pages in preparation
> to the search implantation and configuration.

In many, or even most cases, you don't need to do anything to the web
pages to prep them for indexing.  htdig parses standard HTML to get the
words into the index, and follows HTML links to other documents to get
at the whole linked collection of documents on your site.  Things to
look out for are:

1) htdig doesn't parse JavaScript code, so if your navigation in done
that way, you may need to find another way to get htdig to know what
documents to parse.  See http://www.htdig.org/FAQ.html#q5.18

2) htdig uses the text between <title> and </title> tags in displaying
search results, and by default gives words in there a higher score in
the results.  Make sure your documents use these appropriately.

3) htdig can make use of meta keywords and meta description tags, so
you may want to use these too.

4) htdig can parse PDFs and other document types with external converters
or parsers.  See http://www.htdig.org/FAQ.html#q4.8 and 4.9 to see how
to set these up.

> Also will I need access to
> any files that may not reside on root in order to configure the
> searcher? The web hosting company wasn't much help and simply stated
> they want me to use htdig since they are already using it.

I'm not quite sure what you mean by this.  If you have to use htdig and
htsearch as the hosting company set them up, then you should ask for any
documentation they have that deals with the specifics of configuring it
on their site.  If they're simply recommending htdig, but you're going to
set up your own copy of it, then you can set it up so that nothing is
root-dependent.

The main configuration issue that makes the package dependent on
a specific directory or set of directories is the issue of where
htsearch's CONFIG_DIR is located, i.e. in which directory it looks
for configuration files.  Apart from that, everything else can pretty
easily be defined or overridden on the command line or in your own
configuration file.  See http://www.htdig.org/FAQ.html#q5.30, 5.32 and
4.2 for more information.  Question 4.18 may also provide useful pointers.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to