According to [EMAIL PROTECTED]:
>Hi there,
>
>I am involved with a project to put 30-50 weekly newspapers online
>in Ireland.  We are currently using htdig for searching and it's
>working great, however I have a proposition..
>
>There are a few features that are missing in htdig that we would
>really like to have, these are (in no particular order):
>Exact phrase searching, The ability to index documents on the
>filesystem rather than through the web server, the ability to
>date documents by some tag rather than the date stamp, and
>a few other things that I've probably forgotten.

- Phrase searching is currently on the TODO list
- Indexing via file system is already implemented (see docs)
- Dating documents by other means than time/date stamp can
  be done by using a META tag.  However, that might not be
  what you want.  As a crawler, ht://Dig uses the last-modified
  information from the server.  If you use dynamic content,
  you can force a last-modified timestamp to be generated by
  the document generator code from a value that can be found
  in your article database.  Alas, you'll have to use the server
  in that case and will not be able to work via filesystem.

The matter of dating / ranking / indexing a site is something
that changes from site to site.  For dynamic content, it is not
possible to work via filesystem.  Tweaking the last-modified
headers of the web server is just having dynamic content in
most cases.

If you don't have dynamic content, you can still force a last-
modified header by using touch(1) on the hypertext documents.
In the case you run Apache, you can also use mod_headers for
tweaking the last-modified header on a directory level basis.

AFAIK the latter case will probably suit you best, although this
depends upon the structure of your web site (however, since the
headers are set by the web server, you'll have to index via http
rather than via file system).

The issue of not being able to use file system indexing for dynamic
content does not really matter in a sophisticated setup of ht://Dig
for your site.  Most of your site will probably not change much,
so you can do an update dig.  In combination with using multiple
databases (they must still be merged though in order to search) and
alternate work files, this will increase the performance of the in-
dexer more than using the file system for indexing ever could.


hth,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to