With the current state of the Internet, practical determination of
document dates is not as simple as it should be.
The sources of timing information.
1. The HTTP server Last-Modified: field
2. Meta information inside the document
3. URL, for example, some sites can put documents dated 02.05.2002
in the /2002/05/02 folder.
4. Dates in the document text
5. Persistent search engine database, which, for every document
keeps the date when the document first appeared in the database
and the date when its modification was detected during reindexing.
The document date from (1) is useless as nowadays most documents
are generated and the (1) date is always identical to the request
time. For static files, the HTTP server copies this date from
the file timestamp, which also often is not related to the real
date of the document modification or creation. ASPSeek as I understand
uses dates from (1).
The date (2) is most reliable and can be easily extracted, however
most websites do not specify it, or specify it in an arbitrary format.
The dates (3) and (4) can be reliably extracted only for some selected
Websites and types of documents.
This leaves (5) as the most reliable and universal source of the
document data information. The drawback of this method is that
document dates are restricted by the time of the database
creation. However, usually, it is most important to discriminate
between documents only inside a relatively small interval of time
from the present. Therefore, a database that exists for a month
can be already an important tool for document date information.
If I understand it correctly, ASPSeek does not allow to keep a
persistent database for a long time, because the obsolete
documents are not automatically removed. Therefore from time to
time the database must be cleared and recreated.
I plan to modify ASPSeek code to introduce automatic removal
of deleted documents. This removal should not be done when
the server returned "Not found" code, because the reason may
be that the server or the connection are down. Several requests
during a predefined interval of time must be made to assure that
document is indeed removed from the server for good.
After the database persistency is achieved, the next step is
to introduce gathering of data information from all available
sources, its judicious use in time ranking, and a nuanced
presentation of this information to search engine users.
Gregory Kozlovsky
Project Manager for Information Systems Tel: +41 (0)1 632 63
70
International Relations and Security Network (ISN) Fax: +41 (0)1 632 14
13
Center for Security Studies
Email: [EMAIL PROTECTED]
Swiss Federal Institute of Technology (ETH) http://www.isn.ch
Leonhardshalde 21, ETH-Zentrum / LEH
CH-8092 Z�rich, Switzerland
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:yesin@;sipria.msk.ru]
Sent: Montag, 11. November 2002 17:03
To: [EMAIL PROTECTED]
Subject: [aseek-users] Dose anybody in the world uses feature of search on a
range of dates ?
Dose anybody in the world uses feature of search on a range of dates
? Can I get link to example ? I can't use this feature in ASPSeek
1.2.8...12.10
Thank you.
mailto:yesin@;sipria.msk.ru
http://www.sipria.ru