I try to index local web-server with    such sructure
/
-news1
--20021110
---1.html
---2.html
...
---k.html
-20021109
...
-20020101
-news2
-news3
I set manualy date  files  MODIFICATION  by command:
touch -m mmdd1111 *.html
After index I test the mysql database - the date is correct.
Then i search and sort by date:
I get documents sorted in this maner:
html-docs from news1 sorted by date, then   html-docs from news2
sorted by date too, then    html-docs from news3 too.
Why ?
If I use the search for last week I get, for ex. 10 documents, if I
search after it  for last the week, I get 10 documents too. The reall
number of documents for this period is more.
Why?
What can I do ?


> With the current state of the Internet, practical determination of
> document dates is not as simple as it should be.

> The sources of timing information.

> 1. The HTTP server Last-Modified: field
> 2. Meta information inside the document
> 3. URL, for example, some sites can put documents dated 02.05.2002
>    in the /2002/05/02 folder.
> 4. Dates in the document text
> 5. Persistent search engine database, which, for every document
>    keeps the date when the document first appeared in the database
>    and the date when its modification was detected during reindexing. 

> The document date from (1) is useless as nowadays most documents
> are generated and the (1) date is always identical to the request
> time. For static files, the HTTP server copies this date from
> the file timestamp, which also often is not related to the real
> date of the document modification or creation. ASPSeek as I understand
> uses dates from (1).

> The date (2) is most reliable and can be easily extracted, however
> most websites do not specify it, or specify it in an arbitrary format.
> The dates (3) and (4) can be reliably extracted only for some selected
> Websites and types of documents.

> This leaves (5) as the most reliable and universal source of the
> document data information. The drawback of this method is that
> document dates are restricted by the time of the database
> creation. However, usually, it is most important to discriminate
> between documents only inside a relatively small interval of time
> from the present. Therefore, a database that exists for a month
> can be already an important tool for document date information.

> If I understand it correctly, ASPSeek does not allow to keep a
> persistent database for a long time, because the obsolete
> documents are not automatically removed. Therefore from time to
> time the database must be cleared and recreated.

> I plan to modify ASPSeek code to introduce automatic removal
> of deleted documents. This removal should not be done when
> the server returned "Not found" code, because the reason may
> be that the server or the connection are down. Several requests
> during a predefined interval of time must be made to assure that
> document is indeed removed from the server for good.

> After the database persistency is achieved, the next step is
> to introduce gathering of data information from all available
> sources, its judicious use in time ranking, and a nuanced
> presentation of this information to search engine users. 

>         Gregory Kozlovsky

> Project Manager for Information Systems                 Tel: +41 (0)1 632 63
> 70
> International Relations and Security Network (ISN)      Fax: +41 (0)1 632 14
> 13
> Center for Security Studies
>         Email: [EMAIL PROTECTED]
> Swiss Federal Institute of Technology (ETH)             http://www.isn.ch
> Leonhardshalde 21, ETH-Zentrum / LEH
> CH-8092 Z�rich, Switzerland


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:yesin@;sipria.msk.ru]
> Sent: Montag, 11. November 2002 17:03
> To: [EMAIL PROTECTED]
> Subject: [aseek-users] Dose anybody in the world uses feature of search on a
> range of dates ?


> Dose anybody in the world uses feature  of search on a range of dates
> ? Can I get link to example ? I can't use this feature in ASPSeek
> 1.2.8...12.10
> Thank you.
>  mailto:yesin@;sipria.msk.ru
>  http://www.sipria.ru 
 mailto:yesin@;sipria.msk.ru
 http://www.sipria.ru


  • ... Александр Есин
    • ... Chris Hastie
    • ... Gregory Kozlovsky
      • Александр Есин

Reply via email to