I started with 1.2.8 and noticed the following problems/questions:
(http://search.rz.uni-osnabrueck.de/cgi-bin/s.cgi for regression ...)
1. Order of search expressions
There were different results, when searching for:
Elsner -Meyer -> OK
-Meyer Elsner -> No search expression(s)
+Elsner +Meyer -> No search expression(s)
Elsner +Meyer -> OK
A - or + appended to the first search expression should be possible.
2. "Not indexed yet"
It is not possible to get ALL pages indexed, even if using index -s 0.
There are always about 160 pages left when I try to index.
A true "index what is left to index" would be helpful.
3. Delete entries
I changed my robots.txt file for a WWW server (www.rz.uni-osnabrueck.de),
so that some directories should not be indexed.
After re-indexing, these directories are still in the database.
I know that I can completely delete the database and re-build it,
but I would like to index into an existing database.
How can I instruct index to delete URLs which are now excluded by
robots.txt?
The same situation occurs for servers xyz which had a "Server xyz" once
and should now be excluded, "# Server xyz" in aspseek.conf.
4. option selected.
The feature "<option value="..." selected="$dp"> as described on
page 33 of the 2002/02/18 manual (1.2.8) does not work, besides $ps.
It would be helpful if all Aspseek "display/sort/order/date" variables would
be treated as cookie variables like $ps. At the moment, I identify as
candidates:
dp, dx, dm, dd, dy, db, de, fm, ps, c, s, np, gr, cs, ad, bd, ds, kw, tl
5. Disallow / Allow
Using Disallow and Allow is still hard work.
These are my main scenarios :
(a) Allow some known good extensions and Disallow everything else.
Allow \.htm$
Allow \.shtm$
Allow \.cfm$
Allow \.html$
Allow \.shtml$
Allow \.cfml$
Allow \.ps$
Allow \.pdf$
Disallow *
(b) Disallow some known bad extensions and Allow everything else
DisallowNoMatch /$ | \.htm$ | \.shtm$ | \.cfm$
Disallow /cgi-bin/ /script \.cgi$ /nph
Disallow \?
Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$
Disallow /[.]{1,2} /\%2e /\%2f
Disallow [^:]//
Disallow ///
Allow *
Does this index .htm, .shtm and .cfm only?
Did anybody create a list of "good" extensions for DisallowNoMatch?
What is the difference between Allow and DisallowNoMatch
DisallowNoMatch \.htm$ should be the same as Allow \.htm$ ?
Regards Frank Elsner
#-------------------------------------------------------#
Dipl.-Math. Frank Elsner
Universitaet Osnabrueck (University of Osnabrueck)
- Rechenzentrum - (Computing Center)
Albrechstrasse 28, AVZ
D-49076 Osnabrueck
Deutschland (Germany)
Tel. (Phone): ++49 (0)541/969-2343 Fax: -2470
E-Mail: [EMAIL PROTECTED]
#-------------------------------------------------------#