Hello,
my index process doesn't want to follow some links in a webpage.
the conf line is, for instance :
MaxHops 1000
MaxDocsPerServer 3500
Server http://www.webpage.com/as/
when I start index, it only displays
...
Loading configuration from /usr/local/aspseek/etc/aspseek.conf
( 0 1 1 0 0 0 0 2) Adding URL: http://www.webpage.com/as/robots.txt
( 0 1 1 0 0 0 0 2) Adding URL: http://www.webpage.com/as/index.html
Ended thread: 0. Start: 1039083134.396. End: 1039083134.663-1039083134.669.
Duration: 0.267. URL: http://www.webpage.com/as/
Ended thread: 1. Start: 0.000. End: 0.000- 0.000.
Duration: 0.000. URL:
Saving real-time database ... done.
...
here are the links in the index file
<li><a href="2001/v34/n1/index.html">Vol. 34, no 1 (2001)</a></li>
<li><a href="2000/v33/n2/index.html">Vol. 33, no 2 (2000)</a></li>
<li><a href="2000/v33/n1/index.html">Vol. 33, no 1 (2000)</a></li>
<li><a href="1999/v32/n2/index.html">Vol. 32, no 2 (1999)</a></li>
<li><a href="1999/v32/n1/index.html">Vol. 32, no 1 (1999)</a></li>
these links should be followed and the corresponding web page should be
indexed souldn't they ?
robots.txt file is as follow :
#### Generated by a proxy - DeleGate/7.9.10 by [EMAIL PROTECTED]
User-agent: *
Disallow: /-_-
Disallow: /=@=
I try to test another web page of the same web site
MaxHops 1000
MaxDocsPerServer 3500
Server http://www.webpage.com/circuit/
the index file is quiet the same, here are the links in the index file
<li><a href="2001/v11/n3/index.html">Vol. 11, no 3 (2001)</a></li>
<li><a href="2000/v11/n2/index.html">Vol. 11, no 2 (2000)</a></li>
<li><a href="2000/v11/n1/index.html">Vol. 11, no 1 (2000)</a></li>
<li><a href="1999/v10/n2/index.html">Vol. 10, no 2 (1999)</a></li>
<li><a href="1999/v10/n1/index.html">Vol. 10, no 1 (1999)</a></li>
and robots.txt is the same.
but index starts indexing all the links. for this one it is ok
Loading configuration from /usr/local/aspseek/etc/aspseek.conf
( 0 1 1 0 0 0 0 2) Adding URL: http://www.webpage.com/circuit/robots.txt
( 0 1 1 0 0 0 0 2) Adding URL: http://www.webpage.com/circuit/
( 0 1 1 0 0 0 0 2) Adding URL:
http://www.webpage.com/circuit/2001/v11/n3/index.html
( 0 1 1 1 0 0 0 2) Adding URL:
http://www.webpage.com/circuit//2000/v11/n2/index.html
( 0 1 1 1 0 0 0 2) Adding URL: http://www.web.ca/revues/revues.html
No "Server" command for URL http://www.web.ca/revues/revues.html - deleted.
( 0 1 1 0 0 0 0 2) Adding URL:
http://www.webpage.com/circuit/2000/v11/n1/index.html
something wrong ?
what is the difference ?
did someone had the same problem before
I tried to find something similar in the mailing list archive but couldn't
find anything :(
hope I am clear.
thanks a lot
Luc.
- [aseek-users] Problems with pdftohtml John Grubb
- Re: [aseek-users] Problems with pdftohtml Kir Kolyshkin
- Re: [aseek-users] Problems with pdftohtml John Grubb
- Re: [aseek-users] Problems with pdftohtml Kir Kolyshkin
- Re: [aseek-users] Problems with pdftohtml John Grubb
- [aseek-users] trademark character Luc Santeramo
- [aseek-users] trademark character Emin Huseynov
