start_url: http://localhost/render.php3?op=1&oid=1&type=1&as=5
limit_urls_to: http://localhost/
exclude_urls: /cgi-bin/ .cgi .css
so I run ./htdig -vsi
and I get:
New server: localhost, 80
0:0:0:http://localhost/render.php3?op=1&oid=1&type=1&as=5: -+++++--+***++
size = 9168
1:1:1:http://localhost/: +* size = 328
2:2:1:http://localhost/usertools/: not found
3:3:1:http://localhost/search/help.html: -- size = 4654
4:4:1:http://localhost/about.html: -*------------------- size = 11749
5:5:1:http://localhost/license.html: -+*--***----- size = 15721
6:6:1:http://localhost/kb/render.php3?op=1=1=1=AdvancedSearch: size = 620
7:7:1:http://localhost/kb/render.php3?op=1.3=3=3=5.1: size = 63
8:8:1:http://localhost/kb/render.php3?op=1.4=4=3=5.1: size = 63
9:9:2:http://localhost/render.php3?op=1=1=1=5: size = 62
10:10:2:http://localhost/attribution.html: --- size = 1340
htdig: Run complete
htdig: 1 server seen:
htdig: localhost:80 11 documents
htdig: Errors to take note of:
Not found: http://localhost/usertools/ Ref:
http://localhost/render.php3?op=1&oid=1&type=1&as=5
The error is fine, but what's really wierd is all the URL's that have all
those "=" in them where it doesn't make sense. What I've figured out is
that htdig is dropping the CGI varnames in the URL's! It's also failing
to index the entire site (which I assume has to to do with the varname
issue).
The other odd thing is that the size htdig is reporting for pages is too
small. Those pages aren't 62-63 bytes long, they're close to 10K bytes
long.
Thoughts anyone???
--
Aaron Turner, Core Developer http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization http://linuxkb.org/
Because world domination requires quality open documentation.
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.