That's the XML over. I've got a few more random things about searching in general, mostly theoretical stuff: How do people index truly dynamic sites? Here comes a mostly made-up example, but it illustrates my point quite well. WARNING: Perl and bourne shell used in fair quantity in this e-mail. First off, a really really simple CGI script. <snip> #!/bin/sh echo -n "Content-type: text/plain\n\n" exec fortune </snip> And that's it. It prints out a random fortune cookie. How do you add this to the search engine? You'd only index on the words in the cookie at the time. Useless. You'd be _very_ unlikely to be able to search for the word "cookie". There's obviously something like my XML interface from the last e-mail, but that doesn't use the spider, so isn't quite as unified as it could be. You could do it as an HTML document with meta tags, something more like: <snip> #!/opt/bin/perl use CGI; my $q = new CGI; print $q->header; print $q->start_html; print "<META NAME="DESCRIPTION" CONTENT="This produces nothing but fortune cookies">\n'; print "<META NAME="KEYWORDS" CONTENT="fortune,cookie">\n'; open FORTUNE,"fortune |"; while (<FORTUNE>) { s/&/&/go; s/</</go; s/>/>/go; s/\s+$//go; s{^(\s+)} {' ' x length($1)}gem; print $_."<BR>\n"; } print $q->end_html; exit 0 </snip> Right. But surely that produces _far_ more heavyweight HTML than is necessary? You're doubling the size of the page for a lot of the cookies you might see. And the spider still sees the fortune cookie itself and indexes it. If there were offensive words in there, that'd be _really_ bad. I know, put it between <!-- udmcomment --> tags or whatever it is, but that's just making the HTML bigger again. And I know that we're only talking small amount of data in the grand scheme of things, but it does translate to bigger problems. So, this is actually what my fortune cookie program currently is: <snip> #!/opt/bin/perl use CGI; my $q = new CGI; print $q->header; print $q->start_html(-title => "Second Spider Cloaking Test", -BGCOLOR => '#FFFFCC'); if ( $ENV{"HTTP_USER_AGENT"} =~ /UdmSearch/ ) { print "<META NAME=\"DESCRIPTION\" CONTENT=\"This produces nothing but fortune cookies, and UDM has indexed on some keywords, but they're not in the page if you go to it\">"; print "<META NAME=\"KEYWORDS\" CONTENT=\"chunky,kibbles,fortune,magic,cookie,machine\">"; } else { open FORTUNE, "/home/gbriggs/bin/fortune |"; while (<FORTUNE>) { s/&/&/go; s/</</go; s/>/>/go; s/\s+$//go; s{^(\s+)} {' ' x length($1)}gem; print $_."<BR>\n"; } } print $q->end_html; exit 0; </snip> It detects the user agent and only shows it things I want it to see. Interestingly, this has other uses; you can show it scraps of HTML of the form <A HREF="somewhere">Crossword</A> And it'll know to go there. Interesting way of showing it things. And we're onto another topic. Seed pages. How do you index, for example all the unix manpages? This is something else that I find to be of practical use. I have a short CGI script, man.cgi, from here: http://www.oac.uci.edu/indiv/ehood/man2html.html It's nice and configurable. Looks fairly reasonable, does what it's meant to do. But how would you tell mnogosearch how to index every manpage? It'd get mighty boring mighty quickly if you had to run "./indexer -u long-url" for every manpage. Approximately 9600 of them on the host I run mnogo on. So, instead, I needed a "seed page" to point the indexer at. Well. <snip> #!/bin/sh echo "<HTML><BODY>" > ./manpages.html echo "<meta name=\"robots\" content=\"noindex,follow\">" >> ./manpages.html (for d in /home/gbriggs/man /opt/man /usr/man /usr/openwin/man /usr/dt/man /opt/SUNWspro/man /opt/gnu/man ;do find $d -type f ;done) | sed 's/^.*\///g' | sort -u | awk '{print "<A HREF=\"/cgi-bin/manpages/man.cgi?section=all&topic="$1"\">"$1"</A>";}' >> ./manpages.html echo "</BODY></HTML>" >> ./manpages.html </snip> And the it'll index all the manpages, but will leave out that one, thanks to the robots meta tag. And yes, I know my code could be improved greatly, but I needed something and I needed it fast [at the time]. Anyone else's experience on this or other similar things? Thank-you very much, Gary (-; ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]