A temporary "pre-mercial" advertisment, the lone chunk of javascript you saw, was additionally complicating matters (darned advertisements...)
I've added these tagsnoindex_start: <SCRIPT
noindex_end: </SCRIPT>
to the htdig.conf file, and it works splendidly. Many thanks!
--
Adam Powell
www.theonion.com
From: Gilles Detillieux <[EMAIL PROTECTED]>
Date: Wed, 13 Nov 2002 22:25:47 -0600 (CST)
To: [EMAIL PROTECTED] (Adam Powell)
Cc: [EMAIL PROTECTED] (ht://Dig mailing list)
Subject: Re: [htdig] Only one page indexed/ noindex_start
According to Gilles Detillieux:
> According to Adam Powell:
> > Thanks, Jim. It's possible that JavaScript is confusing the parser. I can't
> > hide them as usual due to syntax.
> >
> > So now I'm thinking that I should employ these tags: noindex_start,
> > noindex_end
> >
> > Because the archives contain conflicting information on noindex, anyone who
> > has used them please confirm that this is the correct usage:
> >
> > these tags, <!--htdig_noindex--> and <!--/htdig_noindex--> surround the
> > script, thusly:
> >
> > <!--htdig_noindex-->
> > <script language="JavaScript" type="text/javascript"
> > src=""http://adserver.com/adscript"></script>
> > <!--/htdig_noindex-->
>
> OK, if you don't change the values of noindex_start and noindex_end,
> then the above would work to exclude the JavaScript. However, htdig
> doesn't have problems with external JavaScript files - it's only in-line
> JavaScript which makes htdig choke. If you're using in-line JavaScript,
> see http://www.htdig.org/FAQ.html#q4.26, and you may also be interested
> in this patch: ftp://ftp.ccsf.org/htdig-patches/3.1.6/JavaScript.0
>
> > And these tags go in the htdig.conf file, like so:
> > noindex_start: <SCRIPT
> > noindex_end: </SCRIPT>
> >
> > Is that right?
>
> You'd make that change only if you don't want to change the HTML code as
> above, and you haven't applied the patch above either. Again, this should
> only apply to in-line JavaScript code.
>
> If you could give us a working URL to one of the pages that htdig isn't
> indexing correctly, we might be able to make a more educated guess as to
> what the problem may be.
Never mind the last point. I reread earlier messages, and saw that
http://www.theonion.com/ is your start_url. The problem is the whole
body of that page is just the JavaScript tag listed above. htdig doesn't
understand JavaScript. See http://www.htdig.org/FAQ.html#q5.18
JavaScript != HTML
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about
your web server security? Click here for a FREE Thawte
Apache SSL Guide and answer your Apache SSL security
needs: http://www.gothawte.com/rd523.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

