According to Gilles Detillieux:
> According to Adam Powell:
> > Thanks, Jim. It's possible that  JavaScript is confusing the parser. I can't
> > hide them as usual due to syntax.
> > 
> > So now I'm thinking that I should employ these tags: noindex_start,
> > noindex_end 
> > 
> > Because the archives contain conflicting information on noindex, anyone who
> > has used them please confirm that this is the correct usage:
> > 
> > these tags, <!--htdig_noindex--> and <!--/htdig_noindex-->   surround the
> > script, thusly:
> > 
> > <!--htdig_noindex-->
> > <script language="JavaScript" type="text/javascript"
> > src="http://adserver.com/adscript";></script>
> > <!--/htdig_noindex-->
> 
> OK, if you don't change the values of noindex_start and noindex_end,
> then the above would work to exclude the JavaScript.  However, htdig
> doesn't have problems with external JavaScript files - it's only in-line
> JavaScript which makes htdig choke.  If you're using in-line JavaScript,
> see http://www.htdig.org/FAQ.html#q4.26, and you may also be interested
> in this patch:  ftp://ftp.ccsf.org/htdig-patches/3.1.6/JavaScript.0
> 
> > And these tags go in the htdig.conf file, like so:
> > noindex_start: <SCRIPT
> > noindex_end: </SCRIPT>
> > 
> > Is that right?
> 
> You'd make that change only if you don't want to change the HTML code as
> above, and you haven't applied the patch above either.  Again, this should
> only apply to in-line JavaScript code.
> 
> If you could give us a working URL to one of the pages that htdig isn't
> indexing correctly, we might be able to make a more educated guess as to
> what the problem may be.

Never mind the last point.  I reread earlier messages, and saw that
http://www.theonion.com/ is your start_url.  The problem is the whole
body of that page is just the JavaScript tag listed above.  htdig doesn't
understand JavaScript.  See http://www.htdig.org/FAQ.html#q5.18

JavaScript != HTML

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about 
your web server security? Click here for a FREE Thawte 
Apache SSL Guide and answer your Apache SSL security 
needs: http://www.gothawte.com/rd523.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to