Title: Re: [htdig] Only one page indexed/ noindex_start
Mr. Detillieux ,

A temporary "pre-mercial" advertisment, the lone chunk of javascript you saw, was additionally complicating matters (darned advertisements...)

I've added these tags
noindex_start: <SCRIPT
noindex_end: </SCRIPT>

to the htdig.conf file, and it works splendidly. Many thanks!

--
Adam Powell
www.theonion.com

From: Gilles Detillieux <[EMAIL PROTECTED]>
Date: Wed, 13 Nov 2002 22:25:47 -0600 (CST)
To: [EMAIL PROTECTED] (Adam Powell)
Cc: [EMAIL PROTECTED] (ht://Dig mailing list)
Subject: Re: [htdig] Only one page indexed/ noindex_start


According to Gilles Detillieux:
> According to Adam Powell:
> > Thanks, Jim. It's possible that  JavaScript is confusing the parser. I can't
> > hide them as usual due to syntax.
> >
> > So now I'm thinking that I should employ these tags: noindex_start,
> > noindex_end
> >
> > Because the archives contain conflicting information on noindex, anyone who
> > has used them please confirm that this is the correct usage:
> >
> > these tags, <!--htdig_noindex--> and <!--/htdig_noindex-->   surround the
> > script, thusly:
> >
> > <!--htdig_noindex-->
> > <script language="JavaScript" type="text/javascript"
> > src=""http://adserver.com/adscript"></script> > > <!--/htdig_noindex-->
>
> OK, if you don't change the values of noindex_start and noindex_end,
> then the above would work to exclude the JavaScript.  However, htdig
> doesn't have problems with external JavaScript files - it's only in-line
> JavaScript which makes htdig choke.  If you're using in-line JavaScript,
> see http://www.htdig.org/FAQ.html#q4.26, and you may also be interested
> in this patch:  ftp://ftp.ccsf.org/htdig-patches/3.1.6/JavaScript.0
>
> > And these tags go in the htdig.conf file, like so:
> > noindex_start: <SCRIPT
> > noindex_end: </SCRIPT>
> >
> > Is that right?
>
> You'd make that change only if you don't want to change the HTML code as
> above, and you haven't applied the patch above either.  Again, this should
> only apply to in-line JavaScript code.
>
> If you could give us a working URL to one of the pages that htdig isn't
> indexing correctly, we might be able to make a more educated guess as to
> what the problem may be.

Never mind the last point.  I reread earlier messages, and saw that
http://www.theonion.com/ is your start_url.  The problem is the whole
body of that page is just the JavaScript tag listed above.  htdig doesn't
understand JavaScript.  See http://www.htdig.org/FAQ.html#q5.18

JavaScript != HTML

--
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about
your web server security? Click here for a FREE Thawte
Apache SSL Guide and answer your Apache SSL security
needs: http://www.gothawte.com/rd523.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to