Colin Viebrock wrote:
> 
> Thus spake Geoff Hutchison (at 01:37 PM 9/17/98 -0400) ...
> >I guess the "problem" is this: ht://Dig interprets JavaScript in HTML
> >files as text. So if we can take the code Muffin uses to strip JavaScript
> >and add it to a "remove JavaScript" pass over the HTML files before
> >ht://Dig begins the real indexing, we'd be set.
> 
> What about the "problem" of people using JS to pop up windows and other
> URLs and such?  If you simply strip all the JS code from a document, you'll
> lose these links (and the info in them).

And your problem with this is....   :-)   (Did I mention I don't like
Javascript?)

> And I haven't even mentioned JS that creates URL references on the fly, or
> based on other variables.  Good luck coding a parser for that!

Exactly.  This is definately non-trivial.
For this reason there is not a single search engine that I know of that will
find any pages at http://www.htmlguru.com/ except the front page...

> The only complete solution I can see is to write a program that emulates a
> browser and follows every possible link, button, image map, etc. possible
> from that page.

There is that GPL'd javascript interpreter...   Believe me, I've thought about
it...

> [or do the digging on the server side ... but then what URL do you present
> to the user?]

Yup.
Just say "no" to javascript.  :-)

P.S.: The best part of all this javascript stuff is that marketing normally
wants all the fancy stuff on their web pages but they *also* want all their
pages to be found by all the search engines.  Try to explain that to them.  
(What?  Me bitter?  Ha!)
-- 
Andrew Scherpbier <[EMAIL PROTECTED]>
Contigo Software <http://www.contigo.com/>
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to