According to Cam Proctor:
> > Please think about this a little more.  In essence you will have to
> > proxy/filter the retrieved HTML page.  This means that the base URL will
> > be different so all relative links and URLs for any components
> > referenced from that page (images, activeX controls, applets, etc.) will
> > have to be modified.  The proxy will have to interpret the HTML and
> > modify the right tags.
> > Now think about the problems with CSS, Layers, Javascript, Frames,
> > etc...
> 
> 
> for the project that this will be used (at least one part of it) there
> will be a set of files (pure html, no scripts) that will be indexed
> (about 1.5 Gb of data currently).  these files will be used only
> for this search engine.  this particular instance should be ok for this
> solution (once i get the spaces thing working right).

For pure html, a <base> tag should handle the problem with relative hrefs.
It should be easy to generate this from the document's URL, and insert it
into the output stream at the appropriate spot (right after the <head>
tag, I think).

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to