According to Ace Suares:
> But your answer confuses me !
> 
> > > 
> > > url_rewrite_rules:      https://(.*) http://\\1
> > > search_rewrite_rules:  http://(.*) https://\\1
> > ...
> > > It works one way around (I am using local files, but with https, I 
> > > changed that to http as per one of your mails)
> > > but in the search results, stuff doens't get translated back !
> > 
> > In 3.1.x versions, htdig only allows http:// URLs, and it checks for that
> > well before it applies url_rewrite_rules, so it would never get around
> > to processing your https URLs and changing them to http.  However, if
> > your files are all accessible via HTTP, then you don't need to use the
> > url_rewrite_rules line.  Just use http:// URLs in start_url, and then
> > use search_rewrite_rules to rewrite them at search time to https:// URLs.
> 
> My pages are only accessible with https.

But you should be aware that url_rewrite_rules is applied to the URL
_before_ the document is fetched.  So, even if you could get htdig
3.1.6 to allow https:// URLs (which you can with a patch, see below),
with your rewrite rules above htdig would still be trying to fetch the
documents using HTTP, not HTTPS.

> So, I decided to search them locally.

If all of your documents are reachable via local_urls, then it doesn't
matter if you "fake up" http:// URLs for them - it will grab them via
the local filesystem and at search time change the fake URLs using
your search_rewrite_rules.  But, local_urls is pretty strict about
what it allows.  See http://www.htdig.org/attrs.html#local_urls for
the allowed file types.  Also, you can't index "bare" directories,
i.e. without an index.html file, via local_urls.

> However, each URL in the documents I am searching contains https 
> URLs. I followed your answer found in google and use http in 
> start_url and search_rewrite_rules (but, probably, the other way 
> around: 
> search_rewrite_rules:  https://(.*) http://\\1

No, that would change https:// URLs, assuming you got htdig to accept
them, into http:// URLs at search time, so the links in search results
won't work if the pages are only accessible via https.

> The search and merge work fine. But the output of a search contains 
> http urls, not https urls. I thought that with url_rewrite_rules I 
> could convert them back, and that I didn't have to have to config 
> files.

If by "The search and merge" you mean building the index with htdig and
htmerge, then I don't know how you got it to work fine with an unpatched
3.1.x version of htdig.  On the other hand, if you have applied the
ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.9 patch and got that working,
then you should be all set.  Of course, in this case, you don't want to
mess with any URL rewriting, either at indexing time or searching time,
as you want to stick to https:// URLs throughout both indexing and
(ht)search phases.

If you didn't apply that patch, then how did you get the "search and merge"
to work fine?

Note the distinction I'm drawing between indexing and searching above:
indexing is building the database, with htdig/htmerge, while searching
is querying the database with htsearch.  The "spidering" of documents
is part of indexing, so we don't refer to that as searching.  When you
refer to "search and merge", you seem to be implying the indexing phase,
so I just want to make sure I'm understanding you correctly.

> If you feel this message has ripened enough, I would very much 
> appreciate your answer, to find out what is my mistake here.
> Htdig is great, like so many OSS projects, and I love to find out 
> this new way to use it.

This time around, it wasn't deliberate "ripening", which I don't do to
followups, even when they're off-list.  However, I got busy and it took
a while to get back to your message.  That's the benefit of keeping even
followup messages on the list - there's always a chance someone else
can answer you, or even clarify my answer, before I can get back to it.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to