Hi,
I am trying to improve my Ht://dig installation a little further, and I
have noticed that the parsing of Word and PDF docs on my site usually
(always?) gives them a  $TITLE  of  Word Document  or  PDF Document,
respectively.

This looks poor, and breaks my 'search for similar' function, which does
a new search based on that variable.

The fix would seem to be fairly simple - create appropriate
template_patterns as shown at the foot of
http://www.htdig.org/hts_selectors.html#template_patterns

My problem is that I am already using a set of domain-based patterns so
that I can differentiate between pages from my sites, and pages from
'third party' sites. What seems to happen is that the very short file
extension strings are being  over-ridden by the more verbose domain
names, since every URL must match one of my existing patterns.

For example:

http://my.domain.one ${common_dir}/web.html  /

http://my.domain.two ${common_dir}/web.html  /

http:// ${common_dir}/external.html  /

.pdf  ${common_dir}/web_pdf.html


Does anyone have any ideas how to solve this, and can anyone confirm
what the matching process is when there is more than one possible match?

Are there any wild cards that I could use that would allow me to do:

http://my.domain.one_[Wildcard]_.pdf   ${common_dir}/web_pdf.html


Thanks in advance,
Mike


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to