Hi, I am trying to improve my Ht://dig installation a little further, and I have noticed that the parsing of Word and PDF docs on my site usually (always?) gives them a $TITLE of Word Document or PDF Document, respectively.
This looks poor, and breaks my 'search for similar' function, which does a new search based on that variable. The fix would seem to be fairly simple - create appropriate template_patterns as shown at the foot of http://www.htdig.org/hts_selectors.html#template_patterns My problem is that I am already using a set of domain-based patterns so that I can differentiate between pages from my sites, and pages from 'third party' sites. What seems to happen is that the very short file extension strings are being over-ridden by the more verbose domain names, since every URL must match one of my existing patterns. For example: http://my.domain.one ${common_dir}/web.html / http://my.domain.two ${common_dir}/web.html / http:// ${common_dir}/external.html / .pdf ${common_dir}/web_pdf.html Does anyone have any ideas how to solve this, and can anyone confirm what the matching process is when there is more than one possible match? Are there any wild cards that I could use that would allow me to do: http://my.domain.one_[Wildcard]_.pdf ${common_dir}/web_pdf.html Thanks in advance, Mike ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

