On Sun, 26 Nov 2006, Ichi Brown wrote: > I have been testing htdig to archive a large set of pdf documents. The > output I recieve from the homegrown pdf-parsing script i've written has a > legitimate <title> tag with the title we'd like to show.
Are you converting the PDF content to HTML and feeding that into htdig? If so, I would expect htdig to pick up the title without any extra effort on your part. However this assumes that the output of your parser is valid HTML. > however, whenever htdig runs it's search results are typically shown as > filename.pdf as the title on the main link. > > *0613_AIR_DOM_00_028_00.pdf<http://10.35.11.22/PagePDFs/Newsprint_Products/Lo_Res/2005/06_13_05/AIR/0613_AIR_DOM_00_028_00.pdf> > *[image: *][image: *][image: *][image: *] is an example. > > When the title for that document the parser outputs isn't the filename. Is It would be necessary to know more about the process you are using and what the script output looks like in order to say why this is happening. > this a configuration issue? or is this something that can be changed pretty > easily? I like the way HTDIG works thus far, but this is the biggest > drawback i have with the package. If you have not already done so, you might also want to take a look at htdig's external_parser support. http://www.htdig.org/attrs.html#external_parsers This would probably allow you more control over the manner in which htdig indexes your documents. Jim ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

