Normally, a .txt file is "plain/text" and htdig will not need a converter
script, though you could provide one if you wished.
I think a parser which can provide meta tags and title from any given
plain/text page would be too difficult, but if you have a set of documents
in a common style then you might be able to do something.
.asp URLs are usually scripts which return "plain/HTML", and their authors
should have provided sensible META tags and titles.
Htdig does need a converter for Acrobat files (.pdf), and the xpdf package
is suitable. Xpdf includes code for converting .pdf files into HTML, but
(IMHO) the pdf2html.pl script included with doc2html will make a better job
of META tags and title. pdf2html.pl uses pdfinfo and pdftotext from the
xpdf package.
--
David Adams
Computing Services
Southampton University
----- Original Message -----
From: "Am�lie Frenette" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, June 18, 2002 2:31 PM
Subject: [htdig] pdf, txt, asp, etc.
Hi,
I know that a parser can convert pdf, txt, asp files to html. For pdf and
txt
files, does all the information is transfered into the body of the html file
?
If so, if HT://Dig is set to consider meta tags and title, it will not be
useful ?
Thanks for your support,
Am�lie Frenette
�tudiante � la ma�trise
Sciences de l'information
Sp�c. Gestion de l'information �lectronique
Universit� de Montr�al
----------------------------------------------------------------------------
Bringing you mounds of caffeinated joy
>>> http://thinkgeek.com/sf <<<
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
----------------------------------------------------------------------------
Bringing you mounds of caffeinated joy
>>> http://thinkgeek.com/sf <<<
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html