According to Andoni Ayala:
> When i trying to parse doc (pdf, wordperfect, etc), i  parse it with
> parse_doc.pl, the script split the accented word in two. but if i parse
> directly the document with de particular parser (ej wp2html, or
> pdftohtml) i view well the accents.

Are you sure it's the parse_doc.pl script, and not htdig, that's splitting
the words?  Do you have your locale set correctly?  See

  http://www.htdig.org/FAQ.html#q4.9
  http://www.htdig.org/FAQ.html#q4.10
  http://www.htdig.org/FAQ.html#q5.8

You should probably also use an external converter, such as conv_doc.pl or
better yet, doc2html, as you'll get better results than with parse_doc.pl.
The doc2html converter also makes it easier to add other conversion
programs.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to