Hi again, Jesse.  Any news on the catdoc testing at your end?

I tested it on my system today.  I grabbed catdoc 0.34 from the location
mentioned in the contrib/htparsedoc/parse_word_doc.pl script, and cleaned
up and customised that script.  It works fine.  When I run the parser
manually on the cec3wp6.doc you gave me, it does indeed spit out 8-bit
control characters, but these don't show up in db.wordlist.  It seems that
WordList::valid_word() would reject these "words" because they don't have
any alphabetic characters in them (isalpha() test).  So, all I can say is
I can't reproduce the error you reported.  If you can still cause the
latest snapshot to crash, please give us details of how you've set it up,
and which version of catdoc and parse_word_doc you're running.  (Mail me
the source if you can.)

According to J. op den Brouw:
> 
> I was hoping I could wait to the final release but what the
> heck; Will install it tonight on my Linux box at home...
> 
> Gilles Detillieux wrote:
> > 
> > According to J. op den Brouw:
> > > Gilles Detillieux wrote:
> > > > According to J. op den Brouw:
> > > > > Well , the web sever sends you a mime-type back that
> > > > > is configured for the extnsion .doc. The server doesn't
> > > > > know what the contents is. WP docs should have
> > > > > extensions like .wp or .wp5 or .wp<whatever>
> > > > >
> > > (Snip a lot...)
> > >
> > > Here is a WP 6 file that has a .doc extention. Try to index it
> > > and you'll see (I hope) that htdig crashes because catdoc
> > > sends back 8-bit characters...
> > >
> > > http://www.st.hhs.nl/htdig/cec3wp6.doc
> > 
> > OK, I grabbed the file, but I haven't set up catdoc on my system
> > yet.  That's why I was hoping you'd test out my patched version of
> > ExternalParsers.cc for me.  :)  Your message doesn't make it clear
> > if htdig still crashes after the patch is applied.  If it does, I'd
> > gladly look into it further.  I don't spot anything in the code that
> > would blow up on 8-bit characters, but that doesn't mean testing won't
> > reveal something.
> > 
> > Just so I know I'm testing the same thing you are, which version of
> > catdoc & htparsedoc are you running, and where can I get it.  All I have
> > is the stuff in contrib/htparsedoc, from Sept. 7.
> > 
> > Also, if you can get a backtrace from a core dump when htdig crashes,
> > I'd like to see where it's happening.  I can try to reproduce the problem
> > here, but I'd like to know if what I try to find and fix is the same
> > problem you're running into - these things are sometimes system dependent.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to