According to J. op den Brouw:
> Well , the web sever sends you a mime-type back that
> is configured for the extnsion .doc. The server doesn't
> know what the contents is. WP docs should have
> extensions like .wp or .wp5 or .wp<whatever>
>
> catdoc should complain if the file is not a word file.
> In fact it does, but not always.
>
> On Fri, 29 Jan 1999, Geoff Hutchison wrote:
> > >Fourth, catdoc sometimes fails dramaticly when a non-Word
> > >file end with .doc and gets parsed by catdoc. It crashed
> > >htdig at my place...
> >
> > Hmm. So the file was sent with the incorrect mime-type? Is there a way we
> > can detect this easily?
Improving the error checking in catdoc may be a solution, but the question
in my mind was "why is any external parser able to take htdig down with
it?" I took a look at htdig/ExternalParser.cc, and found some of its
error checking to be less than bullet-proof. Some of the ands looked
funny - I guess single &'s would work, but the right operator in this
context is &&. I added lots of error checking for strtok's results.
First of all, I didn't assume it can be called repeatedly after it
returns a NULL, as that may be implementation dependent. I also made
sure the return value was always checked for NULL before using it.
I don't use external parsers at my site, so could someone who uses them
give this patch a try, please? I'd especially like to know if this solves
the crashing problem reported by the person who started this thread.
(Sorry, I don't have the original message, so I don't recall who this was.)
This was applied to the 012799 snapshot. If you want to apply it to
3.1.0b4, the last two bits will fail because they include the meta stuff
that was added after the 011499 snapshot.
--- ./htdig/ExternalParser.cc.nullchk Wed Jan 27 18:57:07 1999
+++ ./htdig/ExternalParser.cc Mon Feb 1 10:30:09 1999
@@ -151,13 +151,19 @@
while (readLine(input, line))
{
token1 = strtok(line, "\t");
+ if (token1 == NULL)
+ token1 = "";
+ token2 = NULL;
+ token3 = NULL;
switch (*token1)
{
case 'w': // word
token1 = strtok(0, "\t");
- token2 = strtok(0, "\t");
- token3 = strtok(0, "\t");
- if ( token1!=NULL & token2!=NULL & token3!=NULL )
+ if (token1 != NULL)
+ token2 = strtok(0, "\t");
+ if (token2 != NULL)
+ token3 = strtok(0, "\t");
+ if (token1 != NULL && token2 != NULL && token3 != NULL)
retriever.got_word(token1, atoi(token2), atoi(token3));
else
cerr<< "External parser error in line:"<<line<<"\n";
@@ -165,17 +171,20 @@
case 'u': // href
token1 = strtok(0, "\t");
- token2 = strtok(0, "\t");
- url.parse(token1);
- if (token1 != NULL & token2 != NULL )
+ if (token1 != NULL)
+ token2 = strtok(0, "\t");
+ if (token1 != NULL && token2 != NULL)
+ {
+ url.parse(token1);
retriever.got_href(url, token2);
+ }
else
cerr<< "External parser error in line:"<<line<<"\n";
break;
case 't': // title
token1 = strtok(0, "\t");
- if (token1 != NULL )
+ if (token1 != NULL)
retriever.got_title(token1);
else
cerr<< "External parser error in line:"<<line<<"\n";
@@ -183,7 +192,7 @@
case 'h': // head
token1 = strtok(0, "\t");
- if (token1 != NULL )
+ if (token1 != NULL)
retriever.got_head(token1);
else
cerr<< "External parser error in line:"<<line<<"\n";
@@ -204,7 +213,9 @@
else
cerr<< "External parser error in line:"<<line<<"\n";
break;
+
case 'm': // meta
+ {
// Using good_strtok means we can accept empty
// fields.
char *httpEquiv = good_strtok(token1+2, '\t');
@@ -315,6 +326,11 @@
}
else
cerr<< "External parser error in line:"<<line<<"\n";
+ break;
+ }
+
+ default:
+ cerr<< "External parser error in line:"<<line<<"\n";
break;
}
}
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.