According to Alexander Cohen:
> Ok did a little more checking and found out the exact cuase, it seems a null
> user-agent field in a robots.txt causes the seg fault:
>
> New server: <a server with a webmaster who doesn't know how to write a robots.txt>,
>80
>
> - Persistent connections: enabled
> - HEAD before GET: disabled
> - Timeout: 30
> - Connection space: 0
> - Max Documents: -1
> - TCP retries: 1
> - TCP wait time: 5
> Trying to retrieve robots.txt file
> Parsing robots.txt file using myname = htdig
> Found 'user-agent' line:
> Pattern:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0 in ?? ()
Are you able to get a stack backtrace from the core dump using the
debugger, or can you run htdig from the debugger and get a backtrace
when it fails.
The HtRegex::set() method looks like it should deal sensibly with an empty
pattern, so I don't understand why this would lead to a segmentation
fault. A good stack backtrace should help narrow down exactly where
the problem happens, as long as the stack itself didn't get wiped out.
With the debugging output above, we don't know how much code was executed
between the outputting of the empty pattern and the segmentation fault,
and it may be that the two events are unrelated.
A user-agent line in robots.txt with no Disallow records following it
is valid, if I'm not mistaken, and htdig should allow that.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general