Ok I have two problems people should be aware of, this is the first one that seems to
be
where its barfing on the null string
Program received signal SIGSEGV, Segmentation fault.
0x30001027fe8 in String::get (this=0x140004510) at String.cc:241
#0 0x30001027fe8 in String::get (this=0x140004510) at String.cc:241
#1 0x120039cb8 in main (ac=3, av=0x11ffffe08) at htdig.cc:298
The Length seemed to be set to 65536 (a wraparound?) when that happens, I just hacked
it up
to return a NULL string for now if that happens so see if I can get past it, but then
encountered another problem:
Program received signal SIGSEGV, Segmentation fault.
0x3ff800d9300 in fclose () from /usr/shlib/libc.so
#0 0x3ff800d9300 in fclose () from /usr/shlib/libc.so
#1 0x12003a154 in main (ac=3, av=0x11ffffe08) at htdig.cc:346
which is apparently called as part of cleanup in htdig:
//
// Cleanup
//
if (urls_seen)
fclose(urls_seen);
if (images_seen)
fclose(images_seen);
It seems fclose is seg faulting on a supposedly valid file handle.. Commenting (yeah
yeah,
jsut trying to make it _run_ at all for now) those out will make it run through htdig
without crashing.
However doing that didn't quite help yet:
Trying to retrieve robots.txt file
Unable to find the host: 0/robots.txt (port 80)
Program received signal SIGSEGV, Segmentation fault.
0x0 in ?? ()
Which isn't exactly useful to backtrace, I think my hacking up earlier may have
introduced
another problem possibly, I've only started to look at this code and definantly dont
follow
it all that much quite yet. Someone with a good knowledge of the codebase got a clue to
whats causing any of these issues?
--- Alex
Gilles Detillieux wrote:
> According to Alexander Cohen:
> > Ok did a little more checking and found out the exact cuase, it seems a null
> > user-agent field in a robots.txt causes the seg fault:
> >
> > New server: <a server with a webmaster who doesn't know how to write a
>robots.txt>, 80
> >
> > - Persistent connections: enabled
> > - HEAD before GET: disabled
> > - Timeout: 30
> > - Connection space: 0
> > - Max Documents: -1
> > - TCP retries: 1
> > - TCP wait time: 5
> > Trying to retrieve robots.txt file
> > Parsing robots.txt file using myname = htdig
> > Found 'user-agent' line:
> > Pattern:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0 in ?? ()
>
> Are you able to get a stack backtrace from the core dump using the
> debugger, or can you run htdig from the debugger and get a backtrace
> when it fails.
>
> The HtRegex::set() method looks like it should deal sensibly with an empty
> pattern, so I don't understand why this would lead to a segmentation
> fault. A good stack backtrace should help narrow down exactly where
> the problem happens, as long as the stack itself didn't get wiped out.
> With the debugging output above, we don't know how much code was executed
> between the outputting of the empty pattern and the segmentation fault,
> and it may be that the two events are unrelated.
>
> A user-agent line in robots.txt with no Disallow records following it
> is valid, if I'm not mistaken, and htdig should allow that.
>
> --
> Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
> _______________________________________________
> htdig-general mailing list
> [EMAIL PROTECTED]
> http://lists.sourceforge.net/lists/listinfo/htdig-general
--
-----------------------------------
Alexander Cohen
La Trobe University - ITS
[EMAIL PROTECTED]
(03) 9479-5580
-----------------------------------
_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general