Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
ðÒÉ×ÅÔ, õ ÎÅÇÏ ÓÌÕÞÁÅÍ ÎÅ ÍÕÌØÔÉÔÒÅÄÎÁÑ ×ÅÒÓÉÑ ÉÓÐÏÌØÚÕÅÔÓÑ ? ïÞÅÎØ ÐÏÈÏÖÅ ÎÁ ÔÏ, ËÏÇÄÁ ÐÁÍÑÔÉ ÄÌÑ ÎÉÔÉ ÎÅ È×ÁÔÁÌÏ - ÉÍÅÎÎÏ × ÔÏÍ ÖÅ ÍÅÓÔÅ É ÔÒÁÐÁÅÔÓÑ. On Thu, 01 Mar 2001 09:15:37 +0400 Alexander Barkov [EMAIL PROTECTED] wrote: AB OK. Please check also this: AB AB print realsize AB print *Doc AB AB AB AB Zenon Panoussis wrote: AB please run the following commands in gdb: AB AB frame 1 AB print content_type AB print Method AB print Doc AB print Doc-content AB print Doc-url AB AB #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 AB 97 _CRC32_(crc, *p) ; AB (gdb) frame 1 AB #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 AB 1150crc32=UdmCRC32(Doc-content, (size_t)realsize); AB (gdb) print content_type AB $1 = 0x4021c027 "application/unknown" AB (gdb) print Method AB $2 = 1 AB (gdb) print Doc AB $3 = (UDM_DOCUMENT *) 0x91ef7d8 AB (gdb) print Doc-content AB $4 = 0x4021c03e "" AB (gdb) print Doc-url AB $5 = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/." AB AB See my (bounced) posting from [EMAIL PROTECTED] on AB Tue, 27 Feb 2001 15:46:37 +0100 for details about this and AB the other URLs that the indexer crashes on. AB ___ AB If you want to unsubscribe send "unsubscribe general" AB to [EMAIL PROTECTED] AB AB -- ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
realsize -1 means that there was an error while donwloading document. I found that there is not checking in indexer.c for this. Please find a patch here: http://gw.udmsearch.izhnet.ru/~bar/crc32.indexer.c.patch.gz It should the crash. Take a look into proto.c. UDM_NET_ERROR (it is -1) is returned only in two places: 1. in open_host() function, when port is 0. 2. in UdmHTTPGet() function, when select() returns an error. I have no idea what is happening. Zenon Panoussis wrote: Alexander Barkov skrev: OK. Please check also this: print realsize print *Doc (gdb) frame 1 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 1150crc32=UdmCRC32(Doc-content, (size_t)realsize); (gdb) print realsize $1 = -1 (gdb) print *Doc $2 = {url_id = 12018, status = 0, size = 0, rating = 0, order = 0, referrer = 0, tag = 0, hops = 3, indexed = 0, url = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/.", content_type = 0x0, title = 0x0, keywords = 0x0, description = 0x0, text = 0x0, category = 0x0, content = 0x4021c03e "", last_mod_time = 0, last_index_time = 983253816, next_index_time = 0, crc32 = 0} Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
I just not sure what version are you using. Is it here: }else{ /* Unknown Content-Type */ if(Method!=UDM_HEAD){ crc32=UdmCRC32(Doc-content, (size_t)realsize); changed=!(crc32==Doc-crc32); if(CurSrv-use_clones){ origin=UdmFindOrigin(Indexer, crc32, size); origin=((origin==Doc-url_id)?0:origin); } } } please run the following commands in gdb: frame 1 print content_type print Method print Doc print Doc-content print Doc-url Zenon Panoussis wrote: ./indexer -c [600 | 15000] segfaults on 3.1.11 patched (see my earlier postings in this thread for details) Segfault #1: #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 1181568253 (gdb) print *p Cannot access memory at address 0x40499000 (gdb) print p $2 = 0x40499000 Address 0x40499000 out of bounds (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 Segfault #2: #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 4285190670 (gdb) print *p Cannot access memory at address 0x404d3000 (gdb) print p $2 = 0x404d3000 Address 0x404d3000 out of bounds (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x8094480, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 Segfault #3: #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 2724492306 (gdb) print *p Cannot access memory at address 0x40432000 (gdb) print p $2 = 0x40432000 Address 0x40432000 out of bounds (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x8094480, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 Segfault #4: #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 2252292711 (gdb) print *p Cannot access memory at address 0x40432000 (gdb) print p $2 = 0x40432000 Address 0x40432000 out of bounds (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 Segfault #5: #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 879758289 (gdb) print *p Cannot access memory at address 0x4054f000 (gdb) print p $2 = 0x4054f000 Address 0x4054f000 out of bounds (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 Segfault #6: #0 0x80600ca in
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
Alexander Barkov skrev: I just not sure what version are you using. 3.1.11 with the add_url.3.1.11.diff patch only. Is it here: }else{ /* Unknown Content-Type */ if(Method!=UDM_HEAD){ crc32=UdmCRC32(Doc-content, (size_t)realsize); changed=!(crc32==Doc-crc32); if(CurSrv-use_clones){ origin=UdmFindOrigin(Indexer, crc32, size); origin=((origin==Doc-url_id)?0:origin); } } } please run the following commands in gdb: frame 1 print content_type print Method print Doc print Doc-content print Doc-url #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) frame 1 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 1150crc32=UdmCRC32(Doc-content, (size_t)realsize); (gdb) print content_type $1 = 0x4021c027 "application/unknown" (gdb) print Method $2 = 1 (gdb) print Doc $3 = (UDM_DOCUMENT *) 0x91ef7d8 (gdb) print Doc-content $4 = 0x4021c03e "" (gdb) print Doc-url $5 = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/." See my (bounced) posting from [EMAIL PROTECTED] on Tue, 27 Feb 2001 15:46:37 +0100 for details about this and the other URLs that the indexer crashes on. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
OK. Please check also this: print realsize print *Doc Zenon Panoussis wrote: please run the following commands in gdb: frame 1 print content_type print Method print Doc print Doc-content print Doc-url #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) frame 1 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 1150crc32=UdmCRC32(Doc-content, (size_t)realsize); (gdb) print content_type $1 = 0x4021c027 "application/unknown" (gdb) print Method $2 = 1 (gdb) print Doc $3 = (UDM_DOCUMENT *) 0x91ef7d8 (gdb) print Doc-content $4 = 0x4021c03e "" (gdb) print Doc-url $5 = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/." See my (bounced) posting from [EMAIL PROTECTED] on Tue, 27 Feb 2001 15:46:37 +0100 for details about this and the other URLs that the indexer crashes on. ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
Hi I tested several pages from your site and everything seems to work fine. What you see on the web is 3.1.10 . The 3.1.11 (patched yesterday) runs in a separate directory and a separate database; this way I can maintain the search functional while the 3.1.11 bugs are worked out of it :) You can use 3.1.11 by going to http://search.freewinds.cx/cgi-bin/v4.cgi . On the other hand, it is *indexer* that's craching; the search part works fine (apart from the little ul= problem I reported yesterday). Does indexer crash always on the same URL? This is new since yesterday's patch: indexer crashes after a few minutes, always in the middle of a URL, like this Indexer[21800]: [1] http://www.scientology-kills.org/dead.htm Indexer[21800]: [1] http://www.xs4all.nl/~fishman/ls/. Tue 27 08:26:06 [21283] Client #0 left Segmentation fault (core dumped) This particular URL is not very long and contains no spaces or other funny stuff; what is missing after http://www.xs4all.nl/~fishman/ls/ is something like ls02b.html After a crash I restart indexer and it goes on with status 0 URLs in the order it has them, so it won't go back and won't crash on the same URL. Please send also "backtrace" gdb command output. #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 1181568253 (gdb) print p $2 = 0x40499000 Address 0x40499000 out of bounds (gdb) print *p Cannot access memory at address 0x40499000 (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 I'm saving the core, -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]