Webboard: Trailing dot=segfault
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: v 3.1.17 RH linux 7.1, MySQL 3.23.26 Indexer[10300]: [1] http://www.freezone.org/. Tue 03 12:35:25 [10044] Client #0 left Segmentation fault (core dumped) Three times in a row on the same URL. Notice the trailing dot in the URL. I didn't put it there; it is part of the URL as indexer printed it before crashing. Could it be that indexer can't deal with a malformatted link that? Z Reply: http://www.mnogosearch.org/board/message.php?id=2577 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Trailing dot=segfault
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Here's another one: Indexer[12263]: [1] http://members.aol.com/pjmoy/.. Wed 04 03:03:10 [10044] Client #0 left Segmentation fault (core dumped) Notice the trailing dots. Indexer segfaults every time it comes to a URL with trailing dots and never otherwise. I have indexed these URLs with earlier versions without problems. I suspect that previous versions could deal with bad html better than .17 can. Ah, and another difference: in previous versions I didn't use crosswords, but now I do. It could be there too. Z Reply: http://www.mnogosearch.org/board/message.php?id=2582 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Weird search results
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: It seems you are using search results cache and it is quite old. Remove all cached queries. No, it's not that. I am not using results cache at all. Z Reply: http://www.mnogosearch.org/board/message.php?id=2435 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Spammers, mother fuckers, virii and this list
Guys (sadly no gals around as far as the eye can see...) If we could all agree that - nationality and character are not derivatives of each-other - spammers are mother fuckers - superonline.com is an ISP that really sucks (I have had to deal with them myself, I know first-hand) - Outlook is a mail client that really sucks and there is no virus that won't spread through it - anybody can make a mistake or become the victim of a mistake and, most important of all, that - this list is not a general discussion list then perhaps we could all shake virtual hands and leave this subject behind us and go back to our favourite nerding activities? Please do not reply, except by e-mail if you need to. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: Webboard: SSI
Alexander Barkov skrev: Does anybody know of any way to put server side includes in search.htm? Use $iurl(http://some/include.html) template syntax. It includes given URL. You may also use $if(/usr/local/httpd/include.html). This comman includes given file from local system. It works excellently. And I realise that I have to read the documentation again. Last time I did that we were at 3.1.7 or so, and lots of things seem to have been added in the meanwhile. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: SSI
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Does anybody know of any way to put server side includes in search.htm? It seems rather impossible to do it through the server configuration; after all, the file is not processed by the server, but by search.cgi. Reply: http://search.mnogo.ru/board/message.php?id=1905 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Link length
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The webboard is going to reformat this and make it look like shit, but you can get the same information with # mysql -p mnogosearch mysql describe url; The length is 128 characters. Z +-+--+--+-+-++-+ | Field | Type | Null | Key | Default | Extra | Privileges || +-+--+--+-+-++-+ | rec_id | int(11) | | PRI | NULL| auto_increment | |select,insert,update,references | | status | int(11) | | | 0 || |select,insert,update,references | | url | varchar(128) | | UNI | || |select,insert,update,references | | content_type| varchar(48) | | | || |select,insert,update,references | | title | varchar(128) | | | || |select,insert,update,references | | txt | varchar(255) | | | || |select,insert,update,references | | docsize | int(11) | | | 0 || |select,insert,update,references | | last_index_time | int(11) | | | 0 || |select,insert,update,references | | next_index_time | int(11) | | | 0 || |select,insert,update,references | | last_mod_time | int(11) | | | 0 || |select,insert,update,references | | referrer| int(11) | | | 0 || |select,insert,update,references | | tag | varchar(11) | | | 0 || |select,insert,update,references | | hops| int(11) | | | 0 || |select,insert,update,references | | category| varchar(11) | | | || |select,insert,update,references | | keywords| varchar(255) | | | || |select,insert,update,references | | description | varchar(100) | | | || |select,insert,update,references | | crc32 | int(11) | | MUL | 0 || |select,insert,update,references | | lang| char(2) | | | || |select,insert,update,references | +-+--+--+-+-++-+ Reply: http://search.mnogo.ru/board/message.php?id=1906 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Compliments
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: In the past couple of days I've been looking at urls scrolling up the screen, ending in things like Indexer[12895]: [1] Done (100122 seconds) and 200 0 190277 OK In the meanwhile, no core dumps. No segfaults. No complaints. No problems whatsoever. At the same time I have been getting lots of public compliments for "my" search engine, while really it is *your* search engine and the compliments should be addressed to you. You guys have come a long way. It has to be said and acknowledged, and I am raising a glas to you. Cheers! Thanks for really good work. Reply: http://search.mnogo.ru/board/message.php?id=1891 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: indexer -g
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: v 3.1.12, MySQL: ./indexer -g 0101 correctly indexes all the pages that are listed in that category in indexer.conf , but it doesn't stop there; it goes on and indexes pages in other categories too if they are linked to from pages in the right category. Hmm. Do I make myself clear? I think not. This is what I mean: Assume that indexer.conf contains the following: Category 0101 Server site http://www.here.com Category 0102 Server site http://www.there.org If I now give the command "indexer -g 0101" the indexer will crawl www.here.com. However, if there is a link in www.here.com to www.there.org, then the indexer will continue and index www.there.com too, which was not my intention. Z Reply: http://search.mnogo.ru/board/message.php?id=1878 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: One URL per domain
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: At times a search term happens to be repeated lots of times on different pages of the same site, so that the results get clogged. Imagine looking for shops in your area that offer "home delivery". If one shop has the line "home delivery $2 extra per order" on every single page of every one of thousands of articles, you will never get across to any other shop. Yet, what you are looking for is different shops, not different articles. The solution to this is an option to return only one URL per domain. To my knowledge is http://www.vindex.nl the only search engine that offers this option currently, actually by default. I think it is an excellent option to add to the todo list. Reply: http://search.mnogo.ru/board/message.php?id=1879 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: CVS
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Being totally unfamiliar with CVS, I keep reading HOWTOs and yet can't manage to check out a current version (%#@*!). It would be very helpful if you'd put the exact cvs command together with the CVS info on the main page. Z Reply: http://search.mnogo.ru/board/message.php?id=1877 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: DB.robots ?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Duh. It took some time to find out what had happened, but as it turns out I had typed mysql -p database source/create/mysql/create.txt thereby truncating the create.txt file to 0 bytes and not creating the tables either. db.robots simply happened to be the first table that indexer tried to access. The bug is me. Z Reply: http://search.mnogo.ru/board/message.php?id=1697 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: mirror paths
Caffeinate The World skrev: #MirrorRoot /path/to/mirror #MirrorHeadersRoot /path/to/headers in regard to the above, are they relative to the installation path --PREFIX like var is? No, these ones are absolute. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: mirror paths
Caffeinate The World skrev: this is so strange, i still don't see anything in my mirror directories... Try ./indexer -m Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
Alexander Barkov skrev: I just not sure what version are you using. 3.1.11 with the add_url.3.1.11.diff patch only. Is it here: }else{ /* Unknown Content-Type */ if(Method!=UDM_HEAD){ crc32=UdmCRC32(Doc-content, (size_t)realsize); changed=!(crc32==Doc-crc32); if(CurSrv-use_clones){ origin=UdmFindOrigin(Indexer, crc32, size); origin=((origin==Doc-url_id)?0:origin); } } } please run the following commands in gdb: frame 1 print content_type print Method print Doc print Doc-content print Doc-url #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) frame 1 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 1150crc32=UdmCRC32(Doc-content, (size_t)realsize); (gdb) print content_type $1 = 0x4021c027 "application/unknown" (gdb) print Method $2 = 1 (gdb) print Doc $3 = (UDM_DOCUMENT *) 0x91ef7d8 (gdb) print Doc-content $4 = 0x4021c03e "" (gdb) print Doc-url $5 = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/." See my (bounced) posting from [EMAIL PROTECTED] on Tue, 27 Feb 2001 15:46:37 +0100 for details about this and the other URLs that the indexer crashes on. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Alias news://xyz news://123 ?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: What exactly didn't work? Alias? Does indexer connect to the origial server? Or something else? Nothing at all happens. The indexer connects to cachelogd and exits again normally after one or two seconds without ever connecting to the news server. I gather from your answer that aliasing news should be no problem, so later today I'll try to trace just what indexer does and report back. Z Reply: http://search.mnogo.ru/board/message.php?id=1581 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]
Hi I tested several pages from your site and everything seems to work fine. What you see on the web is 3.1.10 . The 3.1.11 (patched yesterday) runs in a separate directory and a separate database; this way I can maintain the search functional while the 3.1.11 bugs are worked out of it :) You can use 3.1.11 by going to http://search.freewinds.cx/cgi-bin/v4.cgi . On the other hand, it is *indexer* that's craching; the search part works fine (apart from the little ul= problem I reported yesterday). Does indexer crash always on the same URL? This is new since yesterday's patch: indexer crashes after a few minutes, always in the middle of a URL, like this Indexer[21800]: [1] http://www.scientology-kills.org/dead.htm Indexer[21800]: [1] http://www.xs4all.nl/~fishman/ls/. Tue 27 08:26:06 [21283] Client #0 left Segmentation fault (core dumped) This particular URL is not very long and contains no spaces or other funny stuff; what is missing after http://www.xs4all.nl/~fishman/ls/ is something like ls02b.html After a crash I restart indexer and it goes on with status 0 URLs in the order it has them, so it won't go back and won't crash on the same URL. Please send also "backtrace" gdb command output. #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 1181568253 (gdb) print p $2 = 0x40499000 Address 0x40499000 out of bounds (gdb) print *p Cannot access memory at address 0x40499000 (gdb) backtrace #0 0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97 #1 0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150 #2 0x804a050 in thread_main (arg=0x0) at main.c:256 #3 0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596 #4 0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xbaac) at ../sysdeps/generic/libc-start.c:118 I'm saving the core, -- oracle@everywhere: The ephemeral source of the eternal truth... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Site search
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The escape bug in previous versions of cache mode is fixed, but ul=site seems to work only in the format http://www.domain.dom/ . Attempting http://www.domain yields no results at all and just www.domain.dom returns just anything (in other words, is ignored). Wouldn't it be a good idea to put the ul string between %% wildcards in the sql query, so that the user can type just any string and get his search limited to that? E.g. use [nothing] or http:// for any matches, http://www.domain.dom or just domain.dom for domain.dom .dom/ for anything in the .dom tLD. Z Reply: http://search.mnogo.ru/board/message.php?id=1537 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Segfault (bad reference)
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: First try with a freshly compiled 3.1.11 on RH-7.0, mysql. # ./indexer -m -c 300 Indexer[20577]: indexer from mnogosearch-3.1.11/MySQL started with '/usr/local/mn3111-1/etc/indexer.conf' Wed 21 04:24:54 [20556] Client #0 connected Segmentation fault (core dumped) # Wed 21 04:24:55 [20556] Client #0 left # gdb indexer core loading etc #0 0x400d4e1f in _IO_vfprintf (s=0xbfff4f20, format=0x806eec0 "INSERT INTO url (url,referrer,hops,crc32,last_index_time,next_index_time,status,tag,category) VALUES ('%s',%d,%d,0,%d,%d,0,'%s','%s')", ap=0xbfff5020) at ../sysdeps/i386/bits/string.h:343 343 ../sysdeps/i386/bits/string.h: No such file or directory. I have sting.h in the following places: # locate string.h /usr/include/asm/string.h /usr/include/linux/string.h /usr/include/bits/string.h /usr/include/string.h /usr/include/g++-3/std/bastring.h /usr/include/linuxconf/sstring.h /usr/include/mysql/m_string.h /usr/lib/bcc/include/string.h /usr/local/include/php/ext/standard/php_string.h I used --prefix=/usr/local/mn3111-1 --localstatedir=/var/mn3111-1 . That "3111-1" is almost if I would be expecting a 3111-2 ;) Z Reply: http://search.mnogo.ru/board/message.php?id=1485 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]