Re: UdmSearch: Webboard: Segfault (grrr)
Zenon Panoussis skrev: > > Oops. Something else is not OK: > cache.c:687:87: warning: #ifdef with no argument [etc] I think that the mailer is responsible for this. There are lots of broken lines in the code, that shouldn't be broken. Perhaps it's better to attach the file in .gz format instead of text. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > We finally found a bug in cache.c. New version is in attachement. > Everybody who has problems with splitter's crashes are welcome to test. > Please, give feedback! Oops. Something else is not OK: cache.c:687:87: warning: #ifdef with no argument cache.c:692:87: warning: #ifdef with no argument cache.c:697:87: warning: #ifdef with no argument cache.c:702:87: warning: #ifdef with no argument cache.c: In function `UdmFindCache': cache.c:969: parse error before `?' cache.c:982: `real_num' undeclared (first use in this function) cache.c:982: (Each undeclared identifier is reported only once cache.c:982: for each function it appears in.) cache.c:994: `fd1' undeclared (first use in this function) cache.c:996: `group' undeclared (first use in this function) cache.c:1000: `group_num' undeclared (first use in this function) cache.c: At top level: cache.c:1011: initializer element is not constant cache.c:1011: warning: data definition has no type or storage class cache.c:1012: parse error before string constant cache.c:1013: parse error before string constant cache.c:1013: warning: data definition has no type or storage class cache.c:1014: redefinition of `ticks' cache.c:1011: `ticks' previously defined here cache.c:1014: initializer element is not constant cache.c:1014: warning: data definition has no type or storage class cache.c:1015: parse error before string constant cache.c:1015: warning: data definition has no type or storage class cache.c:1024: `i' undeclared here (not in a function) cache.c:1024: parse error before `.' cache.c:1030: register name not specified for `p' cache.c:1032: parse error before `if' cache.c:1035: `pmerg' undeclared here (not in a function) cache.c:1035: `pmerg' undeclared here (not in a function) cache.c:1035: warning: data definition has no type or storage class cache.c:1036: parse error before `&' cache.c:1043: `k' undeclared here (not in a function) cache.c:1043: warning: data definition has no type or storage class cache.c:1044: parse error before `}' cache.c:1046: conflicting types for `p' cache.c:1030: previous declaration of `p' cache.c:1046: `pmerg' undeclared here (not in a function) cache.c:1046: warning: data definition has no type or storage class cache.c:1047: parse error before `&' cache.c:1048: parse error before `->' cache.c:1058: warning: initialization makes integer from pointer without a cast cache.c:1058: warning: data definition has no type or storage class cache.c:1058: parse error before `}' cache.c:1061: redefinition of `ticks' cache.c:1014: `ticks' previously defined here cache.c:1061: initializer element is not constant cache.c:1061: warning: data definition has no type or storage class cache.c:1063: parse error before string constant cache.c:1071: warning: parameter names (without types) in function declaration cache.c:1071: conflicting types for `UdmGroupByURL' ../include/udm_searchtool.h:7: previous declaration of `UdmGroupByURL' cache.c:1071: warning: data definition has no type or storage class cache.c:1072: parse error before `}' make[1]: *** [cache.lo] Error 1 make[1]: Leaving directory `/root/mnogosearch-3.1.10/src' make: *** [all-recursive] Error 1 -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > We finally found a bug in cache.c. New version is in attachement. > Everybody who has problems with splitter's crashes are welcome to test. > Please, give feedback! You guys are great! I'll re-compile and get back to you with reports. BTW, can I remove http://search.freewinds.cx/garbage_in_sbin.tar.gz now? Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Indexer still runs but search.cgi does not and a small story about the problems
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I have tried 600, which is fine for the indexer and I have tried > even 777 but it did not made any difference. After all search > (and search.cgi) can read it via ssh but it fails via browser ... Oops - error. When you access it via ssh, it is user "wendibus" reading the file, while if you access it via the browser it is user "nobody" reading it. If the permissions of search.htm would be -rw--- (and everything else OK) you would get that precise effect. Search.htm should be -rw-r--r-- so that "nobody" can read it. In your particular case the problem is elsewhere (and the thing didn't work with search.htm in -rwxrwxrwx mode either), but anybody else reading the webboard should keep this in mind. Z Reply: <http://search.mnogo.ru/board/message.php?id=1427> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no files found in mirror directories
Caffeinate The World skrev: > > i have indexer going but i see nothing in the mirror directories. when > does it store the pages to the mirror directory? If your pages are already indexed, when you re-index with -a indexer will check the headers and only download files that have been modified since the last indexing. Thus, all pages that are not modified will not be dowloaded and therefore not mirrored either. To create the mirror you need to either (a) start again with a clean database or (b) use the -m switch. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Setting up
> Basically, my UNIX/LINUX knowledge is non-existant. I've FTP'd the tar file > to our LINUX box, and have extracted it... but where to go from there??? I > haven't got a clue. If that is so, you are bound to run into problems all the time. Perhaps installing a search engine is not the first thing you should do as basic Linux training. Anyway, I'll try to help you a bit on the way. In the following I assume that you are logged is as root. If you don't have root access to your machine, more steps are needed, marked with [*]. In such case, ask again. - Install your database (mysql or whatever you are going to use). Make sure it is working. - Download the .tar.gz file on your home directory, e.g. /root - Unpack it with #tar zxvf mnogosearch-3.1.10.tar.gz - Move into the newly created directory with #cd mnogosearch-3.1.10 - [*] - Prepare the source with #./configure --with-[your_database] - Compile with #make && make install - You will now have a new directory called /usr/local/mnogosearch Go there with #cd /usr/local/mnogosearch/etc - Create a new database and tables in it. Since I don't know what database you are using, I can't help you here. Note that you have to take at least two steps: (1) create the database, (2) create the sql tables in it. Two additional steps that are highly recommended but not necessary are to (3) add the stopword tables of your choice and (4) create a new user on the database, so that indexer and search don't run as root. - Edit indexer.conf with #vi indexer.conf (this is torture: now you have to learn vi while trying to install the search. I'm sorry to tell you, vi is a bitch. Anyway, use the arrows to move around. Use i to enter insert mode. Use [esc] to exit insert mode. Use [esc]:x[enter] to save and exit with these four things you should be able to edit a file). If you edit the following items to correspond to your settings you will have a minimal working configuration: - DBAddr (example: mysql://username:password@machine/database/ ) - DBMode (example: single) - LocalCharset (example: LocalCharset iso-8859-1 ) - Server (example: Server path http://www.domain.dom/path ) For DBMode and LocalCharset all you need to do is uncomment the right line (remove the "#" is front of it). Save and exit. - Copy search.htm-dist to search.htm with the command #cp search.htm-dist search.htm - Edit search.htm with #vi search.htm . Adjust DBAddr, DBMode and LocalCharset to the exact same settings as in indexer.conf. Don't touch anything else. Save and exit. - [*] - Find where the cgi-bin directory of your webserver is. If you are running apache without virtual domains it will be in /var/www/cgi-bin or /vol/www/cgi-bin or something similar. If you are working on your ISP's machine, ask the ISP. - Copy search.cgi to the cgi-bin directory. Assuming that you are still in /usr/local/mnogosearch/etc , you do that with #cp ../bin/search.cgi /[full_path_to/cgi-bin/search.cgi . - [*] - Try to access search.cgi with your browser. Go to http://www.your_domain.dom/cgi-bin/search.cgi . If you get a search box, you have come a long way. If not, you need more help. When asking for it, describe exactly what you did, how you did it, what worked or did not and what error messages you are getting. - Change to the sbin directory of the mnogo installation. Assuming that you are still in /usr/local/mnogosearch/etc , do #cd ../sbin - Start indexer with #./indexer -a -c 300 . This will cause indexer to run for 5 minutes and stop. Go back to your browser and search for a word that you know occurs in the files that you just saw indexer index. If you get results, you have come a very long way. I that case you can - Re-start indexer with #./indexer and let it finish its job. Good luck. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Bug report
Bernd Schulze skrev: > > Files that consist of html code get a > correct title entry in the database. > Other document types (and we have 90% > pdf) (that is files that do not have the > possibility of a title tag) get assigned > the last title that has been successfully > found in a tag. I've had this problem with .txt files and v 3.1.8/mysql. After indexing everything, I removed the .txt files from the index (indexer -C -u %.txt) and re-indexed them (indexer -a -u %.txt). Somehow that fixed the problem and all .txt files got the correct "No title" title. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Zenon Panoussis skrev: > > By now, I have almost 1 GB of indexed files, 4 indexer > crashes and one splitter crash. I'll do the debugging and > post its output tomorrow. === # gdb indexer core.indexer.01 GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Core was generated by `./indexer -m -s 200'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /lib/libnss_nisplus.so.2...done. Loaded symbols for /lib/libnss_nisplus.so.2 Reading symbols from /lib/libnss_nis.so.2...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_dns.so.2...done. Loaded symbols for /lib/libnss_dns.so.2 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 #0 0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 1928826335 (gdb) print p $2 = 0x40431000 === # gdb indexer core.indexer.02 #0 0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 835566978 (gdb) print p $2 = 0x40404000 === # gdb indexer core.indexer.03 #0 0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) print crc $1 = 2869617068 (gdb) print p $2 = 0x40404000 === # gdb indexer core.indexer.04 (gdb) print crc $1 = 1253677059 (gdb) print p $2 = 0x40431000 === And finally the splitter: # gdb splitter core.splitter.01 This GDB was configured as "i386-redhat-linux"... Core was generated by `/usr/local/mnogo3110/sbin/splitter'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x8057d15 in UdmSplitCacheLog (log=118) at cache.c:635 635 logwords[count+j].wrd_id=table[w].wrd_id; (gdb) print count $1 = 13121220 (gdb) print count+j $2 = 13125316 (gdb) print logwords $3 = (UDM_LOGWORD *) 0x0 (gdb) print table[w] $4 = {wrd_id = 1918989871, weight = 1869507887, pos = 825454439, len = 1949249585} (gdb) print logwords[count+j] Cannot access memory at address 0x15e7bd70 === This time I'm keeping the core dumps, so let me know if there's anything else you want me to check. Apart from this, I got some garbage directories with misnamed splitter files in them in sbin: # pwd /usr/local/mnogo3110/sbin # ls -l -rw-r--r--1 root root 457672 Feb 13 08:28 àË??? drwxr-xr-x3 root root 4096 Feb 13 08:28 àË???3F -rw-r--r--1 root root 487224 Feb 13 08:27 æmEhttp://search.freewinds.cx/garbage_in_sbin.tar.gz Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Full on search engine?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Depends on what you mean. Follow the links at http://search.mnogo.ru/users.html and see what it can do. Z Reply: <http://search.mnogo.ru/board/message.php?id=1392> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Caffeinate The World skrev: > > i've been going through this and back again time and time again. what > would really be nice is indexer save the logs in a format that's easy > to use again. for instance, you can use the format re-index to sql etc. > or if you want to reindex again, you don't have to crawl through all > the external websites. saves a lot of time and we can debug faster. I'm not sure what you mean here. The Mirror statement does just that (and luckily, I had an almost complete mirror already). Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Zenon Panoussis skrev: > > Now for 31 MB adventures :) # ./run-splitter -k Sending -HUP signal to cachelogd... Done # ./run-splitter -p Preparing logs... Open dir '/var/mnogo3110/raw' Preparing word log 982024900 [ 42176 bytes] Preparing word log 982027284 [31465324 bytes] Preparing word log 982027618 [ 8815804 bytes] Preparing del log 982024900 Preparing del log 982027284 Preparing del log 982027618 Renaming logs... Done Running ./run-splitter on these worked fine. No problems at all. After that, I went on indexing and created 59920 Feb 13 06:05 982040748.del.done 31457740 Feb 13 06:05 982040748.wrd.done 1480 Feb 13 06:06 982040807.del.done 637240 Feb 13 06:06 982040807.wrd.done 51920 Feb 13 07:21 982045300.del.done 31469304 Feb 13 07:21 982045300.wrd.done 69248 Feb 13 07:51 982047843.del.done 30213344 Feb 13 07:51 982047843.wrd.done another two 31 MB files and two smaller ones. All of them were splitted without problems. [two days later] Indexing kept crashing (see separate posting) and splitting kept going fine until tonight, when the opposite occured. By now, I have almost 1 GB of indexed files, 4 indexer crashes and one splitter crash. I'll do the debugging and post its output tomorrow. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: This works
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > this works, works ;-)) > I do have no right to write to the main cgi-bin of the server. > But I do have the right to install cgis in the user dirs. I begin to suspect that you are confusing mnogosearch/bin with cgi-bin. Can you give me the directory structure of *your* file area? Like basedir basedir/mnogo basedir/www etc. Now, if the little perl script I gave you worked, copy search.cgi to the same place and access it in the same way. Whatever you did to get that perl script to work: do the same with search.cgi. Do not recompile or anything, just by copy it where you want it. Z Reply: <http://search.mnogo.ru/board/message.php?id=1376> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer runs and search.cgi does not
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > Question: are you running your own web server? Is the cgi-bin of > your particular domain and user account *really* set to the cgi-bin > directory that you are using? Are you sure? If you don't know how the web server is configured, here is how to test it. Put this in a file called test.pl : #!/usr/bin/perl -w use CGI ':standard'; print header(); print start_html(); print h5("This works"); print end_html(); Do chmod a+x test.pl and place the file in the same directory as your search.cgi . Do ./test.pl on the shell; that should give you a simple HTML page. Now call the script from your browser with http://your.domain/cgi-bin/test.pl or with http://your.domain/your_dir/cgi-bin/test.pl . Does it work? If not, your problem is in the server configuration and the location of your cgi-bin. Z Reply: <http://search.mnogo.ru/board/message.php?id=1372> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Stupid Question about the host
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I have another stupid question. Is it possible that I have to call > the configure script with an option for the host type? > I have installed it via ssh on the host but have not used any > stuff there... and the database is on a different server then > the script is. It needs to be configured for the machine where indexer runs, not the machine where mysql and/or the databases reside. If you compiled it on the same machine as the one indexer runs on, it should be fine. Besides, we already know that search.cgi works from the shell, so it can't be a platform problem you are having. Just as general information though, configure does allow you to compile for different machines. Do ./configure --help and check the "Host type" section. Z Reply: <http://search.mnogo.ru/board/message.php?id=1371> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer runs and search.cgi does not
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > If I move search.htm the script complains that it cant find > search.htm via the ssh. Well that's good. It shows that the cgi is looking in the right place for the right file. > maybe it is any kind of help I have called the script like this > configure --with-mysql --prefix=/mylocal/dir/ That should be OK. > it does not matter if I put it into a local dir or in a local > cgi-bin/dir it never works ... "A local cgi-bin/dir"? What do you mean? Question: are you running your own web server? Is the cgi-bin of your particular domain and user account *really* set to the cgi-bin directory that you are using? Are you sure? Z Reply: <http://search.mnogo.ru/board/message.php?id=1370> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer runs and search.cgi does not
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Grrr. OK, try this then: Make a static HTML page with this form in it: Search for: Try it and tell me what you get. Somehow we need to figure whether the error is in search.htm, file permissions, the web server configuration or yet something else. BTW, do you have a 100% standard installation? No recompilations after you copied search.cgi in cgi-bin, no particular modifications to anything? Z Reply: <http://search.mnogo.ru/board/message.php?id=1369> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Other segfault
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: [3.1.10, RH 7.0 on PII, mysql-3.23.29-1, cache mode] While trying to reproduce the splitter segfault, I got a segfault from indexer. I don't remember this ever happening before and I've been using mnogosearch since the early days of 3.1.7. The way things have been lately I would start questionning my RAM, but everything else on the machine runs fine, so it can't be that. OK, the debug: # gdb indexer core GNU gdb 5.0 This GDB was configured as "i386-redhat-linux"... Core was generated by `./indexer -m -s 200'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /lib/libnss_nisplus.so.2...done. Loaded symbols for /lib/libnss_nisplus.so.2 Reading symbols from /lib/libnss_nis.so.2...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_dns.so.2...done. Loaded symbols for /lib/libnss_dns.so.2 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 #0 0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at crc32.c:97 97 _CRC32_(crc, *p) ; (gdb) backtrace #0 0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at crc32.c:97 #1 0x804d768 in UdmIndexNextURL (Indexer=0x808d308, index_flags=5) at indexer.c:1145 #2 0x804a020 in thread_main (arg=0x0) at main.c:256 #3 0x804a9b0 in main (argc=4, argv=0xbaa4) at main.c:596 #4 0x4009bbfc in __libc_start_main (main=0x804a13c , argc=4, ubp_av=0xbaa4, init=0x8049684 <_init>, fini=0x8068bec <_fini>, rtld_fini=0x4000d674 <_dl_fini>, stack_end=0xba9c) at ../sysdeps/generic/libc-start.c:118 This time I'm keeping core. Just tell me if you want me to run gdm on anything else and how to do that. In any case I'd suggest that you don't bother with this now. Indexer has been working so well so far, that we probably can ascribe this to pure bad luck. If it happens again I'll let you know. Z Reply: <http://search.mnogo.ru/board/message.php?id=1350> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: splitter -p does not rename logs
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: v 3.1.10 in cache mode: splitter -p should rename the n.del and .wrd logs to .done (cachemode.txt, 4B). It did until v 3.1.9, but doesn't any more. run-splitter -p does though. Z Reply: <http://search.mnogo.ru/board/message.php?id=1349> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer runs and search.cgi does not
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > thanks for the hint. It looks like this iExplorer 5.5 believes > the above mentioned code. Opera 4.0 says that the page is just > empty. > the code from the ssh session looks quite fine to me. Here it > is. OK, we're getting closer. Your code says . This results in no action at all. Check your search.htm: it should say . Z Reply: <http://search.mnogo.ru/board/message.php?id=1347> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > Could you check count, j, w, table[w], logwords[count+j] > variable values? Use print gdb command. AAARGH! I deleted the core dump. I didn't know that I could do that :( Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: This is SHITE!!!
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > After spending nearly 3 Days trying to get this thing to work, I > have come to the conclusion that it is a waste of time and a > JOKE:-(..) In that case you are entitled to your money back. Every penny of it. > The documentation is poor and the support I am getting from this > board is daft. Really? Let's see: The query is OKAY except it is expecting some input. Question: Do the inputs need to be supplied during indexing? My users supply the inputs when they are searching. At indexing time, I have no idea what inputs they will supply. What input? If the users supply input during *searching*, you can't very well expect that that input will be used during *indexing*, can you? Indexing is supposed to take place *before* searching. And also: Can this web board be searched? I hope it is not some kinda strategy for this site to get more hits. (insults in the very question) Alexander: There is a link from main page to the site search. Webboard is indexed too among static documents and mailing list archive. You: Where is it? I cannot find it. Your question was answered. If you can't find the main page of this site, you shouldn't be installing databases. If you can't find the search link on the main page, you shouldn't be near computers at all. > Does anyone else no of any alternative? If so > please let me know. No alternatives will help you. You need to start at the basics. "How to find a link on a web page" etc. > See my postings below to see problems I have been having and > the replies i get and you will see why I am feeling this way. The only reason I bother to reply to any of this is that the developers have put a tremendous amount of work into something they provide to you for free, and all you know to do is (a) pose incomprehensible questions, (b) pose stupid questions and (c) post insults. I wish you luck with your computercontractor.net . Perhaps one day you can use it to find someone with more skills and less arrogance than yourself. Z Reply: <http://search.mnogo.ru/board/message.php?id=1343> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Parameters...
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I have built my index using this: > HTDBDoc \ > SELECT concat( \ > [etc] > FROM jobsadvertised \ > WHERE job_id='$1' and to_days(now()) - to_days(job_inp_dte) <= '$2' and site_type >= '$4' and job_location = '$3' and job_type = '$5' > How do I pass the required values to indexer from my browser. > I.e, how will it know what $1 is etc... This means that you have created your tables and that you are not using search.cgi, but are writing your own search scripts too. Well then you should look at the Perl DBI module and try something like *** #!/usr/bin/perl -w use strict; use DBI; use CGI ':standard'; use CGI::Carp 'fatalsToBrowser'; $CGI::POST_MAX=300; my ($query) = @_; import_names('R'); print header(); print [insert your HTML here] my $sth = $dbh->prepare("SELECT \"%$R::job%\" FROM \"$R::table\" WHERE parameter = \"$R::search_term\" AND other_parameter LIKE \"%$R::other_term%\" GROUP BY what_you_want"); $sth->execute(); while (my $ref = $sth->fetchrow_hashref()) { print "$ref->{'job'}","\n"; } print [more HTML] $sth->finish(); $dbh->disconnect(); print end_html(); *** You call this jobs.cgi. Then you create a search form in plain HTML with action=/cgi-bin/jobs.cgi where you name the input fields. The names that you have given to those input fields will be passed to the script in the form R::name and go straight where you want them. Note that the above example is just a very quick adaptation of something I had ready, so it will most probably not work as it is. In any case, for this kind I things you might be screaming in wrong forum. You'd probably be better off asking questions in the apropriate database and/or perl module mailing lists. Z Reply: <http://search.mnogo.ru/board/message.php?id=1341> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer runs and search.cgi does not
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The page is still empty. The actual code is: > > !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">; > > > > > So there is no error message. When I go to the search.cgi dir and call search.cgi >via ssh I still receive a quite good looking HTML-Code on the stdout... Hmm. I'm beginning to wonder: could it be that charset tag? What OS and browser are you using to look at the page with? Try this: 1. Copy/paste the html from ssh onto a static page on the same server. Access that page with your browser. What do you see? 2. Remove the charset tag from the static page. Try again. What do you see? Z Reply: <http://search.mnogo.ru/board/message.php?id=1338> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: indexer -m -e -n 1000
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Is it easy to implement? With big databases it would allow forced re-indexing from the bottom up in a controlled manner. As it is now, if you just do indexer -m the indexer will run away and do the entire URL list in the database. If you do indexer -m -c n and then repeat it, the indexer will take the same documents in both runs. And if you don't know which documents are oldest, you can't use indexer -m -u pattern (which would also be very tedious if you have a huge list of URLs). Z Reply: <http://search.mnogo.ru/board/message.php?id=1339> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: run-splitter
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: run-splitter does not obey --localstatedir . If you configure with --prefix=/usr/local/mnogo3110 --localstatedir=/var/mnogo3110 , run-splitter comes out as PREFIX=/usr/local/mnogo3110 VAR=$PREFIX/var SBIN=$PREFIX/sbin PID=$VAR/cachelogd.pid SPLITTER=$SBIN/splitter It's only cosmetic, but should be easy to fix. Z Reply: <http://search.mnogo.ru/board/message.php?id=1333> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Zenon Panoussis skrev: > > I'll delete the entire tree directory and start re-indexing from > scratch. I'll make and split a small file first, ca 5 MB, then a > 31 MB file, if that works yet another 31 MB file, and so on until > I get in problems again. Will report back later this evening. First step OK: - indexed for a while, created 2.8 MB log file - split successfully and even got the FFF directory: /var/mnogo3110/tree/FF/F/FFFE6000 old: 0 new: 2 total: 2 /var/mnogo3110/tree/FF/F/FFFE7000 old: 0 new: 24 total: 24 Now for 31 MB adventures :) Z -- oracle@everywhere: The ephemeral source of the eternal truth... -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: single mode works, cache mode not.
Fredy Kuenzler skrev: > > It seems to me, that cache mode in 3.1.9 and 3.1.10 does not work > good. Indexer (according to the /doc) works in cache mode and > single mode, however search.cgi does not find anything in cache > mode. In single mode everything works as expected. It's a whole series of things. Your sql tables must be in single mode (i.e. created with create.txt only) and you must have set cache mode in both indexer.conf and search.htm and you must have run cachelogd, indexer, splitter -p and splitter in the right way in the right order. Is all that OK? Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Zenon Panoussis skrev: > >And a really HARD hang at the same place as before. So hard >that I can't even kill splitter. BTW, although I couldn't kill splitter, I did find a core dump in sbin. Here's the backtrace: # gdb splitter core GNU gdb 5.0 This GDB was configured as "i386-redhat-linux"... Core was generated by `./splitter'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x8057d15 in UdmSplitCacheLog (log=300) at cache.c:635 635 logwords[count+j].wrd_id=table[w].wrd_id; (gdb) backtrace #0 0x8057d15 in UdmSplitCacheLog (log=300) at cache.c:635 #1 0x8049f29 in main (argc=1, argv=0xbac4) at splitter.c:74 #2 0x4009bbfc in __libc_start_main (main=0x8049e20 , argc=1, ubp_av=0xbac4, init=0x8049630 <_init>, fini=0x8064f7c <_fini>, rtld_fini=0x4000d674 <_dl_fini>, stack_end=0xbabc) at ../sysdeps/generic/libc-start.c:118 Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Caffeinate The World skrev: > > in my tests your 3 little files wouldn't make a difference. he would > have to run splitter -p and splitter on all the files starting from the > first original RAW file, including all the 31 MB file. i believe in my > case it was the original 31mb file which caused the problem. OK, I'll try to do this systematically now and write down everything I do. Bear with me if you get too much information at times; it's hard to know what could could be relevant and what not. 1. Installed v 3.1.10 with a new prefix (produced one error, see the webboard on subject "warning: no newline at end of file") 2. Copied 319/var/* to 3110/var/ (oofff! took more than an hour) The raw directory looks like this: -rw-r--r--1 root root17096 Feb 7 07:58 981529092.del.done -rw-r--r--1 root root 10060900 Feb 7 07:58 981529092.wrd.done -rw-r--r--1 root root16808 Feb 7 09:24 981534260.del.done -rw-r--r--1 root root 11374124 Feb 7 09:24 981534260.wrd.done -rw-r--r--1 root root13400 Feb 7 11:03 981540190.del.done -rw-r--r--1 root root 11698476 Feb 7 11:03 981540190.wrd.done -rw-r--r--1 root root20328 Feb 7 12:54 981546899.del.done -rw-r--r--1 root root 8055532 Feb 7 12:54 981546899.wrd.done -rw-r--r--1 root root 7312 Feb 7 14:52 981553965.del.done -rw-r--r--1 root root 4459360 Feb 7 14:52 981553965.wrd.done -rw-r--r--1 root root 9912 Feb 7 16:52 981561131.del.done -rw-r--r--1 root root 5254828 Feb 7 16:52 981561131.wrd.done -rw-r--r--1 root root14240 Feb 7 18:53 981568430.del.done -rw-r--r--1 root root 10220088 Feb 7 18:53 981568430.wrd.done -rw-r--r--1 root root 216 Feb 7 22:27 981581773.del.done -rw-r--r--1 root root 220988 Feb 7 22:27 981581773.wrd.done -rw-r--r--1 root root14088 Feb 8 22:40 981669855.del.done -rw-r--r--1 root root 8719924 Feb 8 22:40 981669855.wrd.done -rw-r--r--1 root root 136 Feb 8 23:05 981669947.del.done -rw-r--r--1 root root 125028 Feb 8 23:05 981669947.wrd.done -rw-r--r--1 root root 5288 Feb 9 01:51 981679960.del.done -rw-r--r--1 root root 396972 Feb 9 01:51 981679960.wrd.done -rw-r--r--1 root root 1192 Feb 9 03:32 981686015.del.done -rw-r--r--1 root root 693916 Feb 9 03:32 981686015.wrd.done -rw-r--r--1 root root 4008 Feb 11 21:56 981925017.del.done -rw-r--r--1 root root 1876884 Feb 11 21:56 981925017.wrd.done -rw-r--r--1 root root 4192 Feb 11 22:51 981928286.del.done -rw-r--r--1 root root 3349232 Feb 11 22:51 981928286.wrd.done -rw-r--r--1 root root 4096 Feb 11 23:45 981931533.del.done -rw-r--r--1 root root 1265304 Feb 11 23:45 981931533.wrd.done -rw-r--r--1 root root12944 Feb 12 02:56 981945565.del -rw-r--r--1 root root 6801160 Feb 12 02:56 981945565.wrd -rw-r--r--1 root root 9024 Feb 12 04:10 981993028.del -rw-r--r--1 root root 3751064 Feb 12 04:10 981993028.wrd -rw-r--r--1 root root0 Feb 12 16:50 del.log -rw-r--r--1 root root0 Feb 12 16:50 wrd.log * As you see, no 31 MB files; last time I got them they produced segfaults, so I deleted them and went on, leaving the pages they contained for the next re-indexing. This means that words that should be in the word files according to the mysql database are not there. I don't think it matters, but I cannot be sure. Actually, depending on how words are indexed, this could be the cause of the current segfaults. However, even so, this wouldn't change the fact that the 31 MB files caused segfaults in the first place, before I deleted them. In this context I should also mention that I am using non-ECC memory. If splitting depends on the integrity of the pre-existing word files, an error that has been entered by bad copying/writing would affect all subsequent splitting attempts. This is what the database looks like: #du -c -h tree 1.6Gtotal #./indexer -S Database statistics StatusExpired Total - 0 4240 4240 Not indexed yet 200 0 38945 OK 301 0 52 Moved Permanently 302 0312 Moved Temporarily 304 0 65 Not Modified 400 0 2 Bad Request 403 0 35 Forbidden 404 0 2133 Not found 503 5 5 Service Unavailable 504 1 1 Gateway
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > > http://search.freewinds.cx/logs/logs.tar.gz > Not Found I'm senile. It's fixed (the 404, not the senility ;) Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > Could you please put zipped /var/mnogo319/tree/12/B/12BFD000 and > a file /splitter/XXX.wrd with correspondent XXX.del which produce > crash somewhere on the net? http://search.freewinds.cx/logs/logs.tar.gz Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: warning: no newline at end of file
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Does it matter? /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I../include -I../include -I/usr/include/m ysql -g -O2 -DUDM_CONF_DIR=\"/usr/local/mnogo3110/etc\" -DUDM_VAR_DIR=\"/var/mnogo3110\" -c udmutils .c gcc -DHAVE_CONFIG_H -I. -I. -I../include -I../include -I/usr/include/mysql -g -O2 -DUDM_CONF_DIR=\"/usr/local /mnogo3110/etc\" -DUDM_VAR_DIR=\"/var/mnogo3110\" -Wp,-MD,.deps/udmutils.pp -c udmutils.c -o udmutils.o udmutils.c:1560:9: warning: no newline at end of file Z Reply: <http://search.mnogo.ru/board/message.php?id=1329> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
Alexander Barkov skrev: > > Can you guys give us a log file produced by splitter -p which caused > crash? We can't reproduce crash :-( Huh? splitter doesn't accept the -v5 argument, so it won't give more detailed logs than the normal ones. The only log I had, that to stdout, is the one I included with my first posting in this thread: Delete from cache-file /var/mnogo319/tree/12/B/12BFD000 /var/mnogo319/tree/12/C/12C1 old: 69 new: 1 total: 70 ./run-splitter: line 118: 18790 Segmentation fault (core dumped) $SPLITTER Until this point everything was normal. Anyway, as I said, I strongly suspect corruption in the word database. On a previous occasion when this happened, I deleted the entire tree/* directory structure and started all over again. Splitter worked like a dream with both small and big log files until one of the following occured: 1. I stopped indexer with ^C and then run splitter or 2. Splitter had to work itself through some 31 MB files. (These files are not all the same size; they tend to get slightly bigger the more they are, i.e. something like this: 0001.log31.500.000 bytes 0002.log31.550.000 bytes 0003.log31.580.000 bytes sort of). Unfortunately I haven't been making notes, so I can't tell for sure which one of these two things happened before things stopped working. I tried splitter again today with ./splitter >splitter.log . It went in a very normal way *almost* as far as yesterday, and then hang so badly that not even kill -9 could kill it. The log of this run looks like Delete from cache-file /var/mnogo319/tree/12/B/12B27000 Delete from cache-file /var/mnogo319/tree/12/B/12B2D000 Delete from cache-file /var/mnogo319/tree/12/B/12B3 Delete from cache-file /var/mnogo319/tree/12/B/12B31000 Delete from cache-file /var/mnogo319/tree/12/B/12B3 I am attaching the three files that could be involved, namely tree/12/B/12B31000, 12B32000 and 12B35000. I'll install 3.1.10 now, try it on the old word database and see what it does. If it doesn't work, I'll remove the word database and start again from scratch. I'll try to make detailed notes this time and report back. Z -- oracle@everywhere: The ephemeral source of the eternal truth... wordfiles.tar.gz
UdmSearch: Webboard: Segfault (grrr)
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: RH Linux 7.0, search 3.1.9, MySQL 3.23.29, cache mode, with the new patches for cache.c and sql.c. It happens all the time. It started happening when "maximum size" 31 MB log files were indexed, but by now it happens on any indexing, no matter how big or small the log file, as if the database somehow was corrupt: Delete from cache-file /var/mnogo319/tree/12/B/12BFD000 /var/mnogo319/tree/12/C/12C1 old: 69 new: 1 total: 70 ./run-splitter: line 118: 18790 Segmentation fault (core dumped) $SPLITTER For the same log file it always crashes at the same index file (e.g. every time I try to reindex 12345678.log it will crash at tree/12/3/4567000). If I delete the log file and start again with a new log file, it will crash at a different place, but it will still be consistent in crashing at the same place every time. And the backtrace: # gdb splitter core GNU gdb 5.0 [...] This GDB was configured as "i386-redhat-linux"... Core was generated by `/usr/local/mnogo319/sbin/splitter'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552 552 logwords[count+j].wrd_id=table[w].wrd_id; (gdb) backtrace #0 0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552 #1 0x8049e89 in main (argc=1, argv=0xba94) at splitter.c:70 #2 0x4009bbfc in __libc_start_main (main=0x8049d80 , argc=1, ubp_av=0xba94, init=0x80495bc <_init>, fini=0x8065b7c <_fini>, rtld_fini=0x4000d674 <_dl_fini>, stack_end=0xba8c) at ../sysdeps/generic/libc-start.c:118 Since 3.1.10 is coming out today, I'll try it and see if things work better. If not, I'll post more bad news later ;) Z Reply: <http://search.mnogo.ru/board/message.php?id=1320> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: A bug in search.cgi???
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > When searching words in spanish (accentuated characters, ñ) with search.cgi I get >results like the following: > > If I search for «España», search.cgi breaks the word in two parts, searching for >«Espa» and also for «a», ignoring «ñ». > Or perhaps I'm doing something wrong... Have you set local charset to 8859-1? If not, do so in both indexer.conf and search.htm . Z Reply: <http://search.mnogo.ru/board/message.php?id=1319> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Site search (ul) in cache mode
Zenon Panoussis skrev: > > > We found a bug. Please find patches against sql.c and cache.c > > in attachement. > The patch didn't work by itself, so I did the replacements manually. > The patched source compiled without complaints. I replaced the old > search.cgi with the new one but site search still doesn't work. > Should I re-run splitter or re-index completely? I completely re-indexed some documents with indexer -a -g (category) with the patched compilation. Site search (ul) still doesn't work. I am using it in the form of and fill in http://www.domain.dom/ (domain.dom being one of the just re-indexed servers) before hitting "search". Now what? Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Site search (ul) in cache mode
Alexander Barkov skrev: > > We found a bug. Please find patches against sql.c and cache.c > in attachement. The patch didn't work by itself, so I did the replacements manually. The patched source compiled without complaints. I replaced the old search.cgi with the new one but site search still doesn't work. Should I re-run splitter or re-index completely? Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Site search (ul) in cache mode
Alexander Barkov skrev: > > > > > Now the tags and categories work fine, but not the site search. > > > > The ul= directive is completely ignored by search.cgi. > > > What was the value of ul= variable you tryed? > > I tried all of the following: > > - http://www.domain.dom > > - www.domain.dom > > - domain > > - /path/ > > - path > > Nothing works. Actually, right now there is garbage text in the > > ul variable and the search doesn't care about it. You can see > > it at http://search.freewinds.cx -> hit "New search". > Check http://www.domain.dom/ please with trailing slash. That doesn't work either. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Splitter: core dumped
Caffeinate The World skrev: > > i'll wait. for now, i'm indexing but running splitter when the files > are around 2MB. I've been running indexer -c 3600 since last night, producing log files of 5-10 MB and running splitter every time afterwards, with cleaning of var/splitter and all. So far no problems at all. I have a hunch that the problem is to splitting multiple big files in one go. A friend offered to lend me some memory. If I can get my ass over there and fetch it, I'll try a huge splitting first with my standard 128 MB RAM and then with 1 GB RAM. If there is any difference in the behaviour of splitter, it will be a good indication of where to look for the problem. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Splitter: core dumped
Caffeinate The World skrev: > > > I run splitter -p and finish fine. I then run splitter and, > > halfway through the splitting, crash: segmentation fault, or > > just a hang, core dumped. So I restart splitter and next time > > finish fine. > what machine are you on? Alpha? OS? Intel PII, RH Linux 7.0 with 2.2 kernet. > i had the same problem and i sent a message to the mailing list > describing how i corrected it. search for "core" and "splitter" Found it. My dump appeared at a different position than yours, at 076, but was just as persistent at yours. Also, the premises are similar: I had run indexer for a long time and I had five 31 MB files waiting to be split. Splitter choked every time on the third one of them. This has never happened before or after when the logs have been smaller than 31 MB, so I'm just re-running smaller chunks at a time. > can you check another thing? i've never seen my splitter split the > lasta file "FFF.log". do you get that file? it goes as high as FFE.log > only. Indeed, last night I saw it stop at FFE.log . But I have had files at tree/FF/F/... , so I assume that other times it went all the way to FFF. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Splitter: core dumped
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: [3.1.9, cache mode] I run splitter -p and finish fine. I then run splitter and, halfway through the splitting, crash: segmentation fault, or just a hang, core dumped. So I restart splitter and next time finish fine. The question is: what can this do to the word database? Will it still be accurate, or will some words be inserted twice? Can I just re-run and finish and be happy, or should I re-index? Z Reply: <http://search.mnogo.ru/board/message.php?id=1271> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Search results display problem
Matthew Sullivan skrev: > > > I have the same problem (see http://search.freewinds.cx ) and > > I thought it was my own HTML that did it. If you find the cause, > > will you please post it on the webboard? > Yours looks ok to me. That's only because I took your advise in the meanwhile. All my tables were width="95%" and were contained in one big table of width="100%", except , which was width="100%" itself. I changed it to 95% and the problem seems to be gone. Feel credited :) BTW, you should post your reply to the webboard. I suspect that lots of people read it who are not on the list. Besides, it's searchable. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Search results display problem
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I have a problem with the display of search results in netscape. > When ever I have a large number of results, the width of the > search results are wider than the td(table data) and extend > beyond it... I have the same problem (see http://search.freewinds.cx ) and I thought it was my own HTML that did it. If you find the cause, will you please post it on the webboard? Z Reply: <http://search.mnogo.ru/board/message.php?id=1270> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Webboard <-> mailing list interaction
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: What is posted on the webboard goes to the mailing list too, but what is posted on the mailing list doesn't go to the webboard. Wouldn't it be a good idea if it did? Z Reply: <http://search.mnogo.ru/board/message.php?id=1250> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Site search (ul) in cache mode
Alexander Barkov skrev: > > > Now the tags and categories work fine, but not the site search. > > The ul= directive is completely ignored by search.cgi. > What was the value of ul= variable you tryed? I tried all of the following: - http://www.domain.dom - www.domain.dom - domain - /path/ - path Nothing works. Actually, right now there is garbage text in the ul variable and the search doesn't care about it. You can see it at http://search.freewinds.cx -> hit "New search". BTW, if you go to the site in half an hour or so, "New search" will have been moved to "Search"; I'm just in the process of replacing the MySQL search with the cache mode one. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Splitter
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > Hi, Can Anybody point me to some documentation about the > Splitter, what it is , what it does, ... Splitter -h doens't > realy help me further, and I haven't found an answer in the > mailing list or the included documentation. Thanks in advance. Check doc/cachemode.txt. splitter -h is not documented; probably it does not exist. run-splitter -k renames the current word logs and starts new ones. run-splitter -p or just splitter -p divides the renamed word log into 4096 files in basedir/var/splitter. run-splitter -s or just splitter divides those words in turn into 1.000.000 files in basedir/var/tree. Z Reply: <http://search.mnogo.ru/board/message.php?id=1245> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Search in title: weird behaviour
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Go to http://search.freewinds.cx , hit "New search", search for "zenon" (without the quotes) and limit the search to title only. You will get 6 results where the search term is in the title, and one (http://www.users.wineasy.se/noname/zenon/index.htm) where it is not. Is the indexer confusing title with URL? Z PS. Since the indexing is going on, at the time you try this there might be more results. Reply: <http://search.mnogo.ru/board/message.php?id=1237> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: search.cgi does not work
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I have the following problem. If I try the search.cgi I > get an error message empty page! If I enter search.cgi > from the telnet seesion I get an valid html output but > all the vars from search.htm are empty (I mean you cannot > see anything $A is replaced with nothing) The empty page you get when you try from the web, does it say "An error occured"? > Has anyone any idea what to try next? First of all, did indexing go OK? Did your database grow as it should? If yes, check this: - Have you renamed search.htm-dist to search.htm? - Have you put the right DBAddr and user:password, DBMode etc in search.htm? Do all settings in search.htm match the equivalent settings in indexer.conf? - Have you set permissions correctly for search.htm? - Are you using search results cache? Did you try without? - Are you tracking queries? Did you try not to? - Are you using ispell? Did you try without? Z Reply: <http://search.mnogo.ru/board/message.php?id=1236> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Site search (ul) in cache mode
> Have you solved the problem with tags and categories does not work ? > How exactly if yos? Oh - I posted it on the webboard: compile with --enable-fasttag --enable-fastcat and --enable-fastsite instead of --enable-fast-tag --enable-fast-cat and --enable-fast-site. Now the tags and categories work fine, but not the site search. The ul= directive is completely ignored by search.cgi. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Bug report
Luis Bravo skrev: > > My files are in Spanish. We have words like oración, apéndice, > estómago, etc. When they are indexed, indexer split that words. > In the database they are in two words: oraci n, ap ndice, est mago. > What Can I do? In later versions you need to set LocalCharset iso-8859-1 both in indexer.conf and in search.htm . If you don't, US-ASCII is assumed and all accented characters are discarded. I don't know if the LocalCharset directive was already used in 3.1.2 . Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Site search (ul) in cache mode
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Solve one problem and get on to the next :( Search 3.19 compiled with --enable-fastsite, mysql, cache mode. The site search doesn't work. Tried both http://a_site/"> and All sites Site 1 Site 2 to no avail. Either way, search.cgi returns results from all indexed sites. And yes, I double-checked. ./configure said checking for fast site search support... enabled search.cgi and splitter come from this compilation and everything has been re-indexed with it, so it should work. Any ideas? Z Reply: <http://search.mnogo.ru/board/message.php?id=1232> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Tags and categories in cache mode
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > More problems: neither tags nor categories seem to work. > > I'm using v 3.1.9 with MySQL in cache mode, compiled with > --enable-fast-tag/cat/site ... Found the problem: ./configure --enable-fast-tag --enable-fast-cat --enable-fast-site returns ... checking for fast tag search support... disabled checking for fast category search support... disabled checking for fast site search support... disabled ... ./configure --enable-fasttag --enable-fastcat --enable-fastsite works much better :) Z Reply: <http://search.mnogo.ru/board/message.php?id=1231> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Tags and categories in cache mode
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: More problems: neither tags nor categories seem to work. I'm using v 3.1.9 with MySQL in cache mode, compiled with --enable-fast-tag/cat/site . I've read the part on fast search with tag etc limits in cachemode.txt, but I doubt I understood it properly. My indexer config looked like this: Tag A (lots of Server statements) Tag B (lots of Server statements) This didn't work. Since cachemode.txt talks about a 10-digit HEX string, I replaced the above with "Tag 10" and "Tag 20" and with "Category 10" and "Category 20" respectively, deleting all indexes and re-indexing from scratch every time. Neither alternative worked. My search.htm contains the options All sites One Another A third A fourth (n/a) No matter what you choose, you get results from all sites. Any ideas? Is it me or is it the search engine? Z Reply: <http://search.mnogo.ru/board/message.php?id=1230> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cache mode questions
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > > > The search works very nicely, but it returns a tremendous > > amount of quoted document data... > This is because of --enable-news-extensions Is there *any* way to limit the quotes to just a few lines? If not, is there any chance this can be fixed in 3.1.10? Z Reply: <http://search.mnogo.ru/board/message.php?id=1229> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Is this normal?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > > Isn't 55 MB a bit much just for storing 15.000 URLs? > That's very big size. Do you use --enable-news-extensions? Yes. But I haven't indexed any news yet. By now I have 21.500 URLs and an index of 158 MB, all from ordinary webpages. Z Reply: <http://search.mnogo.ru/board/message.php?id=1226> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cache mode questions
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > > The search works very nicely, but it returns a tremendous > > amount of quoted document data... > Can I take a look on your search page? Yes. Go to http://search.freewinds.cx and use "New search". Search for the word "something" and format "Long" and you'll get a results page that's almost half a megabyte. BTW, there is some other strange behaviour there. Searching for beginning of word or substring doesn't work at all. Ispell is not enabled, but as I understand it doesn't need to be either. Z Reply: <http://search.mnogo.ru/board/message.php?id=1225> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Is this normal?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Isn't 55 MB a bit much just for storing 15.000 URLs? After all, it's only URLs stored, isn't it? Or am I wrong? Is anything else stored in url.myd? The statistics: Database statistics StatusExpired Total - 0 9147 9373 Not indexed yet 200 0 6009 OK 301 0 14 Moved Permanently 302 0 38 Moved Temporarily 400 0 1 Bad Request 403 0 14 Forbidden 404 0 97 Not found 503 0 4 Service Unavailable - Total 9147 15550 And the MySQL database: [root@goat /root]# ls -l /var/lib/mysql/criticscache/ total 56308 -rw-rw1 mysqlmysql 0 Feb 1 16:55 dict.MYD -rw-rw1 mysqlmysql1024 Feb 1 16:55 dict.MYI -rw-rw1 mysqlmysql8608 Feb 1 16:55 dict.frm -rw-rw1 mysqlmysql 0 Feb 1 16:55 robots.MYD -rw-rw1 mysqlmysql1024 Feb 1 16:55 robots.MYI -rw-rw1 mysqlmysql8586 Feb 1 16:55 robots.frm -rw-rw1 mysqlmysql 0 Feb 1 16:55 stopword.MYD -rw-rw1 mysqlmysql1024 Feb 1 16:55 stopword.MYI -rw-rw1 mysqlmysql8578 Feb 1 16:55 stopword.frm -rw-rw1 mysqlmysql 0 Feb 1 16:55 thread.MYD -rw-rw1 mysqlmysql1024 Feb 1 16:55 thread.MYI -rw-rw1 mysqlmysql8584 Feb 1 16:55 thread.frm -rw-rw1 mysqlmysql56568860 Feb 2 05:20 url.MYD -rw-rw1 mysqlmysql 944128 Feb 2 05:20 url.MYI -rw-rw1 mysqlmysql9358 Feb 1 16:55 url.frm v 3.1.9, DBMode cache with MySQL 3.23.29. Z Reply: <http://search.mnogo.ru/board/message.php?id=1218> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cache mode questions
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: >Shouldn't the files in /var/raw also be deleted? Or are they >needed in any way? /Me stupid. The answer is in cachemode.txt: "All processed logs in /var/raw directory are renamed to *.done ... you can remove them or keep them for backup purposes". Please forget I asked. > 2. The search works very nicely, but it returns a tremendous >amount of quoted document data... Re-reading the documentation, I haven't found the answer to this one. If you know, pray, tell. Z Reply: <http://search.mnogo.ru/board/message.php?id=1217> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cache mode questions
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: As more things work, more questions arise. v 3.1.9 in DBMode cache, compiled with news-extension and using MySQL with create.txt from the news-3.1.tar.gz module. 1. cachemode.txt says that after running splitter, "it is better to delete (or backup) files in /var/splitter directory". Shouldn't the files in /var/raw also be deleted? Or are they needed in any way? 2. The search works very nicely, but it returns a tremendous amount of quoted document data with each hit; often the entire document. You can see this if you search for "picket" on http://search.freewinds.cx/cgi-bin/search2.cgi . If you do this, you will get a results page of about 0.5 MB. How can the quoted text be limited to, say, four lines? 3. I am using tags to separate different types of sites. Is it possible to use a tag for news? That is, use tag A for some websites, tag B for other websites and tag C for news? How can this be done? BTW, I found out by coincidence that if search.cgi and search.htm are renamed the same way, e.g. search-X.cgi and search-X.htm respectively, it is possible to run separate searches on the same or on different databases from the same cgi-bin directory. This can be very useful and should be documented. Finally, thank you for an excellent job done. Z Reply: <http://search.mnogo.ru/board/message.php?id=1214> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: New behaviour of indexer.conf
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The format Server path http://bladiblah/ #comment1 comment2 in indexer.conf used to be OK until 3.1.8. However, in 3.1.9 indexer skips the first comment, reads the second one and exits with the error too many arguments: ´comment2´ (and BTW, "argument" is misspelled ;) Z Reply: <http://search.mnogo.ru/board/message.php?id=1215> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Indexing /usr/doc ?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I\\\'m trying this: > > Server file://usr/doc/ > > But it doesn\\\'t work. Is UdmSearch able to recursively index > directories? The documentation doesn\\\'t say anything about this. > I\\\'m using version 3.0.23. That will never work by itself. The documentation of v 3.1.8 says this on the subject: #Alias # You can use this command for example to organize search through # master site by indexing a mirror site. It is also usefull to # index your site from local file system. # UdmSearch will display URLs from while searching # but go to the while indexing. # This command has global indexer.conf file effect. # You may use several aliases in one indexer.conf. #Alias http://www.mysql.com/ http://mysql.udm.net/ #Alias http://www.site.com/ file:/usr/local/apache/htdocs/ Thus, what you need to do is Alias http://URL_that_you_want_in_the_results file:/usr/doc/ Server path http://URL_that_you_want_in_the_results and run the indexer. Now, if you would like the results to point to files as well instead of to a proper URL you might try Alias file:///usr/doc/ file:/usr/doc/ Server path file:///usr/doc/ but I have no clue whether it will work or not. In any case you might want to check the change log first to see if the Alias directive works with your version. Z Reply: <http://search.mnogo.ru/board/message.php?id=971> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Restricted Search - how da hell does it worx!?!?!?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I can\\\'t find a way to make the Restricted Search work! i can\\\'t find in > the DB any data that says that a specific URL is relative to a > restricted criteria (like Sports or Shopping, wich are given as an > example in the search.php of the latest MnogoSearch!). How does it > work? and how can i use it in my queries? If you only have a few categories, the easiest way to do this is to use tags. Put in indexer.conf the following: == Tag A Server site http://www.domain.fr/ Server site http://www.otherdomain.it/ Server path http://www.3rddomain.de/shop/ Tag B Server site http://www.domain.com/ Server site http://www.otherdomain.com/ Server path http://www.3rddomain.com/shop/ == Then put the following in search.htm: == All sites European sites US sites Reserved == and you\'re ready. If you have a more complex system of classification you might want to use categories instead. They work basically the same way and they are properly described in the documentation. Z Reply: <http://search.mnogo.ru/board/message.php?id=970> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cosmetic correction
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Minor, trivial stuff: indexer -S returns the caption \"UdmSearch statistics\". Since that can end up public, as for instance in http://search.freewinds.cx/cgi-bin/stats , you might want to change it. Z Reply: <http://search.mnogo.ru/board/message.php?id=961> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: WORDCHAR and CONTRACTIONCHARS
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: There was a discussion about word separators back in January; see http://www.mail-archive.com/udmsearch%40web.izhcom.ru/msg00200.html . Since I just realised that I am facing the same problem, I wonder if Charlie\'s idea was implemented in newer versions. If not, will it be? Z Reply: <http://search.mnogo.ru/board/message.php?id=959> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: How can users add their homepage to the index?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > Since 3.1.7 we have new "ServerTable" feature. > Currently we have no front-end to add URLs into > server tables, but we have a plan to make it soon. > I use the following form and CGI for this purpose. You will, of course, have to adjust them for your needs. See further comments at the bottom. The form: = Submit a URL for indexing. Choose how to limit the indexing. Read the &$%@#!! instructions first. Limit to: Entire domain: Directory and below: Single page only: Choose a category: Critics: Free Zone: Media: Done? Is everything correct? Then = And the script: = #!/usr/bin/perl -w use strict; use CGI \':standard\'; use CGI::Carp \'fatalsToBrowser\'; $CGI::POST_MAX=1024; my ($query) = @_; import_names(\'R\'); my $h = remote_host(); my $url = $R::url; my $limit = $R::limit; my $category = $R::category; my $d = localtime; $url =~ tr/ a-zA-Z0-9!.,:_\\-#\\$%&+\\[\\]=\\?\\/\\~//cd; $limit =~ tr/a-z//cd; $category =~ tr/ a-z//cd; open(FILE, \">>/usr/local/mnogosearch/etc/indexer.conf\"); print FILE \"\\#Server $limit $url in $category from $h on $d\\n\"; close(FILE); print header(); print start_html(-title=>\'URL submission completed\'); print \"\\n\"; print \"You submitted $url for indexing as a $limit in the category $category.\\n\"; print \"The page(s) will be examined and added to the index within a few days.\"; print \"\"; print \"\\n\\n\"; print end_html(); = As you probably can see, this is a bit primitive; the tag is addedd at the end of the \"server\" directive and leaves me to manually move the submitted URLs to the right place. For me this works fine because I don\'t trust submissions, but need to check them manually anyway. If you trust submissions (and trust that nobody will spam your index by filling it with irrelevant shit), you can add an \"if/then\" statement to put the appropriate tag or category before the URL. If you don\'t use tags or categories, the script will work fine as it is; just remove the \"in $category from $h on $d\\n\" part from the \"print FILE\" statement. Also note that I am not using the -T option. For your own safety you should. Also, you might want to check and possibly restrict the funny characters that you allow in URLs (the \"$url =~ tr/ a-zA-Z0-9!.,:_\\-#\\$%&+\\[\\]=\\?\\/\\~//cd;\" statement. Mine is perhaps a bit too generous. You can see both this and a few other scripts in action at http://search.freewinds.cx . Z Reply: <http://search.mnogo.ru/board/message.php?id=957> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Oops!
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Beginning today or yesterday, the search at http://search.mnogo.ru/search/search.php3 returns \"Fatal error: Cannot redeclare crc32() in /usr/apache/search.mnogo.ru/share/htdocs/search/crc32.inc on line 11\". Looks bad for the new front end ;) Z Reply: <http://search.mnogo.ru/board/message.php?id=939> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Search.cgi displaying no result
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > Search.cgi is not displaying any search result, only returing > the search form. > DBMode and DBAddr is the same as in indexer. Does it say "Sorry, an error occured"? If it does, try commenting out Cache, TrackQuery and Ispell in search.htm to see if any of them is causing the problem. Z Reply: <http://search.mnogo.ru/board/message.php?id=903> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Mirroring
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: If a page that had previously been indexed has been removed from the web, indexer will remove it from the database when re-run. If mirroring is on, will indexer also remove the page copy from the mirror? Z Reply: <http://search.mnogo.ru/board/message.php?id=902> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Pedantic
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Cosmetic bug without any consequence: If the download timeout limit is set to 8, the indexer will try 9 URLs on a server before it skips it, not only 8. Therefore I assume that one part of the programme is counting from 0 up and another one from 1 up. I suggest you don't bother fixing it unless you happen to be looking at that code anyway. Z Reply: <http://search.mnogo.ru/board/message.php?id=899> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: PHP front end and categories
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: search.cgi is been working fine here, but experimenting with the PHP front end I run into problems: Query error: SELECT path,link,name FROM categories WHERE path LIKE '__' ORDER BY NAME ASC Table 'db.categories' doesn't exist I don't use categories. I don't want to use categories. How do I get rid of this? (But I do use tags. How do I put in tags instead?) Z Reply: <http://search.mnogo.ru/board/message.php?id=817> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: "An error occured" in search.cgi
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I've copied the search.cgi to my cgi directory and edited the > search.htm. When I type a search word in the search form, it thinks > for a little while, then returns with a little red text saying > "An error occured!" ... obviously in the area where the > search result was supposed to be printed. See to it that the settings in search.htm really correspond to those in indexer.conf. Specifically, the DBAddr and DBMode lines must be identical in both files. Also, try commenting out TrackQuery and Cache and see if either one is causing the problem. Reply: <http://search.mnogo.ru/board/message.php?id=809> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Cache search
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: What hardware is http://udm.aspseek.com/cgi-bin/search.cgi running on? Reply: <http://search.mnogo.ru/board/message.php?id=808> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Installation Help
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > Can't open template file '/usr/local/udmsearch/etc/search.htm'! > > There is a search.htm-dist in that directory which I tried to rename by because of >permissions I could not. Any help would be appreciated. :) I did the same thing myself: forgot to copy search.htm-dist to search.htm . Do so, :edit th new file for database name, user and password, table type and cache cor not, :and your problem will be solved. Z Reply: <http://search.mnogo.ru/board/message.php?id=789> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: No 'Server' command for url... deleted.
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > What does "No 'Server' command for url... deleted." > means when I run Indexer. It means (a) that you have set DeleteNoServer to "yes" in indexer.conf (b) that you have at some point had a Server path line in indexer.conf for the site that is being deleted, and that the site had been indexed and (c) that you removed the Server path statement, so on your next indexing run all the indexed pages of that site have been deleted from the database. Z Reply: <http://search.mnogo.ru/board/message.php?id=780> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Limitations?
Are there any inherent limitations on how long the Server path list can get? Would the indexer work with, say, a 2 MB list of URLs to index, or would it choke? Z __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Mysql query blues
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: THANK YOU! Z Reply: <http://search.mnogo.ru/board/message.php?id=770> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Mysql query blues
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: This is a stupid question. Please bear with a total newbie to mysql. mysql>SELECT url FROM url WHERE status="404"; works fine and returns all the 404s. However, mysql>SELECT url FROM url WHERE status="404" AND url="%domain%"; returns "empty set" despite the fact that there are 404s in the domain in question. More weirdly, even mysql>SELECT url FROM url WHERE url="*"; returns an empty set. What am I doing wrong? Z Reply: <http://search.mnogo.ru/board/message.php?id=768> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Rotating indexing
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > I think that the right way to those webmasters is to use robots.txt. Oh, I don't mean that they don't want their sites indexed at all; only that they get grumpy if you hit them with 100 requests per minute. Rotating the targets would be a way to put the indexer in "polite mode" without adding delays to the indexing itself. Z Reply: <http://search.mnogo.ru/board/message.php?id=766> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Rotating indexing
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Some webmasters can be terribly grumpy about their servers being hit continuously from the same IP, even if they are sitting on monster machines capable of serving any number of requests. So I wonder if there is any way to force the indexer to rotate between sites. That is, to make it change site for every page it fetches (if multiple sites are indexed) instead of first indexing all pages on one server (or one depth level) and then moving to the next. Z Reply: <http://search.mnogo.ru/board/message.php?id=763> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: How to index pages and docs that are not linked?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: > How can I index html pages or other documents, that are not linked from other pages ? > > I found that indexer only processes the pages that are linked from the main page and >further. > When I put a 'loose' html document in my serverroot and I have no link to it, it >will not be indexed !? The spider cannot guess that the document is there, so of course it can't find it. There are a couple of things you can do to solve this. One is to add a "Server page full_name_of_document" statement in your index.conf file. Another is to remove your index.html file(s) temporarily, make sure that the webserver allows directory browsing, index the site and then put back index.html. Z Reply: <http://search.mnogo.ru/board/message.php?id=762> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Another mirroring suggestion
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: If a directory is first indexed and mirrored, and then removed from the indexer.conf file with DeleteNoServer=yes, the indexer does not delete the mirrored files. I think it should, because otherwise the mirrors grow forever with obsolete files and eventually become useless both for off-line indexing and for actual mirroring. Z Reply: <http://search.mnogo.ru/board/message.php?id=758> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Mirroring
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: If MirrorRoot is specified in indexer.conf, mnogosearch copies the files it indexes to directories such as mirror_root/http/domain/dir . I can see three possible improvements in the mirroring behaviour. The first two should be easy to fix, while the third is more of a long-term improvement: 1. The ../http/.. directory could be eliminated. Not only is it unnecessary, but it also gives an ugly directory structure if you would want to actually make the mirror accessible to outsiders. 2. Only files that are actually indexed are mirrored. This is very sensible for indexing purposes, but it defeats other possible uses of a mirror, such as backup or protection of a site from being forcefully taken down. There should be a MirrorAll command to override the Allow and Dissallow commands and force *all* files to be mirrored, while the Allow and Disallow commands still apply to what is actually indexed. 3. If the indexer can be used as a combined mirroring and indexing tool, then functionality could be added to translate internal absolute links to either relative links or translated links. E.g., if I would index and mirror the mnogosearch site , the indexer could translate all http://search.mnogo.ru/whatever links in the mirrored pages to /whatever or to http://mysite/mirrors/whatever. I suspect that lots of the code for this could be taken from the wget code. Z Reply: <http://search.mnogo.ru/board/message.php?id=750> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Looping URLs
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Using 3.1.8 with MySQL 3.23.24 on RH7. I am indexing part of a site with Server Path http://site/dir/dir/dir/ . Everything in the directories to be indexed is normal HTML with no funny stuff. Most directory indices are auto-generated by the web server. Yet the indexer loops. It reads the auto-generated index at / , adds the files in it to the database, re-reads it with a new name, adds new files to be indexed, ad infinitum. Left alone overnight, a few directories containing a couple of thousand files, produced a pile of more than 90.000 entries with status 0 or 200. The problem directories are at Server Path http://www.xs4all.nl/~kspaink/cos/ and there are no dynamically created pages in them. The resulting URLs in the database look like http://www.xs4all.nl/~kspaink/cos/SecrServ/ops/go732/?952181106go732xhtmgo732l.htmgo732i.htmgo732q.htmgo732.htm You see the loop. The only really existing dir is the one before the question mark. The question mark itself and all what comes after it are "invented" by the indexer. There are references in the to-be-indexed pages to URLs higher than the to-be-indexed path in the form of . Could that be confusing the indexer? Z Reply: <http://search.mnogo.ru/board/message.php?id=731> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Manually Deleting BAD Urls
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: Try "indexer -C -s 403" or whatever status URLs you want to get rid of. Z > Allo, > I forgot to switch on the DeleteBad to YES... > now we have about 26K of bad URLS.. can I delete then manually via MyAdmin...I do >not wish to use -a becuase my bandwidth is limited.. > > <PRE> > StatusExpired Total >- > 0 56536 56536 Not indexed yet > 1 1 1 Unknown status >200 36793 66223 OK >301 1 6 Moved Permanently >302 7891 16368 Moved Temporarily >304 1353 1377 Not Modified >400 4 4 Bad Request >401 1937 Unauthorized >403 10652 10652 Forbidden >404625625 Not found >503128128 Service Unavailable >504 20875 71462 Gateway Timeout >- > Total 134860 224319 > </PRE> > > AJ Khan Reply: <http://search.mnogo.ru/board/message.php?id=730> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Grrrr!
Problem solved. There was a pointer to mysql in /etc/ld.so.conf that pointed to the wrong place. Correcting the pointer and recompiling mnogosearch didn't help. I ended up removing the pointer, uninstalling mysql completely, reinstalling it again, and then recompiling and reinstalling mnogosearch. Everything works properly now. Z Original message= > mnogosearch 3.1.8, mysql 3.23.22 > > This happened: > > The search worked fine. Then I re-installed MySQL (3.23 instead > of 3.22) and Apache, and the directory structure of both changed. > I moved the old search.cgi to the new cgi-bin. I exported the old > database with mysqldump and re-imported it in the new MYI/MYD > format in the same (deleted and re-created) database. The indexer > works fine in the new setup with the old configuration. The search > does not; it returns "an error occured". > > This is what I tried: > > - Searched the Apache and MySQL error logs. Nothing there. Most > important, there are no "access denied" messages in the mysql log, > meaning that the search never even reaches mysql before it fails. > - Recompiled and reinstalled mnogosearch and copied the new search.cgi > to cgi-bin. It didn't help. > - Double-checked search.htm. This shouldn't be necessary since both > the database and search.htm are the same as before, but anyway. The > DBAddr statement is identical to the one in indexer.conf, including > trailing slash. So are the DBMode and charset statements. > - Beat my wife, screamed to the dog, kicked my children and broke my > monitor. That didn't help either. > > Finally I straced search.cgi, but I don't understand the output. If > you do, you'll find it below. > > Any ideas? > > Z > > =strace.out= > > execve("/var/www/cgi-bin/search.cgi", ["/var/www/cgi-bin/search.cgi"], [/* > 24 vars */]) = 0 > _sysctl({{CTL_KERN, KERN_OSRELEASE}, 2, "2.2.16-22", 9, NULL, 0}) = 0 > brk(0) = 0x80908c0 > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x40016000 > open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or > directory) > open("/etc/ld.so.cache", O_RDONLY) = 4 > fstat64(4, 0xb32c) = -1 ENOSYS (Function not > implemented) > fstat(4, {st_mode=S_IFREG|0644, st_size=21769, ...}) = 0 > old_mmap(NULL, 21769, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000 > close(4)= 0 > open("/usr/lib/mysql/libmysqlclient.so.9", O_RDONLY) = 4 > fstat(4, {st_mode=S_IFREG|0755, st_size=196204, ...}) = 0 > read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 d\0\000"..., 4096) > = 4096 > old_mmap(NULL, 172480, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x4001d000 > mprotect(0x40036000, 70080, PROT_NONE) = 0 > old_mmap(0x40036000, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, > 0x18000) = 0x40036000 > old_mmap(0x40047000, 448, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40047000 > close(4)= 0 > open("/lib/libm.so.6", O_RDONLY)= 4 > fstat(4, {st_mode=S_IFREG|0755, st_size=493588, ...}) = 0 > read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300I\0"..., 4096) > = 4096 > old_mmap(NULL, 125352, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40048000 > mprotect(0x40066000, 2472, PROT_NONE) = 0 > old_mmap(0x40066000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, > 0x1d000) = 0x40066000 > close(4)= 0 > open("/usr/lib/libz.so.1", O_RDONLY)= 4 > fstat(4, {st_mode=S_IFREG|0755, st_size=58940, ...}) = 0 > read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\36\0"..., 4096) > = 4096 > old_mmap(NULL, 54064, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40067000 > mprotect(0x40073000, 4912, PROT_NONE) = 0 > old_mmap(0x40073000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, > 0xb000) = 0x40073000 > close(4)= 0 > open("/lib/libc.so.6", O_RDONLY)= 4 > fstat(4, {st_mode=S_IFREG|0755, st_size=4686077, ...}) = 0 > read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\230\270"..., 4096) > = 4096 > old_mmap(NULL, 1167368, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = > 0x40075000 > mprotect(0x40189000, 36872, PROT_NONE) = 0 > old_mmap(0x40189000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, > 0x113000) = 0x40189000 > old_mmap(0x4018f000, 12296, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4018f000 > close(4)= 0 > open("/lib/libnsl.so.1", O_RDONLY) = 4 > fstat(4, {st_mode=S_IFREG|0755, st_size=392107, ...}) = 0 > read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p?\0\000"..., 4096) > = 4096 > old_mmap(NULL, 93120, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40193000 > mprotect(0x401a7000, 11200, PROT_NONE) = 0 > old_mmap(0x401a7000, 4
UdmSearch: Grrrr!
mnogosearch 3.1.8, mysql 3.23.22 This happened: The search worked fine. Then I re-installed MySQL (3.23 instead of 3.22) and Apache, and the directory structure of both changed. I moved the old search.cgi to the new cgi-bin. I exported the old database with mysqldump and re-imported it in the new MYI/MYD format in the same (deleted and re-created) database. The indexer works fine in the new setup with the old configuration. The search does not; it returns "an error occured". This is what I tried: - Searched the Apache and MySQL error logs. Nothing there. Most important, there are no "access denied" messages in the mysql log, meaning that the search never even reaches mysql before it fails. - Recompiled and reinstalled mnogosearch and copied the new search.cgi to cgi-bin. It didn't help. - Double-checked search.htm. This shouldn't be necessary since both the database and search.htm are the same as before, but anyway. The DBAddr statement is identical to the one in indexer.conf, including trailing slash. So are the DBMode and charset statements. - Beat my wife, screamed to the dog, kicked my children and broke my monitor. That didn't help either. Finally I straced search.cgi, but I don't understand the output. If you do, you'll find it below. Any ideas? Z =strace.out= execve("/var/www/cgi-bin/search.cgi", ["/var/www/cgi-bin/search.cgi"], [/* 24 vars */]) = 0 _sysctl({{CTL_KERN, KERN_OSRELEASE}, 2, "2.2.16-22", 9, NULL, 0}) = 0 brk(0) = 0x80908c0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000 open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 4 fstat64(4, 0xb32c) = -1 ENOSYS (Function not implemented) fstat(4, {st_mode=S_IFREG|0644, st_size=21769, ...}) = 0 old_mmap(NULL, 21769, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000 close(4)= 0 open("/usr/lib/mysql/libmysqlclient.so.9", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0755, st_size=196204, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 d\0\000"..., 4096) = 4096 old_mmap(NULL, 172480, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x4001d000 mprotect(0x40036000, 70080, PROT_NONE) = 0 old_mmap(0x40036000, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x18000) = 0x40036000 old_mmap(0x40047000, 448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40047000 close(4)= 0 open("/lib/libm.so.6", O_RDONLY)= 4 fstat(4, {st_mode=S_IFREG|0755, st_size=493588, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300I\0"..., 4096) = 4096 old_mmap(NULL, 125352, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40048000 mprotect(0x40066000, 2472, PROT_NONE) = 0 old_mmap(0x40066000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x1d000) = 0x40066000 close(4)= 0 open("/usr/lib/libz.so.1", O_RDONLY)= 4 fstat(4, {st_mode=S_IFREG|0755, st_size=58940, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\36\0"..., 4096) = 4096 old_mmap(NULL, 54064, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40067000 mprotect(0x40073000, 4912, PROT_NONE) = 0 old_mmap(0x40073000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0xb000) = 0x40073000 close(4)= 0 open("/lib/libc.so.6", O_RDONLY)= 4 fstat(4, {st_mode=S_IFREG|0755, st_size=4686077, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\230\270"..., 4096) = 4096 old_mmap(NULL, 1167368, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40075000 mprotect(0x40189000, 36872, PROT_NONE) = 0 old_mmap(0x40189000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x113000) = 0x40189000 old_mmap(0x4018f000, 12296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4018f000 close(4)= 0 open("/lib/libnsl.so.1", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0755, st_size=392107, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p?\0\000"..., 4096) = 4096 old_mmap(NULL, 93120, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40193000 mprotect(0x401a7000, 11200, PROT_NONE) = 0 old_mmap(0x401a7000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x13000) = 0x401a7000 old_mmap(0x401a8000, 7104, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x401a8000 close(4)= 0 open("/lib/libcrypt.so.1", O_RDONLY)= 4 fstat(4, {st_mode=S_IFREG|0755, st_size=82333, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\17"..., 4096) = 4096 old_mmap(NULL, 184252, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x401aa000 mprotect(0x401af000, 163772, PROT_NONE) = 0 old_mmap(0x401af000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVA
UdmSearch: Refusing to index
v 3.1.8: I have been indexing, and once in a while ^C-ing the indexer in order to do something else. The current progress status looks like this: [root@goat /root]# /usr/local/mnogosearch/sbin/indexer -S UdmSearch statistics StatusExpired Total - 0 0604 Not indexed yet 200 0 4524 OK 301 0 9 Moved Permanently 302 0 6 Moved Temporarily 304 0311 Not Modified 403 0 1 Forbidden 404 0 86 Not found 503 0 19 Service Unavailable 504 0 26 Gateway Timeout - Total 0 5586 So I start the indexer again, and hre is what it does: [root@goat /root]# /usr/local/mnogosearch/sbin/indexer Indexer[1252]: indexer from UdmSearch v.3.1.8/MySQL started with '/usr/local/mnogosearch/etc/indexer.conf' Indexer[1252]: [1] Done (1 seconds) Namely nothing. It has 604 unwalked URls, yet it refuses to walk them. To make sure, I add a couple of new Server statements to the indexer.conf file and try again: Indexer[1068]: indexer from UdmSearch v.3.1.8/MySQL started with '/usr/local/mnogosearch/etc/indexer.conf' Indexer[1152]: [1] http://www.cedar.net/users/dvanhorn/Gallery/arscc.htm Indexer[1152]: [1] http://www.cedar.net/robots.txt Indexer[1152]: [1] http://www.cisar.org/ Indexer[1152]: [1] http://www.cisar.org/robots.txt Indexer[1152]: [1] Done (102 seconds) That was the new sites to be indexed. No go. The indexer fetches robots.txt and then refuses to walk. Until now everything seemed to work perfectly well. Any ideas on what might be causing this weird behaviour? Regards, Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Bug report
UdmSearch version: 3.1.7 Platform: i586 OS:RH Linux 6.2 / 2.2.16 Database: MySQL 9.38 / 3.22.32 Statistics: Perl Severity: cosmetic. The search page reports results +1. E.g. if 20 results per page are requested, the caption on the results page will say "Displaying documents 1-21 of xxx found". If less than one page worth of results are found, the caption will increase them by 1, e.g. "Displaying documents 1-4 of 3 found". Similarly, in the bottom of the first results page, links appear to subsequent pages even when all the results fit in the first page. E.g., if only three results have been returned, there will still be a link like "<< Previous 1 2 3 Next >>" at the bottom of the page, where "Previous" and "Next" are dead, but 2 and 3 are live and point to a page containing the last result of the three already shown. For a live example go to http://194.109.240.22/cgi-bin/search.cgi and search for "bronson". Regards, Z __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: New message on the WebBoard #1: Period directive?
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The instructions in the indexer.conf-dist file say that M is minute and m is month. However, the examples given right after the instructions indicate the opposite. Which is correct? Reply: <http://search.mnogo.ru/board/message.php?id=611> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: New message on the WebBoard #1: udmsearch.robots not found ?!
Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: I just installed 3.1.7 on Linux 2.2.16 with: ./configure --with-mysql (3.22.32) make make install and no changes in the configuration file. I proceeded to create one database and tables with: mysqladmin create udmsearch mysql udmsearch < multi.txt Do I understand the INSTALL instructions correctly in that multi.txt replaces both create.txt and all stop.lang.txt files? I edited indexer.conf minimally and left Robots at the default yes. Running indexer fails with the following error: Indexer[15034]: indexer from UdmSearch v.3.1.7/MySQL started with '/usr/local/udmsearch/etc/indexer.conf' Indexer[15034]: [1] Error: '#1146: Table 'udmsearch.robots' doesn't exist' Changing to "Robots no" in indexer.conf doesn't help. I grepped the entire documentation for 'udmsearch.robots' and found nothing. Thus, I have no idea what the udmsearch.robots table needs to look like and how to create it. Does anyone know what's wrong and how it can be fixed? Regards, Z Reply: <http://search.mnogo.ru/board/message.php?id=602> __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]