Webboard: multilanguages sites
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Use different tags. Latest 3.1.x branch also supports search limits by language. I have done a web site with 5 differents languages. Each language shows differents pages, that can be only in one language Example, a page can be in the french site, but not in the english one. I've installed udmSearch on the french version, and i start to index it. Everythings works fine. A search in the french version will shows the resutls of the french website. But I'd like now to index the english one. And I want that the results diplayed by the search engine ine the english version would be only the english pages, and not the french one. How to do it, using only one indexer.conf file ?? Reply: http://search.mnogo.ru/board/message.php?id=1921 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Indexing ok, command line show info, no web output
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Do you use built-in or SQL version? What database is being used in last case? Some of them provides SQL rueries logging. What SQL queries are sent at a search time? Hi guys, I have not found any info in this subject about the error, more than an error is no errors but not search results. In fact, I was able to index my site not problem, I did have some problem to indexing the site first, but after adding the line in the config file it worked. Now I am to a point that I can use the search but I get not output at all in fact, here is what is says: Sorry, but search returned no results. Any suggestions! It works fine from the command line: example: ./search.cgi remo I have at least 10 results. Reply: http://search.mnogo.ru/board/message.php?id=1922 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Link length
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: It is 128 characters. Can you tell me please what is the maximum link length that indexer can process? I mean how much characters can be in HREF ? Thanks, Alexander Reply: http://search.mnogo.ru/board/message.php?id=1925 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: How to index ftps with non-standard port numbers
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Why did you decide it? You must change function to connect to mysql database. See code bellow static int InitDB(DB*db) { mysql_init(amp;(db-gt;mysql)); if(!(mysql_real_connect(amp;(db-gt;mysql),DBHost,DBUser,DBPass,DBName,DBPort?port:0,NULL,0))) { fprintf(stderr, quot;Failed to connect to database: Error: %s\nquot;,mysql_error(amp;(db-gt;mysql))); db-gt;errcode=1; return(1); } db-gt;connected=1; return(0); } Reply: http://search.mnogo.ru/board/message.php?id=1924 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Search bugs for Windows
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Do you use ispell in indexer? hi if i search for image it would give me 172 results, but if i search for images it gives me no result, however i am able to see the title quot;Macbeth, Images of blood and waterquot; when i search for Image Image http://essay.studyarea.com/cgi-bin/essay/search.exe?q=Imageamp;ps=20amp;o=0amp;m=anyamp;wm=wrdamp;ul=amp;wf=10 Images http://essay.studyarea.com/cgi-bin/essay/search.exe?q=Imagesamp;ps=20amp;o=0amp;m=anyamp;wm=wrdamp;ul=amp;wf=10 Reply: http://search.mnogo.ru/board/message.php?id=1928 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Search.cgi Problem
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: When I go to run the Search.cgi program, I get a Error 500 Internal Web server error. The Apache error-log tells me that the library mysqllient.so.9 cannot be found. I have added this path to the LD_LIBRARY_PATH and re-compiled and installed over again. Still no go...any ideas? You have to set this varible at run time, not in compilation time. Check also our faq at http://search.mnogo.ru/faq.html Reply: http://search.mnogo.ru/board/message.php?id=1935 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Search bugs for Windows
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Yes i use ispell. JOe You have to add correspondent ispell commands into your template, Default one has some examples. Reply: http://search.mnogo.ru/board/message.php?id=1955 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: description and txt
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: It is $DE template variable. Take a look into doc/templates.txt to check all available variables. Hi is there a feature that when a site have description, it will show the decription instead of the txt. and if no description, it will show the txt Joe Reply: http://search.mnogo.ru/board/message.php?id=1956 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Multiple databases
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: why don't you use tag to seperate the two sites Joe Which tags and sites do you mean? Reply: http://search.mnogo.ru/board/message.php?id=1957 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Unable to configure mnogosearch...
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I have no ideas :-( BTW, i'm using RedHat 7.0 I know that there are some compatibility problems with the new GNU packages in RH 7.0... I've tried to ./configure mnogosearch-3.1.12.tar.gz and udmsearch-3.0.23.tar.gz it was the same fault.. Reply: http://search.mnogo.ru/board/message.php?id=1980 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Same Trouble search displays blank pages
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: What http server do you use? I had that problem on another server..that wasn't a hard fix..depends on how you set the conf-dist and htm-dist,what database...etc The problem I am having now is the search form itself will not show up when executing via browser,if I remove the search.htm file I should get an error..eg:template not found,not even that..run it from telnet and prints out the html format of my search.htm,so the script does know where to find it via telnet..but this browser thing is got me crazy,seems to me I should at least get an error page.. Reply: http://search.mnogo.ru/board/message.php?id=1987 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Same Trouble search displays blank pages
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: It seems that search.cgi crashes during execution. Try to test it from command line: ./search.cgi some_word Does it crashe? I have exactly the same problem!!! ./indexer -S Database statistics StatusExpired Total - 200 0 33 OK 302 0 13 Moved Temporarily 404 0 1 Not found - Total 0 47 33 OK urls and http://search.easy-list.net/bin/search.cgi is ALWAYS blank!! Reply: http://search.mnogo.ru/board/message.php?id=1988 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: TXT over 255 characters
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Is it possible to ajust the TXt field to catch more than 255 charaters. I have tried to define the field to a [txt] field with the limet of 65535 characters. Not that i will use the amount, but can it be posible to define how much to indexer. Best regards Thomas Thygesen. You are right, you have to change txt field definition in SQL table. Also there is a #define UDM_MAXTEXTSIZE 255 in udm_common.h which is respobsible for text length. Change it and recompile. Reply: http://search.mnogo.ru/board/message.php?id=1989 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Same Trouble search displays blank pages
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: What is displayed in both cases: with and without template? What can you see in "View HTML source"? I had that problem on another server..that wasn't a hard fix..depends on how you set the conf-dist and htm-dist,what database...etc The problem I am having now is the search form itself will not show up when executing via browser,if I remove the search.htm file I should get an error..eg:template not found,not even that..run it from telnet and prints out the html format of my search.htm,so the script does know where to find it via telnet..but this browser thing is got me crazy,seems to me I should at least get an error page.. Reply: http://search.mnogo.ru/board/message.php?id=1990 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Link not found
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Run indexer -amv6 and check it's output when it visit topics list. I've got a little problem. Actually I'm running UltraBoard conference on my site and it generates links to the topics like this: lt;a href=quot;UltraBoard.pl?Action=ShowPostamp;Board=mssqlamp;Post=779amp;Idle=365amp;Sort=0amp;Order=Descendamp;Page=0amp;Session=quot; OnMouseOver=quot;window.status='Read this (Encoding problem when using DTS) topic.';return true;quot; OnMouseOut=quot;window.status=''quot;gt;Encoding problem when using DTSlt;/agt; - After running indexer - it doesn't add this link : quot;UltraBoard.pl?Action=ShowPostamp;Board=mssqlamp;Post=779amp;Idle=365amp;Sort=0amp;Order=Descendamp;Page=0amp;Session=quot; to the database (url table) and so doen't index it. Can you tell me what can cause the problem? Many thanks, Alexander Reply: http://search.mnogo.ru/board/message.php?id=1991 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: ERROR: Cannot insert a duplicate key intounique index url_url
I can't believe that. The only thing that I can realize is that it is the PostgreSQL who sometimes is "successul" in inserting duplicate keys. Peter Hanecak wrote: Hello, On Thu, 11 Jan 2001, Alexander Barkov wrote: Author: mocha Email: [EMAIL PROTECTED] Message: i see a lot of these errors in my postgres log: ERROR: Cannot insert a duplicate key into unique index url_url ... ERROR: Cannot insert a duplicate key into unique index url_url is that normal? This is normal. indexer is trying to add documents which are already in the database. It ignores "duplicate key" errors after an attempt to run "INSERT INTO url ...". Looks like indexer is sometimes "sucesfull" in inserting duplicate URL into url table because it happens quite a few times to me that I dumped DB for backup and when trying to restore it, I get error: ERROR: Cannot insert a duplicate key into unique index url_url It happens with older mnogosearch 3.1.x indexers and it also happens with mnogosearch 3.1.12 indexer and clean database at start. I'm using PostgreSQL 7.0.3 on Linux 2.4.3, glibc 2.2.1, threads enabled (and used 2 threads when indexing). ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Core dump under Solaris
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Seeing this report I can imagine that you compiled threaded version. It that true? If yes, how did you do it? There is only FreeBSD and Linux threads support. gdb quot;/usr/local/mnogosearch/sbin/indexer -a /usr/local/mnogosearch/etc/indexer_open.confquot; core GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type quot;show copyingquot; to see the conditions. There is absolutely no warranty for GDB. Type quot;show warrantyquot; for details. This GDB was configured as quot;sparc-sun-solaris2.6quot;...unknown option `-a' Core was generated by `/usr/local/mnogosearch/sbin/indexer -a /usr/local/mnogosearch/etc/indexer_open.'. Program terminated with signal 11, Segmentation Fault. Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libxnet.so.1...done. Loaded symbols for /usr/lib/libxnet.so.1 Reading symbols from /usr/local/lib/mysql/libmysqlclient.so.6...done. Loaded symbols for /usr/local/lib/mysql/libmysqlclient.so.6 Reading symbols from /usr/lib/libm.so.1...done. Loaded symbols for /usr/lib/libm.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-250/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-250/lib/libc_psr.so.1 Reading symbols from /usr/lib/nss_files.so.1...done. Loaded symbols for /usr/lib/nss_files.so.1 Reading symbols from /usr/lib/nss_dns.so.1...done. Loaded symbols for /usr/lib/nss_dns.so.1 Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 #0 0x0 in ?? () (gdb) backtrace #0 0x0 in ?? () #1 0xef4d7494 in res_init () from /usr/lib/libresolv.so.2 #2 0xef4daeac in gethostbyname2 () from /usr/lib/libresolv.so.2 #3 0xef5c0c28 in _gethostbyname () from /usr/lib/nss_dns.so.1 #4 0xef5c0ce8 in getbyname () from /usr/lib/nss_dns.so.1 #5 0xef655244 in nss_search () from /usr/lib/libc.so.1 #6 0xef5197ac in _switch_gethostbyname_r () from /usr/lib/libnsl.so.1 #7 0xef52fcb4 in _door_gethostbyname_r () from /usr/lib/libnsl.so.1 #8 0xef5179cc in _get_hostserv_inetnetdir_byname () from /usr/lib/libnsl.so.1 #9 0xef52f6d0 in gethostbyname_r () from /usr/lib/libnsl.so.1 #10 0x2b81c in UdmHostLookup (Conf=0x552b0, connp=0x81048) at host.c:140 #11 0x23528 in open_host (Indexer=0x7d8f0, hostname=0xefffcc4c quot;www.cimaglobal.comquot;, port=80, timeout=30) at proto.c:259 #12 0x23908 in UdmHTTPGet (Indexer=0x7d8f0, header=0xefffe650 quot;GET /main/index.htm HTTP/1.0\r\nIf-Modified-Since: Fri, 23 Mar 2001 11:47:42 GMT\r\nUser-Agent: UdmSearch/3.1.12\r\nHost: www.cimaglobal.com\r\n\r\nquot;, host=0xefffcc4c quot;www.cimaglobal.comquot;, port=80) at proto.c:370 #13 0x1488c in UdmIndexNextURL (Indexer=0x7d8f0, index_flags=0) at indexer.c:712 #14 0x12324 in thread_main (arg=0x0) at main.c:256 #15 0x12d04 in main (argc=1, argv=0xecd3) at main.c:596 Reply: http://search.mnogo.ru/board/message.php?id=1993 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Need help with regexp in config
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: The first one, with space between. Are the disallow line above one or two commands? ie: Disallow *ubbmisc.cgilt;space heregt;*privatesend* ? or Should it be *ubbmisc.cgi*privatesend* ? Reply: http://search.mnogo.ru/board/message.php?id=2008 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: BIG BUGS descriptions fixes
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hello! I've posted a message with topic quot;Too many open filesquot; a week ago and there are no any messages about it. It makes me sad. I like your project, but your support and programming culture is BAD ENOUGH!!! 1) The first problem was in quot;too many open filesquot; topic. As I've guessed first time, you forgot to close the TCP socket when connection to host fails (time out). Look at the line 265 in proto.c (function open_host)... I've found this bug in 20 minutes by simple reading and text search operation though all your source. I do not understand why developers did not react to my message. They get money for installing and supporting their system from clients. It seems that it's senseless to pay them money for support. Hey, guys, do not loose your clients. We develop such big project first time. So probably your are right, our programming culture may be still not good enough. 3 years ago personally me even didn't know what is socket. So we are learning how to open and close them developing the project. Yes, that's my code. An I'm VERY VERY VERY sorry that I forget to close the socket when connection fails. Probably this is because I didn't had so many timeouts to test this case. We are currently very busy developing new 3.2 branch, it will have many new nice features. It can seem that we don't react because we are doing our best to give first release as soon as possible. However all bug reports are collected and considered how to fix them. All fixes will be incorporated in both 3.1.13 release and new 3.2.x releases. 2) Your UdmEscapeURL() function from udmutils.c (line 394) is WRONG. It does not escapes russian characters. More accurate and precise variant of while statement is the following one: for ( ; *s; s++,d++){ if (isalnum(*s)) *d=*s; else if (*s==' ') *d='+'; else { sprintf(d,quot;%%%02Xquot;, (unsigned char)*s); d+=2; } } This is known RFC incompatibility. Unfortunatelly, this DOES NOT WORK under Apache with mod_charset available from apache.lexa.ru. Incorrect behaviour appears when somebody press Next page link and browser and CGI script works in different character sets. Links became broken, all letters in the range 128-255 are not in the original form, posted by user. And CGI even does not know the original form. It have already recoded query string. So we didn't implement this because: 1. This DOES NOT affect non-Russian users. At least we never got such bug reports from non-Russians. All national characters work fine for Gemans, Czechs, Hebrews and many many other people. 2. Apache with mod_charset is the MOST POPULAR in Russian word. I hardly will be much wrong if I say that 95 % Russians web servers work under Apache with mod_charset. 3. This DOES NOT WORK under Apache with mod_charset. 4. Our version DOES WORK under Apache with mod_charset. 5. Our version DOES WORK almost under any HTTP server without built-in charset processing. I agree we can add something like --enable-escaping into configure with conditional compilation for this piece of code. But trust me we had much work to do implement this before. We had only one related bug report, it was from guys who port OS2 version of msearch. So we spent out time for other most requested things. 3) Your HTML parsing is wrong in some cases. For example, they are parsing META-tag Content-Type/charser and Refresh/URL. Can you imagine that quot;URLquot; may be in lower or mixed cases??? I've repaired a lot of your lines like those (parsehtml.c, line 190): if(!strcasecmp(tag.name,quot;refreshquot;)){ if((href=strstr(tag.content,quot;URL=quot;))) href+=4; }else Right code is: if(!strcasecmp(tag.name,quot;refreshquot;)){ if((href=strcasestr(tag.content,quot;URL=quot;))) href+=4; }else Don't look for strcasecmp in manuals. It's handwritten function. I hope you are able to write it in 5 minutes. This is several lines above: // Make lower string for(l=s;*l;*l=tolower(*l),l++); 4) You have some bugs in spelling module when using two different languages. It does not work properly in some cases of placement Affix and Spell lines in config file. Ivan, please provide more information. Your third bug report was very informative, unfortunately it was fixed before it appeared. Reply: http://search.mnogo.ru/board/message.php?id=2016 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: HOW TO protect indexer from DoS attacks?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Imagine that there is a magic index.php file. It has only a link to index.php?id=1. The last one has a link to index.php?id=2, etc., the N-th one (index.php?id=N) has a link to index.php?id=N+1... So, it's a simple example of DoS attack to MnoGoSearch based search system. Is there a way to protect system from it? I think a good one is to set limit of URL per each server. But it should not require a lot of additional resources... Any ideas? Use MaxHops indexer.conf command. Reply: http://search.mnogo.ru/board/message.php?id=2017 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Remote Quering
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I was wandering. If I want to query my database from a remote server, then how do I get cache data over the query?? So for example, ServerA is webserver with index.html ServerB is Search server with all sql/cache data stored on it. how do I get ServerA to query ServerB for words? There is a solution: 1. install apache on ServerB 2. configure Apache on ServerA to consider some path, for example http://ServerA/search/, as a remote data from ServerB. You have to add mod_proxy, as far as I remember it is not built by default. Then add something like this into httpd.conf on ServerA: ProxyRequests On ProxyPass /search/ http://ServerB/search/ ProxyPassReverse/search/ http://ServerB/search/ 3. Put search.cgi into /search/ directory of ServerB After that all requests to http://ServerA/search/search.cgi will cause ServerA to do request to http://ServerB/search/search.cgi and return it's result to client. This is known tecknology to distribute web server between several machines and keep all resources available under the same server name, i.e. http://ServerA/ in your case. Note that all machines except ServerA may be hidden in internal network which is not available directly from internet or protected by firewalls. Reply: http://search.mnogo.ru/board/message.php?id=2023 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: 404 URL's
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: My pages are accessed by something like main.php?p=1400amp;mt=1 where quot;mtquot; is section and quot;pquot; is page name. In database I have indexed only pages like 1400.php (status 200) but URL's like main.php?p=1400amp;mt=1 has status 404 - not found! Any suggestions? indexer stores server response. Check Apache's access_log. Whats is written there for main.php?p=1400mt=1 ? Has it status 200 or 400? Reply: http://search.mnogo.ru/board/message.php?id=2025 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: HOW TO protect indexer from DoS attacks?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: There are many features planned to be implemented in future releases. I think this one is under low priority. Do you really have such problem with DoS attack? Hm, nice feature. But it helps only in linear DoS attack. If it will a tree, for example decimal one - this feature useless. Look: index.php has 10 links: index.php?id=0, index.php?id=1,...,index.php?id=9. ... index.php?id=abc has 10 links too: index.php?id=abc0, index.php?id=abc1,...,index.php?id=abc9. etc. So, for example, MaxHops=8. It is small enough value and I do not want to decrease it more. But in this case it is possible to flood 10^8 links before the limitation will be used. MaxDocuments variable will protect indexer much better. Reply: http://search.mnogo.ru/board/message.php?id=2027 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: 404 URL's
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Status in Apache access_log is 200... Try to reindex those pages using: indexer -am -s404 Reply: http://search.mnogo.ru/board/message.php?id=2031 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: Parsing URL Values
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hi! Take a look into 'Using alias in Realm command' section of alias.txt This is very powerful thing. I hope it's what you need. hi guys! this has nothing much to do w/ mnogosearch engine, but i really need help and i know that you people got more expirience than me :) what i need to do is something like the google directory structure: let's suppose i have a URL like this: http://www.site.com/section1/lesson1/computers/computer.html the /section1/lesson1/computers/computer.html are nothing but variables (it could be like this too: ?sec=1amp;les=1amp;topic=computersamp;file=computer - but this is not readable by the common user) These variables (section1, lesson1, computers, computer) would be then related to ID's in a MySQL database. What i need is a way (using PHP and APACHE) to parse those variables without getting the ERROR 404! how can i do it? is it possible w/ PHP4? do i need to make a DHanlder (default handler) in APACHE and how? where i work , we use Mason and PERL but we are switching to PHP. If you go to de Google Directory you will understand what I need Hope for an answer... Tanx guys! :) Sergio Reply: http://search.mnogo.ru/board/message.php?id=2046 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: Index 2 or more sites problem! - please help
Please try indexer -amv6 Tek Guy wrote: Hello, I still have problems with indexing, I would like to be able to index 3 of our websites but it seems like it only accept 1 and only index 1 page. I'm using version 3.1.12 with MySQL support - Configuration file i specified: Robots no Server site http://www.domainX.com --Log of oupt Indexer[16469]: indexer from mnogosearch-3.1.12/MySQL started with '/usr/local/m nogosearch/etc/indexer.conf' Indexer[16469]: [1] http://www.domainX.com/ Indexer[16469]: [1] Server 'http://www.domainX.com/' Indexer[16469]: [1] Allow by default Indexer[16469]: [1] HTTP/1.1 200 OK Indexer[16469]: [1] Date: Thu, 19 Apr 2001 17:08:41 GMT . Indexer[16485]: [1] "/opalbum/": Allow by default Indexer[16485]: [1] "/chatting/": Allow by default Indexer[16485]: [1] Done (0 seconds) --log ends-- Hi! Check your robots.txt as well as main page for links. Note that indexer does not follow links within JAVA scripts. YOu can also run indexer with these options: indexer -amv6 http://www.domainX.com/ and check it's output. indexer will display a lot of debug information including all found links and reasons why indexer accepts those links or does not accept them. Tek Guy wrote: Hello, I have 2 "Server site http://www.domainX.com/" lines with different value of X in the configuration file "indexer.conf" but for some reasons it only index the first site and ignores the second one. I'm using mnogosearch-3.1.12 and the original dist conf indexer.conf- dist after renaming to indexer.conf. 1. My problem, in only index 1 site instead of all the sites specified with "Server" command. 2. Indexing only work for the first page ie "index.html" instead of all the pages with respect to http://www.domainX.com/ setting. What I mean is it doesn't follow the links on the main page and index those pages. In conf file, i have "Follow site" If anyone has a configuration, could I have a copied that can index multiple pages within the same domains by following the links and also multi domains. Powered by a href=http://www.vietmedia.comhttp://www.vietmedia.com/a Free E-mail, Instant Messaging, and more! ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: robots.txt problem
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: It seems that robots.txt still hasn't been indexed. Run indexer -amu http://servername/robots.txt then run indexer in usual manner. I created a robots.txt file and placed it in the root of my web site. The contents of the robots.txt are as follows: User-agent: * Disallow: / This should keep all search engines from indexing my site. When I run mnoGoSearch, it still indexes all the pages. I have the robots.txt option checked in the servers tab. Any ideas? Thanks. Reply: http://search.mnogo.ru/board/message.php?id=2057 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Webboard: $DD cut off: solution?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I found in udm_indexer.h a line: #define UDM_MAXDESCSIZE 100 I changed to #define UDM_MAXDESCSIZE 254 Heiko Yes, that's right place to change description size. Reply: http://search.mnogo.ru/board/message.php?id=2068 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Re: Webboard: Indexing only .iso files
Try this combination instead of your one: Allow *.iso HrefOnly * duncan wrote: Here is the output: Indexer[31136]: indexer from mnogosearch-3.1.12/UdmDB started with '/home/mnogo/etc/indexer.conf' Indexer[31136]: [1] http://www.redhat.com/robots.txt Indexer[31136]: [1] Server 'http://www.redhat.com/download/' Indexer[31136]: [1] Disallow NoCase * Indexer[31136]: [1] http://www.redhat.com/download/mirror.html Indexer[31136]: [1] Server 'http://www.redhat.com/download/' Indexer[31136]: [1] Disallow NoCase * Indexer[31136]: [1] Done (0 seconds) thank you! On Tue, 24 Apr 2001, Alexander Barkov wrote: Please run indexer -amv6 and check it's output. It will print an information about all found links. duncan wrote: Hello, and thanks matthew- I tried what you suggested, and in fact, here is the whole conf file: Allow */ CheckOnly *.iso Disallow * Server http://www.redhat.com/download/mirror.html and it only returns this: Indexer[30131]: indexer from mnogosearch-3.1.12/UdmDB started with '/home/mnogo/etc/indexer.conf' Indexer[30131]: [1] http://www.redhat.com/robots.txt Indexer[30131]: [1] http://www.redhat.com/download/mirror.html Indexer[30131]: [1] Done (0 seconds) Something isint right there, and I dont know where to do with it. I feel like i try so many things... is there more doc. out there, or more examples? thnaks, i appreciate your response -- || || || || || || duncan shannon [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Ignoring navigation text
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: If it is your site, you can use !--UdmComment-- or NOINDEX tags. Check documentation. Hi all, we have set up our pages using HTML for the navigation - this means that when the Indexer (indexing the pages) runs, it indexes the navigation words, so our display shows things like quot;Local Info Americas Africa/Middle East Asia Australasia Europe Corporate profile Financial management Any questions Research amp; sponsorship Malta CIMA university award This yearamp;s ceremony took place at the University of Malta on 23 Novembquot; where Local Info Americas Africa/Middle East Asia Australasia Europe Corporate profile Financial management Any questions Research amp; sponsorship are all part of our navigation (this looks pretty dumb) ios there a way to get the indexer to ignore these thanks Michael Reply: http://search.mnogo.ru/board/message.php?id=2094 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: next_index_time query
Hello! Yes, you are right. You may also use big Period command for those pages. - Original Message - From: Anand Raman [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, April 25, 2001 10:50 AM Subject: next_index_time query HI guys I am operating mnogosearch under db mode in postgresql. I want to stop some urls from reindexing.. Can i just change the next_index_time column in the url table to some value and prevent this from happening.. Any comments Thanks Anand ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Webboard: Ignoring navigation text
Gavin Love wrote: If you use !--UdmComment-- or NOINDEX does the indexer still follow the links contained within the area enclosed by the the tags? or will it simply not store the text, in the txt field in the url table? It will follow links, but will not add those words into word index, as well as will not add into TXT field. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: charset
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Check HTTP headers which are sent by your web-server. Try this: wget -s http://localhost/ What can you see in Content-Type header? Hello All, I tried to index web site with Cyrillic koi8-r charset, but indexer didn't store any russian words in dict table, only latin. As result, I can search latin words, but not russian indexer.conf: - # This is a minimal sample indexer config file DBAddr mysql://user@pass:localhost/mnogosearch/ #DBMode crc LocalCharset koi8-r CharSet koi8-r #Ispellmode text #Affix ru /usr/local/share/ispell/russian.aff #Spell ru /usr/local/share/ispell/russian.dict ServerTable server Server http://localhost/ # Allow some known extensions and directory index Allow *.html *.htm *.shtml *.txt */ # Disallow everything else Disallow * Reply: http://search.mnogo.ru/board/message.php?id=2104 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: charset
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Please contact me by email % wget -s http://localhost/ --22:45:45-- http://localhost/ =gt; `index.shtml' Connecting to localhost:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] index.shtml has the following header lt;htmlgt; lt;headgt; lt;META HTTP-EQUIV=quot;Content-Typequot; CONTENT=quot;text/html; charset=koi8-rquot;gt; Reply: http://search.mnogo.ru/board/message.php?id=2110 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: 3.1.12 and MacOSX
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Are there any debugging tools like gdb under MacOSX? Hi all, We report to all are interesting with this, that mnogo install correctly on the new MacOSX platform. Compile with only one warning. (OSX 10.0.0 - Tenon iTools 6.01 (Apache) - MySql 3.23.27) The only problems is that it made a quot;segmentation faultquot; at the end of indexing. Therfore the database seems to be correct, and searching run fine with search.cgi. You can see it (in french) at http://mno.imotep.com/cgi-bin/search.cgi (this is a special Porsche crawl ;-) Reply: http://search.mnogo.ru/board/message.php?id=2137 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: 3.1.12 search.cgi remote gaining shell access exploit fix
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Thanks. This fixed in 3.1.13 sources. Bad news. I just check your very recent search.c v1.23 via WWW cvs and see that you add tmplt= variable parsing there. Previous buffer overflow (I post the patch for) overflows data segment and stack by some indirect tricks, but new tmplt= parsing allow direct writing to the stack because template[] is on the stack of main(). Dangerous code is: sprintf(template,quot;%s%s%squot;,UDM_CONF_DIR,UDMSLASHSTR,token+6); It overflows even with my posted fix because UDMSTRSIZ for token increased by UDM_CONF_DIR+UDMSLASHSTR count characters. If someone have UDM_CONF_DIR long enough for shell code, he'll got it. Reply: http://search.mnogo.ru/board/message.php?id=2138 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Linux binary available?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Is there a Linux binary install of mnogosearch available that can be installed without quot;rootquot; privilege and works with MySQL? Thanks. There is no binaries. You may install from sources without having root access. Reply: http://search.mnogo.ru/board/message.php?id=2142 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: question
No. La Rocca Network wrote: Hi ! is there any way to include a site in more than one category ? Regards, Nelson ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Few random things
Briggs, Gary wrote: Has anyone here got a way of indexing powerpoint or visio documents? Changing the document is not viable; I need a way to get the strings out of it. strings is not too bad on powerpoint, but for visio it's not worth the effort. You may use so called external parser - any program which can convert visio documents into text or html. Check doc/parsers.txt Also, Is there any way to convert documents with this in them: META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=windows-1252 ? I'd ideally like to convert them to something more standard... Can I do this? What format do you want to get after convertion? As in, I can't change anything. At all. I need a way to do all these things in the search engine. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: indexing multiple sites
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: You may use URL limits. Take a look into default search.htm. SELECT NAME=ul is responsible for them. I need to know like I can index several sites, and that the finder allows to look for me in: - All the sites - Each site in particular form - Some section of some site . Example: I've the folowing URLs to index: - http://www.tercera.cl/ - http://www.tercera.cl/sitios/ - http://www.tercera.cl/casos/ - http://www.lacuarta.cl/ - http://www.lacuarta.cl/temas/ - http://www.lacuarta.cl/sitios/ - http://www.deportivo.cl/ - http://www.mouse.cl/ and some others... Someone can send me the indexer.conf and some search.htm? thanks a lot. Reply: http://search.mnogo.ru/board/message.php?id=2172 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: htdb and his first entry
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Try this indexer.conf command URLWeight 0 hello, my question: how do i get ride of the first entry in the url table with all the other urls produced by htdblist inside? why? because this entry come out als first when somebody search for a word included in the url! thanks, manu :-) Reply: http://search.mnogo.ru/board/message.php?id=2173 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Errors During Make
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: RH 5.2 i686 linux Getting these errors during make AM_PROG_LIBTOOL not found in lib AM_DISABLE_SHARED not found in lib What do you think? Which version of msearch are you using? Is it taken from CVS? If so, probably you have to upgrade automake and autoconf. Reply: http://search.mnogo.ru/board/message.php?id=2174 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Long URLs and 3.2 branch
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hello! To fix this in 3.1.x: 1. Change SQL url table structure, make url field longer. 2. Change UDM_URLSIZE definition in udm_common.h 3. Recompile Hello All! I just stumpled across a problem, is hopefully going to be solved. In the mnogo 3.1 branch URLs which are longer than 128 bytes are obviously not supported. I found a mail that says, this will be tackled in the 3.2 branch. Question: Is there a timeline for the 3.2 branch? When will it be released? or Does anyone have a patch, which solves the problem for mnogosearch with mysql? Thanks for your answers, Markus Reply: http://search.mnogo.ru/board/message.php?id=2175 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: resulting url when using frameset
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Is there a simple way to give as answer to a search the frameset file rather than the file appearing in the tags lt;Frame src= ... Unfortunatelly it's not implemented. Reply: http://search.mnogo.ru/board/message.php?id=2176 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: need to decode Intag field
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: When phrase yes: It is combined using word position and it's weight: pos*0x1+weight When phrase no word appearance count is used instead of it's pos: count*0x1+weight Reply: http://search.mnogo.ru/board/message.php?id=2177 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: multilanguage text
3.2.x branch will have language guesser. It's already implemented and work very fine for single-language pages or even mostly single-language pages. I hope first release of 3.2.x will be available in May. Danil Lavrentyuk wrote: [ On Wed, 9 May 2001, Maxime Zakharov wrote: ] MZ And what if a site having many texts uploaded by users? MZ Have I manualy edit all they satting lang attributes? :) MZ Have I demand it from uploader? They will not. MZ MZ Users may upload big mega gifs as .html files :) It would be an obvious fraud... MZ Let talk about W3C recommendations. ... but ignoring of far-away-placed committee's recomendations could be a simply laziness. Not all of the software use all of the recomendations. Not all of users know all of the recomedations. Even not all of users think on using such recomendations. Text could be converted to HTML from someone another text fromat. Who, for example, will check for foreign phrases such text like big books which consists of many volumes (like Amber by Zhilazny or Wheel Of Time by Jordan or even bigger)? :) Let's tall about real world where we would have to index multilanguage texts without lang attributes. MZ What if I have to index texts placed somewhere in the internet, not locally? MZ What if a site contains texts of many books (something like www.lib.ry, for MZ example)? MZ MZ Sometime, without explicit language definition it's impossible uniquely MZ select language for a word. MZ For example, word 'test' may be english or german. I know. Think it is real (but hard, I see) to make a system which could guess what the text's language is. It could use 2 steps: 1) Create a list of encodings this text could be written in (symply by testing, is all of the word's characters are aplhas in this encoding). Here we could think that a two or more successive foreign words are from the same language. 2) Check (using ispell tables) all the languages which use encondigs from list (created above), looking for one where this words are correct. 3) (optoinal) If there more then one language suitable, select one that was seelcted for the previous phrase. OK. This method does not gurantee that selection will be correct always. But in the most cases it will. Yes, I know, this method is not too quick... But it is better then no any method at all. Any way it is good to make it able to turn it of in the indexer.conf file or by a command line option. Danil Lavrentyuk Communiware.net Programmer ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: indexing multiple sites
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: indexer.conf is OK. Don't forget to configure SELECT with name ul in your search.htm need to know like I can index several sites, and that the finder allows to look for me in: - All the sites - Each site in particular form - Some section of some site . Is this OK in indexer.conf? # Period 1d Serverhttp://www.quepasa.cl/ Server path http://www.quepasa.cl/sitios/ Server path http://www.quepasa.cl/sitios/especiales/ Server path http://www.quepasa.cl/sitios/enfoco/ Serverhttp://deportivo.tercera.cl/ Serverhttp://dirigible.tercera.cl/ Serverhttp://mouse.tercera.cl/ Serverhttp://www.lacuarta.cl/ Server path http://www.lacuarta.cl/sitios/ Server path http://www.lacuarta.cl/temas/ Serverhttp://www.lahora.cl/ Serverhttp://mujer.tercera.cl/ Serverhttp://siglo20.tercera.cl/ Serverhttp://www.radiozero.cl/ Serverhttp://papasfritas.tercera.cl/ Serverhttp://www.tercera.cl/ Server path http://www.tercera.cl/sitios/ Server path http://www.tercera.cl/casos/ Reply: http://search.mnogo.ru/board/message.php?id=2189 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Mp3 Search
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hello. I am creating an mp3 search engine and i have a such problem. When I am indexing some mp3s some of them gets status 206 (partitialy ok) It's OK. indexer does not download whole file. It checks only those document's parts where MP3 headers are expected to be found. and I can't search for that mp3s but as I see in url table it have written the description for mp3s Probably those files have empty MP3 tags. Reply: http://search.mnogo.ru/board/message.php?id=2078 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Webboard: Titles incorrect for pdf files
Richard Wall wrote: - Original Message - From: Richard Wall [EMAIL PROTECTED] Yeah, it works really well. Infact it accepts a third argument, the URL of the page so I've modified your shell script as follows, using the $UDM_URL environment variable set by mnogosearch... Actually, I've discovered a problem. When indexing certain pdf documents, the doc2html perl script hangs and uses 100% processor resources. It always gets stuck at the same place... confident that the automotive sector can But I can't understand why. Alexander, could you try indexing this document with doc2html.pl... http://elkie.coventry-id.co.uk/~richard/wb58.pdf to see if you get the same problem. pdfinfo called from doc2html does not return anything to stdout. It warns about bad format to stderr: /usr/home/bar pdfinfo wb58.txt Error: May not be a PDF file (continuing anyway) Error (0): PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table So, doc2html seems to wait for pdfinfo output forever. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: mnogosearch on intranet
íÏÖÅÔÅ ÌÉ ÷Ù ÞÉÔÁÔØ ÐÏ-ÒÕÓÓËÉ? Can you read Russian? Florin Andrei wrote: On 12 May 2001 15:35:36 +0500, Alexander Barkov wrote: We tested up to 5 mln document in so called cache mode storage. But this mode is still in beta and people reports that it does not work properly in some cases. What is the typical error for cache mode? If it's not something important or obvious, then i hope i could use it. -- Florin Andrei Remember, son: if you never try, you never fail - Homer Simpson ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Doc Relevance ($DR)
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: All of my returned results show the same relevance/rating [1] when using $DR... Perhaps I don't understand the meaning of this. I expected results at the top to have a higher rating. I have a rough idea of how this is being calc'ed (how many times does the word appear in dict.word)... Suggestions?? Comments?? Thanks It's covered in the doc/relevancy.txt Reply: http://search.mnogo.ru/board/message.php?id=2208 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: htdb and his first entry
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: it doesn't work! the entry is still there!!! (of course deleted first the db: indexer -C) ??? manu :-) Try this patch --- sql.c.orig Thu Mar 29 19:04:08 2001 +++ sql.c Tue May 15 13:22:24 2001 @@ -4405,7 +4405,7 @@ #ifdef HAVE_MYSQL MYSQL_ROW row; row=mysql_fetch_row(db-res); - sprintf(s,a href=\%s\%s/abr\n,*row,*row); + sprintf(s,a href=\%s\/abr\n,*row); s=UDM_STREND(s); #else sprintf(UDM_STREND(s),a href=\%s\%s/abr\n, Reply: http://search.mnogo.ru/board/message.php?id=2209 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: different indexing/splitters
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: what would happen if I indexed some urls with mnogo+sql+cachemode, but then split the cachelogs using a different version that was mnogo+sql+phrase+cachemode+fasttag+fastcat? That seems to be a reason of empty results- pages with 0 file size, No title and so on. Reply: http://search.mnogo.ru/board/message.php?id=2210 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: 3.1.12 fix
Hello, Thomas! Sorry for so late reply, we are working on 3.2.x branch now, so I just had no time to check your function. It works fine and I replaced the old one by your version. It will appear in 3.1.13 release which will be available this week. At least we hope so. Thatk you very much for contribution! Thomas Olsson wrote: Hi, Thanks for a very nice program :-) I've got a small contribution, since I found a bug in mnogosearch 3.1.12 and fixed it. It is the hilightcpy() function in search.c. This function chops off the last char if the input string ends in a single word character. E.g. a title string called RFC 7 will be output as RFC . I must admit I couldn't be bothered to work out exactly why it didn't work, so I just wrote a replacement. You can decide yourself if you want to fix the original bug, or use this replacement. I've been using it for some time now, and it does seem to work. I've tried to keep the style from the code, though it is pretty far from my usual formatting :-) static char *hilightcpy(int LCharset, char *dst, char *src, char *w_list, char *start, char *stop) { char *t = dst, *s = src, *word = src; char real_word[64]; if (*s) { do { if (!UdmWordChar(*s, LCharset)) { if (word s) { char save = *s; *s = 0; sprintf(real_word, %.61s , word); UdmTolower(real_word, LCharset); if (strstr(w_list, real_word)) { sprintf(t, %s%s%s, start, word, stop); }else{ strcpy(t, word); } t += strlen(t); *t++ = *s++ = save; word = s; }else{ *t++ = *s++; word++; } }else{ s++; } } while (s[-1]); } *t = 0; return dst; } Regards, Thomas -- Thomas Olsson http://www.armware.dk/ ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: htdb and his first entry
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: You are right! Thanks! hi, thank you very much, the patch does the work! but for the distributed version should look as follows (see line number): --- mnogosearch-3.1.12/src/sql.c.SQLTue May 15 11:45:46 2001 +++ mnogosearch-3.1.12/src/sql.cTue May 15 11:47:45 2001 @@ -4406,7 +4406,7 @@ #ifdef HAVE_MYSQL MYSQL_ROW row; row=mysql_fetch_row(db-gt;res); - sprintf(s,quot;lt;a href=\quot;%s\quot;gt;%slt;/agt;lt;brgt;\ nquot;,*row,*row); + sprintf(s,quot;lt;a href=\quot;%s\quot;gt;lt;/agt;lt;brgt;\ nquot;,*row); s=UDM_STREND(s); #else sprintf(UDM_STREND(s),quot;lt;a href=\quot;%s\ quot;gt;%slt;/agt;lt;brgt;\nquot;, spasiba! manu :-) Reply: http://search.mnogo.ru/board/message.php?id=2213 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: htdb and reindex
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: To add new entries you have to reindex a page which is generated by HTDBList query using -am arguments. Then run indexer in usual manner w/o arguments. Also take a look into MySQL queries log, it may help to check what happens. hi, i am indexing with mnogosearch ver 3.1.12 and method htdb (mysql). every clean index (after a indexer -C) is done really fine, but if i want to reindex (indexer -a) or add the new entries in the db (indexer w/o args), the indexer reads the urls but nothing is indexed, so i have to delete and index the whole db again, this takes a long time. any solution? thanks, manu :-) Reply: http://search.mnogo.ru/board/message.php?id=2220 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Windows character sets
Not all of charsets can be converted to ascii or latin1 charaset. windows-1251 can't be converted to latin1/ascii, at least it's cyrillic part. Don't worry about windows-1252, it's letter compatible with latin1. windows-1250 can be converted to latin1/ascii without having to loose major information, but 3.1.x branch has not so powerfull charset convertion code. The answer is NO, you can't do translation in indexer. You may try to do it in PHP. Briggs, Gary wrote: I'm outputting XML from my search engine for use in other people's websites, and I'm having a small problem. Some of the sites I'm indexing are made in word [I've no control over this], and outputted as html. And they're in strange character sets like windows-125{0,1,2}. When I output the XML, it contains things like 92s, which are the word equivalent of a normal '. Is there any way I can do translations on this, either in the indexer, or in the php? [I'm using the php front end, and crc-multi DB schema]. Basically, I'd like to see nothing more than US-ASCII or friends; much easier to use, and won't break perl scripts on unix boxes. Anybody? Ta, Gary (-; PS I never got any response to my RFC on my code for putting stuff INTO the database from XML. Does anyone have anythign to add to it? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: index by time
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I want to index a block of url withing mnogo.url, so I set last_index_time=unix_timestamp()-3600, last_mod_time=unix_timestamp()-1800, next_index_time=unix_timestamp() and all status=209 but when I run indexer(usu. -m -s 209), it doesnt seem to care about what the dates are, does anyone have the same situation where indexer seems to ignore the time values? What do you mean it does not care about dates? If you want to exclude some URLs from indexing using such way, you have to set their next_index_time to something in the future, for example a month: next_index_time=unix_timestamp()+30*24*60*60 Reply: http://search.mnogo.ru/board/message.php?id=2227 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: why indexer indexes url with ?
You have Allow * before those disallow commands. Note that Allow/Disallow commands are checked in the order of their appearence, in the indexer.conf so indexer finds Allow * before others. You have to move this command after all Disallow commands. FL wrote: Here it is, I have cut the URLs (about 1000). Thanks. François At 11:10 15/05/01 +0500, Alexander Barkov wrote: Please send your indexer.conf FL wrote: Hi! I don't want to index url with '?'. This is my indexer file (no modifications from the default) : # Exclude cgi-bin and non-parsed-headers using string match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in string match, so we have to use regex match here: Disallow Regex \? But I can see URL indexed like : http://www.premier-ministre.gouv.fr/spihtm/sig_ie4/theme/r_t.cfm?t1=Culture; t2=Histoire What's wrong ? François ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] Name: sample.zip sample.zipType: Zip Compressed Data (application/x-zip-compressed) Encoding: base64 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: AliasProg
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Thanks for reporting! It's fixed in 3.1.13 sources, it will be available today. The very nature of any URL that is passed in AliasProg's $1 is wrong and messed up. That's why it's being Aliased. My issue is with URL's that contain a single quote (') or adouble quote (quot;). When every I try to pass it a script to be processed (via $1), the shell interprets the single (or double) and waits for another ending quote. The problem would also be with astericks (*), because essentially the raw URL is being processed by the shell. Any occurance of this type of URL will crash quot;indexerquot;. Either the URL is passed in on stdin to a shell script (specified in AliasProg) or the URL needs to be escaped. Otherwise this problem will persist. It's an easy fix too. Escaping the URL would probably make the most sense, and wouldn't change the syntax of AliasProg. Maybe adding quot;AliasProgStdinquot; directive would do it too. -justin Reply: http://search.mnogo.ru/board/message.php?id=2230 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Webboard: Indexing product categories which are dynamically updated from base.
ðÒÉ×ÅÔ! ëÁÖÉÓØ ÔÕÔ ËÁË-ÔÏ ÍÏÖÎÏ ÉÚ×ÅÒÎÕÔØÓÑ. ðÏËÁ ËÁË ÉÍÅÎÎÏ Ñ ÎÅ ÐÏÎÑÌ, ÎÏ ×ÒÏÄÅ ÍÏÖÎÏ. á ÎÁ ÆÁÊÌÏ×ÏÊ ÓÉÓÔÅÍÅ ÞÔÏ ÌÅÖÉÔ É × ËÁËÏÍ ×ÉÄÅ? ðÏ ×Ï×ÏÄÕ SELECT, ÍÏÖÎÏ ÎÅ ×ÅÓØ search.htm ÇÅÎÅÒÑÔØ, Á ÔÏÌØËÏ ÞÁÓÔØ. á ÅÅ ×ËÌÀÞÁÔØ ÞÅÒÅÚ $if(/file/to/select.htm) á ÅÝÅ ÍÏÖÎÏ ×ËÌÀÞÉÔØ ÅÅ ËÁË ×ÎÅÛÎÉÊ URL: $iurl(http://www.blalba.ru/select.sgi) Author: Alexander Email: [EMAIL PROTECTED] Message: íÎÅ ÎÅÏÂÈÏÄÉÍÏ ÐÒÉËÒÕÔÉÔØ ËÁËÕÀ-ÎÉÂÕÄØ ÐÏÉÓËÏ×ÕÀ ÓÉÓÔÅÍÕ Ë ÜÌÅËÔÒÏÎÎÏÍÕ ÍÁÇÁÚÉÎÕ (ÐÏÉÓË ÐÏ ÔÏ×ÁÒÁÍ). ñ ÏÓÔÁÎÏ×ÉÌ Ó×ÏÊ ×ÙÂÏÒ ÎÁ íÎÏÇÏÓÅÒÞÅ. ïó: Linux, âä: Oracle. ñ ×ÓÅ ÕÓÔÁÎÏ×ÉÌ, ×ÒÏÄÅ ÂÙ ×ÓÅ ÒÁÂÏÔÁÅÔ. îÏ ÅÓÔØ ÎÅËÏÔÏÒÙÅ ÐÒÏÂÌÅÍÙ. 1.éÎÄÅËÓÉÒÏ×ÁÔØ ÔÏ×ÁÒÙ ÎÁÄÏ ÐÏ ÆÁÊÌÏ×ÏÊ ÓÉÓÔÅÍÅ, Á ÎÅ ÐÏ www. 2.îÅÏÂÈÏÄÉÍÏ ÓÏÚÄÁÔØ ÐÏÉÓË ÐÏ ÒÁÚÄÅÌÁÍ (Ó ÓÏÏÔ×ÅÔÓÔ×ÕÀÝÉÍ select'ÏÍ), ËÏÔÏÒÙÅ _ÄÉÎÁÍÉÞÅÓËÉ_ ÏÂÎÏ×ÌÑÀÔÓÑ ÉÚ ÂÁÚÙ (ÒÁÚÄÅÌ ÐÏÉÓËÁ == ÒÁÚÄÅÌ ËÁÔÁÌÏÇÁ ÔÏ×ÁÒÏ×). ñ ÉÓÐÏÌØÚÏ×ÁÌ ÄÌÑ ÜÔÏÇÏ ÐÏÌÅ category. óÏ×ÍÅÓÔÉ× 1. É 2., Ñ ÎÁÐÉÓÁÌ ÓËÒÉÐÔ, ËÏÔÏÒÙÊ ÐÏ ÏÞÅÒÅÄÉ ÄÌÑ ËÁÖÄÏÊ ËÁÔÅÇÏÒÉÉ: - ÇÅÎÅÒÉÔ ÆÁÊÌÏ×ÏÅ ÄÅÒÅ×Ï (ÆÁÊÌÙ ×ÉÄÁ ../perl-cgi/product_card?prod_id=nnn ), ÄÏÓÔÁ×ÁÑ ÉÚ ÂÁÚÙ ÎÅÏÂÈÏÄÉÍÕÀ ÉÎÆÏÒÍÁÃÉÀ Ï ÔÏ×ÁÒÁÈ, ÐÏÄÌÅÖÁÝÉÈ ÉÎÄÅËÓÁÃÉÉ (ÏÐÉÓÁÎÉÅ, ÃÅÎÁ É Ô.Ä.). ËÁÖÄÙÊ ÆÁÊÌ product_card?prod_id=nnn ÓÏÄÅÒÖÉÔ HTML-ÄÏËÕÍÅÎÔ Ó title, ËÌÀÞÅ×ÙÍÉ ÓÌÏ×ÁÍÉ, ÏÐÉÓÁÎÉÅÍ ÔÏ×ÁÒÁ É Ô.Ð. îÁ www ÌÅÖÁÔ ËÁÒÔÏÞËÉ ÔÏ×ÁÒÏ× Ó ÓÏÏÔ×ÅÔÓÔ×ÕÀÝÉÍÉ ÐÕÔÑÍÉ ( http://www.blablabla.ru/perl-cgi/product_card?prod_id=nnn); - ÇÅÎÅÒÉÔ indexer.conf, ×ÓÔÁ×ÌÑÑ ÔÕÄÁ ÔÅËÕÝÕÀ ËÁÔÅÇÏÒÉÀ × ÐÏÌÅ Category; - ÚÁÐÕÓËÁÅÔ ÉÎÄÅËÓÁÔÏÒ. üÔÏ ×ÓÅ ÐÏ×ÔÏÒÑÅÔÓÑ ÄÌÑ ËÁÖÄÏÊ ËÁÔÅÇÏÒÉÉ ÔÏ×ÁÒÏ×. é ÐÏÄ ËÏÎÅà ÒÁÂÏÔÙ, ÓËÒÉÐÔ ÇÅÎÅÒÉÔ search.htm, ×ÓÔÁ×ÌÑÑ ÔÕÄÁ select-tag ÓÏ ×ÓÅÍÉ ËÁÔÅÇÏÒÉÑÍÉ × ËÁÞÅÓÔ×Å option. üÔÏ ÐÅÒ×ÏÅ, ÞÔÏ ÐÒÉÛÌÏ ÍÎÅ × ÇÏÌÏ×Õ. îÏ ÔÕÔ ÐÒÏÂÌÅÍÁ Ó ÕÄÁÌÅÎÉÅÍ ÉÚ ÂÁÚÙ ÔÏ×ÁÒÏ×: ÉÎÄÅËÓÁÔÏÒ ÌÉÂÏ ÏÓÔÁ×ÌÑÅÔ ×ÓÅ ÔÏ×ÁÒÙ, ÌÉÂÏ ÕÄÁÌÑÅÔ ÔÏ×ÁÒÙ, ËÏÔÏÒÙÅ ÂÙÌÉ ÐÒÏÉÎÄÅËÓÉÒÏ×ÁÎÙ ÄÌÑ ÐÒÅÄÙÄÕÝÉÈ ËÁÔÅÇÏÒÉÊ. é ×ÏÏÂÝÅ ËÁË-ÔÏ ÎÅÍÎÏÇÏ ËÒÉ×Ï: ÓÏÚÄÁ×ÁÔØ ËÁÖÄÙÊ ÒÁÚ indexer.conf, ÚÁÐÕÓËÁÔØ ÍÎÏÇÏ ÒÁÚ ÉÎÄÅËÓÁÔÏÒ É Ô.Ä... :( ëÁË ÍÎÅ ÌÕÞÛÅ ÐÏÓÔÕÐÉÔØ? úÁÒÁÎÅÅ ÓÐÁÓÉÂÏ. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Problems deleting urls with 3.1.10
[EMAIL PROTECTED] wrote: I saw this question posted back in February, but I didn't see that an answer had been given. I'm using 3.1.10 with cachemode, with about 6.4 million urls indexed. I'm trying to delete a set of urls that match a certain pattern but when I attempt to do so I get the following: indexer -C -u http://some.url.com%;; You are going to delete database 'search' content Are you sure?(YES/no)YES Indexer[3617]: Error: 'Can't write to logd: Socket operation on non-socket' Deleting...Done Cachelogd is indeed running and indexing works just fine... Any suggestions? Is indexing is running simultaneously? What happens after cachelogd restarting? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: index just first page
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Try to clear database using ./indexer -Cw then start it with high verbose level: ./indexer -v6 It will display information of every found link among other useful information. I execute ./indexer -a ./indexer.conf and i've result : indexer from mnogosearch-3.1.12/MySQL started with './indexer.conf' [1] Done (0 seconds) I execute ./indexer -S ./indexer.conf i've an empty table with Total 0 My indexer.conf is : DBaddr mysql://... Robots no Follow yes Allow * Server http://apache.cadrus.fr/najean/ (dams.com is local domain name in Intranet.) I don't understand why i index just the first page (index.htm) and not the other pages. Sorry for by bad English, i'm French. Thanks to answer me. Reply: http://search.mnogo.ru/board/message.php?id=2233 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Sort by date?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Unfotunately, there is no sorting by date and no possibility to display percentage. Hi, We are using mnoGoSearch to search in a list of newsitems. Is it possible to sort the result by the date a newsitem was posted instead of sorting by relevance? And is it possible to show the relevance behind the page title as a percentage instead of a small number? Most people don't understand why there is a (1) or (3) behind a result. Thank you for your help. Ilan Shemes Reply: http://search.mnogo.ru/board/message.php?id=2234 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Announce 3.1.13
Hello! mnogosearch-3.1.13 is available from our site http://www.mnogosearch.org Regards! P.S. ChangeLog: * Added installation script install.pl to simplify installation process * HTTPS support has been added. Thanks Dubun Guillaume [EMAIL PROTECTED] * search.cgi now accepts tmplt parameter. It's can be used to specify an alternative search template to be opened. * Content-Language: HTTP header support for detecting document language. * Using language of normalized words for document language detecting. * Now all programs can accept alternative /var working directory. This allows to put built-in and cache-mode databases in non-default directories without having to recompile the package. indexer, spelld, search.cgi, search.cgi take the path from VarDir command in respectively indexer.conf, spelld.conf and search.htm. splitter and cachelogd take working directory value from -w command line argument. * A problem with quotes in AliasProg has been fixed. Thanks Justin [EMAIL PROTECTED] for reporting. * Fixed that sgml entities ( like amp; quot; auml; ) were not unescaped in META KEYWORDS and DESCRIPTION. Thanks Danil Lavrentyuk [EMAIL PROTECTED] for reporting. * A bug that basic authorization where not work when ServerTable is used has been fixed. * A bug that META NAME=Refresh Content=... where not processed properly in some cases has been fixed. Thanks Ivan Mikhnevich [EMAIL PROTECTED] * A bug that text hilighting were not work properly in some cases has been fixed. Thanks Thomas Olsson [EMAIL PROTECTED]. * A bug in spelld hanging has been fixed. * Some bugs and possible exploits in search.cgi have been fixed. * Fixed a bug that socket was not closed when connect() failed. Thanks Ivan Mikhnevich [EMAIL PROTECTED]. * Trap while fetching too big newsgroup lists fixed * Fixed a bug that Host: HTTP header were composed incorrectly when port is not 80. * Minor bug in built-in database has been fixed. * A bug that a line in indexer.conf, which contains only spaces, caused an Error in config file has been fixed. * A bug that indexer crashed when URL command argument has no correspondent Server/Realm command has been fixed. * ISO 10646 characters entity reference skeeping bug fixed. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: index just first page / Allow NoCase
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Please send me your indexer.conf, I'll try it on my box. Thanks to answered me, I see link, but i don't understand what is quot;Allow NoCasequot;. Maybe the answer of my problem ? Anyone could help me please ? [1] http://apache.cadrus.fr/najean/ [1] Server 'http://apache.cadrus.fr/najean/' [1] Allow NoCase * [1] HTTP/1.1 200 OK [1] Date: Fri, 18 May 2001 08:30:24 GMT [1] Server: Apache/1.3.17 (Unix) PHP/4.0.4pl1 [1] Last-Modified: Thu, 17 May 2001 09:37:41 GMT [1] ETag: quot;3084-434-3b039be5quot; [1] Accept-Ranges: bytes [1] Content-Length: 1076 [1] Connection: close [1] text/html [1] HTTP/1.1 200 OK text/html 1076 [1] quot;http://apache.cadrus.fr/najean/docpgsql/adminquot;: Allow NoCase * [1] quot;http://apache.cadrus.fr/najean/docpgsql/programmerquot;: Allow NoCase * [1] quot;docpgsql/postgresquot;: Allow NoCase * [1] quot;docpgsql/tutorialquot;: Allow NoCase * [1] quot;docpgsql/userquot;: Allow NoCase * [1] quot;mail/mail.htmquot;: Allow NoCase * [1] Done (0 seconds) Reply: http://search.mnogo.ru/board/message.php?id=2237 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: index by time
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Use Server command argument with trailing slash, i.e.: Server http://www.altec.com/ instead of Server http://www.altec.com works FINE, this is what I was having problems with in ACTUALLY: I inserted some debuging commands into search.c and got some results after running indexer: disallow *.com www.altec.com :: this url disallowed by default, deleting the database reads theis as url=quot;http://www.altec.comquot; but when it reads as url=quot;http://www.altec.com/quot; it works fine. can this be fixed so that indexer doesnt need a trailing backslash when indexing urls that have a explicit disallow for urls that are not files? Reply: http://search.mnogo.ru/board/message.php?id=2242 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Indexer 1146 table not exist
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: You have to create tables from scripts which can be found in /create directory of sources. Take a look into INSTALL file. When I try to run the indexer to catolog my site I get the following: Indexer[429]: indexer from mnogosearch-3.1.13/MySql start with 'usr/local/mnogosearch/etc/indexer.conf' Indexer[429]: [1]Error: '#1146: Table 'mnoSearch.url' doesn't exist' My indexer.conf file is: # This is a minimal sample indexer config file DBAddr mysql://nogo:nogo@localhost/mnoGoSearch/ #DBAddr mysql://root:@localhost/mnoGoSearch/ DBMode multi DeletenoServerno Server http://smurf.lollydom.ass/ Server http://localhost Server http://smurf.lollydom.ass/~prwb/ file:/home/prwb/public_html/ # Allow some known extensions and directory index Allow *.html *.htm *.shtml *.txt */ # Disallow everything else Disallow * What is this table that the error refers to, it doesn't appear to be valid and its not in any sql scripts for mysql in the create dir. Thanks for your help I've been pulling my hair out trying to guess this one? PB-) Reply: http://search.mnogo.ru/board/message.php?id=2243 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Date/Time-Format
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hi, who I can manipulate the output date/time-format? I will use for example german format (quot;25.05.2000quot;). Thanx! Find this code in search.c near line 630 : case 'M': UdmTime_t2HttpStr(Doc?Doc-last_mod_time:0, buf); sprintf(UDM_STREND(Target),%s,buf);break; Doc-last_mod_time is a unix time stamp. You may use it as a strftime() function together with format you want. Check strftime man page. Reply: http://search.mnogo.ru/board/message.php?id=2244 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Webboard: Compil warning with GCC under OSX 10.0.0
Is core file created? If so, try to inspect it using gdb. Check doc/bugs.txt file for an explanation of how it should be done. richard riegert wrote: At 2:13 +0400 18/05/2001, Maxime Zakharov wrote: That's worst effect (see the end of my log); spelld.o won't compile :( Try current version from CVS. That's compile now without any warning (but the all-* - that's would be normal?) You are really efficient. Thanks a lot. Unfortunately, the parsings of about more 50 Server always end with a segmentation fault (on Rhapsody and Darwin). The search seems working whithout problem, but I'm worry on errors. I can't use the 'period' config with this problem. It is not very annoying since the crontab, but.. if that could be corrected. Anyhow I'm happy with MnogoSearch, it is a great product. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Russian letter 'io' ('')
Danil Lavrentyuk wrote: Hello! How does mnoGoSearch counts russian letter 'io' ('£')? Does it counts this letter equal to russian 'ie' ('Å')? Or not? Have I to use this letter in ispell dictionaries or not to use? It's not equal to ie, it's considered as a separate letter. In ispell it's considered as separate letter two. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: How do I exclude subdirs in indexer.conf?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: To index your site just use the only first command. Hi, I'd like to index http://www.mydomain.com/ with all subdirs and pages except every subdirectory like: http://www.mydomain.com/0/ or http://www.mydomain.com/123/ etc. (only numbers). In my indexer.conf I said: Server http://www.mydomain.com/ Realm Regex NoMatch ^http://www\.mydomain\.com/[0-9]*/ but it won't work at all :(. With the first line only everything is fine, but when I add the NoMatch line, also other pages are indexed which do not start with www.mydomain.com. How do I set up my indexer.conf correctly? tnx! cu Markus Reply: http://search.mnogo.ru/board/message.php?id=2256 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Doc Relevance
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: DR means number of unique words found in document, It is always 1 if you search for the only one word. However most relevant document is always dislayed first. In 3.2 we want to add a possibility to display something like percentage. While performing a search, I realised that the document relevancy $DR is always get 1. This is rather weird, cos' I always thought that the document relevancy value should be derived from the search text. How it be possible that a document which contain more occurences of the search text have the same document relevancy value as documents with lesser occurences? How should I configure indexer.conf during indexing so that the document relevancy can be taken into account when a search is issued? Any help on the matter is much appreciated. -- Jenson Reply: http://search.mnogo.ru/board/message.php?id=2258 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: compiling 3.1.13 failed on spelld.c
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Replace socklen_t with int Thanks for reporting. FreeBSD 3.3 GNU make 3.79.1 mnoGoSearch 3.1.13 ./configure --with-mysql gmake ... ... spelld.c: In function `main': spelld.c:235: `socklen_t' undeclared (first use this function) spelld.c:235: (Each undeclared identifier is reported only once spelld.c:235: for each function it appears in.) spelld.c:235: parse error before `addrlen' spelld.c:241: `addrlen' undeclared (first use this function) gmake[1]: *** [spelld.o] Error 1 gmake[1]: Leaving directory `/usr/local/src/mnogosearch-3.1.13/src' gmake: *** [all-recursive] Error 1 thanks in advance. Reply: http://search.mnogo.ru/board/message.php?id=2259 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Cookies Support
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hello! It's still on TODO. Probably one of the possible solutions is to hack the function UdmAddURL() and cut SESS=XXX substrings before inserting into database. I posted there are serval time a question about the support by mnogosearch of the cookies. Someone answered that it was in the TODO list. I just would like to know if the coders have an idea of when ? Because I'm really interest ;p in fact it's because I'm using session (PHP) on my website and if the broswer (like the parser) doesn't support cookies, sessions were forward in the url (like SESS=ksjfhsjkdf45zefD), well the problem is when I try to delete only the session in my database, mysql answer me that the url is already in. After some research I discovered that (it goes without saying) mnogosearsh consider different session as different web page ... I hope that someone have understand me ;p And thank you if someone have a solution Cheers, Reply: http://search.mnogo.ru/board/message.php?id=2262 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: How do I exclude subdirs in indexer.conf?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Sorry for my previous post. I didn't exactly understand what do you want. Try something like this: Server http://www.mydomain.com/ Disallow regex ^http://www\.mydomain\.com/[0-9]*/ Write this Disallow command BEFORE any Allow/Diallow commands. I'd like to index http://www.mydomain.com/ with all subdirs and pages except every subdirectory like: http://www.mydomain.com/0/ or http://www.mydomain.com/123/ etc. (only numbers). In my indexer.conf I said: Server http://www.mydomain.com/ Realm Regex NoMatch ^http://www\.mydomain\.com/[0-9]*/ but it won't work at all :(. With the first line only everything is fine, but when I add the NoMatch line, also other pages are indexed which do not start with www.mydomain.com. How do I set up my indexer.conf correctly? tnx! cu Markus Reply: http://search.mnogo.ru/board/message.php?id=2263 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Bug report
Hello! There is a name conflict between libudmsearch and php sources. One of users found a workaround for this. Take a look here: http://www.php.net/manual/en/ref.mnogo.php We'll fix this name conflict in 3.1.14 Thanks for reporting! Miguel Feitosa wrote: UdmSearch version: 3.1.13 Platform: linux pentium III Asus dual board only one processor OS:rh7.1 2.4 kernel Database: mysql-3.23.36 Statistics: php-4.0.5 Hello Developers! I have been using udmsearch for more than eight months and it certainly RULES in its field. I havent been able to compile on a new setup into php because I get the following errors while compiling. I am going to try to get mngosearch 3.10.10. Thanks, Sincerely, Miguel Feitosa Brasil - 3499 2143 /bin/sh /usr/local/php-4.0.5/libtool --silent --mode=link gcc -I. -I/usr/local/php-4.0.5/ -I/usr/local/php-4.0.5/main -I/usr/local/php-4.0.5 -I/usr/include/apache -I/usr/local/php-4.0.5/Zend -I/usr//include -I/usr/include/freetype -I/usr/local/include -I/usr/local/mnogosearch/include -I/usr//include/mysql -I/usr/local/include/ucd-snmp -I/usr/local/php-4.0.5/ext/xml/expat/xmltok -I/usr/local/php-4.0.5/ext/xml/expat/xmlparse -I/usr/local/php-4.0.5/TSRM -I/usr/include/apache -I/usr/local/php-4.0.5/Zend -I/usr//include -I/usr/include/freetype -I/usr/local/include -I/usr/local/mnogosearch/include -I/usr//include/mysql -I/usr/local/include/ucd-snmp -DLINUX=22 -DMOD_SSL=208101 -DEAPI -DEAPI_MM -DUSE_EXPAT -DSUPPORT_UTF8 -DXML_BYTE_ORDER=12 -g -O2 -o libphp4.la -rpath /usr/local/php-4.0.5/libs -avoid-version -L/usr//lib -L/usr/local/lib -L/usr//lib/mysql -L/usr/local/mnogosearch/lib -R /usr//lib -R /usr/local/lib -R /usr//lib/mysql -R /usr/local/mnogosearch/lib stub.lo Zend/li! bZ! end.la sapi/apache/libsapi.la main/libmain.la regex/libregex.la ext/bcmath/libbcmath.la ext/bz2/libbz2.la ext/calendar/libcalendar.la ext/dbase/libdbase.la ext/ftp/libftp.la ext/gd/libgd.la ext/imap/libimap.la ext/ldap/libldap.la ext/mnogosearch/libmnogosearch.la ext/mysql/libmysql.la ext/openssl/libopenssl.la ext/pcre/libpcre.la ext/posix/libposix.la ext/recode/librecode.la ext/session/libsession.la ext/snmp/libsnmp.la ext/sockets/libsockets.la ext/standard/libstandard.la ext/xml/libxml.la ext/zlib/libzlib.la TSRM/libtsrm.la -lpam -lrecode -lc-client -ldl -lz -lssl -lcrypto -lsnmp -lmysqlclient -ludmsearch -lz -lm -lmysqlclient -lldap -llber -lttf -lz -lpng -lgd -lbz2 -lssl -lcrypto -lresolv -lm -ldl -lcrypt -lnsl -lresolv /usr/local/mnogosearch/lib/libudmsearch.a(ftp.o): In function `ftp_close\': /usr/local/mnogosearch-3.1.13/src/ftp.c:475: multiple definition of `ftp_close\' ext/ftp/.libs/libftp.al(ftp.lo):/usr/local/php-4.0.5/ext/ftp/ftp.c:179: first defined here /usr/bin/ld: Warning: size of symbol `ftp_close\' changed from 69 to 76 in ftp.o /usr/local/mnogosearch/lib/libudmsearch.a(ftp.o): In function `ftp_login\': /usr/local/mnogosearch-3.1.13/src/ftp.c:247: multiple definition of `ftp_login\' ext/ftp/.libs/libftp.al(ftp.lo):/usr/local/php-4.0.5/ext/ftp/ftp.c:223: first defined here /usr/bin/ld: Warning: size of symbol `ftp_login\' changed from 155 to 298 in ftp.o /usr/local/mnogosearch/lib/libudmsearch.a(ftp.o): In function `ftp_list\': /usr/local/mnogosearch-3.1.13/src/ftp.c:369: multiple definition of `ftp_list\' ext/ftp/.libs/libftp.al(ftp.lo):/usr/local/php-4.0.5/ext/ftp/ftp.c:419: first defined here /usr/bin/ld: Warning: size of symbol `ftp_list\' changed from 41 to 204 in ftp.o /usr/local/mnogosearch/lib/libudmsearch.a(ftp.o): In function `ftp_get\': /usr/local/mnogosearch-3.1.13/src/ftp.c:392: multiple definition of `ftp_get\' ext/ftp/.libs/libftp.al(ftp.lo):/usr/local/php-4.0.5/ext/ftp/ftp.c:499: first defined here /usr/bin/ld: Warning: size of symbol `ftp_get\' changed from 469 to 138 in ftp.o /usr/local/mnogosearch/lib/libudmsearch.a(ftp.o): In function `ftp_mdtm\': /usr/local/mnogosearch-3.1.13/src/ftp.c:411: multiple definition of `ftp_mdtm\' ext/ftp/.libs/libftp.al(ftp.lo):/usr/local/php-4.0.5/ext/ftp/ftp.c:639: first defined here /usr/bin/ld: Warning: size of symbol `ftp_mdtm\' changed from 263 to 160 in ftp.o collect2: ld returned 1 exit status ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: mysql compiling
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Try --with-mysql=/usr/ ok, newbie to working with mysql here. I'm trying to ./configure udmsearch-3.0.23, mysql distro 3.23.36 on a redhat 7.1 i386 system. During the ./configure --with-mysql it fails when it tries to find the mysql.h file, saying quot;Invalid MySql Directory - unable to find mysql.hquot;, I've looked high and low and haven't found this file. Doing a quot;which mysqlquot; gives my /usr/bin/mysql. I tried ./configure --with-mysql-/usr/bin/ and it didn't find what it wanted. So the question is, what amI missing? Should he MySql server be stopped during the configure and install? Is there a mysql file that has dissapeared from the computer, am I pointing to the wrong place, am I entering the command wrong? Any suggestions/advice would be much appreaciated. -jon- Reply: http://search.mnogo.ru/board/message.php?id=2266 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: UDMsearch 3.1.13 cachelogd problem: proper action to correct
Thanks for reporting! We've fixed it in 3.1.14 sources. [EMAIL PROTECTED] wrote: Gents, When compiling udmsearch, a typo makes cachelogd fails. When making, gcc gets as argument: -DUDM_VAR_DIR=\udmsearch_path/var\. Best action to solve definitely properly this bug is to do: Add in src/cachelogd.c right after line 384, just before sprintf(pidname,%s%s,vardir,cachelogd.pid) the following: if (vardir[strlen(vardir)-1]!='/') { strcat(vardir,/); } Consider closely the -w arg as a temporary workaround if you dont dare changing the source code. @+ ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: recent optimizations?
Hi! Tonu Samuel improved MySQL related code, so indexing with MySQL now run faster. If I didn't forget something, it is the only one major improvement. Damon Tkoch wrote: Hello, have there been any major optimizations to mnogosearch with mysql between 3.1.11 and 3.1.14? I just upgraded mnogosearch on one of my machines and it seems to run circles around the older, non-upgraded indexers. (whatever you guys did, it rocks!) ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Problem indexing
Dovli wrote: Hello! I have another question. If I use the directive Server the url-s are indexed but If I I want to index all the url-s in a given domain using Realm I get the error no 'Server' command for url...deleted for each and every url the indexer is processing. Thank you very much for your help. Probably you wrote incorrect Realm argument. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: indexer loops !
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: What is the Period command argument in your indexer.conf? The problem now is different - the indexer doesn't want to stop !!! Does it not tag URLs it has visited already and skips them ? It looks like it is visiting the same ones again and again and again. I'll give SWISH a try :-) marcio Reply: http://search.mnogo.ru/board/message.php?id=2300 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: search.cgi//blank html page//getting closer/
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: If you include search.cgi using SSI, you have to set up an environment variable UDM_TEMPLATE with a path to template file. You can do it using SetEnv and PassEnv apache's httpd.conf directives. Well decided to try accessing search.cgi using ssi,at least I get the cant open template file now,instead of a blank page when accessing thru browser,still finds template in telnet and prints the html form,I have tried permission settings--moving to diff dir--do that and telnet says no good,any ideas at all !!! Thanks Reply: http://search.mnogo.ru/board/message.php?id=2303 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Bug report
Kelvin Chen wrote: UdmSearch version: 3.1.14 Platform: PII XEON OS:Redhat 6.2 Database: Oracle 8.1.16 Statistics:Unknown 4.0.1 I don\'t know if this is a problem. First I found it is not so easy for meto install mnogosearch. After I have installed the whole package and indexed the site. I can\'t use search.cgi to search. I guess maybe I can\'t use web to connect to oracle. Since after I use search.cgi in shell, it works correctly. What can you see in browser after pressing Search button? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: *Weight not working?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Sergey, please check this. Probably there is a bug in PHP module... Using 3.1.13 as a backend, I've indexed a bunch of pages. Didn't set any of the *Weight directives in indexer.conf, as I was happy with the defaults. However, using php-4.0.5 w/ 3.1.14 (for the built-in functions) with: udm_set_agent_param($udm,UDM_PARAM_WEIGHT_FACTOR,quot;F8421quot;); causes weird behavior. Namely, the weighting doesn't actually happen; pages that have the query string in the META KEYWORDS, for instance, are frequently listed lower than pages that just have the term in the body (and nowhere else). What gives? If you need more info, I'd be happy to oblige. Peter Reply: http://search.mnogo.ru/board/message.php?id=2331 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: mnogosearch.url table?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I get this error, when I try to index: Indexer[1460]: indexer from mnogosearch-3.1.14/MySQL started with '/usr/local/mnogosearch/etc/indexer.conf' Indexer[1462]: [1] Error: '#1146: Table 'mnogosearch.url' doesn't exist' Any ideas? You didn't create tables structure. Take a look into INSTALL file. Reply: http://search.mnogo.ru/board/message.php?id=2332 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Will Mnogo index multiple websites?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Yes, you can index several web sites. Morphology support can be added using ispell dictionaries. Take a look into ispell.txt which is supplied with mnogosearch sources. ha? And if yes, then will it support morphology of Russian? Reply: http://search.mnogo.ru/board/message.php?id=2334 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: 3.1.14 bad configure...
[EMAIL PROTECTED] wrote: Gents, Problem with configure into 3.1.14. When compiled with openssl, it sets in makefiles (./ ./src) -L openssl_path/include (instead of lib), so linking fails... Hello! Thanks for reporting! This patch for configure.in fixes the bug. 292c292 SSL_LFLAGS=-L$SSL_INCDIR -lcrypto -lssl --- SSL_LFLAGS=-L$SSL_LIBDIR -lcrypto -lssl You may use this patch if you have autoconf installed on your machine. Apply this patch then run: autoconf configure make ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: update
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Can I update from mnogoSearch 3.1.12 to 3.1.14 and do not lose the urls database? (cache storage mode) Will my users search into the last database? Thanks Yes, you can. Reply: http://www.mnogosearch.org/board/message.php?id=2342 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: timeout/doc limit breaks infinite loop but I get duplicates !
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Look, I did not try to reindex anything. I know you claim it does not loop, but in my attempt it did loop. That's what I have been saying for the past messages. There's still no solution from you guys, that's too bad. I can't reproduce this bug. Nevermind, I tried ht://Dig and it worked just fine. As did Greenstone. None of them looped forever like mnoGoSearch. If you are really interested in tracing this bug in mnoGoSearch, post your email address and I can send you my email and/or yahoo messenger ID. Please do it, my email is [EMAIL PROTECTED] Reply: http://www.mnogosearch.org/board/message.php?id=2343 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: bugs
xiao shibin wrote: version: mnogosearch 3.1.14 for windows. when I use multi-threads(6) run mnogosearch spider, mysql locked. I use mysqladmin process, the result is: | 4 | root | localhost | test | Query | 460 | Locked | UPDATE url SET status=504,next_index_time=989192576 WHERE rec_id=12870 | ... It seems that concurent thread made something wrong. Can you see any other processes in mysqladmin process? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: passing variables to search.cgi
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: You have to hack search.c. As best as I can determine, the only variables that I can pass to search.cgi are quot;ulquot;, quot;psquot;, quot;mquot;, quot;qquot;, quot;oquot;, and quot;tquot;. I would like to pass a few extra variables (of my own) to search.cgi. My desire is that I would be able to display the values of these additional user-defined variables on the search results page by referencing them from inside the search.htm template. Does anyone know a way that I can do this? Thanks in advance. Reply: http://www.mnogosearch.org/board/message.php?id=2345 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: What about the size of the index?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Depending on storage mode, phrase support, ispell support. Whithin 30-100 percents of original document size. Hi, I have a question about the size of a normal index. What Diskspace would about one million URL?s require? I am looking forward to your answers... Best regards Tim Block Reply: http://www.mnogosearch.org/board/message.php?id=2346 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: update. problem
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: What database and DBMode do you use? I have update mnogosearch today, but my indexer.conf and search.htm are the same what I had since version 3.1.10 With this new search.cgi, I have always 'Sorry, but search returned no results'. I had to delete this new script and the old 'search.cgi' is running right actually. any idea for this problem? Thanks Reply: http://www.mnogosearch.org/board/message.php?id=2350 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: How can I hide parts of my html-pages from the indexer?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: There are such tags, but their names are !--UdmComment-- and !--/UdmComment-- Hi, I want to exclude certain parts on html pages from being indexed. How can I do that? Is it possible to use tags like lt;!-- mnogosearch-hide --gt; Text to hide lt;!-- mnogosearch-show --gt; ? I don't want to hide them from visitors so I can't use lt;!-- Text to hide --gt; Reply: http://www.mnogosearch.org/board/message.php?id=2351 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: update. problem
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: DBMode cache Probably you used different --enable-fast-cat,--enable-fast-site and --enable-fast-tag configure parameters during compilation 3.1.10 and 3.1.14 Reply: http://www.mnogosearch.org/board/message.php?id=2354 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Webboard: Indexing listprocessor archives?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: How does listprocessor store it's messages? Hi there! I would like to now, if it is possible to index listprocessor archives with mnogosearch. Listprocessor is a relatively old email-list-manager. If mnogosearch cannot do this, how hard would it be to implement it? Cheers, Gerhard Reply: http://www.mnogosearch.org/board/message.php?id=2358 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Cache mode - incorrect results returned
Joe Frost wrote: Hi, I've just set up the following: RedHat 7.1 with Reiserfs Mnogosearch 3.1.14 PostgreSQL 7.0.3 (as shipped with Redhat) I've set the system up to work in cache-mode and the indexer has run okay, I can see that there are entries in the URL table in postgres but when I do a search, words that I can clearly see listed in the keywords field in the URL table return no results. I know that these words are on the test site that I've indexed and the indexing process seemed to complete okay. Some words work okay and others return nothing, is this a known problem with cache-mode? Thanks for your help and best regards, Did you run splitter ? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: Cache mode - incorrect results returned
Joe Frost wrote: Did you run splitter ? Yes, I ran: cachelogd indexer kill -HUP `cat /var/mnogosearch/cachelogd.pid` splitter -p splitter Is this okay? Joe It's OK. Have you any files under usr/local/mnogosearch/var/tree/ directory? How much time did splitter -p and splitter work? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Re: uncomplete indexing
Rokky Irvayandi wrote: On Thu, 7 Jun 2001, Alexander Barkov wrote: Rokky Irvayandi wrote: Hi, I want to index all of mp3 files on a server, but i always found that it fails to index all of them. Some file on the same directory can't be indexed. I did not find something wrong on the file. Can anyone help me??? How do you index them, from local disk or via http? via http Probably you have no links to all those files. You have to create a page with links and index it. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]