UdmSearch: Webboard: What proxy software on a freebsd machine?
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I'm just wondering what proxy software I should use in conjunction with udmSearch (for ftpsearch.conf). Which is known to work best? Thanks Ari It is tested with squid. You may also take 3.1.x version, it has a native FTP support. Reply: http://search.mnogo.ru/board/message.php?id=720 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Server tables
I'm working a few web pages whereby users can put information into a webpage, and their website is added to my search engine. It's all fine, exacpt for the fact that this is open to some forms of abuse. Is it safe to just add a "added by..." column to the server table? Or does this break anything...? Thank-you very much, Gary (-; __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: How many indexer ?
Author: FL Email: [EMAIL PROTECTED] Message: Hi ! Udm is an amazing good tool. I want now to index about 100.000 Urls. The machine is a linux box, with a PII-300 and 256 Mo of RAM. The conection is a 10Mbs ethernet card (Via-Rhine). Question : How many instance of indexer should I launch ? How should I find the best solution ? Thanks for help. Francois Reply: http://search.mnogo.ru/board/message.php?id=721 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Server tables
"Briggs, Gary" wrote: I'm working a few web pages whereby users can put information into a webpage, and their website is added to my search engine. It's all fine, exacpt for the fact that this is open to some forms of abuse. Is it safe to just add a "added by..." column to the server table? Or does this break anything...? It should be safe. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Yet more questions
I really hate just asking questions this much, but it's kinda important; I'm wondering if anyone can help me: I want to be able to use the basic_auth abilities of udmsearch. My problem is that unless a user is able to authenticate themselves to the same site as the basic_auth was necessary for, I don't even want to serve up existential information about any of that site. eg, if one site being indexed has information about making a new top-secret car, I don't want unauthorised users to even know about the existence of that site. On the other hand, If the user IS able to authenticate themselves against that site, I want them to be able to search and have the results from that site. I'm using the most recent versions of the php front end and 3.1.8 and the mySQL backend, with crc-multi as the storage method. The other thing, that I haven't heard from anyone about yet, is another security issue. I want two users in my mySQL database for this search engine. One, used by the indexer, that is allowed to put stuff into the database/change stuff already in it, and one, used by the front end, that's not allowed to touch these databases. I can't work out a good way of doing it, seeing as the php front end needs to be able to create and drop randomly named databases. Anyone? Security is a very important thing to me... Thank-you very much, Gary (-; __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
RE: UdmSearch: Webboard: How many indexer ?
Empirical testing. It kinda depends on a few things, including the storage method and back end you're using. Just try it; between 4 and 8, probably. Gary (-; -Original Message- From: FL [SMTP:[EMAIL PROTECTED]] Sent: Friday, November 10, 2000 10:27 AM To: [EMAIL PROTECTED] Subject: UdmSearch: Webboard: How many indexer ? Author: FL Email: [EMAIL PROTECTED] Message: Hi ! Udm is an amazing good tool. I want now to index about 100.000 Urls. The machine is a linux box, with a PII-300 and 256 Mo of RAM. The conection is a 10Mbs ethernet card (Via-Rhine). Question : How many instance of indexer should I launch ? How should I find the best solution ? Thanks for help. Francois Reply: http://search.mnogo.ru/board/message.php?id=721 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Yet more questions
"Briggs, Gary" wrote: I really hate just asking questions this much, but it's kinda important; I'm wondering if anyone can help me: I want to be able to use the basic_auth abilities of udmsearch. My problem is that unless a user is able to authenticate themselves to the same site as the basic_auth was necessary for, I don't even want to serve up existential information about any of that site. eg, if one site being indexed has information about making a new top-secret car, I don't want unauthorised users to even know about the existence of that site. On the other hand, If the user IS able to authenticate themselves against that site, I want them to be able to search and have the results from that site. May be "tag" feature? __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Failing to index titles (udmsearch 3.0.23)
Author: Chi Email: [EMAIL PROTECTED] Message: Indexer seems to not be able to index title tags coming from a cgi. I've indexed http://www.beautycommercial.com which indexer.conf set to accept cgis, nphs, ?s and it spiders thru the site fine. Checking the mysql database shows that most of the cgis (.mxs) failed to index the title even though a title exists. Reply: http://search.mnogo.ru/board/message.php?id=723 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re[2]: UdmSearch: Webboard: Performance: cache db
Hi! Friday, November 10, 2000, 1:36:44 PM, you wrote: PS As for debugging I went to my unpacking directory (where I keep a virgin PS copy of the software) and made the change you mentioned to sql.c Then I PS recompiled and ran search.cgi by hand... all I got was a copy of my webpage PS outputted... no real debug data It prints debug data on stderr stream. You can see it by redirecting stdout to file or /dev/null: export QUERY_STRING=word1 ./search.cgi /dev/null -- Regards, Sergey aka gluke. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re[4]: UdmSearch: Webboard: Performance: cache db
Hi! Friday, November 10, 2000, 1:50:53 PM, you wrote: PS When I run the export command (which I know works on Linux) I get an error.. PS this machine is a FreeBSD box... can I just use SET instead? export - is the bash and ksh command. For csh ot tcsh (if i am not mistalen) you should use setenv. Please read a man page for your command shell about setting ans exporting environment variables. -- Regards, Sergey aka gluke. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re[2]: UdmSearch: UdmSearch PHP Frontend - is it possible to search by the URL, not by keyword listed in dict?
Hi! Friday, November 10, 2000, 5:32:51 PM, you wrote: AS It is indexing the pages fine, it is a problem with the PHP frontend. Isn't AS allow/disallow for the indexer only? Dict table filled only by indexer. If you said that is not full than you should edit your indexer.conf. AS is it possible to search by the URL, not by keyword listed in "dict"? AS It's AS a file database, and the "doct" table doesn't contain over half of the AS file AS names, so although I know a link to a file does exist the search engine AS doesn't pick it up. Any ideas how to fix this? -- Regards, Sergey aka gluke. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
RE: Re[2]: UdmSearch: Webboard: Performance: cache db
As another piece of info... I tried using gmake instead of make and it made no difference in performance or in the actual debug output Hope these little pieces of information help...:) Paul Stewart Nexicom Inc. http://www.nexicom.net -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Sergey Kartashoff Sent: Friday, November 10, 2000 6:49 AM To: Paul Stewart Cc: [EMAIL PROTECTED] Subject: Re[2]: UdmSearch: Webboard: Performance: cache db Hi! Friday, November 10, 2000, 1:36:44 PM, you wrote: PS As for debugging I went to my unpacking directory (where I keep a virgin PS copy of the software) and made the change you mentioned to sql.c Then I PS recompiled and ran search.cgi by hand... all I got was a copy of my webpage PS outputted... no real debug data It prints debug data on stderr stream. You can see it by redirecting stdout to file or /dev/null: export QUERY_STRING=word1 ./search.cgi /dev/null -- Regards, Sergey aka gluke. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Links problem. identified it, but I dont get the code
Hi. I just identified my problem with symlinks on ftp protocol. This code in ftp.c is most likely causing the problem: case 'l': ch = strstr (fname, " - "); if (!ch) break; ch +=4; if (ch[0] == '.'){ len = len_h+len_p+strlen(ch); udm_snprintf(buf_out+cur_len, len+1, "a href=\"ftp://%s%s%s/\"/a", connp-hostname, path, ch); }else{ len = len_h+strlen(ch); udm_snprintf(buf_out+cur_len, len+1, "a href=\"ftp://%s%s/\"/a", connp-hostname, ch); } ... What is the reason for checking for links to /^\./ files? On our ftp, we use links to files starting with a . to avoid listing the target paths in a normal ls. Here is an example listing: lrwxrwxrwx 1 root 0 51 Oct 13 15:26 solaris - .mirror-sites/ftp.tuwien.ac.at/sun/solaris/packages/ drwxr-xr-x 2 705 1000 4096 Jun 22 14:15 ssl/ drwxr-xr-x 2 705 1000 4096 Nov 1 10:28 suse-linux/ lrwxrwxrwx 1 root 0 40 Oct 13 15:26 tex - .mirror-sites/ftp.gwdg.de/pub/misc2/ctan/ drwxr-xr-x 3 705 1000 4096 Nov 6 23:02 windows/ Does anyone understand why this check is done. And what happens when it finds a link with a target of ^\..* Regards, Mario Lang Technical University Graz mailto:[EMAIL PROTECTED] Department Computing Services http://www.cis.tu-graz.ac.at/zid/lang/ Phone: +43 (0) 316 / 873 - 8508 ICQ: 69372257 UFOs are for real: the Air Force doesn't exist. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[2]: UdmSearch: PgSQL: DELETE INDEX url_url;
On Fri, 10 Nov 2000, Sergey Kartashoff wrote: Hi! Thursday, November 09, 2000, 9:55:11 PM, you wrote: THH On Thu, 9 Nov 2000, Alexander Barkov wrote: Don't forget to recreate this index before starting indexer. As far as url_url index is UNIQUE this does not allow indexer to add the same link several time. If you remove index, the same documents might be added several times. THH can this unique index not be based on the crc32 value instead? that might THH explain why I'm up to 140K docs when I was only expecting 91k :) no, index on crc32 cannot be unique, because of it will block adding site mirrors into url table. okay, can someone make the following changes to the source code, so that the search avoids using the index ... this will at least give a temporary fix until our LIKE optimizer is fixed: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=1971739852 AND url.rec_id=ndict.url_id AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% '); Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: I got a new problem !!
Hello, THE FIRST QUESTION: Now after indexing my document with this command ./indexer -i -u http://132.248.104.3/home/httpd/html/manual/artus1.html I have a problem, the Udm Statistics (using indexer -S) displays the following: StatusExpired Total - 0 2 2 Not indexed yet 404 0 1 Not found - Total 2 3 and the "artus1.html" is the file not found. When I see the record of this file in Postgresql the contents (text) of this file is not recorded, just the URL field. I know that the status 404 means "Not found" (there was references to URLs that do not exist), but I don't understand why this happens. In fact, "artus1.html" is a copy of a file (a HTML system file) that is correctly indexed. I have proved with some other html files but the results are the same. I don't know whether some parameter(s) of the "indexer.conf" must be changed, if so which should I modify or verify ? -- THE SECOND QUESTION: Udmsearch is able to read XML files ? the question arises because we pretend to add some fields that html DOES NOT allow to add, so that we could be able to do a search over some selected fields. For instance, suppose I have two documents in which Mr.White appears. In the former appears as a a member of a Comunity. In the second as an Author of a technical paper. When I search Mr.White as an Author (e.g. the query "Author:Mr.White")I don't want to fetch the other document in which Mr. White appears as a member of a Comunity. This is why we pretend to use a language that allow us to declare specific tags and be able to get documents by means of specific search. To add this fields in the database(s) has no problem. The thing is that the search mechanism must be changed, isn'it ? Probably there is an easier solution to this problem ... can you give me your opinon or some advice ? Thanks a lot for your help. Regards, Ing. Arturo Pulido Centro Universitario de Investigaciones Bibliotecologicas Universidad Nacional Autonoma de Mexico. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: Help with Sound file Search!
Author: Adrift Email: [EMAIL PROTECTED] Message: Hello Everyone! First off, I want to thank the UDMSearch guys for their killer app! I've compiled UDMSearch with the --enable-mp3 option, and everything works. Here's my problem. I want to create a spider that only cataloges sound files (.mp3, .wav, .ra, .ram, .vqf, .au, etc.). I've tried to do this through ALLOW and DISALLOW, but for some reason the spider won't add records for any files! It only indexes the address of the HTML file, but doesn not index any of the files linked to within the HTML file! Can someone please send me a indexer.conf file which will catalog the links contained within a HTML file to sound files? I'm not sure if this is so clear, so I'll include a example. Here is the HTML file, let's call it test.html This is my sound a href="http://www.url1.com/sound1.wav" Download /a I want it set up so that http://www.url1.com/sound1.wav is added to the database, not just that html file/ Thanks Adrift Reply: http://search.mnogo.ru/board/message.php?id=725 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: error 1045
Folks. I hope you can help me out. I am trying to install udmsearch with mysql backend. I believe I have followed the instructions that come with the package as given in INSTALL file. Clearly I am missing something as I get the following error message on doing a search An error occured! #1045: Access denied for user: 'foo@localhost' (Using password: YES) If it is any use, the search.cgi is at: http://www7.imahal.com/cgi-bin/sitesearch/search.cgi Any help would be appreciated. Best regards, -Pradeep Pradeep Misra(T) 937 775 5062 Electrical Engineering Dept(F) 937 775 5009 Wright State Univ[EMAIL PROTECTED] Dayton, OH 45435[EMAIL PROTECTED]
UdmSearch: Get a FREE $1000 Satellite T.V. System
FREE Satellite T.V. System and FREE Installation For a limited time we'll give you this top of the line Digital Satellite System for FREE! We'll even include Free installation. Enjoy over 500 Channels of crystal clear digital picture and cd stereo sound on your FREE Satellite TV System. Why pay over $900 retail for these items, when we're giving you this satellite package for free. Call 888-514-6881 to be Guaranteed Your FREE Satellite Today This Innovative 20" Satellite includes a stereo receiver and an infrared remote. With this FREE offer you will have both Interactive Television Capability and an On Screen Program Guide. This limited time FREE offer is much less than the monthly cost of cable tv. All you have to do is call us to arrange delivery. If you call today, we'll throw in a second receiver for your second T.V. free. Call 888-514-6881 to Begin Surfing through 500 Channels Today! To be removed send email to [EMAIL PROTECTED] __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: PgSQL: DELETE INDEX url_url;
The Hermit Hacker wrote: okay, can someone make the following changes to the source code, so that the search avoids using the index ... this will at least give a temporary fix until our LIKE optimizer is fixed: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=1971739852 AND url.rec_id=ndict.url_id AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% '); I don't think that this is the best solution to fix search for buggy LIKE optimizer then to fix search back for fixed optimizer. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Webboard: External PDF and doc parsers
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Where I get external parsers for pdf files? Try to find on freshmeat.net Reply: http://search.mnogo.ru/board/message.php?id=726 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]