RE: security: private or public pages
Hmmm. Well, yes. But at the same time, it's considerably more secure than anything else you'll be able to do... And while the ACLs _could_ work on just tags or categories, I'd rather define them as groups of URLs as that would have a greater degree of flexibility... I'd ideally like full regexp support... And finally, the database security: yeah, I've done it so that it works. But when they search, the algorithm actually involves CREATEing a table, SELECTing some stuff into it, and the DROPping it again afterwards. Bugger. Gary (-; PS If you're interested, I can send you a sanitised version of the mysql database... -Original Message- From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] Sent: Thursday, August 23, 2001 4:23 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: security: private or public pages Briggs, Gary wrote on Wed Aug 22 10:27:50 2001: Hmmm. This is of significance to me, as I work in a secure environment... I'm not sure how much compute power and secondary storage you have, but in the end, if it's possible, my humble recommendation is to have two databases, and use some webserver/htaccess voodoo to arrange so that outside people see a different search page/database backend to internal people. I do not wish this solution. It increases significantly disk storage, computing time and network ressources. Of course, depending on resources, you might like to implement that at the database layer with some sort of balancing in it, but we're talking hideous implementation details by now... If that's not viable, I'm working on a wholesome way of adding ACLs to the engine, but it's not a simple exercise to work out what the ACLs need to be, especially on the scale of many many thousands of URLs. these ACLs might operate only on the tags or categories. Which reduces the complexity. That would also require an authentication method on the search page, which is intrusive... NTLM is NOT my friend, before you mention it (= Using several symbolic links on search CGI, each having his own template (search.htm), you might use access controls for each symbolic link in a unique .htaccess, without increasing space for this CGI. And along the same lines, are you using mySQL? [I plan to move to SyBase at some point in the future, but until then...] If so, how have you implemented the security of the database user for /searching/ being effectively read-only, and the indexer/admin user being rw? I've basically created a HUGE permission list, but I feel that's probably not the best solution... First of all, our database has an user/password, and is used by the webserver which works on a dedicated account. Thus there is no risk of intrusion. MySQL has a privilege mechanism which makes it to have several user/password accesses for each database, table and column, each user/password having his own privileges. Thus an user/password can update the database whereas another can only consult it (select statement). The problem, you see, is that the read-only user needs to be able to create and drop tables, and have write access to some parts of the qtracking table. [note: only some parts of it, because I have greatly enhanced the qtracking here, and have several daemons running that are finding further information about the users than just what they give] It seems that the privilege mechanism of MySQL should be able to solve this problem. Dominique Gary (-; PS e-mail me personally if you want snippets of source code, etc. In fact, feel free to e-mail me personally about this anyway... -Original Message- From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, August 21, 2001 6:27 PM To: [EMAIL PROTECTED] Subject: security: private or public pages Hello, I wish to do so that, from one side the whole set of documents from our site be normaly referenced and retrieved when looked from one of our team users, and, from the other side that any external web visitor only see the references to public document. As the spider is working from inside the site, access controls are not efficient. More specifically, the research engine may provide references and display pages strictly for internal use. One solution may be to use tags or categories, but, during research stage, it may still be possible to hack the URL in order to suppress the CGI research limiting parameters (t=, cat=). Another way to overpass this problem would be to constrain in the template file part (!--variables ... --) the CGI parameters that we want in order that they cannot be suppressed or overriden during a research. However it would still be possible to short-cut this solution with the tmplt= CGI parameter, but in case where the restriction is added
RE: biggest(maximum) numbers for BodyWeight, TiteWeight,DescWeig ht, etc.
Because that way you can search through just the titles, etc, because of a bitwise OR operation done on the weight. Gary (-; -Original Message- From: Andre Pfeiler [SMTP:[EMAIL PROTECTED]] Sent: Monday, August 20, 2001 10:13 AM To: [EMAIL PROTECTED]; Andre Pfeiler Subject: Re: biggest(maximum) numbers for BodyWeight, TiteWeight, DescWeight, etc. On Monday 20 August 2001 09:33, you wrote: hello, Posted by c4miles 2001-01-15 02:29:48 Regarding BodyWeight, TiteWeight, DescWeight, etc. What is the weight of these weights? Does 1 carry greater importance than 5, for example? Or vice-versa? Posted by gluke 2001-01-15 10:57:08 No. The bigger number gives greater importance. ...what are the biggest(maximum) numbers i should use? ...is it strongly reconommed to use the weight numbers with degrees of 2? greets Andre ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
RE: Only indexing first part of a page (possible bug)
Hi, There's a limit to the amount of content it'll download and indx for each individual page. IIRC, it's in the indexer.conf file, but it may be compiled in... Gary (-; -Original Message- From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] Sent: Thursday, July 26, 2001 3:45 PM To: [EMAIL PROTECTED] Subject: Only indexing first part of a page (possible bug) Is there a word count limit per page implicitly defined in the mnogosearch indexer? In dubugging why some links aren't picked up in our site indexing, I've found that for indexing a single page of about 850 words (only about 9k), only the first quarter (rough estimate) is being stored. I have no idea why. This is evidenced by both words and links from the latter part of this single page missing from the dict and url sql tables. I can almost spot the exact point in the page at which words stop going into the index database by checking in the dict table. Anyone have a clue why this is happening? I'm using mnogosearch on a redhat 7.1 machine with postgresql to index an SSL intranet. Thanks, nick ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Hmmm.
As threatened the other day, I've coded cookie support into the indexer. I also implemented arbitrary header strings while I was at it... Anyway, it reads columns called cookie_string and header_string out of the server table, so you'll need to modify it with something like MODIFY server ADD cookie_string VARCHAR(100) NOT NULL DEFAULT MODIFY server ADD header_string VARCHAR(100) NOT NULL DEFAULT For the cookie bit, you only need add the string, and not the whole header. For the arbitrary header bit, on your head be it. If you break servers and stuff... I know that one is a superset of the other, but that's the way it goes. And this is only on a per-server basis; if you want something global, use the indexer.conf I'm quite happy to continue with some development on this if anyone else has any use for things like this. Enjoy yourselves, Gary (-; cookie.diff.gz -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. cookie.diff.gz
Cookie Support
Right, people On a per-server basis, I wrote a patch before the weekend for cookie support. It reads a column called cookie_string from the server table, and sends it. Quite simple. I'm not too good with diff and friends, but I hope this works... Gary (-; cookie.diff.gz cookie.diff.gz
Cookies
I've nearly finished writing cookie support into this [but only for server database tables; If you want to do it with indexer.conf, then you can just use HTTPHeader directives, easy] Does anyone want a copy? Does anyone care? Gary ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Searching on Multiple tags
I can't get this to happen... Can you give me an exact example of a parameter t=something, to search on, for example, 0 and offsite 1 and andyandgary or something? Thank-you very much, Gary (-; ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Black magic, SPROCs, and sybase.
I've given up trying to get advice on XML, it's OK, I won't ask again. Has anyone written a coherent SPROC on sybase that, given the relevant parameters [tag, keywords, url, etc, etc] comes back with the results? In 'most any form would be ok. I mean, once someone's done the core bit... Thank-you very much, Gary ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
RE: Webboard: passing variables to search.cgi
actually, you're missing a few. Here's the API documentation as it stands for my search-engine-on-demand here where I work Sorry it's in a disgusting format, but it's wahat people like at this company. The XML returning stuff is irrlenevant and you can ignore it. It's just extensions I've made to the loval one. Gary (-; searchapi.doc -Original Message- From: Chuck Maiden [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, June 05, 2001 3:57 PM To: [EMAIL PROTECTED] Subject: Webboard: passing variables to search.cgi Author: Chuck Maiden Email: [EMAIL PROTECTED] Message: As best as I can determine, the only variables that I can pass to search.cgi are ul, ps, m, q, o, and t. I would like to pass a few extra variables (of my own) to search.cgi. My desire is that I would be able to display the values of these additional user-defined variables on the search results page by referencing them from inside the search.htm template. Does anyone know a way that I can do this? Thanks in advance. Reply: http://www.mnogosearch.org/board/message.php?id=2337 ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] searchapi.doc
RE: sleep when system is heavily loaded
Have you tried writing an independant daemon to do this? Just send indexer SIGSTOPs whenever the load average goes above whatever, then SIGCONTs whenever it drops again. This would have the added advantage of being almost no resource usage, and the indexer would never need to know... But for reference, why use load average? It's usually a bad measure of what's going on in a system, and unreliable as hell. If you're running the db on the same box as the indexer, you may accidentally throw the whole system out of kinter because an i/o bound DB can increase the load average more than is entirely natural, so you'd get an ugly cycle going on... Depending on what the box is doing, if it's only a web server then why not just look at Apache's resource usage? And so on and so forth. But anyway. No, I don't think indexer can do this? Gary (-; -Original Message- From: Danil Lavrentyuk [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, May 29, 2001 8:21 AM To: [EMAIL PROTECTED] Subject: sleep when system is heavily loaded Hello! It would be good if I could say to mnoGoSearch indexer: When system's load average is more then 2 go sleep for 30 seconds. Danil Lavrentyuk Communiware.net Programmer ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Windows character sets
I'm outputting XML from my search engine for use in other people's websites, and I'm having a small problem. Some of the sites I'm indexing are made in word [I've no control over this], and outputted as html. And they're in strange character sets like windows-125{0,1,2}. When I output the XML, it contains things like 92s, which are the word equivalent of a normal '. Is there any way I can do translations on this, either in the indexer, or in the php? [I'm using the php front end, and crc-multi DB schema]. Basically, I'd like to see nothing more than US-ASCII or friends; much easier to use, and won't break perl scripts on unix boxes. Anybody? Ta, Gary (-; PS I never got any response to my RFC on my code for putting stuff INTO the database from XML. Does anyone have anythign to add to it? ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
RE: 2 indexers at the same time
yep. Just try it. I believe that the database locking should work no matter how you do it. And for reference, in all odds you just need to fire off more indexers on the same box; the web server is more likely to be the bottleneck than the box that's doing the indexing. Hint: try running the indexer on one box, and the database on the other. Then run multiple indexers. Gary (-; -Original Message- From: La Rocca Network [SMTP:[EMAIL PROTECTED]] Sent: Saturday, May 05, 2001 3:23 PM To: [EMAIL PROTECTED] Subject: 2 indexers at the same time hi ! very important question is there any way to use the indexer on more than one pc at the same time ? we need better performace and we have available bandwidth ... regards, Nelson ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED] ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
Few random things
Has anyone here got a way of indexing powerpoint or visio documents? Changing the document is not viable; I need a way to get the strings out of it. strings is not too bad on powerpoint, but for visio it's not worth the effort. Also, Is there any way to convert documents with this in them: META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=windows-1252 ? I'd ideally like to convert them to something more standard... Can I do this? As in, I can't change anything. At all. I need a way to do all these things in the search engine. Please help with this, Gary (-; ___ If you want to unsubscribe send unsubscribe general to [EMAIL PROTECTED]
RE: Problems with indexing files on local hard drive.
That's because what you want to do is say: Server http://mywebstie Alias http://mysebstie file:/path-to-files ... Gary (-; -Original Message- From: Cliff Olle [SMTP:[EMAIL PROTECTED]] Sent: Monday, April 09, 2001 12:41 AM To: [EMAIL PROTECTED] Subject: Problems with indexing files on local hard drive. This is what indexer reports Indexer[28709]: indexer from mnogosearch-3.1.12/MySQL started with '/var/local/mnogosearch-3.1.12/etc/indexer.conf' Indexer[28716]: [1] Done (0 seconds) This is my indexer file DBAddr mysql://myuser:mypass@localhost/mnogosearch/ robots no Server http://angelcities.com/ file:/path/to/my/home/dir # Allow some known extensions and directory index Allow \.html$ \.htm$ \.txt$ \/$ # Disallow everything else Disallow .* This directory contains about 15 html files and there are about 1 directories underneath that I would also like indexed. There is not a link between all the html files as this is a user system I am trying to link. Is there is a reason it isn't indexing anything? Thank you, Cliff ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
XML Stuff
OK, I've written a system that can take XML of a certain form and put it into a crc-multi type of database. Just have a look. I'm interested in all you feedback, especially Alexander... It's obviously not yet ready to put on the d/load bit of your website, but I'm working on it... http://hercules.cs.york.ac.uk/~gjb105/udm_xml.tar.gz Gary (-; PS It needs the perl LWP, mysql DBD, DBI, and XML::Parser modules... ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
CRC32 in URL table
What is this? I'm unable to find what it is; I'm comparing some of the things in my already existing database [generated by the indexer], and it's not the text extract, it's not the URL itself, it's not the keywords, and it's not the meta description. What is it? I'm writing an application that inserts stuff into the database based on XML not dissimilar to the xml attatched to this message. I've already got everything working well, but I can't work out what the CRC32 is actually of. I assume it's used for the clone detection, but I'm not entirely sure. Thank-you very much Gary (-; PS Yes, I can search on this, which is my test data, and other pieces of XML which are thousands of times larger. And it's fast. searchindex.xml searchindex.xml
URL Table in database
What is the CRC32 actually _of_? I'm currently doing it as the CRC32 of the url itself... is this right? And for reference, I've basically finished working on a system that takes in XML and puts it straight into the database. Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Webboard: Use of Tags...
Example: You'are searching 4 websites. http://server1/~gbriggs http://server2/~gbriggs http://server3/~acoates http://server2/~acoates If you want to be able to search on just ~gbriggs homepages, you give them one tag [say, "gb"]. Then you give the ~acoates homepages another tag [say, "ac"]. That way, when you're searching, you can give "t=gb", and it'll just search through ~gbriggs pages. Does this help? If you want, I can give you a more complete example. Gary (-; -Original Message- From: Martyn [SMTP:[EMAIL PROTECTED]] Sent: Friday, March 23, 2001 1:04 AM To: [EMAIL PROTECTED] Subject: Webboard: Use of Tags... Author: Martyn Email: [EMAIL PROTECTED] Message: I also do not understand the use of Tags and would appreciate some enlightenment. Martyn Reply: http://search.mnogo.ru/board/message.php?id=1781 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Webboard: Premature end of indexing?
-Original Message- From: Orjan Sandland [SMTP:[EMAIL PROTECTED]] Sent: Friday, March 23, 2001 9:17 AM To: [EMAIL PROTECTED] Subject: Webboard: Premature end of indexing? Author: Orjan Sandland Email: [EMAIL PROTECTED] Message: I'm running latest mnogosearch and redhat 7 with mysql rpm installed. Redhat7. G. RPM. G OK, I'll put my pettiness aside. I compiled mnogosearch with support for pthreads btw. Good. Two days ago, I had about 200.000 urls indexed, with 125.000 of them being status OK, about 20.000 was status not modified. This morning, after running the indexer all night, the total count is up to 255.000 urls, but the Gateway Timeout count was over 150.000, and only 60.000 had OK status. Theory: there was an outage, and max_retry_errors [or whatever it's called in the config file]was reached. the indexer then gave up. Yesterday I started the indexer to reindex, using -a option. Why did so many get into status Gateway Timeout?? My first thought was that there was some huge network error (most of the pages I index are separated from the server by the atlantic ocean :-). I can live with this (as long as they get indexed again at some point ;-), but I tried to force it to index the urls with Gateway Timeout. Running ./indexer -s 504 only works for 4 seconds, then gives me the "Done" message. "indexer -s 504" would only have re-indexed out-of-date documents. If, the night before, it had indexed all of the but given timeouts, they would be marked in the database as indexed, and have to wait for the default of 2 weeks before they get indexed again. You should have hit it with a "indexer -a -s 504". That would have reindexed all the 504s. Am I doing anything wrong? I'm quite new with this, still learning. Naaah. Looks like you know what you're doing [although linux isn't the best choice from my experience - I use solaris] I'm realising that I will need to learn alot, because at this point, with 250.000 urls, I've only partially indexed 200 of the 1-2000 websites I intend to index Hope this helps, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Webboard: libmysqlclient.so.6
edit /etc/ld.so.conf and add /usr/local/mysql/lib to it [don't forgfet to run /sbin/ldconfig] or export LD_LIBRARY_PATH=/usr/local/mysql/lib:$LD_LIBRARY_PATH or similar. Gary (-; -Original Message- From: Doos [SMTP:[EMAIL PROTECTED]] Sent: Friday, March 23, 2001 2:21 PM To: [EMAIL PROTECTED] Subject: Webboard: libmysqlclient.so.6 Author: Doos Email: [EMAIL PROTECTED] Message: when i start indexer i get the error msg.: ./indexer: error in loading shared libraries: libmysqlclient.so.6: cannot open shared object file: No such file or directory. but i have it in: /usr/local/mysql/lib/mysql/libmysqlclient.so.6 Does anyone know how to fix this? thanks Reply: http://search.mnogo.ru/board/message.php?id=1787 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Will mnoGoSearch let me...
Do you want my stuff to do this? I've wriutten some quite useful things... Gary (-; -Original Message- From: Dustin S. [SMTP:[EMAIL PROTECTED]] Sent: Sunday, March 18, 2001 8:20 AM To: [EMAIL PROTECTED] Subject: Will mnoGoSearch let me... ...make it so visitors can add their own site to the engine? ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Here's some more thigns people may find of interest.
qtrackingalanysis.tar.gz Hmmm. The daemon that grabs usernames off the network; I can give that to people if they want. And by the way. This is based around the expanded qtracking that my system does, it's an add-on to the php frontend that I mailed to this list a while ago. Gary (-; qtrackingalanysis.tar.gz
RE: crc-multi and millions of urls
I'll tell you what: here's _my_ personal experience. I'm indexing many many gigabytes of data. Currently it's at ~20, but it will go up in the near[ish] future. When I get a cluster to run it on, and the correct access to the other files, etc. I'm using crc-multi and most of my queries are coming back in under a second. The average is only quite high because I happen to know that if I search for "index || perl", I absolutely trounce the system because thos're about the two most popular words. Anyway. I'm using CRC-multi, and I've added a few features [eg the query trakcing, more useful server tables, and some other stuff], and it's still fast enough to not worry about. I haven't done any proper stress,testing yet, because the server _cannot_ go down. But from the behavior I'm seeing, it shouldn't be too bad. Gary (-; -Original Message- From: Caffeinate The World [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, March 13, 2001 4:41 PM To: [EMAIL PROTECTED] Subject: crc-multi and millions of urls is there any reason why you can't index millions of urls using the DB crc-multi mode? is it speed? when i first started using mnogosearch, i was under the assumption that if you were to index millions of urls, you should use cachemode. now that i've run into several limitations of cachemode itself: 1. limited depth for categories, 2. unreliable -- i've yet to have a fully indexed service -- seems like i've been debugging for months and indexer, cachelogd, splitter still core dumps, some were fixed and new ones showed up. so now i'm back to investigating the use of sql db instead. i do like the speed i see in cachemode, but the unreliabililty doesn't make it usable. __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Question about ServerTables
Why is Active in servertable an int(11)? Surely a bool or a smallint would be faster? Ta, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: [3.1.11] indexing .cgi's?
The thing is, your URL doesn't tecnically come under the disallowed set of things. For that particular case, try adding /cgi/ to the Disallow path. [I'm assuming www.pg.com/cgi is a place that you keep CGI script on your web server] Hope this helps, Gary (-; -Original Message- From: The Hermit Hacker [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2001 4:58 PM To: [EMAIL PROTECTED] Subject: [3.1.11] indexing .cgi's? %sbin/indexer -h indexer from mnogosearch-3.1.11/PgSQL http://search.mnogo.ru (C) 1998-2000, LavTech Corp. I have it disallow'd in my config file: # Exclude cgi-bin and non-parsed-headers Disallow /cgi-bin/ \.cgi /nph \? yet, if I run the indexer, its indexing a *load* of cgi URLs, of the format: Indexer[83032]: [1] http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/configure.in.diff?r1=1.59r 2=1.45sortby=revonly_with_tag=MAIN why? and how can I get it to stop? thanks ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
More interesting things I've done with PHP frontend
This one gives you the option of returning XML. The way to use it is this: if you use it as a regular php script [eg http://lonwebhost20:8080/udm/blank.php ] It does what you'd expect; returns results that're formatted based on the template blank.htm If, on the other hand, you run it with http://lonwebhost20:8080/udm/blank.php/results.xml or simimlar, it returns XML. When I say "or similar", it actually returns XML for ANYTHING where you finish it with a ".xml" extension [case insensitive], so http://lonwebhost20:8080/udm/blank.php/www.linuxgames.com.xml would also work just fine (= From my experience, I'd recommend [if you're using IE and actually want to get XML back] that you only do this as an HTTP POST, because IE can be a bit annoying sometimes... I'd love to hear from anyone who actually finds this useful, especially if they have anything to add. Just in case anyone's interested, the actual use for this is that I'm working on a system whereby people in this company can go to a web page, add their information, and then have a search engine on their page with almost no effort by them. The blank.htm version is so that if they just want to put it in a frame and not have to do any formatting, etc, then it works just fine. Gary (-; PS Mozilla/NS6 doesn't like it because the ?xml version="1.0" standalone="yes"? line isn't necessarily on the first line. This is a bug in NS/moz, and you probably want to do some server-side XSL translation, anyway. blank.tar.gz blank.tar.gz
Is this, by any chance, a Bad Thing (TM)?
010305 9:28:18 Aborted connection 394 to db: 'udmsearch' user: 'udm' host: `localhost' (Got a packet bigger than 'max_allowed_packet') From the mySQL error log. Ta, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Minimum permissions on MySQL database?
I'm trying to lock down some of my mySQL tables, since I accidentally deleted on of the tables the other day. What're the minimum permissions I need to set to make searching possible? I'm hoping to have two users: udm [which will be used by the indexer process; password breach could be used to delete everything], and udm_ro [which will be used by the searching process; password breach should not be able to do anything at all, except maybe a DOS which I oculd fix really quickly] I see that the read-only user does need create and drop permissions, since it uses a temporary table AFAICT. Is there anything else? Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Webboard: Installation
if a file is a .tar.gz file [or .tgz], you can extract it with: gzip -cd the-file.tar.gz | tar -xf - Gary (-; -Original Message- From: Mike Davis [SMTP:[EMAIL PROTECTED]] Sent: Friday, March 02, 2001 10:29 AM To: [EMAIL PROTECTED] Subject: Webboard: Installation Author: Mike Davis Email: [EMAIL PROTECTED] Message: mnoGoSearch comes with great recommendations. I have installed a fair number of scripts but am unfamiliar with the process of unpacking and decompressing tar files. I have not used "make" either. I have managed to use winzip to uncompress the files and unpack them onto MY computer, but what do I do now to put them onto my web site? Perhaps the better question is, once I've taken the original tar.gz file and uploaded it to my web site, how do I unpack and install it there? My web host does not provide access to telnet. Do I need this in order to install these kinds of packages? If you know of a link where I can learn how to do this (preferably for dummies!), I'll be glad to learn there. I think you're search engine would be a real asset. Thanks, Mike Reply: http://search.mnogo.ru/board/message.php?id=1595 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Searching multiple tags
I'm using the PHP frontend at the moment. And I shall actually continue using it, especially when the PHP 4.0.5 comes out (= Gary (-; -Original Message- From: Laurent LEVIER [SMTP:[EMAIL PROTECTED]] Sent: Wednesday, February 28, 2001 6:38 PM To: [EMAIL PROTECTED]; Briggs, Gary; '[EMAIL PROTECTED]' Subject: Re: Searching multiple tags When searching, the search tool is restricting tags within the select. So for sure it is possible, but what are you using as searching tool ? CGI ? PHP ? PERL ? At 17:49 28/02/2001 +, Briggs, Gary wrote: Is there any way that I can search from multiple tags in one search? Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] Laurent LEVIER IT Systems Networks, Unix System Engineer Security Specialist Argosnet Security Server : http://www.Argosnet.com "Le Veilleur Technologique", "The Technology Watcher" ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Searching multiple tags
Hmmm. The php one can't . Gary (-; -Original Message- From: Alexander Barkov [SMTP:[EMAIL PROTECTED]] Sent: Thursday, March 01, 2001 5:13 AM To: [EMAIL PROTECTED]; Briggs, Gary Subject: Re: Searching multiple tags "Briggs, Gary" wrote: Is there any way that I can search from multiple tags in one search? Gary (-; Just submit several t=XXX pairs from HTML form. At least search.cgi can do it. ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Crosswords and servertables
For large volumes of data, which of these two is faster? Gary (-; -Original Message- From: Alexander Barkov [SMTP:[EMAIL PROTECTED]] Sent: Wednesday, February 28, 2001 9:52 AM To: [EMAIL PROTECTED]; Briggs, Gary Subject: Re: Crosswords and servertables This feature works in single and multi modes only. Thanks for noticing this, we'll add this into documentation. "Briggs, Gary" wrote: having looked some more, the docs say that it is "not supported in built-in database and Cachemode". No matter what I do, I can't get it to work at all in crc-multi mode on my database. Help? Thank-you very much, Gary Briggs -Original Message----- From: Briggs, Gary [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, February 27, 2001 11:56 AM To: '[EMAIL PROTECTED]' Subject: Crosswords and servertables Is there any way that I can control what the weight of crosswords is when all my servers are pulled out of a table on the database? Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers Inc. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Crosswords and servertables
I think that's because it doesn't read anything from the ServerTable about crossweight by default, and then sets the crossweight to "0". But that's just a guess. I've got it inserting stuff into the crossdict table by editing sql.c and making it read in an extra column from the ServerTable called "crossweight" that I've put in ["alter table server add crossweight int default 32", IIRC] Two lines of code, I can brew a diff if you want, but it's really not worth the effort [especially since I did a fugly hack that means the number 32 appeard in the middle of a series at one point] Gary (-; -Original Message- From: Alexander Barkov [SMTP:[EMAIL PROTECTED]] Sent: Wednesday, February 28, 2001 9:57 AM To: [EMAIL PROTECTED]; Briggs, Gary Subject: Re: Crosswords and servertables Hi! It seems to be a bug. CrossWeight is not working when servers are loaded from SQL table using ServerTable command. Thanks for reporting! "Briggs, Gary" wrote: having looked some more, the docs say that it is "not supported in built-in database and Cachemode". No matter what I do, I can't get it to work at all in crc-multi mode on my database. Help? Thank-you very much, Gary Briggs -Original Message- From: Briggs, Gary [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, February 27, 2001 11:56 AM To: '[EMAIL PROTECTED]' Subject: Crosswords and servertables Is there any way that I can control what the weight of crosswords is when all my servers are pulled out of a table on the database? Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers Inc. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Crosswords and servertables
Why, thank-you Want a well-scabby patch? (= Gary (-; -Original Message- From: Alexander Barkov [SMTP:[EMAIL PROTECTED]] Sent: Wednesday, February 28, 2001 10:21 AM To: Briggs, Gary Cc: [EMAIL PROTECTED] Subject: Re: Crosswords and servertables You are right! "Briggs, Gary" wrote: I think that's because it doesn't read anything from the ServerTable about crossweight by default, and then sets the crossweight to "0". ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Searching multiple tags
Is there any way that I can search from multiple tags in one search? Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Crosswords and servertables
Is there any way that I can control what the weight of crosswords is when all my servers are pulled out of a table on the database? Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Crosswords and servertables
OK, now I feel stupid. The PHP front end doesn't support it, right? Does anyone know when it's likely to be supported in this frontend? Gary (-; -Original Message- From: Briggs, Gary Sent: Tuesday, February 27, 2001 2:20 PM To: '[EMAIL PROTECTED]'; Briggs, Gary Subject: RE: Crosswords and servertables having looked some more, the docs say that it is "not supported in built-in database and Cachemode". No matter what I do, I can't get it to work at all in crc-multi mode on my database. Help? Thank-you very much, Gary Briggs -Original Message----- From: Briggs, Gary [SMTP:[EMAIL PROTECTED]] Sent: Tuesday, February 27, 2001 11:56 AM To: '[EMAIL PROTECTED]' Subject: Crosswords and servertables Is there any way that I can control what the weight of crosswords is when all my servers are pulled out of a table on the database? Thank-you very much, Gary (-; ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] -- This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers Inc. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Webboard: Segmentation Fault, core dump (gdb report included)
OK, It still breaks: [lonwebhost20:/opt/udmsearch/sbin/]$ pwd /opt/udmsearch/sbin [lonwebhost20:/opt/udmsearch/sbin/]$ cat ./chunkystuff #!/bin/sh LD_LIBRARY_PATH=/opt/mySQL/lib/mysql:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH exec ./indexer $* [lonwebhost20:/opt/udmsearch/sbin/]$ ./chunkystuff indexer from mnogosearch-3.1.11/MySQL started with '/opt/udmsearch/etc/indexer.conf' Segmentation Fault (core dumped) [lonwebhost20:/opt/udmsearch/sbin/]$ gdb --core=./core indexer GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.16 (sparc-sun-solaris2.6), Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... Core was generated by `./indexer'. Program terminated with signal 11, Segmentation fault. Reading symbols from /opt/udmsearch/lib/libudmsearch-3.1.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxnet.so.1...(no debugging symbols found)...done. Reading symbols from /opt/mySQL/lib/mysql/libmysqlclient.so.10...(no debugging symbols found)...done. Reading symbols from /usr/lib/libc.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libm.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols found)...done. Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/nss_files.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols found)...done. #0 0xef5a4734 in strlen () (gdb) bt #0 0xef5a4734 in strlen () #1 0xef5da62c in _doprnt () #2 0xef5e3804 in sprintf () #3 0xef69fd5c in UdmAddURL () #4 0xef6b8850 in UdmStoreHrefs () #5 0xef68ab3c in UdmIndexNextURL () #6 0x12460 in thread_main () #7 0x1387c in main () (gdb) frame 3 #3 0xef69fd5c in UdmAddURL () (gdb) where #0 0xef5a4734 in strlen () #1 0xef5da62c in _doprnt () #2 0xef5e3804 in sprintf () #3 0xef69fd5c in UdmAddURL () #4 0xef6b8850 in UdmStoreHrefs () #5 0xef68ab3c in UdmIndexNextURL () #6 0x12460 in thread_main () #7 0x1387c in main () (gdb) [lonwebhost20:/opt/udmsearch/sbin/]$ I made it with: [lonwebhost20:/opt/udmsearch/src/]$ cat ./makeconfig #!/bin/sh CC="/opt/SUNWspro/bin/cc";export CC #CFLAGS="-fast -g";export CFLAGS CFLAGS="-g";export CFLAGS CXX="/opt/SUNWspro/bin/CC";export CXX ./configure --prefix=/opt/udmsearch --with-mysql=/opt/mySQL --disable-syslog --disable-mp3 --disable-news --enable-shared Hope this helps, [and that you can help me!] Gary (-; -Original Message- From: Alexander Barkov [SMTP:[EMAIL PROTECTED]] Sent: Monday, February 26, 2001 11:06 AM To: [EMAIL PROTECTED] Subject: Webboard: Segmentation Fault, core dump (gdb report included) Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: You have very strange place of crash :-( Probably compiling with -g cc flag will produce backtrace with more information. To compile with -g flag run export CFLAGS=-g ./configure .. make then run new indexer and check backtrace after crash Reply: http://search.mnogo.ru/board/message.php?id=1517 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Problem with mnogosearch unpack
It's because you've got a .tar.gz file, and your version of tar [are you using solaris, by any chance?] doesn't understand gzip compression. The way to decompress .tar.gz files that works an almost any un*x platform is: gzip -cd something.tar.gz | tar -xf - Although be warned: soalris tar has a HUGE problem, and for a lot of things you should use the gnu one instead [and if you're using the gnu one, then the -z option will work anyway]. Usually it's installed someplace like /opt/gnu/bin Hope this helps Gary (-; -Original Message- From: PNTCD [SMTP:[EMAIL PROTECTED]] Sent: Sunday, February 25, 2001 1:19 PM To: [EMAIL PROTECTED] Subject: Problem with mnogosearch unpack To unpack UDM I had use tar -zxf udmsearch-3.1.3.tar.gz but the server respond: tar: z: unknown option then I'd try tar -xf udmsearch-3.1.3.tar.gz the server: tar: directory checksum error Can anybody help me!!! Thank you! Claudiu Cristea [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
RE: Segmentation Fault core dump (gdb report include in this mail)
I'm getting this too. Solaris 2.6 MySQL 3.23.33 Mnogo 3.1.11 compiled with CC="/opt/SUNWspro/bin/cc";export CC CFLAGS="-fast";export CFLAGS CXX="/opt/SUNWspro/bin/CC";export CXX ./configure --cache-file=/dev/null --prefix=/opt/udmsearch \ --with-mysql=/opt/mySQL --disable-syslog --disable-mp3 \ --disable-news --enable-shared It's in CRC-multi mode. Here's a BackTrace: [lonwebhost20:/opt/udmsearch/sbin/]$ pwd /opt/udmsearch/sbin [lonwebhost20:/opt/udmsearch/sbin/]$ cat ./chunkystuff #!/bin/sh export LD_LIBRARY_PATH=/opt/mySQL/lib/mysql:$LD_LIBRARY_PATH exec ./indexer $* [lonwebhost20:/opt/udmsearch/sbin/]$ ./chunkystuff indexer from mnogosearch-3.1.11/MySQL started with '/opt/udmsearch/etc/indexer.conf' Segmentation Fault (core dumped) [lonwebhost20:/opt/udmsearch/sbin/]$ echo "bt" | gdb --core=./core indexer GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.16 (sparc-sun-solaris2.6), Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... Core was generated by `./indexer'. Program terminated with signal 11, Segmentation fault. Reading symbols from /opt/udmsearch/lib/libudmsearch-3.1.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libxnet.so.1...(no debugging symbols found)...done. Reading symbols from /opt/mySQL/lib/mysql/libmysqlclient.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/libc.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libm.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols found)...done. Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/nss_files.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols found)...done. #0 0xef5a4734 in strlen () (gdb) #0 0xef5a4734 in strlen () #1 0xef5da62c in _doprnt () #2 0xef5e3804 in sprintf () #3 0xef71374c in UdmAddURL () #4 0xef71f5c8 in UdmStoreHrefs () #5 0xef709a4c in UdmIndexNextURL () #6 0x113f4 in thread_main () #7 0x11dbc in main () (gdb) [lonwebhost20:/opt/udmsearch/sbin/]$ Thank-you very much, Gary Briggs -Original Message- From: filip.sergeys [SMTP:[EMAIL PROTECTED]] Sent: Thursday, February 22, 2001 7:14 PM To: [EMAIL PROTECTED] Subject: Segmentation Fault core dump (gdb report include in this mail) Hi, When trying to start the indexer I get a core dump. The complete installation is new Linux : 6.2 kernel 2.2.14 Mysql : 3.23.33 Mnogosearch : 3.1.11 compiled with --enable-syslog=LOG_LOCAL6 --enable-linux-pthreads --with-mysql The dbg report: GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Core was generated by `./indexer -v 3'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libpthread.so.0...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /usr/lib/libz.so.1...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. #0 0x400e9c61 in __libc_nanosleep () from /lib/libc.so.6 (gdb) backtrace #0 0x400e9c61 in __libc_nanosleep () from /lib/libc.so.6 #1 0x400e9bed in __sleep (seconds=1) at ../sysdeps/unix/sysv/linux/sleep.c:82 #2 0x804b0ec in main (argc=3, argv=0xbd70) at main.c:593 (gdb) q Hope you can help me. Thanks in advance FS ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED] ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
Direct Database Injection And Some probably stupid questions
OK, so I've been reading the source code, and I'm having real trouble with what some parts of the url table are for... Why bother to compute a crc32 for the urls? do I need it? [I'm currently using crc-multi db mode on mySQL] It seems to be the primary key, but then what's the point of rec_id? Seeing as how rec_id autoincrements, surely it's actually more unique than a crc, which by your own analysis is not unique for about 250 in any given 1,600,000 urls? Why is there a keywords field? I thought that the search worked by: 0) Compute the crc's of the keywords we're looking for 1) looking up the crc's we're searching for from the dict tables 2) using url_id as a foreign key, look up the relevant url for rec_id key in the url table 3) also look up all the other information from the url table, such as description, title, text Surely this doesn't need a keyword field, since we're searching other tables based on keywords anyway? What's the difference between txt and description? I assume that Description is the descrption if there's a description meta-tag, and txt is an extract of the text. Why is there both? Surely a unified field that could contain description, if it's there, or an extract, if there's not? Thank-you very much, Gary ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]