RE: security: private or public pages

2001-08-24 Thread Briggs, Gary

Hmmm. Well, yes. But at the same time, it's considerably more secure than
anything else you'll be able to do...

And while the ACLs _could_ work on just tags or categories, I'd rather
define them as groups of URLs as that would have a greater degree of
flexibility... I'd ideally like full regexp support...

And finally, the database security: yeah, I've done it so that it works. But
when they search, the algorithm actually involves CREATEing a table,
SELECTing some stuff into it, and the DROPping it again afterwards.

Bugger.

Gary (-;

PS If you're interested, I can send you a sanitised version of the mysql
database...


 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, August 23, 2001 4:23 PM
 To:   [EMAIL PROTECTED]
 Cc:   [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject:  Re: security: private or public pages
 
 Briggs, Gary wrote on Wed Aug 22 10:27:50 2001:
  
  Hmmm. This is of significance to me, as I work in a secure
 environment...
  
  I'm not sure how much compute power and secondary storage you have, but
 in
  the end, if it's possible, my humble recommendation is to have two
  databases, and use some webserver/htaccess voodoo to arrange so that
 outside
  people 
  see a different search page/database backend to internal people.
 
 I do not wish this solution.  It increases significantly disk storage,
 computing time and network ressources.
 
  
  Of course, depending on resources, you might like to implement that at
 the
  database layer with some sort of balancing in it, but we're talking
 hideous
  implementation details by now...
  
  If that's not viable, I'm working on a wholesome way of adding ACLs to
 the
  engine, but it's not a simple exercise to work out what the ACLs need to
 be,
  especially on the scale of many many thousands of URLs.
 
 these ACLs might operate only on the tags or categories.  Which reduces
 the complexity.
 
  That would also require an authentication method on the search page,
 which
  is intrusive... NTLM is NOT my friend, before you mention it (=
 
 Using several symbolic links on search CGI, each having his own
 template (search.htm), you might use access controls for each symbolic
 link in a unique .htaccess, without increasing space for this CGI.
 
  
  And along the same lines, are you using mySQL? [I plan to move to SyBase
 at
  some point in the future, but until then...]
  If so, how have you implemented the security of the database user for
  /searching/ being effectively read-only, and the indexer/admin user
 being
  rw? I've basically created a HUGE permission list, but I feel that's
  probably not the best solution...
 
 First of all, our database has an user/password, and is used by
 the webserver which works on a dedicated account. Thus there is no
 risk of intrusion.
 
 MySQL has a privilege mechanism which makes it to have several
 user/password accesses for each database, table and column, each
 user/password having his own privileges.  Thus an user/password can
 update the database whereas another can only consult it (select
 statement).
 
  
  The problem, you see, is that the read-only user needs to be able to
 create
  and drop tables, and have write access to some parts of the qtracking
 table.
  [note: only some parts of it, because I have greatly enhanced the
 qtracking
  here, and have several daemons running that are finding further
 information
  about the users than just what they give]
 
 It seems that the privilege mechanism of MySQL should be able to solve
 this problem.
 
 Dominique
 
  
  Gary (-;
  
  PS e-mail me personally if you want snippets of source code, etc. In
 fact,
  feel free to e-mail me personally about this anyway...
  
   -Original Message-
   From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
   Sent: Tuesday, August 21, 2001 6:27 PM
   To:   [EMAIL PROTECTED]
   Subject:  security: private or public pages
   
   Hello,
   
   I wish to do so that, from one side the whole set of documents from
   our site be normaly referenced and retrieved when looked from one of
   our team users, and, from the other side that any external web visitor
   only see the references to public document. As the spider is working
   from inside the site, access controls are not efficient. More
   specifically, the research engine may provide references and display
   pages strictly for internal use.
   
   One solution may be to use tags or categories, but, during research
   stage, it may still be possible to hack the URL in order to suppress
   the CGI research limiting parameters (t=, cat=).
   
   Another way to overpass this problem would be to constrain in the
   template file part (!--variables ... --) the CGI parameters that we
   want in order that they cannot be suppressed or overriden during a
   research. However it would still be possible to short-cut this
 solution
   with the tmplt= CGI parameter, but in case where the restriction is
   added

RE: biggest(maximum) numbers for BodyWeight, TiteWeight,DescWeig ht, etc.

2001-08-20 Thread Briggs, Gary

Because that way you can search through just the titles, etc, because of a
bitwise OR operation done on the weight.

Gary (-;

 -Original Message-
 From: Andre Pfeiler [SMTP:[EMAIL PROTECTED]]
 Sent: Monday, August 20, 2001 10:13 AM
 To:   [EMAIL PROTECTED]; Andre Pfeiler
 Subject:  Re: biggest(maximum) numbers for BodyWeight, TiteWeight,
 DescWeight, etc.
 
 On Monday 20 August 2001 09:33, you wrote:
  hello,
 
  Posted by c4miles 2001-01-15 02:29:48
  
  Regarding BodyWeight, TiteWeight, DescWeight, etc.
  What is the weight of these weights?
  Does 1 carry greater importance than 5, for example?
  Or vice-versa?
  
  Posted by gluke 2001-01-15 10:57:08
  
  No. The bigger number gives greater importance.
 
  ...what are the biggest(maximum) numbers i should use?
 
 ...is it strongly reconommed to use the weight numbers with degrees of 2?
 
   greets
  Andre
  ___
  If you want to unsubscribe send unsubscribe general
  to [EMAIL PROTECTED]
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 


--
This message is intended only for the personal and confidential use of the designated 
recipient(s) named above.  If you are not the intended recipient of this message you 
are hereby notified that any review, dissemination, distribution or copying of this 
message is strictly prohibited.  This communication is for information purposes only 
and should not be regarded as an offer to sell or as a solicitation of an offer to buy 
any financial product, an official confirmation of any transaction, or as an official 
statement of Lehman Brothers.  Email transmission cannot be guaranteed to be secure or 
error-free.  Therefore, we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject to 
change without notice.


___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




RE: Only indexing first part of a page (possible bug)

2001-07-26 Thread Briggs, Gary

Hi,

There's a limit to the amount of content it'll download and indx for each
individual page. IIRC, it's in the indexer.conf file, but it may be compiled
in...

Gary (-;

 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, July 26, 2001 3:45 PM
 To:   [EMAIL PROTECTED]
 Subject:  Only indexing first part of a page (possible bug)
 
 Is there a word count limit per page implicitly defined in the
 mnogosearch indexer?
 
 In dubugging why some links aren't picked up in our site indexing, I've
 found that for indexing a single page of about 850 words (only about
 9k), only the first quarter (rough estimate) is being stored.  I have no
 idea why.
 
 This is evidenced by both words and links from the latter part of this
 single page missing from the dict and url sql tables.  I can almost
 spot the exact point in the page at which words stop going into the
 index database by checking in the dict table.
 
 Anyone have a clue why this is happening?  I'm using mnogosearch on a
 redhat 7.1 machine with postgresql to index an SSL intranet.
 
 Thanks, nick
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 


--
This message is intended only for the personal and confidential use of the designated 
recipient(s) named above.  If you are not the intended recipient of this message you 
are hereby notified that any review, dissemination, distribution or copying of this 
message is strictly prohibited.  This communication is for information purposes only 
and should not be regarded as an offer to sell or as a solicitation of an offer to buy 
any financial product, an official confirmation of any transaction, or as an official 
statement of Lehman Brothers.  Email transmission cannot be guaranteed to be secure or 
error-free.  Therefore, we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject to 
change without notice.


___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Hmmm.

2001-07-05 Thread Briggs, Gary

As threatened the other day, I've coded cookie support into the indexer. I
also implemented arbitrary header strings while I was at it...

Anyway, it reads columns called cookie_string and header_string out of
the server table, so you'll need to modify it with something like
MODIFY server ADD cookie_string VARCHAR(100) NOT NULL DEFAULT 
MODIFY server ADD header_string VARCHAR(100) NOT NULL DEFAULT 

For the cookie bit, you only need add the string, and not the whole header.
For the arbitrary header bit, on your head be it. If you break servers and
stuff...

I know that one is a superset of the other, but that's the way it goes. And
this is only on a per-server basis; if you want something global, use the
indexer.conf

I'm quite happy to continue with some development on this if anyone else has
any use for things like this.

Enjoy yourselves,
Gary (-;

 cookie.diff.gz 


--
This message is intended only for the personal and confidential use of the designated 
recipient(s) named above.  If you are not the intended recipient of this message you 
are hereby notified that any review, dissemination, distribution or copying of this 
message is strictly prohibited.  This communication is for information purposes only 
and should not be regarded as an offer to sell or as a solicitation of an offer to buy 
any financial product, an official confirmation of any transaction, or as an official 
statement of Lehman Brothers.  Email transmission cannot be guaranteed to be secure or 
error-free.  Therefore, we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject to 
change without notice.


 cookie.diff.gz


Cookie Support

2001-07-02 Thread Briggs, Gary

Right, people

On a per-server basis, I wrote a patch before the weekend for cookie
support. It reads a column called cookie_string from the server table, and
sends it.

Quite simple.

I'm not too good with diff and friends, but I hope this works...

Gary (-;

 cookie.diff.gz 

 cookie.diff.gz


Cookies

2001-06-29 Thread Briggs, Gary

I've nearly finished writing cookie support into this [but only for server
database tables; If you want to do it with indexer.conf, then you can just
use HTTPHeader directives, easy]

Does anyone want a copy?

Does anyone care?

Gary
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Searching on Multiple tags

2001-06-27 Thread Briggs, Gary

I can't get this to happen...

Can you give me an exact example of a parameter t=something, to search on,
for example,

0 and offsite
1 and andyandgary

or something?

Thank-you very much,
Gary (-;
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Black magic, SPROCs, and sybase.

2001-06-11 Thread Briggs, Gary

I've given up trying to get advice on XML, it's OK, I won't ask again.

Has anyone written a coherent SPROC on sybase that, given the relevant
parameters [tag, keywords, url, etc, etc] comes back with the results? In
'most any form would be ok. I mean, once someone's done the core bit...

Thank-you very much,
Gary
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




RE: Webboard: passing variables to search.cgi

2001-06-05 Thread Briggs, Gary

actually, you're missing a few.

Here's the API documentation as it stands for my search-engine-on-demand
here where I work

Sorry it's in a disgusting format, but it's wahat people like at this
company.

The XML returning stuff is irrlenevant and you can ignore it. It's just
extensions I've made to the loval one. 

Gary (-;

 searchapi.doc 

 -Original Message-
 From: Chuck Maiden [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, June 05, 2001 3:57 PM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: passing variables to search.cgi
 
 Author: Chuck Maiden
 Email: [EMAIL PROTECTED]
 Message:
 As best as I can determine, the only variables that I can pass to
 search.cgi are ul, ps, m, q, o, and t.
 
 I would like to pass a few extra variables (of my own) to search.cgi.  My
 desire is that I would be able to display the values of these additional
 user-defined variables on the search results page by referencing them from
 inside the search.htm template.
 
 Does anyone know a way that I can do this?
 
 Thanks in advance.
 
 
 
 Reply: http://www.mnogosearch.org/board/message.php?id=2337
 
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 

 searchapi.doc


RE: sleep when system is heavily loaded

2001-05-29 Thread Briggs, Gary

Have you tried writing an independant daemon to do this?

Just send indexer SIGSTOPs whenever the load average goes above whatever,
then SIGCONTs whenever it drops again. This would have the added advantage
of being almost no resource usage, and the indexer would never need to
know...

But for reference, why use load average? It's usually a bad measure of
what's going on in a system, and unreliable as hell. 

If you're running the db on the same box as the indexer, you may
accidentally throw the whole system out of kinter because an i/o bound DB
can increase the load average more than is entirely natural, so you'd get an
ugly cycle going on...

Depending on what the box is doing, if it's only a web server then why not
just look at Apache's resource usage?

And so on and so forth.

But anyway. No, I don't think indexer can do this?

Gary (-;

 -Original Message-
 From: Danil Lavrentyuk [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, May 29, 2001 8:21 AM
 To:   [EMAIL PROTECTED]
 Subject:  sleep when system is heavily loaded
 
 Hello!
 
 It would be good if I could say to mnoGoSearch indexer: When system's
 load
 average is more then 2 go sleep for 30 seconds.
 
 
 Danil Lavrentyuk
 Communiware.net
 Programmer
 
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Windows character sets

2001-05-15 Thread Briggs, Gary

I'm outputting XML from my search engine for use in other people's websites,
and I'm having a small problem.

Some of the sites I'm indexing are made in word [I've no control over this],
and outputted as html.

And they're in strange character sets like windows-125{0,1,2}.

When I output the XML, it contains things like 92s, which are the word
equivalent of a normal '. Is there any way I can do translations on this,
either in the indexer, or in the php? [I'm using the php front end, and
crc-multi DB schema].

Basically, I'd like to see nothing more than US-ASCII or friends; much
easier to use, and won't break perl scripts on unix boxes.

Anybody?

Ta,
Gary (-;

PS I never got any response to my RFC on my code for putting stuff INTO the
database from XML. Does anyone have anythign to add to it?
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




RE: 2 indexers at the same time

2001-05-08 Thread Briggs, Gary

yep.

Just try it. I believe that the database locking should work no matter how
you do it.

And for reference, in all odds you just need to fire off more indexers on
the same box; the web server is more likely to be the bottleneck than the
box that's doing the indexing.

Hint: try running the indexer on one box, and the database on the other.
Then run multiple indexers.

Gary (-;

 -Original Message-
 From: La Rocca Network [SMTP:[EMAIL PROTECTED]]
 Sent: Saturday, May 05, 2001 3:23 PM
 To:   [EMAIL PROTECTED]
 Subject:  2 indexers at the same time
 
 hi !
 
 very important question 
 is there any way to use the indexer on more than one pc at the same time ?
 
 we need better performace and we have available bandwidth ...
 
 
 
 regards,
 Nelson
 
 
 
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Few random things

2001-05-08 Thread Briggs, Gary

Has anyone here got a way of indexing powerpoint or visio documents?

Changing the document is not viable; I need a way to get the strings out of
it.

strings is not too bad on powerpoint, but for visio it's not worth the
effort.


Also, Is there any way to convert documents with this in them:
 META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=windows-1252
?

I'd ideally like to convert them to something more standard... Can I do
this?

As in, I can't change anything. At all. I need a way to do all these things
in the search engine.

Please help with this,
Gary (-;
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




RE: Problems with indexing files on local hard drive.

2001-04-09 Thread Briggs, Gary

That's because what you want to do is say:


Server http://mywebstie

Alias http://mysebstie file:/path-to-files
...

Gary (-;

 -Original Message-
 From: Cliff Olle [SMTP:[EMAIL PROTECTED]]
 Sent: Monday, April 09, 2001 12:41 AM
 To:   [EMAIL PROTECTED]
 Subject:  Problems with indexing files on local hard drive.
 
 This is what indexer reports
 Indexer[28709]: indexer from mnogosearch-3.1.12/MySQL started with
 '/var/local/mnogosearch-3.1.12/etc/indexer.conf'
 Indexer[28716]: [1] Done (0 seconds)
 
  
 This is my indexer file
 DBAddr  mysql://myuser:mypass@localhost/mnogosearch/
  
 robots no
 
 Server  http://angelcities.com/ file:/path/to/my/home/dir
  
 # Allow some known extensions and directory index
 Allow \.html$ \.htm$ \.txt$ \/$
  
 # Disallow everything else
 Disallow .*
  
 This directory contains about 15 html files and there are about 1
 directories underneath that I would also like indexed.  There is not a
 link between all the html files as this is a user system I am trying to
 link.   Is there is a reason it isn't indexing anything?
  
 Thank you,
 Cliff
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




XML Stuff

2001-04-06 Thread Briggs, Gary

OK, I've written a system that can take XML of a certain form and put it
into a crc-multi type of database.

Just have a look. I'm interested in all you feedback, especially
Alexander...

It's obviously not yet ready to put on the d/load bit of your website, but
I'm working on it...

http://hercules.cs.york.ac.uk/~gjb105/udm_xml.tar.gz

Gary (-;

PS It needs the perl LWP, mysql DBD, DBI, and XML::Parser modules...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




CRC32 in URL table

2001-04-04 Thread Briggs, Gary

What is this?

I'm unable to find what it is; I'm comparing some of the things in my
already existing database [generated by the indexer], and it's not the text
extract, it's not the URL itself, it's not the keywords, and it's not the
meta description.

What is it?

I'm writing an application that inserts stuff into the database based on XML
not dissimilar to the xml attatched to this message.

I've already got everything working well, but I can't work out what the
CRC32 is actually of. I assume it's used for the clone detection, but I'm
not entirely sure.

Thank-you very much
Gary (-;

PS Yes, I can search on this, which is my test data, and other pieces of XML
which are thousands of times larger. And it's fast.


 searchindex.xml 

 searchindex.xml


URL Table in database

2001-03-30 Thread Briggs, Gary

What is the CRC32 actually _of_?

I'm currently doing it as the CRC32 of the url itself... is this right?

And for reference, I've basically finished working on a system that takes in
XML and puts it straight into the database.

Thank-you very much,

Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Webboard: Use of Tags...

2001-03-23 Thread Briggs, Gary

Example:
You'are searching 4 websites.

http://server1/~gbriggs
http://server2/~gbriggs
http://server3/~acoates
http://server2/~acoates

If you want to be able to search on just ~gbriggs homepages, you give them
one tag [say, "gb"].
Then you give the ~acoates homepages another tag [say, "ac"].

That way, when you're searching, you can give "t=gb", and it'll just search
through ~gbriggs pages.

Does this help?
If you want, I can give you a more complete example.

Gary (-;

 -Original Message-
 From: Martyn [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, March 23, 2001 1:04 AM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: Use of Tags...
 
 Author: Martyn
 Email: [EMAIL PROTECTED]
 Message:
 I also do not understand the use of Tags and would appreciate some
 enlightenment.
 
 Martyn
 
 Reply: http://search.mnogo.ru/board/message.php?id=1781
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Webboard: Premature end of indexing?

2001-03-23 Thread Briggs, Gary



 -Original Message-
 From: Orjan Sandland [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, March 23, 2001 9:17 AM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: Premature end of indexing?
 
 Author: Orjan Sandland
 Email: [EMAIL PROTECTED]
 Message:
 I'm running latest mnogosearch and redhat 7 with mysql rpm installed.
 
Redhat7.

G.

RPM.

G
OK, I'll put my pettiness aside.

 I compiled mnogosearch with support for pthreads btw.
 
Good.

 Two days ago, I had about 200.000 urls indexed, with 125.000 of them being
 status OK, about 20.000 was status not modified.
 This morning, after running the indexer all night, the total count is up
 to 255.000 urls, but the Gateway Timeout count was over 150.000, and only
 60.000 had OK status.
 
Theory: there was an outage, and max_retry_errors [or whatever it's called
in the config file]was reached. the indexer then gave up.

 Yesterday I started the indexer to reindex, using -a option. 
 Why did so many get into status Gateway Timeout??
 My first thought was that there was some huge network error (most of the
 pages I index are separated from the server by the atlantic ocean :-).
 
 I can live with this (as long as they get indexed again at some point ;-),
 but I tried to force it to index the urls with Gateway Timeout. Running
 ./indexer -s 504  only works for 4 seconds, then gives me the "Done"
 message.
 
"indexer -s 504" would only have re-indexed out-of-date documents.
If, the night before, it had indexed all of the but given timeouts, they
would be marked in the database as indexed, and have to wait for the default
of 2 weeks before they get indexed again.

You should have hit it with a "indexer -a -s 504". That would have reindexed
all the 504s.

 Am I doing anything wrong? I'm quite new with this, still learning.
 
Naaah. Looks like you know what you're doing [although linux isn't the best
choice from my experience - I use solaris]

 I'm realising that I will need to learn alot, because at this point, with
 250.000 urls, I've only partially indexed 200 of the 1-2000 websites I
 intend to index
 
Hope this helps,

Gary (-;

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Webboard: libmysqlclient.so.6

2001-03-23 Thread Briggs, Gary

edit /etc/ld.so.conf and add /usr/local/mysql/lib to it
[don't forgfet to run /sbin/ldconfig]
or export LD_LIBRARY_PATH=/usr/local/mysql/lib:$LD_LIBRARY_PATH
or similar.

Gary (-;

 -Original Message-
 From: Doos [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, March 23, 2001 2:21 PM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: libmysqlclient.so.6
 
 Author: Doos
 Email: [EMAIL PROTECTED]
 Message:
 when i start indexer i get the error msg.:
 ./indexer: error in loading shared libraries: libmysqlclient.so.6: cannot
 open shared object file: No such file or directory.
 
 but i have it in: /usr/local/mysql/lib/mysql/libmysqlclient.so.6
 
 Does anyone know how to fix this?
 
 thanks
 
 Reply: http://search.mnogo.ru/board/message.php?id=1787
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Will mnoGoSearch let me...

2001-03-19 Thread Briggs, Gary

Do you want my stuff to do this?

I've wriutten some quite useful things...

Gary (-;

 -Original Message-
 From: Dustin S. [SMTP:[EMAIL PROTECTED]]
 Sent: Sunday, March 18, 2001 8:20 AM
 To:   [EMAIL PROTECTED]
 Subject:  Will mnoGoSearch let me...
 
 ...make it so visitors can add their own site to the engine?
  
  
  
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Here's some more thigns people may find of interest.

2001-03-19 Thread Briggs, Gary

 qtrackingalanysis.tar.gz 

Hmmm.
The daemon that grabs usernames off the network; I can give that to people
if they want.

And by the way. This is based around the expanded qtracking that my system
does, it's an add-on to the php frontend that I mailed to this list a while
ago.

Gary (-;

 qtrackingalanysis.tar.gz


RE: crc-multi and millions of urls

2001-03-14 Thread Briggs, Gary

I'll tell you what: here's _my_ personal experience.

I'm indexing many many gigabytes of data. Currently it's at ~20, but it will
go up in the near[ish] future. When I get a cluster to run it on, and the
correct access to the other files, etc.

I'm using crc-multi and most of my queries are coming back in under a
second.

The average is only quite high because I happen to know that if I search for
"index || perl", I absolutely trounce the system because thos're about the
two most popular words.

Anyway.

I'm using CRC-multi, and I've added a few features [eg the query trakcing,
more useful server tables, and some other stuff], and it's still fast enough
to not worry about.

I haven't done any proper stress,testing yet, because the server _cannot_ go
down. But from the behavior I'm seeing, it shouldn't be too bad.

Gary (-;

 -Original Message-
 From: Caffeinate The World [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, March 13, 2001 4:41 PM
 To:   [EMAIL PROTECTED]
 Subject:  crc-multi and millions of urls
 
 is there any reason why you can't index millions of urls using the DB
 crc-multi mode? is it speed? when i first started using mnogosearch, i
 was under the assumption that if you were to index millions of urls,
 you should use cachemode. now that i've run into several limitations of
 cachemode itself:
 
 1. limited depth for categories, 
 2. unreliable -- i've yet to have a fully indexed 
service -- seems like i've been debugging for months 
and indexer, cachelogd, splitter still core dumps, some
were fixed and new ones showed up.
 
 so now i'm back to investigating the use of sql db instead. i do like
 the speed i see in cachemode, but the unreliabililty doesn't make it
 usable.
 
 __
 Do You Yahoo!?
 Yahoo! Auctions - Buy the things you want at great prices.
 http://auctions.yahoo.com/
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Question about ServerTables

2001-03-08 Thread Briggs, Gary

Why is Active in servertable an int(11)?

Surely a bool or a smallint would be faster?

Ta,
Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: [3.1.11] indexing .cgi's?

2001-03-07 Thread Briggs, Gary

The thing is, your URL doesn't tecnically come under the disallowed set of
things.

For that particular case, try adding /cgi/ to the Disallow path. [I'm
assuming www.pg.com/cgi is a place that you keep CGI script on your web
server]

Hope this helps,
Gary (-;

 -Original Message-
 From: The Hermit Hacker [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, March 06, 2001 4:58 PM
 To:   [EMAIL PROTECTED]
 Subject:  [3.1.11] indexing .cgi's?
 
 
 %sbin/indexer -h
 
 indexer from mnogosearch-3.1.11/PgSQL
 http://search.mnogo.ru (C) 1998-2000, LavTech Corp.
 
 I have it disallow'd in my config file:
 
 # Exclude cgi-bin and non-parsed-headers
 Disallow /cgi-bin/ \.cgi /nph \?
 
 yet, if I run the indexer, its indexing a *load* of cgi URLs, of the
 format:
 
 Indexer[83032]: [1]
 http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/configure.in.diff?r1=1.59r
 2=1.45sortby=revonly_with_tag=MAIN
 
 why?  and how can I get it to stop?
 
 thanks ...
 
 
 Marc G. Fournier   ICQ#7615664   IRC Nick:
 Scrappy
 Systems Administrator @ hub.org
 primary: [EMAIL PROTECTED]   secondary:
 scrappy@{freebsd|postgresql}.org
 
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




More interesting things I've done with PHP frontend

2001-03-07 Thread Briggs, Gary

This one gives you the option of returning XML.

The way to use it is this:

if you use it as a regular php script 
[eg http://lonwebhost20:8080/udm/blank.php ]
It does what you'd expect; returns results that're formatted based on the
template blank.htm

If, on the other hand, you run it with
http://lonwebhost20:8080/udm/blank.php/results.xml
or simimlar, it returns XML.

When I say "or similar", it actually returns XML for ANYTHING where you
finish it with a ".xml" extension [case insensitive], so
http://lonwebhost20:8080/udm/blank.php/www.linuxgames.com.xml would also
work just fine (=

From my experience, I'd recommend [if you're using IE and actually want to
get XML back] that you only do this as an HTTP POST, because IE can be a bit
annoying sometimes...

I'd love to hear from anyone who actually finds this useful, especially if
they have anything to add.

Just in case anyone's interested, the actual use for this is that I'm
working on a system whereby people in this company can go to a web page, add
their information, and then have a search engine on their page with almost
no effort by them.

The blank.htm version is so that if they just want to put it in a frame and
not have to do any formatting, etc, then it works just fine.

Gary (-;

PS Mozilla/NS6 doesn't like it because the ?xml version="1.0"
standalone="yes"? line isn't necessarily on the first line. This is a bug
in NS/moz, and you probably want to do some server-side XSL translation,
anyway.



 blank.tar.gz 

 blank.tar.gz


Is this, by any chance, a Bad Thing (TM)?

2001-03-05 Thread Briggs, Gary

010305  9:28:18  Aborted connection 394 to db: 'udmsearch' user: 'udm' host:
`localhost' (Got a packet bigger than 'max_allowed_packet')

From the mySQL error log.

Ta,
Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Minimum permissions on MySQL database?

2001-03-05 Thread Briggs, Gary

I'm trying to lock down some of my mySQL tables, since I accidentally
deleted on of the tables the other day.

What're the minimum permissions I need to set to make searching possible?
I'm hoping to have two users:
udm [which will be used by the indexer process; password breach could be
used to delete everything], and
udm_ro [which will be used by the searching process; password breach should
not be able to do anything at all, except maybe a DOS which I oculd fix
really quickly]

I see that the read-only user does need create and drop permissions, since
it uses a temporary table AFAICT.

Is there anything else?

Thank-you very much,
Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Webboard: Installation

2001-03-02 Thread Briggs, Gary

if a file is a .tar.gz file [or .tgz], you can extract it with:
gzip -cd the-file.tar.gz | tar -xf -

Gary (-;

 -Original Message-
 From: Mike Davis [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, March 02, 2001 10:29 AM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: Installation
 
 Author: Mike Davis
 Email: [EMAIL PROTECTED]
 Message:
 mnoGoSearch comes with great recommendations. I have installed a fair
 number of scripts but am unfamiliar with the process of unpacking and
 decompressing tar files. I have not used "make" either. I have managed to
 use winzip to uncompress the files and unpack them onto MY computer, but
 what do I do now to put them onto my web site? Perhaps the better question
 is, once I've taken the original tar.gz file and uploaded it to my web
 site, how do I unpack and install it there? My web host does not provide
 access to telnet. Do I need this in order to install these kinds of
 packages? If you know of a link where I can learn how to do this
 (preferably for dummies!), I'll be glad to learn there. I think you're
 search engine would be a real asset. Thanks, Mike
 
 Reply: http://search.mnogo.ru/board/message.php?id=1595
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Searching multiple tags

2001-03-01 Thread Briggs, Gary

I'm using the PHP frontend at the moment.

And I shall actually continue using it, especially when the PHP 4.0.5 comes
out (=

Gary (-;

 -Original Message-
 From: Laurent LEVIER [SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, February 28, 2001 6:38 PM
 To:   [EMAIL PROTECTED]; Briggs, Gary; '[EMAIL PROTECTED]'
 Subject:  Re: Searching multiple tags
 
 When searching, the search tool is restricting tags within the select.
 
 So for sure it is possible, but what are you using as searching tool ? CGI
 ? PHP ? PERL ?
 
 At 17:49 28/02/2001 +, Briggs, Gary wrote:
 Is there any way that I can search from multiple tags in one search?
 
 Gary (-;
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
 Laurent LEVIER
 IT Systems  Networks, Unix System Engineer
 Security Specialist
 
 Argosnet Security Server : http://www.Argosnet.com
 "Le Veilleur Technologique", "The Technology Watcher"
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Searching multiple tags

2001-03-01 Thread Briggs, Gary

Hmmm.
The php one can't .

Gary (-;

 -Original Message-
 From: Alexander Barkov [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, March 01, 2001 5:13 AM
 To:   [EMAIL PROTECTED]; Briggs, Gary
 Subject:  Re: Searching multiple tags
 
 "Briggs, Gary" wrote:
  
  Is there any way that I can search from multiple tags in one search?
  
  Gary (-;
 
 
 Just submit several t=XXX  pairs from HTML form.
 At least search.cgi can do it.
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Crosswords and servertables

2001-02-28 Thread Briggs, Gary

For large volumes of data, which of these two is faster?

Gary (-;

 -Original Message-
 From: Alexander Barkov [SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, February 28, 2001 9:52 AM
 To:   [EMAIL PROTECTED]; Briggs, Gary
 Subject:  Re: Crosswords and servertables
 
 This feature works in single and multi modes only.
 Thanks for noticing this, we'll add this into documentation.
 
 
 
 "Briggs, Gary" wrote:
  
  having looked some more, the docs say that it is "not supported in
 built-in
  database and Cachemode".
  
  No matter what I do, I can't get it to work at all in crc-multi mode on
 my
  database.
  
  Help?
  
  Thank-you very much,
  Gary Briggs
  
   -Original Message-----
   From: Briggs, Gary [SMTP:[EMAIL PROTECTED]]
   Sent: Tuesday, February 27, 2001 11:56 AM
   To:   '[EMAIL PROTECTED]'
   Subject:  Crosswords and servertables
  
   Is there any way that I can control what the weight of crosswords is
 when
   all my servers are pulled out of a table on the database?
  
   Thank-you very much,
   Gary (-;
   ___
   If you want to unsubscribe send "unsubscribe general"
   to [EMAIL PROTECTED]
  
  
  
  
 --
   
   This message is intended only for the personal and confidential use of
 the
   designated recipient(s) named above.  If you are not the intended
   recipient of this message you are hereby notified that any review,
   dissemination, distribution or copying of this message is strictly
   prohibited.  This communication is for information purposes only and
   should not be regarded as an offer to sell or as a solicitation of an
   offer to buy any financial product, an official confirmation of any
   transaction, or as an official statement of Lehman Brothers Inc.
 Email
   transmission cannot be guaranteed to be secure or error-free.
 Therefore,
   we do not represent that this information is complete or accurate and
 it
   should not be relied upon as such.  All information is subject to
 change
   without notice.
  
  ___
  If you want to unsubscribe send "unsubscribe general"
  to [EMAIL PROTECTED]
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Crosswords and servertables

2001-02-28 Thread Briggs, Gary

I think that's because it doesn't read anything from the ServerTable about
crossweight by default, and then sets the crossweight to "0".

But that's just a guess. I've got it inserting stuff into the crossdict
table by editing sql.c and making it read in an extra column from the
ServerTable called "crossweight" that I've put in ["alter table server add
crossweight int default 32", IIRC]

Two lines of code, I can brew a diff if you want, but it's really not worth
the effort [especially since I did a fugly hack that means the number 32
appeard in the middle of a series at one point]

Gary (-;

 -Original Message-
 From: Alexander Barkov [SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, February 28, 2001 9:57 AM
 To:   [EMAIL PROTECTED]; Briggs, Gary
 Subject:  Re: Crosswords and servertables
 
 Hi!
 
 It seems to be a bug. CrossWeight is not working when 
 
 servers are loaded from SQL table using ServerTable command.
 
 Thanks for reporting!
 
 
 "Briggs, Gary" wrote:
  
  having looked some more, the docs say that it is "not supported in
 built-in
  database and Cachemode".
  
  No matter what I do, I can't get it to work at all in crc-multi mode on
 my
  database.
  
  Help?
  
  Thank-you very much,
  Gary Briggs
  
   -Original Message-
   From: Briggs, Gary [SMTP:[EMAIL PROTECTED]]
   Sent: Tuesday, February 27, 2001 11:56 AM
   To:   '[EMAIL PROTECTED]'
   Subject:  Crosswords and servertables
  
   Is there any way that I can control what the weight of crosswords is
 when
   all my servers are pulled out of a table on the database?
  
   Thank-you very much,
   Gary (-;
   ___
   If you want to unsubscribe send "unsubscribe general"
   to [EMAIL PROTECTED]
  
  
  
  
 --
   
   This message is intended only for the personal and confidential use of
 the
   designated recipient(s) named above.  If you are not the intended
   recipient of this message you are hereby notified that any review,
   dissemination, distribution or copying of this message is strictly
   prohibited.  This communication is for information purposes only and
   should not be regarded as an offer to sell or as a solicitation of an
   offer to buy any financial product, an official confirmation of any
   transaction, or as an official statement of Lehman Brothers Inc.
 Email
   transmission cannot be guaranteed to be secure or error-free.
 Therefore,
   we do not represent that this information is complete or accurate and
 it
   should not be relied upon as such.  All information is subject to
 change
   without notice.
  
  ___
  If you want to unsubscribe send "unsubscribe general"
  to [EMAIL PROTECTED]
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Crosswords and servertables

2001-02-28 Thread Briggs, Gary

Why, thank-you

Want a well-scabby patch? (=

Gary (-;

 -Original Message-
 From: Alexander Barkov [SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, February 28, 2001 10:21 AM
 To:   Briggs, Gary
 Cc:   [EMAIL PROTECTED]
 Subject:  Re: Crosswords and servertables
 
 You are right!
 
 "Briggs, Gary" wrote:
  
  I think that's because it doesn't read anything from the ServerTable
 about
  crossweight by default, and then sets the crossweight to "0".
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Searching multiple tags

2001-02-28 Thread Briggs, Gary

Is there any way that I can search from multiple tags in one search?

Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Crosswords and servertables

2001-02-27 Thread Briggs, Gary

Is there any way that I can control what the weight of crosswords is when
all my servers are pulled out of a table on the database?

Thank-you very much,
Gary (-;
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Crosswords and servertables

2001-02-27 Thread Briggs, Gary

OK, now I feel stupid.
The PHP front end doesn't support it, right?

Does anyone know when it's likely to be supported in this frontend?

Gary (-;

 -Original Message-
 From: Briggs, Gary 
 Sent: Tuesday, February 27, 2001 2:20 PM
 To:   '[EMAIL PROTECTED]'; Briggs, Gary
 Subject:  RE: Crosswords and servertables
 
 having looked some more, the docs say that it is "not supported in
 built-in database and Cachemode". 
 
 No matter what I do, I can't get it to work at all in crc-multi mode on my
 database.
 
 Help?
 
 Thank-you very much,
 Gary Briggs
 
 -Original Message-----
 From: Briggs, Gary [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, February 27, 2001 11:56 AM
 To:   '[EMAIL PROTECTED]'
 Subject:  Crosswords and servertables
 
 Is there any way that I can control what the weight of crosswords is when
 all my servers are pulled out of a table on the database?
 
 Thank-you very much,
 Gary (-;
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
 
 
 --
 
 This message is intended only for the personal and confidential use of the
 designated recipient(s) named above.  If you are not the intended
 recipient of this message you are hereby notified that any review,
 dissemination, distribution or copying of this message is strictly
 prohibited.  This communication is for information purposes only and
 should not be regarded as an offer to sell or as a solicitation of an
 offer to buy any financial product, an official confirmation of any
 transaction, or as an official statement of Lehman Brothers Inc.  Email
 transmission cannot be guaranteed to be secure or error-free.  Therefore,
 we do not represent that this information is complete or accurate and it
 should not be relied upon as such.  All information is subject to change
 without notice.
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Webboard: Segmentation Fault, core dump (gdb report included)

2001-02-26 Thread Briggs, Gary

OK, It still breaks:


[lonwebhost20:/opt/udmsearch/sbin/]$ pwd
/opt/udmsearch/sbin
[lonwebhost20:/opt/udmsearch/sbin/]$ cat ./chunkystuff 
#!/bin/sh
LD_LIBRARY_PATH=/opt/mySQL/lib/mysql:$LD_LIBRARY_PATH; export
LD_LIBRARY_PATH
exec ./indexer $*

[lonwebhost20:/opt/udmsearch/sbin/]$ ./chunkystuff 
indexer from mnogosearch-3.1.11/MySQL started with
'/opt/udmsearch/etc/indexer.conf'
Segmentation Fault (core dumped)
[lonwebhost20:/opt/udmsearch/sbin/]$ gdb --core=./core indexer
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-sun-solaris2.6), Copyright 1996 Free Software Foundation,
Inc...(no debugging symbols found)...
Core was generated by `./indexer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/udmsearch/lib/libudmsearch-3.1.so...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libxnet.so.1...(no debugging symbols
found)...done.
Reading symbols from /opt/mySQL/lib/mysql/libmysqlclient.so.10...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/libc.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libm.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols
found)...done.
Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/nss_files.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols
found)...done.
#0  0xef5a4734 in strlen ()
(gdb) bt
#0  0xef5a4734 in strlen ()
#1  0xef5da62c in _doprnt ()
#2  0xef5e3804 in sprintf ()
#3  0xef69fd5c in UdmAddURL ()
#4  0xef6b8850 in UdmStoreHrefs ()
#5  0xef68ab3c in UdmIndexNextURL ()
#6  0x12460 in thread_main ()
#7  0x1387c in main ()
(gdb) frame 3
#3  0xef69fd5c in UdmAddURL ()
(gdb) where
#0  0xef5a4734 in strlen ()
#1  0xef5da62c in _doprnt ()
#2  0xef5e3804 in sprintf ()
#3  0xef69fd5c in UdmAddURL ()
#4  0xef6b8850 in UdmStoreHrefs ()
#5  0xef68ab3c in UdmIndexNextURL ()
#6  0x12460 in thread_main ()
#7  0x1387c in main ()
(gdb)
[lonwebhost20:/opt/udmsearch/sbin/]$ 

I made it with:

[lonwebhost20:/opt/udmsearch/src/]$ cat ./makeconfig 
#!/bin/sh

CC="/opt/SUNWspro/bin/cc";export CC
#CFLAGS="-fast -g";export CFLAGS
CFLAGS="-g";export CFLAGS

CXX="/opt/SUNWspro/bin/CC";export CXX

./configure --prefix=/opt/udmsearch --with-mysql=/opt/mySQL --disable-syslog
--disable-mp3 --disable-news --enable-shared

Hope this helps, [and that you can help me!]
Gary (-;

 -Original Message-
 From: Alexander Barkov [SMTP:[EMAIL PROTECTED]]
 Sent: Monday, February 26, 2001 11:06 AM
 To:   [EMAIL PROTECTED]
 Subject:  Webboard: Segmentation Fault, core dump (gdb report
 included)
 
 Author: Alexander Barkov
 Email: [EMAIL PROTECTED]
 Message:
 You have very strange place of crash :-(  Probably compiling with
 -g  cc flag will produce backtrace with more information. To compile
 with -g flag run 
 
   export CFLAGS=-g
   ./configure ..
   make
 
 
 then run new indexer and check backtrace after crash
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1517
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Problem with mnogosearch unpack

2001-02-25 Thread Briggs, Gary

It's because you've got a .tar.gz file, and your version of tar [are you
using solaris, by any chance?] doesn't understand gzip compression. The way
to decompress .tar.gz files that works an almost any un*x platform is:
gzip -cd something.tar.gz | tar -xf -
Although be warned: soalris tar has a HUGE problem, and for a lot of things
you should use the gnu one instead [and if you're using the gnu one, then
the -z option will work anyway]. Usually it's installed someplace like
/opt/gnu/bin

Hope this helps
Gary (-;



 -Original Message-
 From: PNTCD [SMTP:[EMAIL PROTECTED]]
 Sent: Sunday, February 25, 2001 1:19 PM
 To:   [EMAIL PROTECTED]
 Subject:  Problem with mnogosearch unpack
 
 To unpack UDM I had use 
 tar -zxf udmsearch-3.1.3.tar.gz 
 but the server respond: 
 tar: z: unknown option 
 
 then I'd try 
 tar -xf udmsearch-3.1.3.tar.gz 
 
 the server: 
 tar: directory checksum error 
 
 Can anybody help me!!! 
 
 Thank you! 
 
 Claudiu Cristea
 [EMAIL PROTECTED]
 
 
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




RE: Segmentation Fault core dump (gdb report include in this mail)

2001-02-23 Thread Briggs, Gary

I'm getting this too.
Solaris 2.6
MySQL 3.23.33
Mnogo 3.1.11
   compiled with
CC="/opt/SUNWspro/bin/cc";export CC
CFLAGS="-fast";export CFLAGS
CXX="/opt/SUNWspro/bin/CC";export CXX
./configure --cache-file=/dev/null --prefix=/opt/udmsearch \
   --with-mysql=/opt/mySQL --disable-syslog --disable-mp3 \
   --disable-news --enable-shared 

It's in CRC-multi mode.
Here's a BackTrace:


[lonwebhost20:/opt/udmsearch/sbin/]$ pwd
/opt/udmsearch/sbin

[lonwebhost20:/opt/udmsearch/sbin/]$ cat ./chunkystuff 

#!/bin/sh
export LD_LIBRARY_PATH=/opt/mySQL/lib/mysql:$LD_LIBRARY_PATH
exec ./indexer $*

[lonwebhost20:/opt/udmsearch/sbin/]$ ./chunkystuff 
indexer from mnogosearch-3.1.11/MySQL started with
'/opt/udmsearch/etc/indexer.conf'
Segmentation Fault (core dumped)

[lonwebhost20:/opt/udmsearch/sbin/]$ echo "bt" | gdb --core=./core indexer
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-sun-solaris2.6), Copyright 1996 Free Software Foundation,
Inc...(no debugging symbols found)...
Core was generated by `./indexer'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/udmsearch/lib/libudmsearch-3.1.so...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/libsocket.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libxnet.so.1...(no debugging symbols
found)...done.
Reading symbols from /opt/mySQL/lib/mysql/libmysqlclient.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/libc.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libnsl.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libm.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libdl.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libmp.so.2...(no debugging symbols
found)...done.
Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/nss_files.so.1...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/nss_nis.so.1...(no debugging symbols
found)...done.
#0  0xef5a4734 in strlen ()
(gdb) #0  0xef5a4734 in strlen ()
#1  0xef5da62c in _doprnt ()
#2  0xef5e3804 in sprintf ()
#3  0xef71374c in UdmAddURL ()
#4  0xef71f5c8 in UdmStoreHrefs ()
#5  0xef709a4c in UdmIndexNextURL ()
#6  0x113f4 in thread_main ()
#7  0x11dbc in main ()
(gdb)
[lonwebhost20:/opt/udmsearch/sbin/]$ 


Thank-you very much,
Gary Briggs


 -Original Message-
 From: filip.sergeys [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, February 22, 2001 7:14 PM
 To:   [EMAIL PROTECTED]
 Subject:  Segmentation Fault core dump (gdb report include in this
 mail)
 
 Hi,
 When trying to start the indexer I get a core dump.
 The complete installation is new
 Linux : 6.2 kernel 2.2.14
 Mysql : 3.23.33
 Mnogosearch : 3.1.11 
   compiled with --enable-syslog=LOG_LOCAL6
   --enable-linux-pthreads --with-mysql
 
 The dbg report:
 
 GNU gdb 19991004
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you
 are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for
 details.
 This GDB was configured as "i386-redhat-linux"...
 Core was generated by `./indexer -v 3'.
 Program terminated with signal 11, Segmentation fault.
 Reading symbols from /lib/libpthread.so.0...done.
 Reading symbols from /lib/libm.so.6...done.
 Reading symbols from /usr/lib/libz.so.1...done.
 Reading symbols from /lib/libc.so.6...done.
 Reading symbols from /lib/ld-linux.so.2...done.
 Reading symbols from /lib/libnss_files.so.2...done.
 #0  0x400e9c61 in __libc_nanosleep () from /lib/libc.so.6
 (gdb) backtrace
 #0  0x400e9c61 in __libc_nanosleep () from /lib/libc.so.6
 #1  0x400e9bed in __sleep (seconds=1) at
 ../sysdeps/unix/sysv/linux/sleep.c:82
 #2  0x804b0ec in main (argc=3, argv=0xbd70) at main.c:593
 (gdb) q
 
 Hope you can help me. Thanks in advance
 
 FS
 ___
 If you want to unsubscribe send "unsubscribe general"
 to [EMAIL PROTECTED]
 
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Direct Database Injection And Some probably stupid questions

2001-02-16 Thread Briggs, Gary

OK, so I've been reading the source code, and I'm having real trouble with
what some parts of the url table are for...

Why bother to compute a crc32 for the urls? do I need it? [I'm currently
using crc-multi db mode on mySQL]

It seems to be the primary key, but then what's the point of rec_id? Seeing
as how rec_id autoincrements, surely it's actually more unique than a crc,
which by your own analysis is not unique for about 250 in any given
1,600,000 urls?

Why is there a keywords field? I thought that the search worked by:

0)  Compute the crc's of the keywords we're looking for
1)  looking up the crc's we're searching for from the dict tables
2)  using url_id as a foreign key, look up the relevant url for rec_id key
in the url table
3)  also look up all the other information from the url table, such as
description, title, text

Surely this doesn't need a keyword field, since we're searching other tables
based on keywords anyway?


What's the difference between txt and description? I assume that Description
is the descrption if there's a description meta-tag, and txt is an extract
of the text.
Why is there both? Surely a unified field that could contain description, if
it's there, or an extract, if there's not?


Thank-you very much,
Gary
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]