UdmSearch: Webboard: What proxy software on a freebsd machine?

2000-11-10 Thread Alexander Barkov

Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
 I'm just wondering what proxy software I should use in conjunction with udmSearch 
(for ftpsearch.conf). Which is known to work best? 
 Thanks
 Ari
 

It is tested with squid. You may also take 3.1.x
version, it has a native FTP support.


Reply: http://search.mnogo.ru/board/message.php?id=720

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Server tables

2000-11-10 Thread Briggs, Gary

I'm working a few web pages whereby users can put information into a
webpage, and their website is added to my search engine.

It's all fine, exacpt for the fact that this is open to some forms of abuse.
Is it safe to just add a "added by..." column to the server table? Or does
this break anything...?

Thank-you very much,
Gary (-;
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: How many indexer ?

2000-11-10 Thread FL

Author: FL
Email: [EMAIL PROTECTED]
Message:
Hi !

Udm is an amazing good tool.
I want now to index about 100.000 Urls.
The machine is a linux box, with a PII-300 and 256 Mo of RAM. The conection is a 10Mbs 
ethernet card (Via-Rhine). 

Question : How many instance of indexer should I launch ? How should I find the best 
solution ?

Thanks for help.

Francois



Reply: http://search.mnogo.ru/board/message.php?id=721

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Server tables

2000-11-10 Thread Alexander Barkov

"Briggs, Gary" wrote:
 
 I'm working a few web pages whereby users can put information into a
 webpage, and their website is added to my search engine.
 
 It's all fine, exacpt for the fact that this is open to some forms of abuse.
 Is it safe to just add a "added by..." column to the server table? Or does
 this break anything...?
 

It should be safe.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Yet more questions

2000-11-10 Thread Briggs, Gary

I really hate just asking questions this much, but it's kinda important; I'm
wondering if anyone can help me:

I want to be able to use the basic_auth abilities of udmsearch. My problem
is that unless a user is able to authenticate themselves to the same site as
the basic_auth was necessary for, I don't even want to serve up existential
information about any of that site.

eg, if one site being indexed has information about making a new top-secret
car, I don't want unauthorised users to even know about the existence of
that site. On the other hand, If the user IS able to authenticate themselves
against that site, I want them to be able to search and have the results
from that site.

I'm using the most recent versions of the php front end and 3.1.8 and the
mySQL backend, with crc-multi as the storage method.

The other thing, that I haven't heard from anyone about yet, is another
security issue. I want two users in my mySQL database for this search
engine. One, used by the indexer, that is allowed to put stuff into the
database/change stuff already in it, and one, used by the front end, that's
not allowed to touch these databases. I can't work out a good way of doing
it, seeing as the php front end needs to be able to create and drop randomly
named databases. Anyone?

Security is a very important thing to me...

Thank-you very much,
Gary (-;
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




RE: UdmSearch: Webboard: How many indexer ?

2000-11-10 Thread Briggs, Gary

Empirical testing. It kinda depends on a few things, including the storage
method and back end you're using.

Just try it; between 4 and 8, probably.

Gary (-;

 -Original Message-
 From: FL [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, November 10, 2000 10:27 AM
 To:   [EMAIL PROTECTED]
 Subject:  UdmSearch: Webboard: How many indexer ?
 
 Author: FL
 Email: [EMAIL PROTECTED]
 Message:
 Hi !
 
 Udm is an amazing good tool.
 I want now to index about 100.000 Urls.
 The machine is a linux box, with a PII-300 and 256 Mo of RAM. The
 conection is a 10Mbs ethernet card (Via-Rhine). 
 
 Question : How many instance of indexer should I launch ? How should I
 find the best solution ?
 
 Thanks for help.
 
 Francois
 
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=721
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Yet more questions

2000-11-10 Thread Alexander Barkov

"Briggs, Gary" wrote:
 
 I really hate just asking questions this much, but it's kinda important; I'm
 wondering if anyone can help me:
 
 I want to be able to use the basic_auth abilities of udmsearch. My problem
 is that unless a user is able to authenticate themselves to the same site as
 the basic_auth was necessary for, I don't even want to serve up existential
 information about any of that site.
 
 eg, if one site being indexed has information about making a new top-secret
 car, I don't want unauthorised users to even know about the existence of
 that site. On the other hand, If the user IS able to authenticate themselves
 against that site, I want them to be able to search and have the results
 from that site.


May be "tag" feature?
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Failing to index titles (udmsearch 3.0.23)

2000-11-10 Thread Chi

Author: Chi
Email: [EMAIL PROTECTED]
Message:
Indexer seems to not be able to index title tags coming from a cgi.

I've indexed http://www.beautycommercial.com which indexer.conf set to accept cgis, 
nphs, ?s and it spiders thru the site fine.
Checking the mysql database shows that most of the cgis (.mxs) failed to index the 
title even though a title exists.



Reply: http://search.mnogo.ru/board/message.php?id=723

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[2]: UdmSearch: Webboard: Performance: cache db

2000-11-10 Thread Sergey Kartashoff

Hi!

Friday, November 10, 2000, 1:36:44 PM, you wrote:

PS As for debugging I went to my unpacking directory (where I keep a virgin
PS copy of the software) and made the change you mentioned to sql.c  Then I
PS recompiled and ran search.cgi by hand... all I got was a copy of my webpage
PS outputted... no real debug data

It prints debug data on stderr stream. You can see it by redirecting
stdout to file or /dev/null:

export QUERY_STRING=word1
./search.cgi /dev/null


-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[4]: UdmSearch: Webboard: Performance: cache db

2000-11-10 Thread Sergey Kartashoff

Hi!

Friday, November 10, 2000, 1:50:53 PM, you wrote:

PS When I run the export command (which I know works on Linux) I get an error..
PS this machine is a FreeBSD box... can I just use SET instead?

export - is the bash and ksh command.
For csh ot tcsh (if i am not mistalen) you should use setenv.
Please read a man page for your command shell about setting ans
exporting environment variables.

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[2]: UdmSearch: UdmSearch PHP Frontend - is it possible to search by the URL, not by keyword listed in dict?

2000-11-10 Thread Sergey Kartashoff

Hi!

Friday, November 10, 2000, 5:32:51 PM, you wrote:

AS It is indexing the pages fine, it is a problem with the PHP frontend. Isn't
AS allow/disallow for the indexer only?

Dict table filled only by indexer.
If you said that is not full than you should edit your indexer.conf.

AS  is it possible to search by the URL, not by keyword listed in "dict"?
AS It's
AS a file database, and the "doct" table doesn't contain over half of the
AS file
AS names, so although I know a link to a file does exist the search engine
AS doesn't pick it up. Any ideas how to fix this?

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




RE: Re[2]: UdmSearch: Webboard: Performance: cache db

2000-11-10 Thread Paul Stewart

As another piece of info...

I tried using gmake instead of make and it made no difference in performance
or in the actual debug output

Hope these little pieces of information help...:)

Paul Stewart
Nexicom Inc.
http://www.nexicom.net

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Sergey Kartashoff
Sent: Friday, November 10, 2000 6:49 AM
To: Paul Stewart
Cc: [EMAIL PROTECTED]
Subject: Re[2]: UdmSearch: Webboard: Performance: cache db


Hi!

Friday, November 10, 2000, 1:36:44 PM, you wrote:

PS As for debugging I went to my unpacking directory (where I keep a
virgin
PS copy of the software) and made the change you mentioned to sql.c  Then I
PS recompiled and ran search.cgi by hand... all I got was a copy of my
webpage
PS outputted... no real debug data

It prints debug data on stderr stream. You can see it by redirecting
stdout to file or /dev/null:

export QUERY_STRING=word1
./search.cgi /dev/null


--
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Links problem. identified it, but I dont get the code

2000-11-10 Thread Mario Lang

Hi.

I just identified my problem with symlinks on ftp protocol.
This code in ftp.c is most likely causing the problem:
case 'l':
ch = strstr (fname, " - ");
if (!ch)
break;
ch +=4;
if (ch[0] == '.'){

len = len_h+len_p+strlen(ch);
udm_snprintf(buf_out+cur_len, len+1, "a 
href=\"ftp://%s%s%s/\"/a",
connp-hostname, path, ch);
}else{
len = len_h+strlen(ch);
udm_snprintf(buf_out+cur_len, len+1, "a 
href=\"ftp://%s%s/\"/a",
connp-hostname, ch);
}

...

What is the reason for checking for links to /^\./ files?
On our ftp, we use links to files starting with a . to avoid listing
the target paths in a normal ls.

Here is an example listing:
lrwxrwxrwx   1 root 0  51 Oct 13 15:26 solaris - 
.mirror-sites/ftp.tuwien.ac.at/sun/solaris/packages/
drwxr-xr-x   2 705  1000 4096 Jun 22 14:15 ssl/
drwxr-xr-x   2 705  1000 4096 Nov  1 10:28 suse-linux/
lrwxrwxrwx   1 root 0  40 Oct 13 15:26 tex - 
.mirror-sites/ftp.gwdg.de/pub/misc2/ctan/
drwxr-xr-x   3 705  1000 4096 Nov  6 23:02 windows/


Does anyone understand why this check is done. And what
happens when it finds a link with a target of ^\..*

Regards,
  Mario Lang

Technical University Graz   mailto:[EMAIL PROTECTED]
Department Computing Services   http://www.cis.tu-graz.ac.at/zid/lang/
Phone: +43 (0) 316 / 873 - 8508   ICQ: 69372257

UFOs are for real: the Air Force doesn't exist.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[2]: UdmSearch: PgSQL: DELETE INDEX url_url;

2000-11-10 Thread The Hermit Hacker

On Fri, 10 Nov 2000, Sergey Kartashoff wrote:

 Hi!
 
 Thursday, November 09, 2000, 9:55:11 PM, you wrote:
 
 THH On Thu, 9 Nov 2000, Alexander Barkov wrote:
 
  Don't forget to recreate this index before starting indexer.
  As far as url_url index is UNIQUE this does not allow indexer to add
  the same link several time. If you remove index, the same documents
  might be added several times.
 
 THH can this unique index not be based on the crc32 value instead?  that might
 THH explain why I'm up to 140K docs when I was only expecting 91k :)
 
 no, index on crc32 cannot be unique, because of it will block adding
 site mirrors into url table.

okay, can someone make the following changes to the source code, so that
the search avoids using the index ... this will at least give a temporary
fix until our LIKE optimizer is fixed:

SELECT ndict.url_id,ndict.intag 
  FROM ndict,url 
 WHERE ndict.word_id=1971739852 
   AND url.rec_id=ndict.url_id  
   AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% ');

Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: I got a new problem !!

2000-11-10 Thread Arturo


Hello,

THE FIRST QUESTION:

Now after indexing my document with this command

 ./indexer -i -u http://132.248.104.3/home/httpd/html/manual/artus1.html

I have a problem, the Udm Statistics (using indexer -S) displays the
following:

StatusExpired  Total
   -
 0  2  2 Not indexed yet
   404  0  1 Not found
   -
 Total  2  3

and the "artus1.html" is the file not found.  When I see the record of
this file in Postgresql the contents (text) of this file is not recorded,
just the URL field.

I know that the status 404 means  "Not found" (there was references to
URLs that do not exist), but I don't understand why this happens.  In
fact, "artus1.html" is a copy of a file (a HTML system file) that is
correctly indexed.

I have proved with some other html files but the results are the same.

I don't know whether some parameter(s) of the "indexer.conf" must be
changed, if so which should I modify or verify ?

--

THE SECOND QUESTION:

Udmsearch is able to read XML files ? the question arises because we
pretend to add some fields that html DOES NOT allow to add, so that we
could be able to do a search over some selected fields.

For instance, suppose I have two documents in which Mr.White appears.
In the former appears as a a member of a Comunity.  In the second as an
Author of a technical paper.

When I search Mr.White as an Author (e.g. the query "Author:Mr.White")I
don't want to fetch the other document in which Mr. White appears as a
member of a Comunity.

This is why we pretend to use a language that allow us to declare specific
tags and be able to get documents by means of specific search.

To add this fields in the database(s) has no problem.  The thing is that
the search mechanism must be changed, isn'it ?

Probably there is an easier solution to this problem ... can you give me
your opinon or some advice ? 

Thanks a lot for your help.

Regards, Ing. Arturo Pulido

Centro Universitario de Investigaciones Bibliotecologicas
Universidad Nacional Autonoma de Mexico.

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Help with Sound file Search!

2000-11-10 Thread Adrift

Author: Adrift
Email: [EMAIL PROTECTED]
Message:
Hello Everyone!
First off, I want to thank the UDMSearch guys for their killer app!
I've compiled UDMSearch with the --enable-mp3 option, and everything works. Here's my 
problem. 

I want to create a spider that only cataloges sound files (.mp3, .wav, .ra, .ram, 
.vqf, .au, etc.). I've tried to do this through ALLOW and DISALLOW, but for some 
reason the spider won't add records for any files! It only indexes the address of the 
HTML file, but doesn not index any of the files linked to within the HTML file! 

Can someone please send me a indexer.conf file which will catalog the links contained 
within a HTML file to sound files? 

I'm not sure if this is so clear, so I'll include a example. Here is the HTML file, 
let's call it test.html

This is my sound 
a href="http://www.url1.com/sound1.wav" Download /a

I want it set up so that http://www.url1.com/sound1.wav is added to the database, not 
just that html file/

Thanks
Adrift



Reply: http://search.mnogo.ru/board/message.php?id=725

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: error 1045

2000-11-10 Thread Pradeep Misra

Folks. I hope you can help me out. I am trying to
install udmsearch with mysql backend. I believe I
have followed the instructions that come with the
package as given in INSTALL file.

Clearly I am missing something as I get the following
error message on doing a search

An error occured! 
#1045: Access denied for user: 'foo@localhost' (Using password:
YES)

If it is any use, the search.cgi is at:
http://www7.imahal.com/cgi-bin/sitesearch/search.cgi

Any help would be appreciated.

Best regards,
-Pradeep


Pradeep
Misra(T)
937 775 5062
Electrical Engineering
Dept(F) 937 775 5009
Wright State
Univ[EMAIL PROTECTED]
Dayton, OH
45435[EMAIL PROTECTED]



UdmSearch: Get a FREE $1000 Satellite T.V. System

2000-11-10 Thread Gross

FREE Satellite T.V. System and FREE Installation

For a limited time we'll give you this top of the line Digital
Satellite System for FREE! We'll even include Free installation.

Enjoy over 500 Channels of crystal clear digital picture and
cd stereo sound on your FREE Satellite TV System.  Why pay
over $900 retail for these items, when we're giving you this
satellite package for free.


Call 888-514-6881 to be Guaranteed Your FREE Satellite Today


This Innovative 20" Satellite includes a stereo receiver and
an infrared remote.  With this FREE offer you will have both
Interactive Television Capability and an On Screen Program Guide.

This limited time FREE offer is much less than the monthly cost
of cable tv. All you have to do is call us to arrange delivery.   
If you call today, we'll throw in a second receiver for your
second T.V. free.


Call 888-514-6881 to Begin Surfing through 500 Channels Today!


To be removed send email to [EMAIL PROTECTED]

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: PgSQL: DELETE INDEX url_url;

2000-11-10 Thread Alexander Barkov

The Hermit Hacker wrote:
 
 okay, can someone make the following changes to the source code, so that
 the search avoids using the index ... this will at least give a temporary
 fix until our LIKE optimizer is fixed:
 
 SELECT ndict.url_id,ndict.intag
   FROM ndict,url
  WHERE ndict.word_id=1971739852
AND url.rec_id=ndict.url_id
AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% ');


I don't think that this is the best solution to fix search for buggy 
LIKE optimizer then to fix search back for fixed optimizer.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: External PDF and doc parsers

2000-11-10 Thread Alexander Barkov

Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
 Where I get external parsers for pdf files?

Try to find on freshmeat.net

Reply: http://search.mnogo.ru/board/message.php?id=726

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]