UdmSearch: arte reqexps in AllowDisallow case sensitive?

2000-10-19 Thread Peter Hanecak

Hello,

I'm trying to block URLs which have some upper case charaster(s) in host
name part because then there are two identical documents with different
URLs in my database. I tried this:

Disallow http://.*[A-Z]*.*\.[sS][kK]/

but:

[hany@m1 ~]$ indexer -v 6 -n 1 -m -i -a -u http://WWW.MEGALOMAN.SK/
Indexer[1691]: indexer from UdmSearch v.3.0.23/PgSQL started with '/etc/indexer.conf'
Indexer[1693]: [1] http://WWW.MEGALOMAN.SK/
Indexer[1693]: [1] 'Disallowhttp://.*[A-Z]*.*\.[sS][kK]/'
Indexer[1693]: [1] Done
[hany@m1 ~]$ indexer -v 6 -n 1 -m -i -a -u http://www.megaloman.sk/
Indexer[1695]: indexer from UdmSearch v.3.0.23/PgSQL started with '/etc/indexer.conf'
Indexer[1697]: [1] http://www.megaloman.sk/
Indexer[1697]: [1] 'Disallowhttp://.*[A-Z]*.*\.[sS][kK]/'
Indexer[1697]: [1] Done

Is there a way (except hacking sources) to avoid indexing URLs like:

WWW.MEGALOMAN.SK
www.MEGALOMAN.sk

but still index URL www.megaloman.sk ?

Thanks in advance for any reply.

Sincerely

Peter Hanecak

-- 
===
  Peter Hanecak [EMAIL PROTECTED]
  GPG pub.key: http://www.megaloman.com/gpg/hanecak-megaloman.txt
===


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: UdmSearch + HTDB

2000-10-19 Thread Thomas Yengst

I do not believe HTDB works...here are my results

HTDBList SELECT concat('message.php?doc_id=',doc_id) FROM
Metadata.DTS_archive

HTDBDoc \
SELECT concat( \
'HTTP/1.0 200 OK\\r\\n',\
'Content-type: text/html\\r\\n',\
'HTMLBODY\\n',\
title, ' (b',user_filename,'/b)\\n',\
'/BODY/HTML') \
FROM Metadata.DTS_archive \
WHERE doc_id=$1

Server http://localhost/
Alias http://localhost/message.php?doc_id=  htdb:/
Alias http://localhost/ htdb:/

===

Indexer[314]: indexer from UdmSearch v.3.1.7/MySQL started with
'/usr/local/udmsearch/etc/indexer.conf'
[314] SQL 0.02s: SELECT hostinfo,path FROM robots
[314] SQL 0.00s: SELECT word,lang FROM stopword
[314] SQL 0.00s: INSERT INTO url
(url,referrer,hops,crc32,last_index_time,next_index_time,status) VALUES
('http://localhost/',0,0,0,971969420,971969420,0)
[314] SQL 0.00s: LOCK TABLES url WRITE
[314] SQL 0.00s: SELECT
url,rec_id,docsize,status,last_index_time,hops,crc32,last_mod_time FROM
url WHERE next_index_time=971969420  LIMIT 250
[314] SQL 0.01s: UPDATE url SET next_index_time=971983820 WHERE rec_id
in (1)
[314] SQL 0.00s: UNLOCK TABLES
Indexer[314]: [1] http://localhost/
Indexer[314]: [1] 'Allow\/$'
Indexer[314]: [1] Alias: 'htdb:/'
[314] SQL 0.00s: SELECT concat('message.php?doc_id=',doc_id) FROM
Metadata.DTS_archive
Indexer[314]: [1] HTTP/1.0 200 OK
Indexer[314]: [1] Content-type: text/html
Indexer[314]: [1] HTTP/1.0 200 OK text/html 568
[314] SQL 0.00s: SELECT rec_id FROM url WHERE crc32=635106556 AND
status=200 AND docsize=568
Indexer[314]: [1] "message.php?doc_id=1" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=2" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=3" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=4" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=5" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=6" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=7" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=8" : 'Disallow\?'
Indexer[314]: [1] "message.php?doc_id=9" : 'Disallow\?'
[314] SQL 0.00s: LOCK TABLES dict WRITE
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag)
VALUES(1,'message',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'php',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'doc',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'id',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'1',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'2',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'3',1)
[314] SQL 0.01s: INSERT INTO dict (url_id,word,intag) VALUES(1,'4',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'5',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'6',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'7',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'8',1)
[314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'9',1)
[314] SQL 0.00s: UNLOCK TABLES
[314] SQL 0.00s: UPDATE url SET
status=200,last_mod_time=0,next_index_time=972574221,tag='',txt='message.php?doc_id=1 
message.php?doc_id=2  message.php?doc_id=3  message.php?doc_id=4 
message.php?doc_id=5  message.php?doc_id=6  message.php?doc_id=7 
message.php?doc_id=8  message.php?doc_id=9
',title='',content_type='text/html',docsize=568,keywords='',description='',crc32=635106556,lang='',category=''
WHERE rec_id=1
[314] SQL 0.00s: LOCK TABLES url WRITE
[314] SQL 0.01s: SELECT
url,rec_id,docsize,status,last_index_time,hops,crc32,last_mod_time FROM
url WHERE next_index_time=971969421  LIMIT 250
[314] SQL 0.00s: UNLOCK TABLES
Indexer[314]: [1] Done (1 seconds)

This tells me that indexer did not search the database tables, but
instead just grabbed the primary keys.

A possible enhancement would be to have a configuration parameter in
indexer.conf which was the column name of a timestamp for the table
being indexed. That way, indexer could check to see if the table had
been changed since last indexing.

-- 
My PGP public key is at
http://wwwkeys.pgp.net:11371/pks/lookup?op=indexsearch=yengst
Lookup anyone's PGP key at http://www.openpgp.net/pgpsrv.html

Thomas R. YengstPhoton Research Associates, Inc.
(858) 455-9741  5720 Oberlin Drive
(858) 455-0658 fax  San Diego, CA 92121-1723
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: UdmSearch + HTDB

2000-10-19 Thread Alexander Barkov

Thomas Yengst wrote:
 
 I do not believe HTDB works...here are my results
 



 Indexer[314]: [1] "message.php?doc_id=1" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=2" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=3" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=4" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=5" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=6" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=7" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=8" : 'Disallow\?'
 Indexer[314]: [1] "message.php?doc_id=9" : 'Disallow\?'


This means that you have "Disallow \?"  in your indexer.conf
and indexer does not accept URLs with "?" sign. Just remove
it.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: perl cgi

2000-10-19 Thread BaggioOwen

Author: BaggioOwen
Email: 
Message:
It is found that Udm support perl search frontends. I don't know how to do this. Can 
any one tell me please?
Moreover, I need to add some features about the database, which file should I modify?

Thx U for attention

Reply: http://search.mnogo.ru/board/message.php?id=541

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Indexing problem

2000-10-19 Thread sanaa

Hello

When i have tried to index using udmsearch-3.1.5  under Linux (mandrake
6) and mysql, i got the following message:

[root@Linux src]# ./indexer
Indexer[985]: indexer from UdmSearch v.3.1.5/MySQL started with
'/usr/local/udmsearch/etc/indexer.conf'
Indexer[985] : [1] Error: '#1: parse error near 'LOCK TABLES url WRITE'
at line 1'

In 'url' table, i have just the name of the url ., the other fields are
empty.

In my indexer.conf , i have :
DBAddr mysql://root@localhost/udmsearch/
FollowOutside yes
server file:/etc
server http://www.yahoo.com/

thank you.

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: Problem with Russian language

2000-10-19 Thread Yakov Grusovsky

Author: Yakov Grusovsky
Email: [EMAIL PROTECTED]
Message:
My UdmSearch don't find words if first letter is big "ñ"-(last letter of russian 
alphabet).Help me!! I'm don't lie 
( try 
http://194.44.181.3/search.shtml 
this site is under construction)

My site have Win-1251 encoding,i'm use MySQL database and PHP4.
Thanks.Sorry for bad English




Reply: http://search.mnogo.ru/board/message.php?id=0

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: Preoblem with Russian language

2000-10-19 Thread Yakov Grusovsky

Author: Yakov Grusovsky
Email: [EMAIL PROTECTED]
Message:
My UdmSearch don't find words if first letter is big "ñ" -last letter of russian 
alphabet.

I'm use MySql database, PHP4, all documents in win-1251 encoding.

URLhttp://194.44.181.3/search.shtml

Help me!!! Thanks

Reply: http://search.mnogo.ru/board/message.php?id=0

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: New message on the WebBoard #1: Problem with Russian language

2000-10-19 Thread Sergey Kartashoff

Hi!

Friday, October 20, 2000, 1:49:22 AM, you wrote:

YG Author: Yakov Grusovsky
YG Email: [EMAIL PROTECTED]
YG My UdmSearch don't find words if first letter is big "ñ"-(last letter of russian 
alphabet).Help me!! I'm don't lie
YG ( try http://194.44.181.3/search.shtml
YG this site is under construction)

YG My site have Win-1251 encoding,i'm use MySQL database and PHP4.

Please specify what version of udmsearch you are using, and what
frontend (with version).

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]