UdmSearch: arte reqexps in AllowDisallow case sensitive?
Hello, I'm trying to block URLs which have some upper case charaster(s) in host name part because then there are two identical documents with different URLs in my database. I tried this: Disallow http://.*[A-Z]*.*\.[sS][kK]/ but: [hany@m1 ~]$ indexer -v 6 -n 1 -m -i -a -u http://WWW.MEGALOMAN.SK/ Indexer[1691]: indexer from UdmSearch v.3.0.23/PgSQL started with '/etc/indexer.conf' Indexer[1693]: [1] http://WWW.MEGALOMAN.SK/ Indexer[1693]: [1] 'Disallowhttp://.*[A-Z]*.*\.[sS][kK]/' Indexer[1693]: [1] Done [hany@m1 ~]$ indexer -v 6 -n 1 -m -i -a -u http://www.megaloman.sk/ Indexer[1695]: indexer from UdmSearch v.3.0.23/PgSQL started with '/etc/indexer.conf' Indexer[1697]: [1] http://www.megaloman.sk/ Indexer[1697]: [1] 'Disallowhttp://.*[A-Z]*.*\.[sS][kK]/' Indexer[1697]: [1] Done Is there a way (except hacking sources) to avoid indexing URLs like: WWW.MEGALOMAN.SK www.MEGALOMAN.sk but still index URL www.megaloman.sk ? Thanks in advance for any reply. Sincerely Peter Hanecak -- === Peter Hanecak [EMAIL PROTECTED] GPG pub.key: http://www.megaloman.com/gpg/hanecak-megaloman.txt === __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: UdmSearch + HTDB
I do not believe HTDB works...here are my results HTDBList SELECT concat('message.php?doc_id=',doc_id) FROM Metadata.DTS_archive HTDBDoc \ SELECT concat( \ 'HTTP/1.0 200 OK\\r\\n',\ 'Content-type: text/html\\r\\n',\ 'HTMLBODY\\n',\ title, ' (b',user_filename,'/b)\\n',\ '/BODY/HTML') \ FROM Metadata.DTS_archive \ WHERE doc_id=$1 Server http://localhost/ Alias http://localhost/message.php?doc_id= htdb:/ Alias http://localhost/ htdb:/ === Indexer[314]: indexer from UdmSearch v.3.1.7/MySQL started with '/usr/local/udmsearch/etc/indexer.conf' [314] SQL 0.02s: SELECT hostinfo,path FROM robots [314] SQL 0.00s: SELECT word,lang FROM stopword [314] SQL 0.00s: INSERT INTO url (url,referrer,hops,crc32,last_index_time,next_index_time,status) VALUES ('http://localhost/',0,0,0,971969420,971969420,0) [314] SQL 0.00s: LOCK TABLES url WRITE [314] SQL 0.00s: SELECT url,rec_id,docsize,status,last_index_time,hops,crc32,last_mod_time FROM url WHERE next_index_time=971969420 LIMIT 250 [314] SQL 0.01s: UPDATE url SET next_index_time=971983820 WHERE rec_id in (1) [314] SQL 0.00s: UNLOCK TABLES Indexer[314]: [1] http://localhost/ Indexer[314]: [1] 'Allow\/$' Indexer[314]: [1] Alias: 'htdb:/' [314] SQL 0.00s: SELECT concat('message.php?doc_id=',doc_id) FROM Metadata.DTS_archive Indexer[314]: [1] HTTP/1.0 200 OK Indexer[314]: [1] Content-type: text/html Indexer[314]: [1] HTTP/1.0 200 OK text/html 568 [314] SQL 0.00s: SELECT rec_id FROM url WHERE crc32=635106556 AND status=200 AND docsize=568 Indexer[314]: [1] "message.php?doc_id=1" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=2" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=3" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=4" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=5" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=6" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=7" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=8" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=9" : 'Disallow\?' [314] SQL 0.00s: LOCK TABLES dict WRITE [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'message',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'php',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'doc',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'id',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'1',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'2',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'3',1) [314] SQL 0.01s: INSERT INTO dict (url_id,word,intag) VALUES(1,'4',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'5',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'6',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'7',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'8',1) [314] SQL 0.00s: INSERT INTO dict (url_id,word,intag) VALUES(1,'9',1) [314] SQL 0.00s: UNLOCK TABLES [314] SQL 0.00s: UPDATE url SET status=200,last_mod_time=0,next_index_time=972574221,tag='',txt='message.php?doc_id=1 message.php?doc_id=2 message.php?doc_id=3 message.php?doc_id=4 message.php?doc_id=5 message.php?doc_id=6 message.php?doc_id=7 message.php?doc_id=8 message.php?doc_id=9 ',title='',content_type='text/html',docsize=568,keywords='',description='',crc32=635106556,lang='',category='' WHERE rec_id=1 [314] SQL 0.00s: LOCK TABLES url WRITE [314] SQL 0.01s: SELECT url,rec_id,docsize,status,last_index_time,hops,crc32,last_mod_time FROM url WHERE next_index_time=971969421 LIMIT 250 [314] SQL 0.00s: UNLOCK TABLES Indexer[314]: [1] Done (1 seconds) This tells me that indexer did not search the database tables, but instead just grabbed the primary keys. A possible enhancement would be to have a configuration parameter in indexer.conf which was the column name of a timestamp for the table being indexed. That way, indexer could check to see if the table had been changed since last indexing. -- My PGP public key is at http://wwwkeys.pgp.net:11371/pks/lookup?op=indexsearch=yengst Lookup anyone's PGP key at http://www.openpgp.net/pgpsrv.html Thomas R. YengstPhoton Research Associates, Inc. (858) 455-9741 5720 Oberlin Drive (858) 455-0658 fax San Diego, CA 92121-1723 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: UdmSearch + HTDB
Thomas Yengst wrote: I do not believe HTDB works...here are my results Indexer[314]: [1] "message.php?doc_id=1" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=2" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=3" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=4" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=5" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=6" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=7" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=8" : 'Disallow\?' Indexer[314]: [1] "message.php?doc_id=9" : 'Disallow\?' This means that you have "Disallow \?" in your indexer.conf and indexer does not accept URLs with "?" sign. Just remove it. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: New message on the WebBoard #1: perl cgi
Author: BaggioOwen Email: Message: It is found that Udm support perl search frontends. I don't know how to do this. Can any one tell me please? Moreover, I need to add some features about the database, which file should I modify? Thx U for attention Reply: http://search.mnogo.ru/board/message.php?id=541 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Indexing problem
Hello When i have tried to index using udmsearch-3.1.5 under Linux (mandrake 6) and mysql, i got the following message: [root@Linux src]# ./indexer Indexer[985]: indexer from UdmSearch v.3.1.5/MySQL started with '/usr/local/udmsearch/etc/indexer.conf' Indexer[985] : [1] Error: '#1: parse error near 'LOCK TABLES url WRITE' at line 1' In 'url' table, i have just the name of the url ., the other fields are empty. In my indexer.conf , i have : DBAddr mysql://root@localhost/udmsearch/ FollowOutside yes server file:/etc server http://www.yahoo.com/ thank you. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: New message on the WebBoard #1: Problem with Russian language
Author: Yakov Grusovsky Email: [EMAIL PROTECTED] Message: My UdmSearch don't find words if first letter is big "ñ"-(last letter of russian alphabet).Help me!! I'm don't lie ( try http://194.44.181.3/search.shtml this site is under construction) My site have Win-1251 encoding,i'm use MySQL database and PHP4. Thanks.Sorry for bad English Reply: http://search.mnogo.ru/board/message.php?id=0 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: New message on the WebBoard #1: Preoblem with Russian language
Author: Yakov Grusovsky Email: [EMAIL PROTECTED] Message: My UdmSearch don't find words if first letter is big "ñ" -last letter of russian alphabet. I'm use MySql database, PHP4, all documents in win-1251 encoding. URLhttp://194.44.181.3/search.shtml Help me!!! Thanks Reply: http://search.mnogo.ru/board/message.php?id=0 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: New message on the WebBoard #1: Problem with Russian language
Hi! Friday, October 20, 2000, 1:49:22 AM, you wrote: YG Author: Yakov Grusovsky YG Email: [EMAIL PROTECTED] YG My UdmSearch don't find words if first letter is big "ñ"-(last letter of russian alphabet).Help me!! I'm don't lie YG ( try http://194.44.181.3/search.shtml YG this site is under construction) YG My site have Win-1251 encoding,i'm use MySQL database and PHP4. Please specify what version of udmsearch you are using, and what frontend (with version). -- Regards, Sergey aka gluke. __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]