UdmSearch: Webboard: Crash!

2001-01-28 Thread Mario Gray

Author: Mario Gray
Email: [EMAIL PROTECTED]
Message:
search3.1.9 crashes with SIGSEGV (fault).. no core for me.
I'm using gdb to trace it.  it says that somewhere in free() -> glibc
but when I do a list I get to the top of void main() {...
I think there is a nasty memory leak somewhere


Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: file: URL indexing bug in mnoGoSearch 3.1.9 / udmSearch 3.0.23

2001-01-28 Thread Kaspar Brand

Hi,

when experimenting with udmSearch/mnoGoSearch, I encountered the following
bug when indexer processes file: URLs. While indexer correctly processes
file names containing special characters (which are escaped by %XX
sequences), it fails to do so when the *directory* part (dirname) of the
URL, i.e. given the following URLs -

file:/directory1/file%201  ("/directory1/file 1")
file:/directory%202/file2  ("/directory 2/file2")

indexer will mistakenly treat the second URL as a directory, which means it
rewrites it to "file:/directory%202/file2/" and tries to read its
(non-existing) directory contents. The overall result is that indexing such
files fails with status 500 (Internal Server Error).

Fixing this bug is simple (I'm attaching a patch for mnoGoSearch 3.1.9). I'm
not really proficient in C, but as far as I can see, this patch fixes the
problem. (In short: UdmFILEGet() in proto.c now uses "openname" [the
unescaped form of the current directory] instead of "filename" [the escaped
form] when putting together the directory name and the entries it reads from
it. This prevents it from mistakenly recognizing file entries as
(sub-)directory entries. I can elaborate on this fix if anybody is
interested.)

Hope this helps. And yes, I like mnoGoSearch. Nice thing.

Regards,
Kaspar



__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: file: URL indexing bug in mnoGoSearch 3.1.9 / udmSearch 3.0.23 - here is the patch

2001-01-28 Thread Kaspar Brand

Hi,

sorry, I forgot to attach the patch in my previous message. Here it is.

Regards,
Kaspar

 mnogosearch-3.1.9-FILEGet.patch


UdmSearch: Webboard: Indexing a multilanguage site

2001-01-28 Thread Presedo Roberto

Author: Presedo Roberto
Email: [EMAIL PROTECTED]
Message:
Hello,

I would like to index a web site that is in various languages, but when a user ask to 
the engine a search in a given language, I would like to show him a result only in the 
language he asked.

How do I have to index the website, knowing that every main page of each language has 
a different URL ?

Do I have to stock the datas in different Databases ?

If every languages are saved in one database, will it slow the querys ?!

Thanks



Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: ISO 10646

2001-01-28 Thread Presedo Roberto

Author: Presedo Roberto
Email: [EMAIL PROTECTED]
Message:
Hello,

I would like to index a web site where the charset is ISO-10646. Does this engine 
index that kind of charset if the mysql database is configured to stock that charset !?

Thanks

Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: ISO 10646

2001-01-28 Thread Alexander Barkov

Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
> Hello,
> 
> I would like to index a web site where the charset is ISO-10646. Does this engine 
>index that kind of charset if the mysql database is configured to stock that charset 
>!?
> 


ISO-10646 charset is not supported.

Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Speed and Indexes...

2001-01-28 Thread Matthew Sullivan


Hi,
Alexander Barkov wrote:
Matthew Sullivan wrote:
>
> Alexander Barkov wrote:
>
> > Matthew Sullivan wrote:
> > >
> > > Alexander Barkov wrote:
> > >
> > > > How many records in ndict4, ndict6 tables?
> > > >
> > > > Find an information on how to make search faster in doc/performance.txt
> > > >
> > > > Matthew Sullivan wrote:
> > > > > ndict4 contains 2051909 rows
> > > > > ndict6 contains 1415176 rows
> >
> > Big enough. Try to sort tables content as it described in
> > performance.txt
>
> Already done this... (I run optimize.sh after every db update)
Check what does MySQL report for "EXPLAIN SELECT * FROM ndict4 WHERE
word_id=XXX"
Probably something wrong with indexes.
mysql> explain select * from ndict6 where (word_id='-175892837');
++--+---+--+-++--+---+
| table  | type | possible_keys | key 
| key_len | ref    | rows | Extra
|
++--+---+--+-++--+---+
| ndict6 | ref  | word_id6  |
word_id6 |   4 | -175892837 | 2406 |  
|
++--+---+--+-++--+---+
1 row in set (0.01 sec)
mysql>
(this is cached in memory)
mysql> explain select * from ndict7 where (word_id='2082865617');
++--+---+--+-++--+---+
| table  | type | possible_keys | key 
| key_len | ref    | rows | Extra
|
++--+---+--+-++--+---+
| ndict7 | ref  | word_id7  |
word_id7 |   4 | 2082865617 |  
19 |   |
++--+---+--+-++--+---+
1 row in set (0.02 sec)
mysql>
This wasn't
 
--
Yours
Matthew
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Any connection between your reality and mine is purely coincidental.
 

begin:vcard 
n:Sullivan;Matthew
tel;cell:+44 (0)780 122 5744
tel;fax:+61 (0)3 9693 7699
tel;home:Ex-Directory
tel;work:+61 (0)3 9693 7640
x-mozilla-html:TRUE
url:http://people.netscape.com/matthews/
org:http://people.netscape.com/matthews/penguin.gif">Senior Technical Support EngineeriPlanet E-Commerce SolutionsAustralian Technical Support Services
version:2.1
email;internet:[EMAIL PROTECTED]
adr;quoted-printable:;;Netscape Communications Australia=0D=0A;Level 1, The Tea House, 28 Clarendon Street;South Melbourne;VIC 3205;Australia
x-mozilla-cpt:nemesis.netscape.com;-27760
fn:Matthew Sullivan
end:vcard

 S/MIME Cryptographic Signature


UdmSearch: Webboard: Speed and scalability (practical situation)

2001-01-28 Thread Catalin Braescu

Author: Catalin Braescu
Email: [EMAIL PROTECTED]
Message:
Hardware = K6-2/500 MHz, 96 MB RAM, IDE HDD at 5,400 RPM. Software = RedHat 7.0, 
Postgres 7.02, MnoGoSearch 3.18. 

The problem is that a single word query takes from 3 seconds to 35 seconds, depending 
on how used is that specific word. The database have cca. 10,000 documents (I am not 
very sure) indexed and the index size is about 1 GB.

My questions are: is this speed normal? Is this size of the index (versus so small 
number of documents) normal? What should I do to improve search speed (most important 
for me) and to keep the size of the index within borders of a regular HDD?

Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Differences beetwen ASPSEEK and MNOGOSEARCH

2001-01-28 Thread Catalin Braescu

Author: Catalin Braescu
Email: [EMAIL PROTECTED]
Message:
What DBMS (database) does ASPSeek use? What is the sugested hardware for a Linux box 
that have to handle about 400,000 documents? What size the index file will have? many 
thanks for any hints.

Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Performance: cache db

2001-01-28 Thread Catalin Braescu

Author: Catalin Braescu
Email: [EMAIL PROTECTED]
Message:
Any news on this isuue? I am also thinking about using built in mode and now I am 
concerned.

Reply: 

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]