At 09:02 PM 11/27/01 +0000, Mark Maunder wrote:
>I'm using it on our site and searching fulltext
>indexes on three fields (including a large text field) in under 3 seconds
on over
>70,000 records on a p550 with 490 megs of ram.
>
>
Hi Mark,

<plug>

Some day if you are bored, try indexing with swish-e (the development
version).
http://swish-e.org

The big problem with it right now is it doesn't do incremental indexing.
One of the developers is trying to get that working with in a few weeks.
But for most small sets of files it's not an issue since indexing is so fast.

My favorite feature is it can run an external program, such as a perl mbox
or html parser or perl spider, or DBI program or whatever to get the source
to index.  Use it with Cache::Cache and mod_perl and it's nice and fast
from page to page of results.

Here's indexing only 24,000 files:

> ./swish-e -c u -i /usr/doc
Indexing Data Source: "File-System"
Indexing "/usr/doc"
270279 unique words indexed.
4 properties sorted.                                              
23840 files indexed.  177638538 total bytes.
Elapsed time: 00:03:50 CPU time: 00:03:16
Indexing done!

Here's searching:

> ./swish-e -w install -m 1
# SWISH format: 2.1-dev-24
# Search words: install
# Number of hits: 2202
# Search time: 0.006 seconds
# Run time: 0.011 seconds

A phrase:

> ./swish-e -w '"public license"' -m 1
# SWISH format: 2.1-dev-24
# Search words: "public license"
# Number of hits: 348
# Search time: 0.007 seconds
# Run time: 0.012 seconds
998 /usr/doc/packages/ijb/gpl.html "gpl.html" 26002


A wild card and boolean search:

> ./swish-e -w 'sa* or java' -m 1
# SWISH format: 2.1-dev-24
# Search words: sa* or java
# Number of hits: 7476
# Search time: 0.082 seconds
# Run time: 0.087 seconds

Or a good number of results:

> ./swish-e -w 'is or und or run' -m 1
# SWISH format: 2.1-dev-24
# Search words: is or und or run
# Number of hits: 14477
# Search time: 0.084 seconds
# Run time: 0.089 seconds

Or everything:

> ./swish-e -w 'not dksksks' -m 1
# SWISH format: 2.1-dev-24
# Search words: not dksksks
# Number of hits: 23840
# Search time: 0.069 seconds
# Run time: 0.074 seconds


This is pushing the limit for little old swish, but here's indexing a few
more very small xml files (~150 bytes each)

3830016 files indexed.  582898349 total bytes.
Elapsed time: 00:48:22 CPU time: 00:44:01

</plug>

Bill Moseley
mailto:[EMAIL PROTECTED]

Reply via email to