Re: [OT] Re: search.cpan.org
On Tue, 27 Nov 2001, Nick Tonkin wrote: > Well, ask Ask if you want the whole truth. But when I saked him that's > what he said. Maybe there's a problem with the architecture and some > pre-indexing is done per session or something suboptimal like that. Ask? No, Robert is right. It's just searches that are doing a full scan of the database. I know Graham is working on a better search system. If Bill got swish-e to support incremental database updates I'm sure it would help. ;-) - ask -- ask bjoern hansen, http://ask.netcetera.dk/ !try; do(); more than a billion impressions per week, http://valueclick.com
Re: [OT] Re: search.cpan.org
On Tue, 27 Nov 2001, Bill Moseley wrote: > At 12:55 PM 11/27/01 -0800, Nick Tonkin wrote: > > > >Because it does a full text search of all the contents of the DB. > > Perhaps, but it's just overloaded. I think the load, and network connection, is the main reason; the search itself, if you were connected locally at a time when the machine isn't so busy, is pretty quick. > I'm sure he's working on it, but anyone want of offer Graham free hosting? > A few mirrors would be nice, too. They (Graham and Elaine) are aware that it can be slow at times, and have set up at least one mirror site to help spread the load. best regards, randy kobes
Re: [OT] Re: search.cpan.org
At 09:02 PM 11/27/01 +, Mark Maunder wrote: >I'm using it on our site and searching fulltext >indexes on three fields (including a large text field) in under 3 seconds on over >70,000 records on a p550 with 490 megs of ram. > > Hi Mark, Some day if you are bored, try indexing with swish-e (the development version). http://swish-e.org The big problem with it right now is it doesn't do incremental indexing. One of the developers is trying to get that working with in a few weeks. But for most small sets of files it's not an issue since indexing is so fast. My favorite feature is it can run an external program, such as a perl mbox or html parser or perl spider, or DBI program or whatever to get the source to index. Use it with Cache::Cache and mod_perl and it's nice and fast from page to page of results. Here's indexing only 24,000 files: > ./swish-e -c u -i /usr/doc Indexing Data Source: "File-System" Indexing "/usr/doc" 270279 unique words indexed. 4 properties sorted. 23840 files indexed. 177638538 total bytes. Elapsed time: 00:03:50 CPU time: 00:03:16 Indexing done! Here's searching: > ./swish-e -w install -m 1 # SWISH format: 2.1-dev-24 # Search words: install # Number of hits: 2202 # Search time: 0.006 seconds # Run time: 0.011 seconds A phrase: > ./swish-e -w '"public license"' -m 1 # SWISH format: 2.1-dev-24 # Search words: "public license" # Number of hits: 348 # Search time: 0.007 seconds # Run time: 0.012 seconds 998 /usr/doc/packages/ijb/gpl.html "gpl.html" 26002 A wild card and boolean search: > ./swish-e -w 'sa* or java' -m 1 # SWISH format: 2.1-dev-24 # Search words: sa* or java # Number of hits: 7476 # Search time: 0.082 seconds # Run time: 0.087 seconds Or a good number of results: > ./swish-e -w 'is or und or run' -m 1 # SWISH format: 2.1-dev-24 # Search words: is or und or run # Number of hits: 14477 # Search time: 0.084 seconds # Run time: 0.089 seconds Or everything: > ./swish-e -w 'not dksksks' -m 1 # SWISH format: 2.1-dev-24 # Search words: not dksksks # Number of hits: 23840 # Search time: 0.069 seconds # Run time: 0.074 seconds This is pushing the limit for little old swish, but here's indexing a few more very small xml files (~150 bytes each) 3830016 files indexed. 582898349 total bytes. Elapsed time: 00:48:22 CPU time: 00:44:01 Bill Moseley mailto:[EMAIL PROTECTED]
Re: [OT] Re: search.cpan.org
Nick Tonkin wrote: > Because it does a full text search of all the contents of the DB. > Not sure what he's using for a back end, but mysql 4.0 (in alpha) has very fast and feature rich full text searching now, so perhaps he can migrate to that once it's released in December sometime. I'm using it on our site and searching fulltext indexes on three fields (including a large text field) in under 3 seconds on over 70,000 records on a p550 with 490 megs of ram.
Re: [OT] Re: search.cpan.org
At 12:55 PM 11/27/01 -0800, Nick Tonkin wrote: > >Because it does a full text search of all the contents of the DB. Perhaps, but it's just overloaded. I'm sure he's working on it, but anyone want of offer Graham free hosting? A few mirrors would be nice, too. (Plus, all my CPAN.pm setups are now failing to work, too) Bill Moseley mailto:[EMAIL PROTECTED]