Re: Strategy for handling large (and growing) index: horizontal partitioning?
How many documents are in the index? If you haven't already done this I'd take a really close look at your schema and make sure you're only storing the things that should really be stored, same with the indexed fields. I drastically reduced my index size just by changing some indexed/stored options on a few fields. On Thu, Feb 28, 2008 at 10:54 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: James, I can't comment more on the SN's arch choices. Here is the story about your questions - 1 Solr instance can hold 1+ indices, either via JNDI (see Wiki) or via the new multi-core support which works, but is still being hacked on - See SOLR-303 in JIRA for distributed search. Yonik committed it just the other day, so now that's in nightly builds if you want to try it. There are 2 Wiki pages about that, too, see Recent changes log on the Wiki to quickly find them. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: James Brady [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, February 29, 2008 1:11:07 AM Subject: Re: Strategy for handling large (and growing) index: horizontal partitioning? Hi Otis, Thanks for your comments -- I didn't realise the wiki is open to editing; my apologies. I've put in a few words to try and clear things up a bit. So determining n will probably be a best guess followed by trial and error, that's fine. I'm still not clear about whether single Solr servers can operate across several indices, however.. can anyone give me some pointers here? An alternative would be to have 1 index per instance, and n instances per server, where n is small. This might actually be a practical solution -- I'm spending ~20% of my time committing, so I should probably only have 3 or 4 indices in total per server to avoid two committing at the same time. Your mention of The Large Social Network was interesting! A social network's data is by definition pretty poorly partitioned by user id, so unless they've done something extremely clever like co-locating social cliques in the same indices, I would have though it would be a sub-optimal architecture. If me and my friends are scattered around different indices, each search would have to be federated massively. James On 28 Feb 2008, at 20:49, Otis Gospodnetic wrote: James, Regarding your questions about n users per index - this is a fine approach. The largest Social Network that you know of uses the same approach for various things, including full-text indices (not Solr, but close). You'd have to maintain user-shard/index mapping somewhere, of course. What should the n be, you ask? Look at the overall index size, I'd say, against server capabilities (RAM, disk, CPU), increase n up to a point where you're maximizing your hardware at some target query rate. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: James Brady To: solr-user@lucene.apache.org Sent: Wednesday, February 27, 2008 10:08:02 PM Subject: Strategy for handling large (and growing) index: horizontal partitioning? Hi all, Our current setup is a master and slave pair on a single machine, with an index size of ~50GB. Query and update times are still respectable, but commits are taking ~20% of time on the master, while our daily index optimise can up to 4 hours... Here's the most relevant part of solrconfig.xml: true 10 1000 1 1 I've given both master and slave 2.5GB of RAM. Does an index optimise read and re-write the whole thing? If so, taking about 4 hours is pretty good! However, the documentation here: http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b states Optimizations can take nearly ten minutes to run... which leads me to think that we've grossly misconfigured something... Firstly, we would obviously love any way to reduce this optimise time - I have yet to experiment extensively with the settings above, and optimise frequency, but some general guidance would be great. Secondly, this index size is increasing monotonously over time and as we acquire new users. We need to take action to ensure we can scale in the future. The approach we're favouring at the moment is horizontal partitioning of indices by user id as our data suits this scheme well. A given index would hold the indexed data for n users, where n would probably be between 1 and 100 users, and we will have multiple indices per search server. Running server per index is impractical, especially for a small n, so is a sinlge Solr instance capable of managing
solr not finding all results
I've found an odd situation where solr is not returning all of the documents that I think it should. A search for Geckoplp4-M returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set: http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0rows=10indent=onfl=comments,id ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flcomments,id/str str name=qGeckoplp4-M/str str name=version2.2/str /lst /lst result name=response numFound=3 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816500/str /doc doc str name=commentstoptrax recordings. Same tracks. Geckoplp4-M/str str name=idm2816544/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2815903/str /doc /result /response Now here's an example of a search for two documents that I know have that string, but were not returned in the previous search: http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611version=2.2start=0rows=10indent=onfl=id,comments ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flid,comments/str str name=qid:m2816615 OR id:m2816611/str str name=version2.2/str /lst /lst result name=response numFound=2 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816611/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2816615/str /doc /result /response Here is the definition for the comments field: field name=comments type=text indexed=true stored=true/ And here is the definition for a text field: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldtype Any ideas? Am I doing something wrong? thanks, Kevin
Re: solr not finding all results
Sorry, I've figured out my own problem. There is a problem with the way I create the xml document for indexing that was causing some of the comments fields to not be listed correctly in the default search field, content. On 10/12/07, Kevin Lewandowski [EMAIL PROTECTED] wrote: I've found an odd situation where solr is not returning all of the documents that I think it should. A search for Geckoplp4-M returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set: http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0rows=10indent=onfl=comments,id ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flcomments,id/str str name=qGeckoplp4-M/str str name=version2.2/str /lst /lst result name=response numFound=3 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816500/str /doc doc str name=commentstoptrax recordings. Same tracks. Geckoplp4-M/str str name=idm2816544/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2815903/str /doc /result /response Now here's an example of a search for two documents that I know have that string, but were not returned in the previous search: http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611version=2.2start=0rows=10indent=onfl=id,comments ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flid,comments/str str name=qid:m2816615 OR id:m2816611/str str name=version2.2/str /lst /lst result name=response numFound=2 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816611/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2816615/str /doc /result /response Here is the definition for the comments field: field name=comments type=text indexed=true stored=true/ And here is the definition for a text field: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldtype Any ideas? Am I doing something wrong? thanks, Kevin
Re: index size
To achieve this I have to keep the document field to stored right? Yes, the field needs to be stored to return snippets. When I do this my index becomes huge 10 GB index, cause I have 10K docs but each is very lengthy HTML. Is there any better solution? Why is index created by nutch so small in comparison (about 27 mb approx) but it still returns snippets! Are you storing the complete html? If so I think you should strip out the html then index the document. On 10/9/07, Kevin Lewandowski [EMAIL PROTECTED] wrote: Late reply on this but I just wanted to say thanks for the suggestions. I went through my whole schema and was storing things that didn't need to be stored and indexing a lot of things that didn't need to be indexed. Just completed a full reindex and it's a much more reasonable size now. Kevin On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote: On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. An ls -sh will tell you roughly where the the space is being occupied. There is something strange going on: 2.5kB * 2.7m is only 6GB, and I have trouble imagining where the 30-fold index size expansion is coming from. -Mike
Re: index size
Late reply on this but I just wanted to say thanks for the suggestions. I went through my whole schema and was storing things that didn't need to be stored and indexing a lot of things that didn't need to be indexed. Just completed a full reindex and it's a much more reasonable size now. Kevin On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote: On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. An ls -sh will tell you roughly where the the space is being occupied. There is something strange going on: 2.5kB * 2.7m is only 6GB, and I have trouble imagining where the 30-fold index size expansion is coming from. -Mike
index size
Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. thanks, Kevin
Re: Snapshooting or replicating recently indexed data
snapshooter does create incremental builds of the index. It doesn't appear so if you look at the contents because the existing files are hard links. But it is incremental. On 4/20/07, Doss [EMAIL PROTECTED] wrote: Hi Yonik, Thanks for your quick response, my question is this, can we take incremental backup/replication in SOLR? Regards, Doss. M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise) - Original Message - From: Yonik Seeley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 19, 2007 7:42 PM Subject: Re: Snapshooting or replicating recently indexed data On 4/19/07, Doss [EMAIL PROTECTED] wrote: It seems the snapshooter takes the exact copy of the indexed data, that is all the contents inside the index directory, how can we take the recently added once? ... cp -lr ${data_dir}/index ${temp} mv ${temp} ${name} ... I don't quite understand your question, but since hard links are used, it's more like pointing to the index files instead of copying them. Rsync is used as a transport to only move the files that were changed from the master to slaves. -Yonik
Re: Facet Browsing
I recommend you build your query with facet options in raw format and make sure you're getting back the data you want. Then build it into your app. On 4/18/07, Jennifer Seaman [EMAIL PROTECTED] wrote: Does anyone have any sample code (php, perl, etc) how to setup facet browsing with paging? I can't seem to get things like facet.mincount to work. Thank you. Jennifer Seaman
Re: Incremental replication...
snapshooter copies all files but most files in the snapshot directories are hard links pointing to segments in the main index directory. So only new segments end up getting copied. We've been running replication on discogs.com for several months and it works great. On 2/13/07, escher2k [EMAIL PROTECTED] wrote: I was wondering if the scripts provided in Solr do incremental replication. Looking at the script for snapshooter, it seems like the whole index directory is copied over. Is that correct ? If so, isn't performance a problem over the long run ? Thanks for the clarification in advance (I hope I am wrong !!). -- View this message in context: http://www.nabble.com/Incremental-replication...-tf3222946.html#a8951862 Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication
This should explain most everything: http://wiki.apache.org/solr/CollectionDistribution I've been running solr replication on discogs.com for a few months and it works great! Kevin On 1/23/07, S Edirisinghe [EMAIL PROTECTED] wrote: Hi, I just started looking into solr. I like the features that have been listed. I'm interested in how the replication feature is implemented since I have build my own replication for lucene using unix rsync scripts. Where would the best starting point be to find out how replication of the search indeces are done? thanks
Re: solr/tomcat stops responding
Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing else but generate a stack trace to the console or a log file. If you don't start tomcat by hand, the stack trace may go somewhere else I suppose. This would be useful to learn how to do on your particular system (and we should add it to a debugging/troubleshooting wiki too). Okay, I figured out how to get the thread dump. It was in the tomcat logfile. I'm attaching it here. Are you load-balancing at all, or is this your only search server? FYI, I'm looking into something that will help. I'm load balancing two solr servers. thanks, Kevin thread_dump.txt.gz Description: GNU Zip compressed data
Re: solr/tomcat stops responding
accept connections for 3 or 4 hours ... did you try taking some thread dumps like yonik suggested to see what all the threads were doing? A kill -3 will not kill the process. It does nothing and there's no thread dump on the console. kill -9 does kill it though. btw, this has been a bigger problem for me because there's a separate hardware issue and the system freezes about every 12 hours. So I have to reboot it. After that I noticed solr not responding. I've done a temporary fix for this by running a proxy in front of tomcat. Then I updated my system startup to start solr, wait 20 seconds, run a few queries, wait 20 seconds, then start the proxy. This is working fine now. But I'd still like to fix the real problem. Let me know if there's anything else I can test or information I can provide. thanks, Kevin
Re: solr/tomcat stops responding
My solr installation has been running fine for a few weeks but now after a server reboot it starts and runs for a few seconds, then stops responding. I don't see any errors in the logfiles, apart from snapinstaller not being able to issue a commit. Also, the process is using 100% cpu and stops responding to http requests (admin interface and queries). Okay, some more happened after I sent this email. About 3 hours after the reboot solr started running normally again. Then I rebooted it to see if I could reproduce it. This time solr remained in the not-responding state for about 4 hours but I did not wait longer to see if it would come back. - check what got changed after the server reboot... anything? Nothing had been changed on the server. Part of the fix for this has recently been committed into Lucene (multiple threads won't generate the same FieldCache entry). Has that been added to solr yet? I'm running solr-2006-11-20. To see if this is your problem, restart the server and make sure no traffic goes to it. Then run some queries of the same type that will be hitting it to warm it up, then turn on normal traffic. Okay, I did that. Shut off traffic to the server, restarted solr, ran a few queries against it, then turned traffic back on, and it's running fine now. So maybe the initial flood of requests has something to do with it? thanks, Kevin
Re: Cache stats
In the admin interface, if you click statistics, there's a cache section. On 11/29/06, Tom [EMAIL PROTECTED] wrote: Hi - I'm starting to try to tune my installation a bit, and I'm looking for cache statistics. Is there a way to peek into a running installation, and see what my cache stats are? I'm looking for the usual cache hits/cache misses sort of things. Also, on a related note, I was looking for solr info via mbeans. I fired up jconsole, and I can see all sort of tomcat mbeans, but nothing for solr. Is there something extra I have to do to turn this on? I see things implementing SolrInfoMBean, so I'm assuming there is something there. (Off topic, but suggestions for anything better than JConsole also welcome). Thanks, Tom
Minimum time between distributions
On Discogs I'm running Solr with two slaves and one master, using the distribution scripts. The slaves pull and install a new snapshot every five minutes and this is working very well so far. Are there any risks with reducing this window to every one or two minutes? With large caches could the autowarming take longer than one or two minutes? It isn't a business need to reduce the window but I'm just curious about the feasibility and risks. How often do other people run snappuller and snapinstaller? thanks, Kevin
Re: Spellchecker in Solr?
I have not done one but have been planning to do it based on this article: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html With Solr it would be much simpler than the java examples they give. On 10/30/06, Michael Imbeault [EMAIL PROTECTED] wrote: Hello everyone, Has anybody successfully implemented a Lucene spellchecker within Solr? If so, could you give details on how one would achieve this? If not, is it planned to make it as standard within Solr? Its a feature almost every Solr application would want to use, so I think it would be a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one coding that :( Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Spellchecker in Solr?
I had the very same article in mind - how would it be simpler in Solr than in Lucene? A spellchecker is pretty much standard in every major I meant it would be a simpler implementation in Solr because you don't have to deal with java or any Lucene API's. You just create a document for each correct word. For example the word lettuce would have a document: doc field name=wordlettuce/field field name=start3let/field field name=gram3let ett ttu tuc uce/field field name=end3uce/field field name=start4lett/field field name=gram4lett ettu ttuc tuce/field field name=end4tuce/field /doc Then you query Solr using the same syntax they describe in the article. Anyway I haven't done this or tested it, but when reading that article I thought it would be much easier to implement using Solr, at least for me since I already have a database of correct words in Solr. Kevin
Re: Solr use case
No, after you add new documents you simply issue a commit/ command and the new docs are searchable. On Discogs.com we have just over 1 million docs in the index and do about 20,000 updates per day. Every 15 minutes we read a queue and add new documents, then commit. And we optimize once per day. I've had no problems with that. Kevin On 10/11/06, climbingrose [EMAIL PROTECTED] wrote: Hi all, Is it true that Solr is mainly used for applications that rarely change the underlying data? As I understand, if you submit new data or modify existing data on Solr server, you would have to refresh the cache somehow to display the updated data. If my application frequently gets new data/updates from users, should I use Solr? I love faceted browsing and dynamic properties so much but I need to justify the choice of Solr. Thanks. By the way, does anyone have any performance measure that can be shared (apart from the one on the Wiki)? As I estimated, my application probably has half a million docs, each of which has around 15 properties, does anyone know the type of hardware I would need for reasonable performance. Thanks. -- Regards, Cuong Hoang
Re: Couple of problems
I've had a problem similar to this and it was because of the schema.xml. It was valid XML but there were some incorrect field definitions and/or the default field listed was not a defined field. I'd suggest you start with the default schema and build on it piece by piece, each time testing for the error with a ping operation in the admin page. Kevin On 10/11/06, mark [EMAIL PROTECTED] wrote: Hi, I have installed solr under a stand alone tomcat5.5 installation. I can see the admin screens etc. When I submit documents I get this error Oct 11, 2006 10:05:44 AM org.apache.solr.core.SolrException logSEVERE: java.lang.NullPointerException at org.apache.solr.update.DocumentBuilder.addField (DocumentBuilder.java:78) at org.apache.solr.update.DocumentBuilder.addField (DocumentBuilder.java:74) at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917) at org.apache.solr.core.SolrCore.update(SolrCore.java:685) at org.apache.solr.servlet.SolrUpdateServlet.doPost (SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) . My docs follow this schema: fields field name=id type=string indexed=false stored=true/ field name=timestamp type=string indexed=true stored=true/ field name=url type=string indexed=false stored=true/ field name=collection type=text_ws indexed=true stored=true/ field name=mimetype type=string indexed=true stored=true/ field name=content type=text indexed=true stored=false/ /fields Also - since getting this error I can no longer see part of the solr/ admin/stats.jsp screen - the boxes core, update , cache and other are now empty. I deleted and reinstalled solr (including the unpacked webapps dir) but not tomcat and the problem is still there cheers mark
Re: Can't get q.op working
Now I feel dumb! I hadn't deployed the latest build properly. The new .war file was there but for some reason restarting tomcat didn't reload it. Anyway, q.op is working fine now. On 9/27/06, Erik Hatcher [EMAIL PROTECTED] wrote: Kevin, I've just tried this locally using the tutorial example data, using both a default (in schema.xml) of AND and OR. (I use the Ruby response writer because it's easier to read than XML ;) Use the default operator from schema.xml: http://localhost:8983/solr/select?wt=rubyindent=2q=ipod%20belkin Override with AND: http://localhost:8983/solr/select?wt=rubyindent=2q=ipod% 20belkinq.op=AND Override with OR: http://localhost:8983/solr/select?wt=rubyindent=2q=ipod% 20belkinq.op=OR All worked as expected in all cases. There is one result with AND and three results with OR. I recommend you try this same scenario out with the tutorial example data and ensure things work as I've stated here. Let us know more details if the problem persists. Erik On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote: I'm running the latest nightly build (2006-09-27) and cannot seem to get the q.op parameter working. I have the default operator set to AND and am testing with a two word query that returns no results. If I add OR to the query I get results. But if I remove the OR and add q.op=OR to the Solr query I still get no results. Is there anything I could be doing wrong? thanks Kevin
How much ram can Solr use?
On the performace wiki page it mentions a test box with 16GB ram. Did anything special need to be done to use that much ram (with the OS or java)? Would Solr on a system with Linux x86_64 and Tomcat be able to use that much ram? (sorry, I don't know Java so I don't know if there are any limitations there). thanks, Kevin
Solr now used on Discogs.com
I just wanted to say thanks to the Solr developers. I'm now using Solr for the main search engine on Discogs.com. I've been through five revisions of the search engine and this was definitely the least painful. Solr gives me the power of Lucene without having to deal with the guts. It made for a much faster implementation than all other search packages I've worked with. Some stats: there are now 1.1 million documents in the index and it handles 200,000 searches per day (on a single-cpu P4 server with 1 gig ram). Kevin
Re: acts_as_solr
You might want to look at acts_as_searchable for Ruby: http://rubyforge.org/projects/ar-searchable That's a similar plugin for the Hyperestraier search engine using its REST interface. On 8/28/06, Erik Hatcher [EMAIL PROTECTED] wrote: I've spent a few hours tinkering with an Ruby ActiveRecord plugin to index, delete, and search models fronted by a database into Solr.