On Jun 26, 2007, at 3:22 PM, rubdabadub wrote: >> >> I currently use Sami's SolrIndexer with the trunk solrj, and we have >> a single Solr index of about 5m pages on a single 4GB machine, with >> stored content. Although the indexing is fast and stable, complicated >> full text queries are too slow for comfort (forget about MLT/faceting >> etc.) We are currently looking into ways of partitioning this and we >> may be of service in the future here. > > Brain just wondering searching woudn't that be more of a Solr issue? > I know some of the Solr site has more then 5m docs? no? are you > doing something special? I am very curious to know. We are > looking into implementing Solr on production and so far so good. > However > we are only dealing with 10 fileds 3 mil lucene doc. >
The Solr installations I know with many millions of docs don't have hundreds of KB of text per doc. The "special" thing I'm doing is storing the parse text from the nutch crawls (and other sources), which we need for various reasons. We have an extraordinary amount of unique tokens, which turns Solr/Lucene into a disk seek speed test. Full text search is certainly possible, even with stored content, but I am seeing a drop off in QTime (milliseconds to process and return a solr query) after we crossed the 2-3m document mark. It's currently at ~200-1000ms or so for uncached single term queries on a very nice server with lots of heap. Not tenable for a real-time case (but we don't use it in this manner.) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
