Re: millions of records problem

2011-10-18 Thread Tom Gullo
Getting a solid-state drive might help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/millions-of-records-problem-tp3427796p3431309.html
Sent from the Solr - User mailing list archive at Nabble.com.


millions of records problem

2011-10-17 Thread Jesús Martín García

Hi,

I've got 500 millions of documents in solr everyone with the same number 
of fields an similar width. The version of solr which I used is 1.4.1 
with lucene 2.9.3.


I don't have the option to use shards so the whole index has to be in a 
machine...


The size of the index is about 50Gb and the ram is 8GbEverything is 
working but the searches are so slowly, although I tried different 
configurations of the solrconfig.xml as:


- configure a first searcher with the most used searches
- configure the caches (query, filter and document) with great numbers...

but everything is still working slowly, so do you have any ideas to 
boost the searches without the penalty to use much more ram?


Thanks in advance,

Jesús

--
...
  __
/   /   Jesús Martín García
C E / S / C A   Tècnic de Projectes
  /__ / Centre de Serveis Científics i Acadèmics de Catalunya

Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
...



Re: millions of records problem

2011-10-17 Thread Jan Høydahl
Hi,

What exactly do you mean by slow search? 1s? 10s?
Which operating system, how many CPUs, which servlet container and how much RAM 
have you allocated to your JVM? (-Xmx)
What kind and size of docs? Your numbers indicate about 100bytes per doc?
What kind of searches? Facets? Sorting? Wildcards?
Have you tried to slim down you schema by setting indexed=false and 
stored=false wherever possible?

First thought is that it's really impressive if you've managed to get 500mill 
docs into one index with only 8Gb RAM!! I would expect that to fail or best 
case be veery slow. If you have a beefy server I'd first try putting in 64Gb 
RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is 
more RAM efficient.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. okt. 2011, at 12:19, Jesús Martín García wrote:

 Hi,
 
 I've got 500 millions of documents in solr everyone with the same number of 
 fields an similar width. The version of solr which I used is 1.4.1 with 
 lucene 2.9.3.
 
 I don't have the option to use shards so the whole index has to be in a 
 machine...
 
 The size of the index is about 50Gb and the ram is 8GbEverything is 
 working but the searches are so slowly, although I tried different 
 configurations of the solrconfig.xml as:
 
 - configure a first searcher with the most used searches
 - configure the caches (query, filter and document) with great numbers...
 
 but everything is still working slowly, so do you have any ideas to boost the 
 searches without the penalty to use much more ram?
 
 Thanks in advance,
 
 Jesús
 
 -- 
 ...
  __
/   /   Jesús Martín García
 C E / S / C A   Tècnic de Projectes
  /__ / Centre de Serveis Científics i Acadèmics de Catalunya
 
 Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
 T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
 ...
 



Re: millions of records problem

2011-10-17 Thread Nick Veenhof
You could use this technique? I'm currently reading up on it
http://khaidoan.wikidot.com/solr-common-gram-filter


On 17 October 2011 12:57, Jan Høydahl jan@cominvent.com wrote:
 Hi,

 What exactly do you mean by slow search? 1s? 10s?
 Which operating system, how many CPUs, which servlet container and how much 
 RAM have you allocated to your JVM? (-Xmx)
 What kind and size of docs? Your numbers indicate about 100bytes per doc?
 What kind of searches? Facets? Sorting? Wildcards?
 Have you tried to slim down you schema by setting indexed=false and 
 stored=false wherever possible?

 First thought is that it's really impressive if you've managed to get 500mill 
 docs into one index with only 8Gb RAM!! I would expect that to fail or best 
 case be veery slow. If you have a beefy server I'd first try putting in 64Gb 
 RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is 
 more RAM efficient.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 17. okt. 2011, at 12:19, Jesús Martín García wrote:

 Hi,

 I've got 500 millions of documents in solr everyone with the same number of 
 fields an similar width. The version of solr which I used is 1.4.1 with 
 lucene 2.9.3.

 I don't have the option to use shards so the whole index has to be in a 
 machine...

 The size of the index is about 50Gb and the ram is 8GbEverything is 
 working but the searches are so slowly, although I tried different 
 configurations of the solrconfig.xml as:

 - configure a first searcher with the most used searches
 - configure the caches (query, filter and document) with great numbers...

 but everything is still working slowly, so do you have any ideas to boost 
 the searches without the penalty to use much more ram?

 Thanks in advance,

 Jesús

 --
 ...
      __
    /   /       Jesús Martín García
 C E / S / C A   Tècnic de Projectes
  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya

 Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
 T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
 ...





Re: millions of records problem

2011-10-17 Thread Otis Gospodnetic
Hi Jesús,

Others have already asked a number of relevant question.  If I had to guess, 
I'd guess this is simply a disk IO issue, but of course there may be room for 
improvement without getting more RAM or SSDs, so tell us more about your 
queries, about disk IO you are seeing, etc.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Jesús Martín García jmar...@cesca.cat
To: solr-user@lucene.apache.org
Sent: Monday, October 17, 2011 6:19 AM
Subject: millions of records problem

Hi,

I've got 500 millions of documents in solr everyone with the same number of 
fields an similar width. The version of solr which I used is 1.4.1 with lucene 
2.9.3.

I don't have the option to use shards so the whole index has to be in a 
machine...

The size of the index is about 50Gb and the ram is 8GbEverything is 
working but the searches are so slowly, although I tried different 
configurations of the solrconfig.xml as:

- configure a first searcher with the most used searches
- configure the caches (query, filter and document) with great numbers...

but everything is still working slowly, so do you have any ideas to boost the 
searches without the penalty to use much more ram?

Thanks in advance,

Jesús

-- ...
      __
    /   /       Jesús Martín García
C E / S / C A   Tècnic de Projectes
  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya

Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
...





Re: millions of records problem

2011-10-17 Thread Vadim Kisselmann
Hi,
a number of relevant questions is given.
i have another one:
which type of docs do you have? Do you add some new docs every day? Or is it
a stable number of docs (500Mio.) ?
What about Replication?

Regards Vadim


2011/10/17 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi Jesús,

 Others have already asked a number of relevant question.  If I had to
 guess, I'd guess this is simply a disk IO issue, but of course there may be
 room for improvement without getting more RAM or SSDs, so tell us more about
 your queries, about disk IO you are seeing, etc.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Jesús Martín García jmar...@cesca.cat
 To: solr-user@lucene.apache.org
 Sent: Monday, October 17, 2011 6:19 AM
 Subject: millions of records problem
 
 Hi,
 
 I've got 500 millions of documents in solr everyone with the same number
 of fields an similar width. The version of solr which I used is 1.4.1 with
 lucene 2.9.3.
 
 I don't have the option to use shards so the whole index has to be in a
 machine...
 
 The size of the index is about 50Gb and the ram is 8GbEverything is
 working but the searches are so slowly, although I tried different
 configurations of the solrconfig.xml as:
 
 - configure a first searcher with the most used searches
 - configure the caches (query, filter and document) with great numbers...
 
 but everything is still working slowly, so do you have any ideas to boost
 the searches without the penalty to use much more ram?
 
 Thanks in advance,
 
 Jesús
 
 -- ...
   __
 /   /   Jesús Martín García
 C E / S / C A   Tècnic de Projectes
   /__ / Centre de Serveis Científics i Acadèmics de Catalunya
 
 Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
 T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
 ...