That number of docs is far more than I've ever worked with but I'm still surprised it takes 4 minutes to initialize an index reader.
What exactly do you mean by initialization? Show us the code that takes 4 minutes. What version of lucene? What OS? What disks? -- Ian. On Wed, Mar 20, 2013 at 6:21 PM, ash nix <nixd...@gmail.com> wrote: > Thanks Ian. > > Number of documents in index is 381,153,828. > The data set size is 1.9TB. > The index size of this dataset is 290G. It is single index. > The following are the fields indexed for each of the document. > > 1. Document id : It is StoredField and is generally around 128 chars or more. > 2. Text field: It is TextField and not stored. > 3. Title : it is a Textfield and not stored. > 4. anchor : It is Textfield and not stored. > 5. Timestamp : DoubleDocValue field and not stored. Actually this > should be DoubleField and I need to fix it. > > Initialization of indexreader at the start of search takes approximately 4 > min. > After initialization , I am executing a series of Boolean AND queries > of 2-3 terms. Each search result is dumped with some information on > score and doc id in a output file. > > The resident size (RES) of process is 1.7 Gigs. > The total virtual memory (VIRT) is 307 Gig. > > Do you think this is okay? > Do you think I should use Solr instead of using lucene core? > > I have times tamps for document and so I can split into multiple > indexes sorted on chronology. > > Thanks, > Ashwin > > On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea <ian....@gmail.com> wrote: >> Searching doesn't usually use that much memory, even on large indexes. >> >> What version of lucene are you on? How many docs in the index? What >> does a slow query look like (q.toString()) and what search method are >> you calling? Anything else relevant you forgot to tell us? >> >> >> Or google "lucene sharding" if you are determined to split the index. >> >> >> -- >> Ian. >> >> >> On Wed, Mar 20, 2013 at 5:12 PM, ash nix <nixd...@gmail.com> wrote: >>> Hi Everybody, >>> >>> I have created a single compound index which is of size 250 Gigs. >>> I open a single index reader to search simple boolean queries. >>> The process is consuming lot of memory search painfully slow. >>> >>> It seems that I will have to create multiple indexes and have multiple >>> index readers. >>> Can anyone suggest me good blog or documentation on creating multiple >>> indexes and performing parallel search. >>> >>> -- >>> Thanks, >>> A >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > Thanks, > A > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org