Hi Christian, Typically for public facing applications the desire is to have search results be sub-second. For some applications waiting even a minute or more is OK.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Christian Brennsteiner <christ...@brennsteiner.at> > To: java-user@lucene.apache.org > Sent: Monday, December 22, 2008 2:55:01 AM > Subject: Re: lucene suiteable ? 6 mio recods / day 1k > > hi otis, > > i think that out of 2 k 80 % can be stemmed and many of the words are > duplicates so they would not need full space. > can you give me an idea what in your opinion would mean "don't need > queries to be quick" ... > i have no idea in what timeframe it could be handeled if it is not > completely in RAM. > > regards chris > > > > On Mon, Dec 22, 2008 at 4:41 AM, Otis Gospodnetic > wrote: > > Christian > > > > You can certainly purge old documents on a daily basis in order to keep the > corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much > for > a single index unless you really have lots of RAM or you don't need queries > to > be quick. In other words, you may have to spread this over multiple > indices/machines. > > > > > > Otis -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: Christian Brennsteiner > >> To: java-user@lucene.apache.org > >> Sent: Friday, December 19, 2008 6:22:40 AM > >> Subject: lucene suiteable ? 6 mio recods / day 1k > >> > >> hi *, > >> > >> i am searching for a fulltext index capeable of the following requirements: > >> > >> index everyday 3 000 000 new records with a validity of N days (e.g. > >> 90 days expiration) > >> == 34,7 / s > >> one record is e.g. an url and can be up to 2 k big > >> > >> http://example.com/somedir/some.html > >> > >> lucene should use "/" as a word seperator and should e.g. eliminate all ":" > >> > >> so the following "sentence" shoule be indexed: > >> > >> http example.com somedir some.html when having the url > >> http://example.com/somedir/some.html > >> > >> my main concern about this requirement is that the index should not > >> grow over time as it always holds > >> NR OF DAYS * RECORDS PER DAY and expires the records after a given > >> time. in my opinione ther must be some background thread always > >> throwing away expired hits. > >> > >> is this easilly possible with lucene? > >> > >> regards chris > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > -- > --------------- > Christian Brennsteiner > Linzergasse 21 / 14 > 5020 Salzburg > Austria / Europe > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org