The norms are modded so each norm value is stored as 4 byte instead of 1 byte, 
this modification is using more memory. But anyway the hw we are running on are
2x 8 cpu hp servers with 16 gig ram in each of them. 

We are scaling the index on daterange (and the ranking is modified to sort by 
date) 

Ex:
2007-01 - 2007-06  > ix 1
2007-07 - 2007-12 > ix 2

Each index is hosted in a separate hosting application and we have a layer 
infront of all indexes to merge the results. So it is our own "search engine" 
we tried solr but I didn't really liked it + we had to do some modifications to 
support the business. 

Since we are running 32 bit OS (we are going to use 64 bit soon it will be 
interesting) on windows with pae each process can consume just 2-2.5gb memory 
so we are having a lot of indexes on the same machine. 

We did this some years ago and we run it 24/7 without any failure, indexing 
approx 200-300k new articles every day. 


Any way... We really need to find a good api / some one that knows how to add 
inverted searching to lucene. 

/Regards
Marcus






-----Ursprungligt meddelande-----
Från: Mark Miller [mailto:[EMAIL PROTECTED] 
Skickat: den 16 januari 2008 19:14
Till: java-user@lucene.apache.org
Ämne: Re: Inverted search / Search on profilenet

Don't have any info to add, but out of curiosity, what kind of setup are you
using to host the 300 mil archive? Is the index distributed? Single machine?
Solr?

Thanks,

Mark

On Jan 16, 2008 12:27 PM, Marcus Falk <[EMAIL PROTECTED]> wrote:

> Hi again,
>
>
>
> Today we are hosting a 300 million large search index without any
> problems in a lucene environment, with just some customization in the
> lucene api for ranking etc...
>
>
>
> So we are really satisfied with lucene.
>
>
>
> We also have the demands to search with documents on profiles we are
> currently using verity (autonomy) for this, where we store the profiles
> in the index and are using the document as query.
>
> The verity api we are using seems to have some internal threading
> problems (race conditions) so we need to find another way to perform
> those kind of searches.
>
>
>
> Does anybody have any ideas of any api that could do this for us? Any
> ideas on how lucene could be modified to do this kind of searches?
>
>
>
> The volumes are around 300k full length articles distributed some what
> evenly over a 24h period on a 50 k profilenet.
>
>
>
>
>
> /Mvh
>
> Marcus
>
>
>
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to