I confess that I've just skimmed your e-mail, but there's absolutely no requirement that the entire index fit in RAM. The fact that your index is larger than available RAM isn't the reason you're hitting OOM.
Typical reasons for this are: 1> you're sorting on a field with many, many, many unique values. If you're sorting on a fine-grained timestamp, this is quite possible. 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching on, say, one-letter wildcards. 3> many other reasons. I agree with Jacob, jumping into a multi-machine solution without understanding the problem in detail may not be your best course. So, can you tell us more about the conditions under which you hit OOM? Maybe with more details we can come up with better solutions. If you absolutely *must* implement a multi-machine solution, have you seen ParallelMultiSearcher? Best Erick On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zha...@gmail.com> wrote: > Yes, exactly 'distributed'... > From maintenance point of view, the 'horizontal' expandable is very > important. > For my case, the data file is a kind of 'history' file, categorized > by date. Once the data file is indexed, it will not change, unless > the searching fields changed. > Say I make whole ten years data indexed, generated 400G index, > requiring 8G ram. When I do backup, I have to backup the entire 400G > every time. I need another 8G machine for backup. And 8G is not > enough, the index is increasing everyday. > Compare to distributed solution, I can split the index by year or by > seasons. Say I have 10x40G index. I can easily run 10 jvm process > each with 1G heap space, in 3-5 low cost not dedicated x86 machines. > Consider the backup, 9 of 10 indexes are old, only need backup once, > they won't change. only 1 hot index is changing everyday, so I just > backup up to 40G. The spare machine is also very cheap. And the > machines are so cheap, I can use VMs to run this, it's more flexible > in resource management. As time goes by, I just install new jvm > instance when needed. I don't worry about ram and search speed > anymore. > I do think there should be more bigger cases out there just like mine. > The general distributed Lucene will be very useful. It will bring > Lucene to more enterprise applications, or more bigger, industry > applications. > > > 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: > > Sounds like you may need to have some sort of distributed system, I just > > wanted to make sure you were aware of the cost/benifits of just buying a > big > > 62bit/8Gb ram machine, vs having to not only maintain and power several > 32 > > bit machines, but also maintain and support your now more complicated > code. > > > > I have seen it too many times developers/companies spend so much money in > > not just the initial development, but long term support and maintenance > that > > could have been simplified by just buying a bigger/better more powerful > > machine in the first place. > > > > I am interested to see what other people have to say about how to solve > your > > problem. > > > > Best regards, > > Jacob > > > > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote: > > > >> My data is categorized by date. About 14M+ docs per month, 37M+ terms. > >> When I use 1G heap size to do search of 10 month index, I got OOM. > >> The problem is I can't increase heap size in an easy way. > >> I have several machines, all 32bit windows, 4G ram. > >> And my goal is to index 10 year's data, plus more data every day ! > >> If I put all of them together, I will need 8G+ ram to run search. > >> Maybe another 8G+ ram to run indexwriter. > >> > >> I think to split large index into smaller indexes and use a group of > >> machines to work as one is more flexible and faster compare to one > >> huge ram machine. > >> Any suggestions ? beside more rams. > >> > >> > >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: > >>> > >>> Not sure how large your index is, but it might be easier (if possible > to > >>> increase your memory) than to develop a fairly complicated alternative > >>> strategy. > >>> > >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote: > >>> > >>>> Hi, all > >>>> I'm facing a large index, on a x86 win platform which may not have big > >>>> enough jvm heap space to hold the entire index. > >>>> So, I think it's possible to split the index into several smaller > >>>> indexes, run them in different jvm instances on different machine. > >>>> Then for each query, I can concurrently run it one every indexes and > >>>> merge the result together. > >>>> This can be a workaround of OutOfMemory issue. > >>>> But before I start to do this, I want to ask if Lucene already have a > >>>> solution for things like this. > >>>> Thanks. > >>>> > >>>> -- > >>>> > >>>> Best Regards, > >>>> ZHAO, Wenbo > >>>> > >>>> ======================= > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>> > >>> > >>> ____________________________________ > >>> Information Technology Services, > >>> The University of Melbourne > >>> > >>> Email: jrho...@unimelb.edu.au > >>> Phone: +61 3 8344 2884 > >>> Mobile: +61 4 1095 7575 > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > >> > >> > >> > >> -- > >> > >> Best Regards, > >> ZHAO, Wenbo > >> > >> ======================= > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > ____________________________________ > > Information Technology Services, > > The University of Melbourne > > > > Email: jrho...@unimelb.edu.au > > Phone: +61 3 8344 2884 > > Mobile: +61 4 1095 7575 > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > -- > > Best Regards, > ZHAO, Wenbo > > ======================= > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >