Thank you Sami Siren, Andrzej Bialecki, Devaraj Das and Mahadev Konar for your inputs. I finally was able to get past 1 million with 2 changes.
1. Reduced the document size significantly. 2. Increased the file-hanldle limit from 1024 to 4096. These 2 did the magic. I was able to successfully process 5 million docs. Planning a test for processing 25 million. I'll keep things posted. Thanks, Venkat --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Venkat Seeth wrote: > > Hi Andrzej, > > > > A quick question on your suggestion. > > > > > >>> Configuration: > >>> I have about 128 maps and 8 reduces so I get to > >>> > > create 8 partitions of my index. > > > > > >> I think that with this configuration you could > >> > > increase the number of > > > >> reduces, to decrease the amount of data each > reduce > >> > > task has to handle. > > > >> In your current config you run at most 2 reduces > per > >> > > machine. > > > > You suggested to increase the number of reduces. I > did > > come up with 8 partitions for my index each > containing > > about 10 million documents. > > > > Are you saying I could probably create 32 > partitions > > and then later merge into smaller number of > > partitions? > > > > If I have a huge number of partitions, I do not > know > > how it'll affect federating search across these > large > > number of indexes and merging the results from > those > > searches. > > > > Any thoughts are greatly appreciated. > > > > > The only reason I suggested to increase the number > of reduces is to get > you past the memory problems. From the search > performance point of view > you should definitely merge partial indexes. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ > __________________________________ > [__ || __|__/|__||\/| Information Retrieval, > Semantic Web > ___|||__|| \| || | Embedded Unix, System > Integration > http://www.sigram.com Contact: info at sigram dot > com > > > ____________________________________________________________________________________ Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. http://tools.search.yahoo.com/toolbar/features/mail/