Perhaps some kind of in memory index would be better than iterating an array? Binary tree or so. I did similar with polygon indexes and point data. It requires careful memory planning on the nodes if the indexes are large (mine were several GB).
Just a thought, Tim On Sat, May 16, 2009 at 1:56 PM, PORTO aLET <portoa...@gmail.com> wrote: > Hi, > > I am trying to include the stop words into hadoop map reduce, and later on, > into hive. > What is the accepted solution regarding the stop words in hadoop? > > All I can think is to load all the stop words into an array in the mapper, > and then check each token against the stop words..(this would be O(n^2) ) > > Regards >