Perhaps some kind of in memory index would be better than iterating an
array?  Binary tree or so.
I did similar with polygon indexes and point data.  It requires
careful memory planning on the nodes if the indexes are large (mine
were several GB).

Just a thought,

Tim

On Sat, May 16, 2009 at 1:56 PM, PORTO aLET <portoa...@gmail.com> wrote:
> Hi,
>
> I am trying to include the stop words into hadoop map reduce, and later on,
> into hive.
> What is the accepted solution regarding the stop words in hadoop?
>
> All I can think is to load all the stop words into an array in the mapper,
> and then check each token against the stop words..(this would be O(n^2) )
>
> Regards
>

Reply via email to