Thanks so much for your explaination. But there is one thing that I want to
make sure is that in case that i add the same token to the same field,
internally is it redundancy?

And in case, that I have many fields. What is the best way to list up the
frequency of all the tokens from different fields without knowing in advance
what are the tokens that we have.

Once again, thank you very much for your reply.

Best regards,

Sengly


On 4/4/07, Erick Erickson <[EMAIL PROTECTED]> wrote:

See below

On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote:
>
> Dear all,
>
> My problem is a little bit strange. Instead of parsing the content of
the
> document to the indexer. I am adding one by one. Here is a piece of my
> code
> :
>
> Document doc = new Document();
> doc.add(Field.Text("Features", "blue");
> doc.add(Field.Text("Features","beautiful");
> doc.add(Field.Text("Features", "black");
> doc.add(Field.Text("Features","white");
> doc.add(Field.Text("Features", "blue");
> doc.add(Field.Text("Features","blue");
>
> I'd like to know whether the internal representation of this is like
when
> we
> add at once the whole content of the document as a long string?


There's no internal difference unless you implement an Analyzer that
returns a value other than 1 from getPositionIncrementGap(). Why would
you do this you ask? Occasionally, it is useful to index data in the
same field but NOT allow proximity queries to span separate
additions. Imagine indexing a book and, for some reason, you
didn't want span queries to cross, say, chapters. You could
do something like
doc.add("page", <contents of page 1>);
doc.add("page", <contents of page 2>);



then in your Analyzer, have something in getPositionIncrementGap
like
if (page is beginning of chapter)  {
   return 1000;
} else {
return 1;
}

Now, no Span query with a slop of less than 1,000 would
match from the last page in one chapter to the first page of another.

FWIW
Erick

I'd like
> also to count the number of "blue" occurence from the document. How to
do
> this?


See the other reply <G>.

Thank you very much for your suggestion in advance.
>
> Regards,
>
> Sengly
>

Reply via email to