Re: How to get terms of a particular field of a particular document

Michael Wechner Sun, 12 Nov 2023 14:36:20 -0800

Thanks again, whereas I think I have found now what I wanted (without needing 
the Highlighter):


IndexReader reader = DirectoryReader.open(„index_directory");
log.info("Get terms of document ...");
TokenStream stream = TokenSources.getTokenStream(„field_name", null, text, 
analyzer, -1);
stream.reset();
while (stream.incrementToken()) {
    log.info("Term: " + stream.getAttribute(CharTermAttribute.class));
}
stream.close();
reader.close()

Thanks

Michael




> Am 12.11.2023 um 22:00 schrieb Mikhail Khludnev <[email protected]>:
> 
> it's something over there
> https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159
> 
> 
> On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner <[email protected]>
> wrote:
> 
>> Hi Mikhail
>> 
>> Thank you very much for your feedback!
>> 
>> I have found various examples for the first option when running a query,
>> e.g.
>> 
>> https://howtodoinjava.com/lucene/lucene-search-highlight-example/
>> 
>> but don't understand how to implement the second option, resp. how to
>> get the extracted terms of a document field independent of a query?
>> 
>> Can you maybe give a code example?
>> 
>> Thanks
>> 
>> Michael
>> 
>> 
>> 
>> Am 12.11.23 um 18:46 schrieb Mikhail Khludnev:
>>> Hello,
>>> This is what highlighters do. There are two options:
>>>  - index termVectors, obtain them in search time.
>>>  - obtain the stored field value, analyse it again, get all terms.
>>>  Good Luck
>>> 
>>> On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner <
>> [email protected]>
>>> wrote:
>>> 
>>>> HI
>>>> 
>>>> IIUC I can get all terms of a particular field of an index with
>>>> 
>>>> IndexReader reader = DirectoryReader.open(„index_directory");
>>>> List<LeafReaderContext> list = reader.leaves();
>>>> for (LeafReaderContext lrc : list) {
>>>>     Terms terms = lrc.reader().terms(„field_name");
>>>>     if (terms != null) {
>>>>         TermsEnum termsEnum = terms.iterator();
>>>>         BytesRef term = null;
>>>>         while ((term = termsEnum.next()) != null) {
>>>>             System.out.println("Term: " + term.utf8ToString());
>>>>         }
>>>>     }
>>>> }
>>>> reader.close();
>>>> But how I can get all terms of a particular field of a particular
>> document?
>>>> Thanks
>>>> Michael
>>>> 
>>>> P.S.: Btw, does it make sense to update the Lucene FAQ
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments
>>>> ?
>>>> with the code above?
>>>> I can do this, but want to make sure, that I don’t update it in a wrong
>>>> way.
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

Reply via email to