Re: How to Index each file and then each Line for Complete Phrase Match. Sample Data shown.

Michael McCandless Tue, 06 Aug 2013 06:04:31 -0700

The suggester builds its own index when you call the build() method;
you need to provide a TermFreqIterator that iterates over all your
suggestions.


Each suggester has different tradeoffs, e.g. the FST based suggesters
are prefix-only matching, while AnalyzingInfixSuggester will suggest
based on non-prefix matches.  You can see AnalyzingInfixSuggester
running at http://jirasearch.mikemccandless.com e.g. try typing fst.

If you want spell-checker like behavior, use FuzzySuggester, which
allows up to 2 "edits" when finding a matching suggestion.

Once the suggester is built, use the lookup method to find suggestions ...

Mike McCandless

http://blog.mikemccandless.com


On Tue, Aug 6, 2013 at 3:39 AM, Ankit Murarka
<[email protected]> wrote:
> Hello.
>
> I dont seem to figure out what to use. Started with AnalyzingSuggester and
> passed StandardAnalyzer to its constructor.
>
> But essentially in order to get the suggestions, I will have to index the
> already indexed document. Now how do I index it again using this
> AnalyzingSuggester.
>
> I cannot use SpellChecker with this as this seem to accept only Analyzer and
> not AnalyzerSuggester.
>
> Is there a different way of using this AnalyzingSuggester to get the search
> suggestion..
>
> Also, verified from the Luke, that indexing the document with
> LineNumberReader is actually working properly. Each line is being separately
> indexed.
>
> Now how do I go about implementing this phrase did you mean search ???
>
>
> On 8/5/2013 5:08 PM, Michael McCandless wrote:
>>
>> Why not use one of the suggesters under lucene/suggest/*?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Aug 5, 2013 at 4:49 AM, Ankit Murarka
>> <[email protected]>  wrote:
>>
>>>
>>> Hello.
>>>
>>> 1. What I am trying to implement is "Complete Suggestion Match-Did You
>>> Mean
>>> feature for a phrase. I did it for Single Word. I want to do it now for
>>> Sentence."
>>>
>>> 2. What my understanding of indexing each line as a valid phrase in a
>>> particular file is as follows:
>>>
>>> a. Instead of providing a directory name to index, give file name.
>>> b. Following code to read each line..  This might be wrong as I am not
>>> fully
>>> aware of how to index each log line as a valid phrase and not the
>>> individual
>>> words.
>>>
>>>
>>>       LineNumberReader lnr = new LineNumberReader( new FileReader( new
>>>
>>> File("D:\\Lucene\\FileSearch\\Memo-1094.20130722-005200_10761334-10771333.txt")))
>>> ;
>>>           String line=null;
>>>            while( null != (line = lnr.readLine()) ){
>>>                doc.add(new TextField("contents",line,Field.Store.YES));
>>>            }
>>>
>>> c. Using StandardAnalyzer and storing the index in a separate location.
>>>
>>> Now, Obviously after this I ran into problem. I provided this index to
>>> SpellCheck to create its own index using this and then invoked SpellCheck
>>> similar method to give me suggestions. I got only 1 word as the
>>> suggested.
>>>
>>> Now I know I have done a terrible mistake over here but don't seem to
>>> figure
>>> out.
>>>
>>> I guess I need to index the whole line as a Phrase (present in the file)
>>> as
>>> a spellchecker suggestion. Wondering what can be the possible approach.
>>> Any
>>> help will be highly appreciated.
>>>
>>>
>>> On 8/3/2013 7:25 PM, Jack Krupansky wrote:
>>>
>>>>
>>>> Why not start with something simple? Like, index each log line as a
>>>> tokenized text field and then do PhraseQuery against that text field? Is
>>>> there something else you need beyond that?
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Ankit Murarka
>>>> Sent: Saturday, August 03, 2013 3:22 AM
>>>> To: [email protected]
>>>> Subject: How to Index each file and then each Line for Complete Phrase
>>>> Match. Sample Data shown.
>>>>
>>>> Hello All,
>>>>
>>>> I have this mentioned in the log file. Till now I am indexing the
>>>> complete directory containing files which contain data like this:
>>>>
>>>> Now I need to index each line of the file to implement complete phrase
>>>> search. I intend to store phrases in index and then use SpellChecker API
>>>> to suggest me similar phrases.
>>>>
>>>> 7/20/2013 7:45 *package execution happening-1
>>>> * FATAL *check request has been sent for instance* Ip:Port
>>>> *EXCEPTION*
>>>> 7/20/2013 7:45 *This is not working perfectly
>>>> * DEBUG *check request for instance being received is status=200
>>>> * Ip:Port *EXCEPTION*
>>>> 7/20/2013 7:45 *Encountering a constant error.
>>>> * DEBUG *response is not proper.Expecting some more information on
>>>> this detail.
>>>> * Ip:Port *EXCEPTION*
>>>> 7/20/2013 7:45 *This needs urgent attention
>>>> * FATAL *I am still trying to ensure it is running perfectly.
>>>> Encountering some issues.
>>>> * Ip:Port *EXCEPTION*
>>>>
>>>> 7/20/2013 8:01 *Job is running fine.*
>>>> INFO
>>>>
>>>> *************************************************************************\
>>>>
>>>> *Exception Occured in ClassFactory* * Function()
>>>> java.nullPointerException: Value is null
>>>> * *Should not be null*
>>>>
>>>> To implement complete phrase search I reckon I need to index each line
>>>> and
>>>> store the phrase .*Phrases in the above mentioned table are highlighted
>>>> in
>>>> Bold.*
>>>>
>>>> So, if I am able to index these and store these phrases as indexes, so
>>>> when User tries to search for "package executing",
>>>>
>>>> the Lucene would be able to provide me "package execution happening-1"
>>>> as
>>>> a valid suggestion..
>>>>
>>>> These columns does not have a name to them and hence I cannot index
>>>> based
>>>> on column name. Also as shown in the table above, first column may
>>>> contain
>>>> time/date or a phrase in itself (shown in last row).
>>>>
>>>> Please suggest. How is it possible using Lucene and its API. Javadoc
>>>> does
>>>> not seem to guide me anywhere for this case.
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: How to Index each file and then each Line for Complete Phrase Match. Sample Data shown.

Reply via email to