On Sep 22, 2011, at 4:59 AM, Ian Lea wrote:
>> I am not analyzing the title
>>
>> Field titleField = new Field("title", article.getTitle(),Field.Store.YES,
>> Field.Index.NOT_ANALYZED);
>
> OK. But the output you quote says "no match on required clause
> (title:List of newspapers in New York)" so something is out of synch
> somewhere.
i am reindexing the content with no analysis in case.
>
> What does Luke show? See
luke shows the title as unanalyzed text.
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
> for more things to check.
i'll walk through them as soon as i can.
>
>> Do you think booleanquery is the right approach for solving the problem
>> (finding lucene score of a word or a phrase in _a_ particular document)?
>
> Sounds OK to me. You could look at the contrib MemoryIndex as a
> possible alternative.
thanks for your help Ian
Peyman
>
>
> --
> Ian.
>
>
>> On Sep 21, 2011, at 1:00 PM, Ian Lea wrote:
>>
>>> How is the "title" field indexed? Seems likely it is analyzed in
>>> which case a TermQuery won't match because "list of newspapers in New
>>> York" would be analyzed into terms "list", "newspapers", "new", "york"
>>> assuming things were lowercased, stop words removed etc.
>>>
>>> Maybe you need your "word" as TermQuery, assuming it is lowercased
>>> etc., and pass the title through query parser. In other words,
>>> reverse what you've got for the two fields.
>>>
>>> As for performance, first narrow down where it is taking the time. If
>>> it is in lucene, read
>>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>>>
>>>
>>> --
>>> Ian.
>>>
>>> On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <[email protected]>
>>> wrote:
>>>> Hi
>>>>
>>>> The problem I would like to solve is determining the lucene score of a
>>>> word in _a particular_ given document. The 2 candidates i have been trying
>>>> are
>>>>
>>>> - QueryWrapperFilter
>>>> - BooleanQuery
>>>>
>>>> Both are to restrict search within a search space. But according to Doug
>>>> Cutting QueryWrapperFilter option is less preferable than Boolean Query.
>>>> However, I am experiencing both performance (very slow) and response
>>>> problems (query is not matched to any doc).
>>>>
>>>> The setup is as follows. Given a user query "word":
>>>>
>>>> QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new
>>>> StandardAnalyzer(Version.LUCENE_32));
>>>> Query query = parser.parse(word);
>>>> Document d = WikiIndexSearcher.doc(match.doc);
>>>> docTitle = d.get("title");
>>>> TermQuery titleQuery = new TermQuery(new Term("title", docTitle));
>>>> BooleanQuery bQuery = new BooleanQuery();
>>>> bQuery.add(titleQuery, BooleanClause.Occur.MUST);
>>>> bQuery.add(query, BooleanClause.Occur.MUST);
>>>> TopDocs hits = WikiIndexSearcher.search(bQuery, 1);
>>>>
>>>> In other words, find a wikipedia doc with a particular title (in example
>>>> below it is "list of newspapers in New York
>>>> http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York"). We then
>>>> create a boolean term query with that must match on the title and content
>>>> must match the user query ('american' in the example below).
>>>>
>>>> Here is the output of a run on user query "american" in a doc with title
>>>> "list of newspapers in New York").
>>>>
>>>> ... QUERY: content:american
>>>> ... doc: List of newspapers in New York
>>>> ... query: +title:List of newspapers in New York +content:american
>>>> ... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of
>>>> required/prohibited clause(s)
>>>> 0.0 = no match on required clause (title:List of newspapers in New York)
>>>> 0.011818626 = (MATCH) weight(content:american in 212081), product of:
>>>> 0.15625292 = queryWeight(content:american), product of:
>>>> 2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>>>> 0.0645564 = queryNorm
>>>> 0.075637795 = (MATCH) fieldWeight(content:american in 212081), product
>>>> of:
>>>> 1.0 = tf(termFreq(content:american)=1)
>>>> 2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>>>> 0.03125 = fieldNorm(field=content, doc=212081)
>>>>
>>>> As you can see there is no match to the query (and hits.totalcounts is 0).
>>>> The search is very slow too.
>>>>
>>>> Any help would be much appreciated
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]