Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-04 Thread SergeyG

Otis,

Here're the logs - method calls along with their outputs (sorry for the bulk
data :) ). I compared 3 runs.


1) GetMethod
 a) url=http://localhost:8080/solr/mlt
 b)
query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score

Output:
INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand():
url=http://localhost:8080/solr/mlt
;
queryString=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+sc
ore
 INFO MLT2SearchRequestProcessor:76 - 

002.098612S.G.SG_Book0.28923997O. HenryS.G.Four
Million, The0.08667877Katherine
MosbyThe Season of Lillian Dawes0.07947738Jerome K. JeromeThree Men in a
Boat0.047219563Charles
OliverS.G.ABC's of Science1.01.01.01.01.0



2) GetMethod
 a) url=http://localhost:8080/solr/select
 b)
query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score

Output:

INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand():
url=http://localhost:8080/solr/sel
ect;
queryString=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=tit
le+author+score
 INFO MLT2SearchRequestProcessor:76 - 

015title author scorecontent_mltid:10truedetails5<
/lst>2.098612S.G.SG_Book0.24578805O.
HenryS.G.Four Million, The0.22171465Jerome K. JeromeThree Men in a
Boat0.22018899Katherine
MosbyTh
e Season of Lillian Dawes0.098666154
Charles OliverS.G.ABC's of
Scienceid:10id:10id:10id:10
2.098612 = (MATCH) weight(id:10 in 3), product of:
  0.9994 = queryWeight(id:10), product of:
2.0986123 = idf(docFreq=1, numDocs=5)
0.47650534 = queryNorm
  2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
1.0 = tf(termFreq(id:10)=1)
2.0986123 = idf(docFreq=1, numDocs=5)
1.0 = fieldNorm(field=id, doc=3)
OldLuceneQParser15.00.00.00.00.00.00.015.00.00.015.00.00.0



3) SolrJ call
 a) url=http://localhost:8080/solr
 b)
query=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+auth
or+score

Output:

INFO MLTSearchRequestProcessor:45 - SolrServer url:
http://localhost:8080/solr
 INFO MLTSearchRequestProcessor:51 - id = 10
 INFO MLTSearchRequestProcessor:53 - constructedQuery> id:10
 INFO MLTSearchRequestProcessor:63 - solrQuery>
q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score
 INFO MLTSearchRequestProcessor:69 - Number of docs found = 1
 INFO MLTSearchRequestProcessor:73 - title = SG_Book; score = 2.098612


One can see that the results of 2 runs with GetMethod are almost identical:
docs found and their weights are the same. (Although the values themselves
are doubtful: for example, the response contains the original doc, though it
wasn't supposed to be in the returned list of "more like this" docs. Then
its weight shows that its id=10 was found in three other docs what shouldn't
be like that. (Or it's just that rare coincidence that 10 is among the most
important terms of this doc and other docs happen to contain it. But it
looks very unlikely. Or I simply misinterpret it?) Plus individual weights
for "intestingTerms" are the same (1.0) and that's also questionable. 
And the 3rd run (SolrJ call) returned just the original doc (with the same
weight as in the first two calls).

Maybe the problem lurks somewhere in solrconfig.xml? Now I don't have a
slightest idea where to look for a hint.

Anyway, it's a holiday today. (Hopefully my message doesn't interrupt it. :)
)

Have a great 4th of July!

Sergey


Otis Gospodnetic wrote:
> 
> 
> Sergey,
> 
> I think I confused you.  The comment about the fields listed in the "fl"
> parameter has nothing to do with the SolrJ calls not working.
> 
> For SolrJ calls not working my suggestion is to look at the logs and
> compare the GetMethod call with the SolrJ call.  Paste them if you want
> more people to look at them.
> 
> 
> Otis 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: SergeyG 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, July 3, 2009 4:08:37 AM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> 
>> Otis,
>> 
>> Thanks a lot. I'd certainly follow your advice and check the logs.
>> Although,
>> I must say that I've already tried all possible variations of the string
>> for
>> the "fl" parameter (spaces, commas, plus signs). More than that - the
>> query
>> still doesn't want to fetch any docs (other than the one with the id
>> specified in the query) even when the line solrQuery.setParam("fl",
>> "title
>> auth

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-03 Thread Otis Gospodnetic

Sergey,

I think I confused you.  The comment about the fields listed in the "fl" 
parameter has nothing to do with the SolrJ calls not working.

For SolrJ calls not working my suggestion is to look at the logs and compare 
the GetMethod call with the SolrJ call.  Paste them if you want more people to 
look at them.


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: SergeyG 
> To: solr-user@lucene.apache.org
> Sent: Friday, July 3, 2009 4:08:37 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Otis,
> 
> Thanks a lot. I'd certainly follow your advice and check the logs. Although,
> I must say that I've already tried all possible variations of the string for
> the "fl" parameter (spaces, commas, plus signs). More than that - the query
> still doesn't want to fetch any docs (other than the one with the id
> specified in the query) even when the line solrQuery.setParam("fl", "title
> author score"); is commented out. So I suspect that the problem is that the
> request with the url
> "http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..."; due
> to some reason doesn't work properly. And when I use the GetMethod(url)
> approach and send url directly in the form
> "http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...";, Solr picks up
> the mlt component. (At least, I'll have this backup solution if the main one
> keeps committing sabotage. :) I'll just need to add a parser for an incoming
> xml-response.)
> 
> I'll continue my "research" of this issue and, if you're interested in
> results, I'll definitely let you know.
> 
> Cheers,
> Sergey
> 
> 
> Otis Gospodnetic wrote:
> > 
> > 
> > Sergey,
> > 
> > Glad to hear the suggestion worked!
> > 
> > I can't spot the problem (though I think you want to use a comma to
> > separate the list of fields in the fl parameter value).
> > I suggest you look at the servlet container logs and Solr logs and compare
> > requests that these two calls make.  Once you see what how the second one
> > is different from the first one, you will probably be able to figure out
> > how to adjust the second one to produce the same results as the first one.
> > 
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: SergeyG 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, July 2, 2009 6:17:59 PM
> >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> >> 
> >> 
> >> Otis,
> >> 
> >> Your recipe does work: after copying an indexing field and excluding stop
> >> words the MoreLikeThis query started fetching meaningful results. :)
> >> 
> >> Just one issue remained. 
> >> 
> >> When I execute query in this way:
> >> 
> >> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
> >> HttpClient client = new HttpClient();
> >> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt";);
> >> get.setQueryString(query);
> >> client.executeMethod(get);
> >> ...
> >> 
> >> it works fine bringing results as an XML string. 
> >> 
> >> But when I use "Solr-like" approach:
> >> 
> >> String query = "id:1";
> >> solrQuery.setQuery(query);
> >> solrQuery.setParam("mlt", "true");
> >> solrQuery.setParam("mlt.fl", "content");
> >> solrQuery.setParam("fl", "title author score");
> >> QueryResponse queryResponse = server.query( solrQuery );
> >> 
> >> the result contains only one doc with id=1 and no other "more like" docs. 
> >> 
> >> In my solrconfig.xml, I have these settings: 
> >> ...
> >> 
> >> ...
> >> 
> >> I guess it all is a matter of syntax but I can't figure out what's wrong.
> >> 
> >> Thank you very much (and again, thanks to Michael and Walter).
> >> 
> >> Cheers,
> >> Sergey
> >> 
> >> 
> >> 
> >> Michael Ludwig-4 wrote:
> >> > 
> >> > SergeyG schrieb:
> >> > 
> >> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >> >> in the same app taking into account the fact that for the former to
> >> >> work the stop words list needs to be included and this results in the
> >> >> latter putting stop words among the most important words?
> >> > 
> >> > Why would the inclusion of a stopword list result in stopwords being of
> >> > top importance in the MoreLikeThis query?
> >> > 
> >> > Michael Ludwig
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24319269.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-03 Thread SergeyG

Otis,

Thanks a lot. I'd certainly follow your advice and check the logs. Although,
I must say that I've already tried all possible variations of the string for
the "fl" parameter (spaces, commas, plus signs). More than that - the query
still doesn't want to fetch any docs (other than the one with the id
specified in the query) even when the line solrQuery.setParam("fl", "title
author score"); is commented out. So I suspect that the problem is that the
request with the url
"http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..."; due
to some reason doesn't work properly. And when I use the GetMethod(url)
approach and send url directly in the form
"http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...";, Solr picks up
the mlt component. (At least, I'll have this backup solution if the main one
keeps committing sabotage. :) I'll just need to add a parser for an incoming
xml-response.)

I'll continue my "research" of this issue and, if you're interested in
results, I'll definitely let you know.

Cheers,
Sergey


Otis Gospodnetic wrote:
> 
> 
> Sergey,
> 
> Glad to hear the suggestion worked!
> 
> I can't spot the problem (though I think you want to use a comma to
> separate the list of fields in the fl parameter value).
> I suggest you look at the servlet container logs and Solr logs and compare
> requests that these two calls make.  Once you see what how the second one
> is different from the first one, you will probably be able to figure out
> how to adjust the second one to produce the same results as the first one.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message 
>> From: SergeyG 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, July 2, 2009 6:17:59 PM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> 
>> Otis,
>> 
>> Your recipe does work: after copying an indexing field and excluding stop
>> words the MoreLikeThis query started fetching meaningful results. :)
>> 
>> Just one issue remained. 
>> 
>> When I execute query in this way:
>> 
>> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
>> HttpClient client = new HttpClient();
>> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt";);
>> get.setQueryString(query);
>> client.executeMethod(get);
>> ...
>> 
>> it works fine bringing results as an XML string. 
>> 
>> But when I use "Solr-like" approach:
>> 
>> String query = "id:1";
>> solrQuery.setQuery(query);
>> solrQuery.setParam("mlt", "true");
>> solrQuery.setParam("mlt.fl", "content");
>> solrQuery.setParam("fl", "title author score");
>> QueryResponse queryResponse = server.query( solrQuery );
>> 
>> the result contains only one doc with id=1 and no other "more like" docs. 
>> 
>> In my solrconfig.xml, I have these settings: 
>> ...
>> 
>> ...
>> 
>> I guess it all is a matter of syntax but I can't figure out what's wrong.
>> 
>> Thank you very much (and again, thanks to Michael and Walter).
>> 
>> Cheers,
>> Sergey
>> 
>> 
>> 
>> Michael Ludwig-4 wrote:
>> > 
>> > SergeyG schrieb:
>> > 
>> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>> >> in the same app taking into account the fact that for the former to
>> >> work the stop words list needs to be included and this results in the
>> >> latter putting stop words among the most important words?
>> > 
>> > Why would the inclusion of a stopword list result in stopwords being of
>> > top importance in the MoreLikeThis query?
>> > 
>> > Michael Ludwig
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24319269.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Otis Gospodnetic

Sergey,

Glad to hear the suggestion worked!

I can't spot the problem (though I think you want to use a comma to separate 
the list of fields in the fl parameter value).
I suggest you look at the servlet container logs and Solr logs and compare 
requests that these two calls make.  Once you see what how the second one is 
different from the first one, you will probably be able to figure out how to 
adjust the second one to produce the same results as the first one.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: SergeyG 
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 6:17:59 PM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Otis,
> 
> Your recipe does work: after copying an indexing field and excluding stop
> words the MoreLikeThis query started fetching meaningful results. :)
> 
> Just one issue remained. 
> 
> When I execute query in this way:
> 
> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
> HttpClient client = new HttpClient();
> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt";);
> get.setQueryString(query);
> client.executeMethod(get);
> ...
> 
> it works fine bringing results as an XML string. 
> 
> But when I use "Solr-like" approach:
> 
> String query = "id:1";
> solrQuery.setQuery(query);
> solrQuery.setParam("mlt", "true");
> solrQuery.setParam("mlt.fl", "content");
> solrQuery.setParam("fl", "title author score");
> QueryResponse queryResponse = server.query( solrQuery );
> 
> the result contains only one doc with id=1 and no other "more like" docs. 
> 
> In my solrconfig.xml, I have these settings: 
> ...
> 
> ...
> 
> I guess it all is a matter of syntax but I can't figure out what's wrong.
> 
> Thank you very much (and again, thanks to Michael and Walter).
> 
> Cheers,
> Sergey
> 
> 
> 
> Michael Ludwig-4 wrote:
> > 
> > SergeyG schrieb:
> > 
> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >> in the same app taking into account the fact that for the former to
> >> work the stop words list needs to be included and this results in the
> >> latter putting stop words among the most important words?
> > 
> > Why would the inclusion of a stopword list result in stopwords being of
> > top importance in the MoreLikeThis query?
> > 
> > Michael Ludwig
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread SergeyG

Otis,

Your recipe does work: after copying an indexing field and excluding stop
words the MoreLikeThis query started fetching meaningful results. :)

Just one issue remained. 

When I execute query in this way:

String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
HttpClient client = new HttpClient();
GetMethod get = new GetMethod("http://localhost:8080/solr/mlt";);
get.setQueryString(query);
client.executeMethod(get);
...

it works fine bringing results as an XML string. 

But when I use "Solr-like" approach:

String query = "id:1";
solrQuery.setQuery(query);
solrQuery.setParam("mlt", "true");
solrQuery.setParam("mlt.fl", "content");
solrQuery.setParam("fl", "title author score");
QueryResponse queryResponse = server.query( solrQuery );

the result contains only one doc with id=1 and no other "more like" docs. 

In my solrconfig.xml, I have these settings: 
 ...

...

I guess it all is a matter of syntax but I can't figure out what's wrong.

Thank you very much (and again, thanks to Michael and Walter).

Cheers,
Sergey



Michael Ludwig-4 wrote:
> 
> SergeyG schrieb:
> 
>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>> in the same app taking into account the fact that for the former to
>> work the stop words list needs to be included and this results in the
>> latter putting stop words among the most important words?
> 
> Why would the inclusion of a stopword list result in stopwords being of
> top importance in the MoreLikeThis query?
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Otis Gospodnetic

I could be wrong about MLT - maybe it really does use TF IDF and not raw 
frequency.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Walter Underwood 
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 10:26:33 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> I think it works better to use the highest tf.idf terms, not the highest tf.
> That is what I implemented for Ultraseek ten years ago. With tf, you get
> lots of terms with low discrimination power.
> 
> wunder
> 
> On 7/2/09 4:48 AM, "Otis Gospodnetic" wrote:
> 
> > 
> > Michael - because they are the most frequent, which is how MLT selects terms
> > to use for querying, IIRC.
> > 
> > 
> > Otis --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: Michael Ludwig 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, July 2, 2009 6:20:05 AM
> >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> >> 
> >> SergeyG schrieb:
> >> 
> >>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >>> in the same app taking into account the fact that for the former to
> >>> work the stop words list needs to be included and this results in the
> >>> latter putting stop words among the most important words?
> >> 
> >> Why would the inclusion of a stopword list result in stopwords being of
> >> top importance in the MoreLikeThis query?
> >> 
> >> Michael Ludwig
> > 



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread SergeyG

wunder, thank you. (Sorry, I'm not sure this is your first name). I thought
the MoreLikeThis query normally uses tf.idf of the terms when deciding what
terms are the most important (not the most frequent). And if this is not the
case, how can I change its behavior?



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24309831.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread SergeyG

Thanks, Otis. I'd try that right away and tell you about the result. And if
you come up with any other idea, please let me know - just for the future.

Also thanks to Michael for the discussion.

Best regards,
Sergey



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24309525.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Walter Underwood
I think it works better to use the highest tf.idf terms, not the highest tf.
That is what I implemented for Ultraseek ten years ago. With tf, you get
lots of terms with low discrimination power.

wunder

On 7/2/09 4:48 AM, "Otis Gospodnetic"  wrote:

> 
> Michael - because they are the most frequent, which is how MLT selects terms
> to use for querying, IIRC.
> 
> 
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Michael Ludwig 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, July 2, 2009 6:20:05 AM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> SergeyG schrieb:
>> 
>>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>>> in the same app taking into account the fact that for the former to
>>> work the stop words list needs to be included and this results in the
>>> latter putting stop words among the most important words?
>> 
>> Why would the inclusion of a stopword list result in stopwords being of
>> top importance in the MoreLikeThis query?
>> 
>> Michael Ludwig
> 



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Otis Gospodnetic

Michael - because they are the most frequent, which is how MLT selects terms to 
use for querying, IIRC.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Michael Ludwig 
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 6:20:05 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> SergeyG schrieb:
> 
> > Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> > in the same app taking into account the fact that for the former to
> > work the stop words list needs to be included and this results in the
> > latter putting stop words among the most important words?
> 
> Why would the inclusion of a stopword list result in stopwords being of
> top importance in the MoreLikeThis query?
> 
> Michael Ludwig



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Otis Gospodnetic

Hi,

Rushing quickly through this one, one way you can use the same index for both 
is by copying fields.  One field copy would leave stopwords in (for PQ), and 
the other copy would remove stopwords (for MLT).  There may be more elegant 
ways to accomplish this - this is the first thing that comes to mind.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: SergeyG 
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 5:31:21 AM
> Subject: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a PhraseQuery
> and in a MoreLikeThis query in the same app. I posted it twice.
> Unfortunately I didn't get any responses. I realize that the question might
> not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter putting
> stop words among the most important words? Or these two queries need to use
> two different indexes and thus have to be implemented in different
> applications or in different cores of Solr (with different schema.xml files:
> one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search; b)
> search for separate words; c) MLT search. The problem I encountered is in
> the use of a stop words list. If I don't take it into account, the MLT query
> picks up common words as the most important words what is not right. And
> when I use it, the PhraseQuery stops working. I tried it with the ps and qs
> parameters (ps=100, qs=100) but that didn't change anything. (Both indexed
> fields are of type text, the StandardAnalyzer is applied, and all docs are
> in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set of
> those for the doc with id=1?
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24303817.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread SergeyG

>Why would the inclusion of a stopword list result in stopwords being of
>top importance in the MoreLikeThis query?

Michael, 

I just saw some of them (words from the stop words list) in the MLT query's
response.

Sergey



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24304705.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Michael Ludwig

SergeyG schrieb:


Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
in the same app taking into account the fact that for the former to
work the stop words list needs to be included and this results in the
latter putting stop words among the most important words?


Why would the inclusion of a stopword list result in stopwords being of
top importance in the MoreLikeThis query?

Michael Ludwig


Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread SergeyG

Hi,

Recently I've posted a question regarding using stop words in a PhraseQuery
and in a MoreLikeThis query in the same app. I posted it twice.
Unfortunately I didn't get any responses. I realize that the question might
not have been formulated clearly. So let me reformulate it.

Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
the same app taking into account the fact that for the former to work the
stop words list needs to be included and this results in the latter putting
stop words among the most important words? Or these two queries need to use
two different indexes and thus have to be implemented in different
applications or in different cores of Solr (with different schema.xml files:
one with the StopWord Filter and another without it.)?

Any opinion will be highly appreciated. 

Thank you.

Redards,
Sergey Goldberg


P.S. Just for the reference, here is my original message.

1. There're 3 kinds of searches in my application: a) PhraseQuery search; b)
search for separate words; c) MLT search. The problem I encountered is in
the use of a stop words list. If I don't take it into account, the MLT query
picks up common words as the most important words what is not right. And
when I use it, the PhraseQuery stops working. I tried it with the ps and qs
parameters (ps=100, qs=100) but that didn't change anything. (Both indexed
fields are of type text, the StandardAnalyzer is applied, and all docs are
in English.)

2. Do I understand it right that the query
q=id:1&mlt=true&mlt.fl=content&...
should bring back documents where the most important words are in the set of
those for the doc with id=1?
-- 
View this message in context: 
http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24303817.html
Sent from the Solr - User mailing list archive at Nabble.com.