Re: Phrase Query Problem?
On 11/1/2010 11:14 PM, Ken Stanley wrote: On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod Tod, Without knowing your exact field definition, my first guess would be your first boolean query; because it is not quoted, what SOLR typically does is to transform that type of query into something like (assuming your uniqueKey is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do (mykeywords:Compliance+With+Conduct+Standards) you might see different (better?) results. Otherwise, appenddebugQuery=on to your URL and you can see exactly how SOLR is parsing your query. If none of that helps, what is your field definition in your schema.xml? - Ken The field definition is: field name=mykeywords type=string indexed=true stored=true multiValued=true/ The request: select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on The response looks like this: responseHeader:{ status:0, QTime:8, params:{ wt:json, q:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), start:0, indent:true, fl:mykeywords, debugQuery:on}}, response:{numFound:6,start:0,docs:[ { mykeywords:[Compliance With Attorney Conduct Standards]}, { mykeywords:[Anti-Bribery,Bribes]}, { mykeywords:[Marketing Guidelines,Marketing]}, {}, { mykeywords:[Anti-Bribery,Due Diligence]}, { mykeywords:[Anti-Bribery,AntiBribery]}] }, debug:{ rawquerystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), querystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), parsedquery:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, explain:{ ... As you mentioned, looking at the parsed query its breaking the request up on word boundaries rather than on the entire phrase. The goal is to return only the very first entry. Any ideas? Thanks - Tod
Re: Phrase Query Problem?
That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick On Tue, Nov 2, 2010 at 5:25 AM, Tod listac...@gmail.com wrote: On 11/1/2010 11:14 PM, Ken Stanley wrote: On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod Tod, Without knowing your exact field definition, my first guess would be your first boolean query; because it is not quoted, what SOLR typically does is to transform that type of query into something like (assuming your uniqueKey is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do (mykeywords:Compliance+With+Conduct+Standards) you might see different (better?) results. Otherwise, appenddebugQuery=on to your URL and you can see exactly how SOLR is parsing your query. If none of that helps, what is your field definition in your schema.xml? - Ken The field definition is: field name=mykeywords type=string indexed=true stored=true multiValued=true/ The request: select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on The response looks like this: responseHeader:{ status:0, QTime:8, params:{ wt:json, q:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), start:0, indent:true, fl:mykeywords, debugQuery:on}}, response:{numFound:6,start:0,docs:[ { mykeywords:[Compliance With Attorney Conduct Standards]}, { mykeywords:[Anti-Bribery,Bribes]}, { mykeywords:[Marketing Guidelines,Marketing]}, {}, { mykeywords:[Anti-Bribery,Due Diligence]}, { mykeywords:[Anti-Bribery,AntiBribery]}] }, debug:{ rawquerystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), querystring:(((mykeywords:Compliance With Attorney Conduct Standards)OR(mykeywords:All)OR(mykeywords:ALL))), parsedquery:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct text:standard) mykeywords:All mykeywords:ALL, explain:{ ... As you mentioned, looking at the parsed query its breaking the request up on word boundaries rather than on the entire phrase. The goal is to return only the very first entry. Any ideas? Thanks - Tod
Re: Phrase Query Problem?
On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken
Re: Phrase Query Problem?
On 11/2/2010 9:21 AM, Ken Stanley wrote: On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken What it turned out to be was escaping the spaces. q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) became q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) If I tried q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) ... it didn't work. Once I removed the quotes and escaped spaces it worked as expected. This seems odd since I would have expected the quotes to have triggered a phrase query. Thanks for your help. - Tod
Re: Phrase Query Problem?
Indeed something doesn't seem right about that, quotes are for phrases, you are right, and I get confused even thinking about what happens when you try to escape spaces like that. I think there's something odd going on with your URI-escaping in general. Here's what the string should actually look like for mykeywords:Compliance With Conduct Standards , when put into a URI: mykeywords%3A%22Compliance+With+Conduct+Standards%22 You really ought to escape the colon and the double quotes too, to follow URI spec. If you weren't escaping the double-quotes, that could explain your issue. And I seriously don't understand what putting a backslash in the URI accomplishes in this case, it confuses me trying to understand what's going on there, and personally I never like it when i just try random things until something I don't understand works. Tod wrote: On 11/2/2010 9:21 AM, Ken Stanley wrote: On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote: That's not the response I get when I try your query, so I suspect something's not quite right with your test... But you could also try putting parentheses around the words, like mykeywords:(Compliance+With+Conduct+Standards) Best Erick I agree with Erick, your query string showed quotes, but your parsed query did not. Using quotes, or parenthesis, would pretty much leave your query alone. There is one exception that I've found: if you use a stopword analyzer, any stop words would be converted to ? in the parsed query. So if you absolutely need every single word to match, regardless, you cannot use a field type that uses the stop word analyzer. For example, I have two dynamic field definitions: df_text_* that does the default text transformations (including stop words), and df_text_exact_* that does nothing (field type is string). When I run the query df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America, the following is shown as my query/parsed query when debugQuery is on: str name=rawquerystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=querystring df_text_exact_company_name:Bank of America OR df_text_company_name:Bank of America /str str name=parsedquery df_text_exact_company_name:Bank of America PhraseQuery(df_text_company_name:bank ? america) /str str name=parsedquery_toString df_text_exact_company_name:Bank of America df_text_company_name:bank ? america /str The difference is subtle, but important. If I were to do df_text_company_name:Bank and America, I would still match Bank of America. These are things that you should keep in mind when you are creating fields for your indices. A useful tool for seeing what SOLR does to your query terms is the Analysis tool found in the admin panel. You can do an analysis on either a specific field, or by a field type, and you will see a breakdown by Analyzer for either the index, query, or both of any query that you put in. This would definitely be useful when trying to determine why SOLR might return what it does. - Ken What it turned out to be was escaping the spaces. q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) became q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) If I tried q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL))) ... it didn't work. Once I removed the quotes and escaped spaces it worked as expected. This seems odd since I would have expected the quotes to have triggered a phrase query. Thanks for your help. - Tod
Re: Phrase Query Problem?
On Mon, Nov 1, 2010 at 10:26 PM, Tod listac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json Should, with an exact match, return only one entry but it returns five some of which don't have any of the fields I've specified. I've tried this both with and without quotes. What could I be doing wrong? Thanks - Tod Tod, Without knowing your exact field definition, my first guess would be your first boolean query; because it is not quoted, what SOLR typically does is to transform that type of query into something like (assuming your uniqueKey is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do (mykeywords:Compliance+With+Conduct+Standards) you might see different (better?) results. Otherwise, append debugQuery=on to your URL and you can see exactly how SOLR is parsing your query. If none of that helps, what is your field definition in your schema.xml? - Ken
Re: phrase query problem .. how to?
On 2/4/07, rubdabadub [EMAIL PROTECTED] wrote: Suppose you have a field name with data - Sony CLT2134 handheld camera. When doing a phrase search like Sony Camera or sony handheld -- Solr returns 0 results. Often time our searchers doesn't know the model number but perform phrase search.. How do I solve this issue? If you are controlling the query structure you could - use a sloppy phrase query... sony handheld~10 - use the dismax handler to create a different query structure - don't use a phrase query at all... change the default operator to and (q.op=AND) to require both terms. -Yonik