Re: Problems with WordDelimiterFilterFactory
On Fri, Oct 9, 2009 at 3:33 AM, Patrick Jungermann patrick.jungerm...@googlemail.com wrote: Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Or you could escape the hyphen character. If you are using SolrJ, use ClientUtils.escapeQueryChars on the query string. -- Regards, Shalin Shekhar Mangar.
Re: Problems with WordDelimiterFilterFactory
Hi Bernadette, Bernadette Houghton schrieb: Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml - that's probably because the hyphen/minus has a special meaning (not containing)? Try putting the input in quotes. But I agree with Christian that the hyphens should have been removed during index time by the token filters. cheers chantal filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / To filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / i.e. replacing non-alpha chars with a space, looks like it may handle that aspect. Regards Bern
RE: Problems with WordDelimiterFilterFactory
Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
RE: Problems with WordDelimiterFilterFactory
Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any
Re: Problems with WordDelimiterFilterFactory
Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email
RE: Problems with WordDelimiterFilterFactory
Thanks for this, marklo; it is a *very* useful page. bern -Original Message- From: marklo [mailto:mar...@pcmall.com] Sent: Thursday, 8 October 2009 1:10 PM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Use http://solr-url/solr/admin/analysis.jsp to see how your data is indexed/queried -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problems with WordDelimiterFilterFactory
Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml - filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / To filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / i.e. replacing non-alpha chars with a space, looks like it may handle that aspect. Regards Bern -Original Message- From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] Sent: Friday, 9 October 2009 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory
Re: Problems with WordDelimiterFilterFactory
Bern, The only way that could be happening is if you are not using the field type you described on your original e-mail. The TokenFilter WordDelimiterFilterFactory should take care of the hyphen. On 10/08/2009 05:30 PM, Bernadette Houghton wrote: Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml - filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / To filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / i.e. replacing non-alpha chars with a space, looks like it may handle that aspect. Regards Bern -Original Message- From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] Sent: Friday, 9 October 2009 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without problems. Because of this parsing problem, I'd recommend a query cleanup before the submit to the Solr server that replaces each sequence of minus characters by a single one. Regards, Patrick Bernadette Houghton schrieb: Sorry, the last line was truncated - HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 7. Was expecting one of: ( ... * ...QUOTED ...TERM ...PREFIXTERM ...WILDTERM ... [ ... { ...NUMBER ... -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 9 October 2009 8:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: Problems with WordDelimiterFilterFactory Here's the query and the error - Oct 09 08:20:17 [debug] [196] Solr query string:(Asia -- Civilization AND status_i:(2)) Oct 09 08:20:17 [debug] [196] Solr sort by: score desc Oct 09 08:20:17 [error] Error on searching: 400 Status: org.apache.lucene.queryParser.ParseException: Cannot parse ' (Asia -- Civilization AND status_i:(2)) ': Encount Bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 12:48 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghtonbernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer
Re: Problems with WordDelimiterFilterFactory
Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
RE: Problems with WordDelimiterFilterFactory
Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Problems with WordDelimiterFilterFactory
Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Problems with WordDelimiterFilterFactory
Use http://solr-url/solr/admin/analysis.jsp to see how your data is indexed/queried -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problems with WordDelimiterFilterFactory
Hi Bern, I indexed some records with - and : today using your configuration and I searched with following urls http://localhost/solr/select?q=CONTENT:cold : temperature http://localhost/solr/select?q=CONTENT:cold: temperature http://localhost/solr/select?q=CONTENT:cold :temperature http://localhost/solr/select?q=CONTENT:cold temperature and http://localhost/solr/select?q=CONTENT:asia - civilization http://localhost/solr/select?q=CONTENT:asia- civilization http://localhost/solr/select?q=CONTENT:asia -civilization http://localhost/solr/select?q=CONTENT:asia civilization The results doesn't make any difference. It worked all the times and I saw the relevant records. Regards, Sandeep -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25798793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with WordDelimiterFilterFactory
Tried this once again, with different combinations, none worked actually.. David Smiley @MITRE.org wrote: It seems you want Id to only match on complete field values. If that is the case then you should not do tokenization nor perhaps any text analysis altogether. Consider removing the whole analyzer / block or using KeywordTokenizerFactory plus a modicum of other stuff (perhaps lowercasing). For the particular examples you gave... it would be sufficient to simply remove “WordDelimeterFilterFactory” as an analysis step. ~ David GPS. wrote: I am using a fieldType, with following configuration: !-- Less flexible matching, but less false matches. Probably not ideal for product names, but may be good for SKUs. Can insert dashes in the wrong place and still match. -- fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I have field name=Id type=textTight indexed=true stored=true omitNorms=true/ When I try searching with : http://localhost:8001/solr/select/?q=Id:ARMZ It gives me complete list, where Id is: ARMZ or ARMZ117 or ARMZ129 What I want is if I search for ARMZ, it should tightly match only ARMZ and shouldn't return ARMZ117 OR ARMZ129 Similarly, If I try searching for ARMZ1, it shouldn't give me any of ARMZ117 OR ARMZ129 Is it possible to achieve this, by somehow strictly mapping the input text with field Id? Any help on this matter would be deeply appreciated. Thanks GPS. -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp21149384p21156935.html Sent from the Solr - User mailing list archive at Nabble.com.