Re: Solr failing on y charakter in string?
Ok thanks you´re right. But the thing is my users will often search for expressions like: Harr or har etc.. So I thought I automatically add the wildcard * to every request. If that too gets me into trouble Harr*=no result harry*=no result What should I do? Otis Gospodnetic wrote: I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24789070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr failing on y charakter in string?
The easiest thing to do would be to create a new field in your schema which only has a lowercasefilter applied to it. While searching perform searches across the two fields. You'll get desired results. You can use the copyField directive in your schema.xml for copying data from your original field into the new field. Cheers Avlesh On Mon, Aug 3, 2009 at 4:51 PM, gateway0 reiterwo...@yahoo.de wrote: Ok thanks you´re right. But the thing is my users will often search for expressions like: Harr or har etc.. So I thought I automatically add the wildcard * to every request. If that too gets me into trouble Harr*=no result harry*=no result What should I do? Otis Gospodnetic wrote: I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24789070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr failing on y charakter in string?
Ok still not working with new field text_two: str name=qtext:Har* text_two:Har*/str == result 0 Schema Updates: fieldType name=text_two class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=text_two type=text_two indexed=true stored=false multiValued=true/ copyField source=text dest=text_two/ This is what you suggested, right? kind regards, S. gateway0 wrote: Hi, I have the following setting: schema.xml: field name=kunde type=text indexed=true stored=true / the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24790774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr failing on y charakter in string?
Ok still not working with new field text_two: str name=qtext:Har* text_two:Har*/str == result 0 Schema Updates: fieldType name=text_two class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=text_two type=text_two indexed=true stored=false multiValued=true/ copyField source=text dest=text_two/ This is what you suggested, right? kind regards, S. Avlesh Singh wrote: The easiest thing to do would be to create a new field in your schema which only has a lowercasefilter applied to it. While searching perform searches across the two fields. You'll get desired results. You can use the copyField directive in your schema.xml for copying data from your original field into the new field. Cheers Avlesh On Mon, Aug 3, 2009 at 4:51 PM, gateway0 reiterwo...@yahoo.de wrote: Ok thanks you´re right. But the thing is my users will often search for expressions like: Harr or har etc.. So I thought I automatically add the wildcard * to every request. If that too gets me into trouble Harr*=no result harry*=no result What should I do? Otis Gospodnetic wrote: I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24789070.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24790836.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr failing on y charakter in string?
Ok still not working with new field text_two: str name=qtext:Har* text_two:Har*/str == result 0 Schema Updates: fieldType name=text_two class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=text_two type=text_two indexed=true stored=false multiValued=true/ copyField source=text dest=text_two/ I'm pretty sure the query string needs to be lower-case, since a wildcard query is not analyzed. I think what Avlesh was suggesting was more like this: str name=qtext:Har text_two:har*/str So the original field would be for a regular query containing whatever the user entered and would undergo the usual analysis for searching, and the secondary field would be used to construct a wildcard query which would strictly serve the begins-with case. -Ken
Re: Solr failing on y charakter in string?
I have a Solr text field and when I use Solr's field analysis tool, it shows that wildcard queries are being stemmed. But query results indicate that it is not. It looks like there is a bug in the tool. Bill On Mon, Aug 3, 2009 at 7:21 AM, gateway0 reiterwo...@yahoo.de wrote: Ok thanks you´re right. But the thing is my users will often search for expressions like: Harr or har etc.. So I thought I automatically add the wildcard * to every request. If that too gets me into trouble Harr*=no result harry*=no result What should I do? Otis Gospodnetic wrote: I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24789070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr failing on y charakter in string?
I have a Solr text field and when I use Solr's field analysis tool, it shows that wildcard queries are being stemmed. But query results indicate that it is not. It looks like there is a bug in the tool. I am in agreement. Seems like a bug to me. Cheers Avlesh On Mon, Aug 3, 2009 at 10:19 PM, Bill Au bill.w...@gmail.com wrote: I have a Solr text field and when I use Solr's field analysis tool, it shows that wildcard queries are being stemmed. But query results indicate that it is not. It looks like there is a bug in the tool. Bill On Mon, Aug 3, 2009 at 7:21 AM, gateway0 reiterwo...@yahoo.de wrote: Ok thanks you´re right. But the thing is my users will often search for expressions like: Harr or har etc.. So I thought I automatically add the wildcard * to every request. If that too gets me into trouble Harr*=no result harry*=no result What should I do? Otis Gospodnetic wrote: I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24789070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr failing on y charakter in string?
I believe it's because wildcard queries are not stemmed. During indexing harry probably got stemmed to harr, so now harry* doesn't match, because there is no harry token in that string, only harr. Why wildcard queries are not analyzed is described in the Lucene FAQ on the Lucene Wiki. You could also try searching for kunde:Harr* for example (not the upper-case Harr). I bet it won't result in a hit for the same reason - at index time you probably lower-case tokens with LowerCaseFilter(Factory), and if you search for Harr*, the lower-casing won't happen because the query string with the wildcard character isn't analyzed. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: gateway0 reiterwo...@yahoo.de To: solr-user@lucene.apache.org Sent: Sunday, August 2, 2009 7:30:19 PM Subject: Solr failing on y charakter in string? Hi, I have the following setting: schema.xml: the text field-type was updated with the preserveOriginal=1 option in the schema I have the following string indexd in the field kunde Harry Heim KG Now when I search for kunde:harry* it gives me an empty result. When I search for kunde:harry I get the right result. Also kunde:harr* works just fine. The strange thing is that with every other string (for example kunde:heim*) I will get the right result. So why not on harry* with an y* at the end? kind regards, S. -- View this message in context: http://www.nabble.com/Solr-failing-on-%22y%22-charakter-in-string--tp24783211p24783211.html Sent from the Solr - User mailing list archive at Nabble.com.