Re: no search results for specific search in solr 6.6.0

2017-09-20 Thread Erick Erickson
Just go to the admin/analysis page and enter the terms in the "index"
box (I usually uncheck the "verbose" checkbox). You will see exactly
what element in your analysis chain is doing this. You'll see light
gray two-letter codes on the size, e.g. "ST". Hover over it with your
mouse, and you should see exactly what the class and thus the
easily-identifiable element of your fieldType for the field in
question. For instance:

solr.StandardTokenizerFactory

text_general may have fixed _this_ problem, but it's not a great
solution. The french analysis chain is tuned to create a better
solution for, well, french. Likely solr.FrenchLightStemFilterFactory
is removing the last "o", but that's a guess.

In general, stemming is incompatible with wildcards. E.g. "running"
stems to "run", but "runni*" has no real algorithm that can stem.

Best,
Erick

On Wed, Sep 20, 2017 at 5:18 AM, Sascha Tuschinski
 wrote:
> Hello Erik and Josh,
>
> Thanks for your hints and comments.
>
> I found out that the “text_fr” field type didn’t stored the “fraoo” as term. 
> It stored “frao” only. Maybe because of French field type. This field had 
> been automatically created. I’m new to Solr and this is maybe correct.
>
> I use “text_general” as field type now and this works fine. This is fine and 
> solve our problem.
>
> I can deliver the output of the debug query from admin/analysis for the 
> text_fr field type if required.
>
> Thanks again!
> Sascha
>
>
> Am 19.09.17, 20:12 schrieb "Erick Erickson" :
>
> Unfortunately the link you provided goes to "localhost", which isn't 
> accessible.
>
> The very first thing I'd do is go to the admin/analysis page and put
> the terms in both the "index" and "query" boxes for the field in
> question.
> Next, attach =query to the query to see how the query is actually 
> parsed.
>
> My bet: You are using a different stemmer for the two cases and the
> actual token in the index is FRao in the problem field, but that's
> just a guess.
>
> It often fools people that the field returned in the document (i.e. in
> the fl list) is the _stored_ value, not the actual token in the index.
> You can also use the TermsComponent to see the actual terms in the
> index as well as the admin/schema_browser link.
>
> Best,
> Erick
>
>
> On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
>  wrote:
> > Hello Community,
> >
> > We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) 
> with field names defined like "f_1179014266_txt". The number in the middle of 
> the name differs for each field we use. For language specific fields we are 
> adding an language specific extension e.g. "f_1179014267_txt_fr", 
> "f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> > We are having the following odd issue within the french "_fr" field 
> only:
> > Field
> > 
> f_1197829835_txt_fr
> > Dynamic Field /
> > 
> *_txt_fr
> > Type
> > text_fr
> >
> >   *   The saved value which had been added with no problem to the Solr 
> index is "FRaoo".
> >   *   When searching within the Solr query tool for 
> "f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
> below - OK.
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   "q":"f_1197829839_txt_fr:*FRao*",
> >   "indent":"on",
> >   "wt":"json",
> >   "_":"1505808887827"}},
> >   "response":{"numFound":1,"start":0,"docs":[
> >   {
> > "id":"129",
> > "f_1197829834_txt_en":"EnAir",
> > "f_1197829822_txt_de":"Lufti",
> > "f_1197829835_txt_fr":"FRaoi",
> > "f_1197829836_txt_it":"ITAir",
> > "f_1197829799_txt":["Lufti"],
> > "f_1197829838_txt_en":"EnAir",
> > "f_1197829839_txt_fr":"FRaoo",
> > "f_1197829840_txt_it":"ITAir",
> > "_version_":1578520424165146624}]
> >   }}
> >
> >   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found 
> - Wrong!
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   "q":"f_1197829839_txt_fr:*FRaoo*",
> >   "indent":"on",
> >   "wt":"json",
> >   "_":"1505808887827"}},
> >   "response":{"numFound":0,"start":0,"docs":[]
> >   }}
> > When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the 
> matching items are found - OK
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   

Re: no search results for specific search in solr 6.6.0

2017-09-20 Thread Sascha Tuschinski
Hello Erik and Josh,

Thanks for your hints and comments.

I found out that the “text_fr” field type didn’t stored the “fraoo” as term. It 
stored “frao” only. Maybe because of French field type. This field had been 
automatically created. I’m new to Solr and this is maybe correct.

I use “text_general” as field type now and this works fine. This is fine and 
solve our problem.

I can deliver the output of the debug query from admin/analysis for the text_fr 
field type if required.

Thanks again!
Sascha
 

Am 19.09.17, 20:12 schrieb "Erick Erickson" :

Unfortunately the link you provided goes to "localhost", which isn't 
accessible.

The very first thing I'd do is go to the admin/analysis page and put
the terms in both the "index" and "query" boxes for the field in
question.
Next, attach =query to the query to see how the query is actually 
parsed.

My bet: You are using a different stemmer for the two cases and the
actual token in the index is FRao in the problem field, but that's
just a guess.

It often fools people that the field returned in the document (i.e. in
the fl list) is the _stored_ value, not the actual token in the index.
You can also use the TermsComponent to see the actual terms in the
index as well as the admin/schema_browser link.

Best,
Erick


On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
 wrote:
> Hello Community,
>
> We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) 
with field names defined like "f_1179014266_txt". The number in the middle of 
the name differs for each field we use. For language specific fields we are 
adding an language specific extension e.g. "f_1179014267_txt_fr", 
"f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> We are having the following odd issue within the french "_fr" field only:
> Field
> 
f_1197829835_txt_fr
> Dynamic Field /
> 
*_txt_fr
> Type
> text_fr
>
>   *   The saved value which had been added with no problem to the Solr 
index is "FRaoo".
>   *   When searching within the Solr query tool for 
"f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
below - OK.
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRao*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
>
>   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found - 
Wrong!
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRaoo*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the 
matching items are found - OK
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:FRaoo",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
> If we save exact the same value into a different language field e.g. 
ending on "_en", means "f_1197829834_txt_en", then the search 
"f_1197829834_txt_en:*FRaoo*" find all items correctly!
> We have no idea what's wrong here and we even recreated the index and can 
reproduce this problem all the time. I can only see that the value starts with 
"FR" and the field extension ends with "fr" but this is not problem for "en", 
"de" an so on. All fields are used in the same way and have the same field 

Re: no search results for specific search in solr 6.6.0

2017-09-19 Thread Erick Erickson
Unfortunately the link you provided goes to "localhost", which isn't accessible.

The very first thing I'd do is go to the admin/analysis page and put
the terms in both the "index" and "query" boxes for the field in
question.
Next, attach =query to the query to see how the query is actually parsed.

My bet: You are using a different stemmer for the two cases and the
actual token in the index is FRao in the problem field, but that's
just a guess.

It often fools people that the field returned in the document (i.e. in
the fl list) is the _stored_ value, not the actual token in the index.
You can also use the TermsComponent to see the actual terms in the
index as well as the admin/schema_browser link.

Best,
Erick


On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
 wrote:
> Hello Community,
>
> We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) with 
> field names defined like "f_1179014266_txt". The number in the middle of the 
> name differs for each field we use. For language specific fields we are 
> adding an language specific extension e.g. "f_1179014267_txt_fr", 
> "f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> We are having the following odd issue within the french "_fr" field only:
> Field
> f_1197829835_txt_fr
> Dynamic Field /
> *_txt_fr
> Type
> text_fr
>
>   *   The saved value which had been added with no problem to the Solr index 
> is "FRaoo".
>   *   When searching within the Solr query tool for 
> "f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
> below - OK.
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRao*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
>
>   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found - 
> Wrong!
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRaoo*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the matching 
> items are found - OK
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:FRaoo",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
> If we save exact the same value into a different language field e.g. ending 
> on "_en", means "f_1197829834_txt_en", then the search 
> "f_1197829834_txt_en:*FRaoo*" find all items correctly!
> We have no idea what's wrong here and we even recreated the index and can 
> reproduce this problem all the time. I can only see that the value starts 
> with "FR" and the field extension ends with "fr" but this is not problem for 
> "en", "de" an so on. All fields are used in the same way and have the same 
> field properties.
> Any help or ideas are highly appreciated. I filed a bug for this 
> https://issues.apache.org/jira/browse/SOLR-11367 but had been asked to 
> publish my question here. Thanks for reading.
> Greetings,
> ___
> Sascha Tuschinski
> Manager Quality Assurance // Canto GmbH
> Phone: +49 (0) 30 ­ 390 485 - 41
> E-mail: stuschin...@canto.com
> Web: canto.com
>
> Canto GmbH
> Lietzenburger Str. 46
> 10789 Berlin
> Phone: +49 (0)30 390485-0
> Fax: +49 (0)30 390485-55
> Amtsgericht Berlin-Charlottenburg HRB 88566
> Geschäftsführer: Jack McGannon, Thomas Mockenhaupt
>


Re: no search results for specific search in solr 6.6.0

2017-09-19 Thread Josh Lincoln
Can you provide the fieldType definition for text_fr?

Also, when you use the Analysis page in the admin UI, what tokens are
generated during indexing for FRaoo using the text_fr fieldType?

On Tue, Sep 19, 2017 at 12:01 PM Sascha Tuschinski 
wrote:

> Hello Community,
>
> We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates)
> with field names defined like "f_1179014266_txt". The number in the middle
> of the name differs for each field we use. For language specific fields we
> are adding an language specific extension e.g. "f_1179014267_txt_fr",
> "f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> We are having the following odd issue within the french "_fr" field only:
> Field
> f_1197829835_txt_fr<
> http://localhost:8983/solr/#/test_core/schema?field=f_1197829835_txt_fr>
> Dynamic Field /
> *_txt_fr<
> http://localhost:8983/solr/#/test_core/schema?dynamic-field=*_txt_fr>
> Type
> text_fr
>
>   *   The saved value which had been added with no problem to the Solr
> index is "FRaoo".
>   *   When searching within the Solr query tool for
> "f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen
> below - OK.
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRao*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
>
>   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found -
> Wrong!
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRaoo*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the matching
> items are found - OK
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:FRaoo",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
> If we save exact the same value into a different language field e.g.
> ending on "_en", means "f_1197829834_txt_en", then the search
> "f_1197829834_txt_en:*FRaoo*" find all items correctly!
> We have no idea what's wrong here and we even recreated the index and can
> reproduce this problem all the time. I can only see that the value starts
> with "FR" and the field extension ends with "fr" but this is not problem
> for "en", "de" an so on. All fields are used in the same way and have the
> same field properties.
> Any help or ideas are highly appreciated. I filed a bug for this
> https://issues.apache.org/jira/browse/SOLR-11367 but had been asked to
> publish my question here. Thanks for reading.
> Greetings,
> ___
> Sascha Tuschinski
> Manager Quality Assurance // Canto GmbH
> Phone: +49 (0) 30 ­ 390 485 - 41 <+49%2030%2039048541>
> E-mail: stuschin...@canto.com
> Web: canto.com
>
> Canto GmbH
> Lietzenburger Str. 46
> 10789 Berlin
> Phone: +49 (0)30 390485-0
> Fax: +49 (0)30 390485-55 <+49%2030%2039048555>
> Amtsgericht Berlin-Charlottenburg HRB 88566
> Geschäftsführer: Jack McGannon, Thomas Mockenhaupt
>
>


no search results for specific search in solr 6.6.0

2017-09-19 Thread Sascha Tuschinski
Hello Community,

We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) with 
field names defined like "f_1179014266_txt". The number in the middle of the 
name differs for each field we use. For language specific fields we are adding 
an language specific extension e.g. "f_1179014267_txt_fr", 
"f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
We are having the following odd issue within the french "_fr" field only:
Field
f_1197829835_txt_fr
Dynamic Field /
*_txt_fr
Type
text_fr

  *   The saved value which had been added with no problem to the Solr index is 
"FRaoo".
  *   When searching within the Solr query tool for 
"f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
below - OK.
{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"f_1197829839_txt_fr:*FRao*",
  "indent":"on",
  "wt":"json",
  "_":"1505808887827"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"129",
"f_1197829834_txt_en":"EnAir",
"f_1197829822_txt_de":"Lufti",
"f_1197829835_txt_fr":"FRaoi",
"f_1197829836_txt_it":"ITAir",
"f_1197829799_txt":["Lufti"],
"f_1197829838_txt_en":"EnAir",
"f_1197829839_txt_fr":"FRaoo",
"f_1197829840_txt_it":"ITAir",
"_version_":1578520424165146624}]
  }}

  *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found - Wrong!
{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"f_1197829839_txt_fr:*FRaoo*",
  "indent":"on",
  "wt":"json",
  "_":"1505808887827"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}
When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the matching 
items are found - OK

{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"f_1197829839_txt_fr:FRaoo",
  "indent":"on",
  "wt":"json",
  "_":"1505808887827"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"129",
"f_1197829834_txt_en":"EnAir",
"f_1197829822_txt_de":"Lufti",
"f_1197829835_txt_fr":"FRaoi",
"f_1197829836_txt_it":"ITAir",
"f_1197829799_txt":["Lufti"],
"f_1197829838_txt_en":"EnAir",
"f_1197829839_txt_fr":"FRaoo",
"f_1197829840_txt_it":"ITAir",
"_version_":1578520424165146624}]
  }}
If we save exact the same value into a different language field e.g. ending on 
"_en", means "f_1197829834_txt_en", then the search 
"f_1197829834_txt_en:*FRaoo*" find all items correctly!
We have no idea what's wrong here and we even recreated the index and can 
reproduce this problem all the time. I can only see that the value starts with 
"FR" and the field extension ends with "fr" but this is not problem for "en", 
"de" an so on. All fields are used in the same way and have the same field 
properties.
Any help or ideas are highly appreciated. I filed a bug for this 
https://issues.apache.org/jira/browse/SOLR-11367 but had been asked to publish 
my question here. Thanks for reading.
Greetings,
___
Sascha Tuschinski
Manager Quality Assurance // Canto GmbH
Phone: +49 (0) 30 ­ 390 485 - 41
E-mail: stuschin...@canto.com
Web: canto.com

Canto GmbH
Lietzenburger Str. 46
10789 Berlin
Phone: +49 (0)30 390485-0
Fax: +49 (0)30 390485-55
Amtsgericht Berlin-Charlottenburg HRB 88566
Geschäftsführer: Jack McGannon, Thomas Mockenhaupt