Re: Get first value in a multivalued field
You can copy the field to another field, then use the FirstFieldValueUpdateProcessorFactory to limit that field to the first value. At least, that seems to be what that URP does. I have not used it. https://solr.apache.org/guide/8_8/update-request-processors.html wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 4, 2021, at 11:42 AM, ufuk yılmaz wrote: > > Hi, > > Is it possible in any way to get the first value in a multivalued field? > Using function queries, streaming expressions or any other way without > reindexing? (Stream decorators have array(), but no way to get a value at a > specific index?) > > Another one, is it possible to match a regex to a text field and extract only > the matching part? > > I tried very hard for this too but couldn’t find a way. > > --ufuk > > Sent from Mail for Windows 10 >
Get first value in a multivalued field
Hi, Is it possible in any way to get the first value in a multivalued field? Using function queries, streaming expressions or any other way without reindexing? (Stream decorators have array(), but no way to get a value at a specific index?) Another one, is it possible to match a regex to a text field and extract only the matching part? I tried very hard for this too but couldn’t find a way. --ufuk Sent from Mail for Windows 10
Cross join on multivalued field
Hi, I am wondering whether there is planning to implement cross collections join query on multivalued field Thanks Sent from my iPhone
Multivalued text_general field returns lowercased value in "if" function query
I have a type=”text_general” multivalued=”true” field, named fieldA. When I use a function query, with fields like fields=if(true, fieldA, -1), fieldA Response is: "response":{"numFound":1,"start":0,"maxScore":4.6553917,"docs":[ { "fieldA":["SomeMixedCaseValue"], "if(true,fieldA,-1)":"somemixedcasevalue"}] }} Is this a bug or an expected output? Is there a way to avoid it getting lowercased? Whole field definition is: -ufuk yilmaz Sent from Mail for Windows 10
parsing multivalued fields in Value Source Parser
Hello All, I am writing a custom function query that requires to parse a multivalued field. I am getting this exception : org.apache.solr.common.SolrException: can not use FieldCache on multivalued field The function query works as expected with single-valued field. How can I parse a multi-valued fields with FunctionQParser( or any other way)? and get the all the values for that field for further processing in my custom function ? TIA, Tridib Manna
Why am I able to sort on a multiValued field?
I am adding a new float field to my index that I want to perform range searches and sorting on. It will only contain a single value. I have an existing dynamic field definition in my schema.xml that I wanted to use to avoid having to updating the schema: I went ahead and implemented this in a test system (recently updated to Solr 8.7), but then it occurred to me that I am not going to be able to sort on the field because it is defined as multiValued. But to my surprise sorting worked, and gave the expected results.Why? Can this behavior be relied on in future releases? Appreciate any insights. Thanks - AndyC -
Re: Avoiding duplicate entry for a multivalued field
add-distinct is similar to add but does contains check before adding the value. In general, performance overhead should be minimal Regards, Munendra S N On Fri, Oct 30, 2020 at 7:29 PM Srinivas Kashyap wrote: > Thanks Munendra, this will really help me. Are there any performance > overhead with this? > > Thanks, > Srinivas > > > From: Munendra S N > Sent: 30 October 2020 19:20 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas, > > For atomic updates, you could use add-distinct operation to avoid > duplicates - > https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html< > https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html> > This operation is available from Solr 7.3 > > Regards, > Munendra S N > > > > On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood <mailto:wun...@wunderwood.org>> > wrote: > > > Since you are already taking the performance hit of atomic updates, > > I doubt you’ll see any impact from field types or update request > > processors. > > The extra cost of atomic updates will be much greater than indexing cost. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org<mailto:wun...@wunderwood.org> > > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my > blog) > > > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap .INVALID<mailto:srini...@bamboorose.com.INVALID>> > > wrote: > > > > > > Thanks Dwane, > > > > > > I have a doubt, according to the java doc, the duplicates still > continue > > to exist in the field. May be during query time, the field returns only > > unique values? Am I right with my assumption? > > > > > > And also, what is the performance overhead for this > UniqueFiled*Factory? > > > > > > Thanks, > > > Srinivas > > > > > > From: Dwane Hall mailto:dwaneh...@hotmail.com>> > > > Sent: 29 October 2020 14:33 > > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > > > Srinivas this is possible by adding an unique field update processor to > > the update processor chain you are using to perform your updates > (/update, > > /update/json, /update/json/docs, .../a_custom_one) > > > > > > The Java Documents explain its use nicely > > > ( > > > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > > > > < > > > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > >>) > > or there are articles on stack overflow addressing this exact problem ( > > > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > > > > < > > > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > > > > >) > > > > > > Thanks, > > > > > > Dwane > > > > > > From: Srinivas Kashyap <mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> > srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>> > > > Sent: Thursday, 29 October 2020 3:49 PM > > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org > <mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>> > < > > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>> > > > Subject: Avoiding duplicate entry for a multivalued field > > > > > > Hello, > > > > > > Say, I have a schema field which is multivalued. Is there a way to > > maintain distinct values for that field though I continue to add > duplicate > > values through atomic update via
RE: Avoiding duplicate entry for a multivalued field
Thanks Munendra, this will really help me. Are there any performance overhead with this? Thanks, Srinivas From: Munendra S N Sent: 30 October 2020 19:20 To: solr-user@lucene.apache.org Subject: Re: Avoiding duplicate entry for a multivalued field Srinivas, For atomic updates, you could use add-distinct operation to avoid duplicates - https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html> This operation is available from Solr 7.3 Regards, Munendra S N On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood mailto:wun...@wunderwood.org>> wrote: > Since you are already taking the performance hit of atomic updates, > I doubt you’ll see any impact from field types or update request > processors. > The extra cost of atomic updates will be much greater than indexing cost. > > wunder > Walter Underwood > wun...@wunderwood.org<mailto:wun...@wunderwood.org> > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my blog) > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > > mailto:srini...@bamboorose.com.INVALID>> > wrote: > > > > Thanks Dwane, > > > > I have a doubt, according to the java doc, the duplicates still continue > to exist in the field. May be during query time, the field returns only > unique values? Am I right with my assumption? > > > > And also, what is the performance overhead for this UniqueFiled*Factory? > > > > Thanks, > > Srinivas > > > > From: Dwane Hall mailto:dwaneh...@hotmail.com>> > > Sent: 29 October 2020 14:33 > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > Srinivas this is possible by adding an unique field update processor to > the update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > > > The Java Documents explain its use nicely > > ( > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html> > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>>) > or there are articles on stack overflow addressing this exact problem ( > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655> > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655> > >) > > > > Thanks, > > > > Dwane > > > > From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>> > > Sent: Thursday, 29 October 2020 3:49 PM > > To: > > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>> > > < > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>> > > Subject: Avoiding duplicate entry for a multivalued field > > > > Hello, > > > > Say, I have a schema field which is multivalued. Is there a way to > maintain distinct values for that field though I continue to add duplicate > values through atomic update via solrj? > > > > Is there some property setting to have only unique values in a multi > valued fields? > > > > Thanks, > > Srinivas > > > > DISCLAIMER: > > E-mails and attachments from Bamboo Rose, LLC are confidential. > > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > > > Disclaimer > > > > The information contained in
Re: Avoiding duplicate entry for a multivalued field
Srinivas, For atomic updates, you could use add-distinct operation to avoid duplicates - https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html This operation is available from Solr 7.3 Regards, Munendra S N On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood wrote: > Since you are already taking the performance hit of atomic updates, > I doubt you’ll see any impact from field types or update request > processors. > The extra cost of atomic updates will be much greater than indexing cost. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > > > wrote: > > > > Thanks Dwane, > > > > I have a doubt, according to the java doc, the duplicates still continue > to exist in the field. May be during query time, the field returns only > unique values? Am I right with my assumption? > > > > And also, what is the performance overhead for this UniqueFiled*Factory? > > > > Thanks, > > Srinivas > > > > From: Dwane Hall > > Sent: 29 October 2020 14:33 > > To: solr-user@lucene.apache.org > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > Srinivas this is possible by adding an unique field update processor to > the update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > > > The Java Documents explain its use nicely > > ( > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem ( > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > >) > > > > Thanks, > > > > Dwane > > > > From: Srinivas Kashyap srini...@bamboorose.com.INVALID>> > > Sent: Thursday, 29 October 2020 3:49 PM > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> < > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>> > > Subject: Avoiding duplicate entry for a multivalued field > > > > Hello, > > > > Say, I have a schema field which is multivalued. Is there a way to > maintain distinct values for that field though I continue to add duplicate > values through atomic update via solrj? > > > > Is there some property setting to have only unique values in a multi > valued fields? > > > > Thanks, > > Srinivas > > > > DISCLAIMER: > > E-mails and attachments from Bamboo Rose, LLC are confidential. > > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > > > Disclaimer > > > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a > Service (SaaS) for business. Providing a safer and more useful place for > your human generated data. Specializing in; Security, archiving and > compliance. To find out more visit the Mimecast website. > >
Re: Avoiding duplicate entry for a multivalued field
Since you are already taking the performance hit of atomic updates, I doubt you’ll see any impact from field types or update request processors. The extra cost of atomic updates will be much greater than indexing cost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > wrote: > > Thanks Dwane, > > I have a doubt, according to the java doc, the duplicates still continue to > exist in the field. May be during query time, the field returns only unique > values? Am I right with my assumption? > > And also, what is the performance overhead for this UniqueFiled*Factory? > > Thanks, > Srinivas > > From: Dwane Hall > Sent: 29 October 2020 14:33 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas this is possible by adding an unique field update processor to the > update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > The Java Documents explain its use nicely > (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem > (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) > > Thanks, > > Dwane > > From: Srinivas Kashyap > mailto:srini...@bamboorose.com.INVALID>> > Sent: Thursday, 29 October 2020 3:49 PM > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > mailto:solr-user@lucene.apache.org>> > Subject: Avoiding duplicate entry for a multivalued field > > Hello, > > Say, I have a schema field which is multivalued. Is there a way to maintain > distinct values for that field though I continue to add duplicate values > through atomic update via solrj? > > Is there some property setting to have only unique values in a multi valued > fields? > > Thanks, > Srinivas > > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender immediately > by replying to the e-mail, and then delete it without making copies or using > it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a Service > (SaaS) for business. Providing a safer and more useful place for your human > generated data. Specializing in; Security, archiving and compliance. To find > out more visit the Mimecast website.
Re: Avoiding duplicate entry for a multivalued field
If I understand correctly what you're trying to do, docValues for a number of field types are (at least in their multivalued incarnation) backed by SortedSetDocValues, which inherently deduplicate values per-document. In your case it sounds like you could maybe rely on that behavior as a feature, set stored=false, docValues=true, useDocValuesAsStored=true, and achieve the desired behavior? Michael On Thu, Oct 29, 2020 at 6:17 AM Srinivas Kashyap wrote: > > Thanks Dwane, > > I have a doubt, according to the java doc, the duplicates still continue to > exist in the field. May be during query time, the field returns only unique > values? Am I right with my assumption? > > And also, what is the performance overhead for this UniqueFiled*Factory? > > Thanks, > Srinivas > > From: Dwane Hall > Sent: 29 October 2020 14:33 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas this is possible by adding an unique field update processor to the > update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > The Java Documents explain its use nicely > (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem > (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) > > Thanks, > > Dwane > > From: Srinivas Kashyap > mailto:srini...@bamboorose.com.INVALID>> > Sent: Thursday, 29 October 2020 3:49 PM > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > mailto:solr-user@lucene.apache.org>> > Subject: Avoiding duplicate entry for a multivalued field > > Hello, > > Say, I have a schema field which is multivalued. Is there a way to maintain > distinct values for that field though I continue to add duplicate values > through atomic update via solrj? > > Is there some property setting to have only unique values in a multi valued > fields? > > Thanks, > Srinivas > > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender immediately > by replying to the e-mail, and then delete it without making copies or using > it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a Service > (SaaS) for business. Providing a safer and more useful place for your human > generated data. Specializing in; Security, archiving and compliance. To find > out more visit the Mimecast website.
RE: Avoiding duplicate entry for a multivalued field
Thanks Dwane, I have a doubt, according to the java doc, the duplicates still continue to exist in the field. May be during query time, the field returns only unique values? Am I right with my assumption? And also, what is the performance overhead for this UniqueFiled*Factory? Thanks, Srinivas From: Dwane Hall Sent: 29 October 2020 14:33 To: solr-user@lucene.apache.org Subject: Re: Avoiding duplicate entry for a multivalued field Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one) The Java Documents explain its use nicely (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) Thanks, Dwane From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID>> Sent: Thursday, 29 October 2020 3:49 PM To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> mailto:solr-user@lucene.apache.org>> Subject: Avoiding duplicate entry for a multivalued field Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Re: Avoiding duplicate entry for a multivalued field
Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one) The Java Documents explain its use nicely (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655) Thanks, Dwane From: Srinivas Kashyap Sent: Thursday, 29 October 2020 3:49 PM To: solr-user@lucene.apache.org Subject: Avoiding duplicate entry for a multivalued field Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Avoiding duplicate entry for a multivalued field
Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Multivalued field for Analysis on Admin page.
I forgot how to enter multivalued in Analysis Page in Admin. Can anyone help? Jae
Query function error - can not use FieldCache on multivalued field
Hi, I'm trying to use Solr query function as a boost for term matches in the title field. Here's my boost function bf=if(exists(query({!v='title:Import data'})),10,0) This throws the following error --> can not use FieldCache on multivalued field: data The function seems to be only working for a single term. The title field doesn't support multivalued but it's configured to analyze terms. Here's the field definition. I was under the impression that I would be able to use the query function to evaluate a regular query field. Am I missing something? If there's a constraint on this function, can this boost be done in a different way? Any pointers will be appreciated. Thanks, Shamik
Re: JTS, IsWithin predicate, and multivalued fields
Replying to myself and for posterity: This is expected behavior per the comments on https://issues.apache.org/jira/browse/LUCENE-4644 that originally added the IsWithin predicate. Too bad, I was really pleased with my idea but I can see why it was implemented the way it was. Back to the drawing board From: Murray Johnston Sent: Monday, July 13, 2020 11:52:20 AM To: solr-user@lucene.apache.org Subject: JTS, IsWithin predicate, and multivalued fields Message from External Sender Hi all, I'm trying to use (abuse[1]) a SpatialRecursivePrefixTreeFieldType field that is multivalued with the IsWithin JTS predicate. After some testing, it appears that all values of the field must satisfy the predicate in order for the document to be returned. Is that expected? It seems somewhat different than other semantics for a multivalued field. Thanks, -Murray [1] My use case is an extension of the SpatialForTimeDurations usage. My prices have a "time in advance" that must also be satisfied. My idea was to model this as a line instead of just a point and verify that the entire line IsWithin the bounding box but I might be blocked from doing this if it won't work for a multivalued field.
JTS, IsWithin predicate, and multivalued fields
Hi all, I'm trying to use (abuse[1]) a SpatialRecursivePrefixTreeFieldType field that is multivalued with the IsWithin JTS predicate. After some testing, it appears that all values of the field must satisfy the predicate in order for the document to be returned. Is that expected? It seems somewhat different than other semantics for a multivalued field. Thanks, -Murray [1] My use case is an extension of the SpatialForTimeDurations usage. My prices have a "time in advance" that must also be satisfied. My idea was to model this as a line instead of just a point and verify that the entire line IsWithin the bounding box but I might be blocked from doing this if it won't work for a multivalued field.
Re: use highlighting on multivalued fields with positionIncrementGap 0
I haven't worked with highlighting much but what's the need to store terms in multivalued field? On Fri, 14 Feb 2020 at 20:04, Nicolas Franck wrote: > I'm trying to use highlighting on a multivalued text field (analysis not > so important) .. > > > { text: [ "hello", "world" ], id: 1 } > > but I want to match across the string boundaries: > > q=text:"hello world" > > This works by setting the attribute > positionIncrementGap to 0, but then the hightlighting entry is empty > > "highlighting": { "1" : { "text" : [] } } > > Parameters are: > > hl=true > hl.fl=text > hl.snippets=50 > hl.fragSize=1 > > Any idea why this happens? > I guess this gap is internal stuff handled by Lucene that Solr doesn't > know about? > (as for lucene, there are no multivalued fields!) > > -- -- Regards, *Paras Lehana* [65871] Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd, 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305 Mob.: +91-9560911996 Work: 0120-4056700 | Extn: *11096* -- * * <https://www.facebook.com/IndiaMART/videos/578196442936091/>
use highlighting on multivalued fields with positionIncrementGap 0
I'm trying to use highlighting on a multivalued text field (analysis not so important) .. { text: [ "hello", "world" ], id: 1 } but I want to match across the string boundaries: q=text:"hello world" This works by setting the attribute positionIncrementGap to 0, but then the hightlighting entry is empty "highlighting": { "1" : { "text" : [] } } Parameters are: hl=true hl.fl=text hl.snippets=50 hl.fragSize=1 Any idea why this happens? I guess this gap is internal stuff handled by Lucene that Solr doesn't know about? (as for lucene, there are no multivalued fields!)
Re: Analysing Multivalued Fields
First, if you’re using primitive types, there is no analysis so in that case the question is irrelevant. If you’re using a text-based field, the only difference between single-valued and multi-valued fields for analyzed types (i.e. text fields) is the offset recorded between entries. For instance: Single value this is some text position token 0 this 1 is 2 some 3 text Multi valued with positionIncrementGap=100 this is some text position token 0 this 1 is 101 some 102 text With a positionIncrementGap of 1, there’d be no difference. So if you’re using text-based fields, just do the values one at a time. Or this is an XY problem, you’re trying to solve some problem. If the above is irrelevant, what is that problem you’re tying to solve? Best, Erick > On Dec 31, 2019, at 1:32 AM, Sidharth Negi wrote: > > Hi, > > Is there a way to analyze how multiple values in a multivalued field are > being tokenized and processed during indexing? > > The "Analysis" page on the UI assumes that my multiple comma-separated > values is a single value. It filters out the comma and acts as if it's a > single value that I specified. > > Thanks in advance!
Analysing Multivalued Fields
Hi, Is there a way to analyze how multiple values in a multivalued field are being tokenized and processed during indexing? The "Analysis" page on the UI assumes that my multiple comma-separated values is a single value. It filters out the comma and acts as if it's a single value that I specified. Thanks in advance!
Re: SolrNet...multiValued.
On 9/2/2019 8:58 AM, Britto Raj wrote: I am working MNC and doing Prototype using SolrNet and Solr. I have few questions and got stuck not able to move forward.. 4. When i try to access using SolrQueryResults results = solr.Query(new SolrQuery("title:\"changeme4\"")); It throws error as could not convert value 'system.collections.arraylist' to property 'title' of document type solr rest client.program+product' SolrNet is third party software. It was not created by the Solr project. I have absolutely no idea what that error message means. With no experience at all using .net, I can't even tell you whether that error message comes from the SolrNet library or from the language itself. I found this: https://groups.google.com/forum/embed/#!forum/solrnet There is also an "issues" tab on the project page: https://github.com/SolrNet/SolrNet Thanks, Shawn
SolrNet...multiValued.
Hello All, I am working MNC and doing Prototype using SolrNet and Solr. I have few questions and got stuck not able to move forward.. 1. I have created a collection e.g Product without any field. 2. Using SolrNet.. created one Field like below. public class Product { [SolrField("title")] public string title { get; set; } } 3. And below code is executed.. title is created as MultiValued as true. Startup.Init("http://localhost:8983/solr/product;); ISolrOperations solr = ServiceLocator.Current.GetInstance>(); // Product test = new Product() {title = new List() { "changeme4" } }; Product test = new Product() { title = "changeme4" }; solr.Add(test); solr.Commit(); 4. When i try to access using SolrQueryResults results = solr.Query(new SolrQuery("title:\"changeme4\"")); It throws error as could not convert value 'system.collections.arraylist' to property 'title' of document type solr rest client.program+product' So i need to convert public string title { get; set; } as public ICollection title { get; set; } But i want to create a set of Properties in my document. I really don't need to use ICollection and MultiValued as false. Please help me to resolve the issue to move forward.
Re: SOLR Atomic Update - String multiValued Field
Hi Doss, What was existing value and what happens after you do atomic update? Kind Regards, Furkan KAMACI On Wed, Jul 24, 2019 at 2:47 PM Doss wrote: > HI, > > I have a multiValued field of type String. > > multiValued="true"/> > > I want to keep this list unique, so I am using atomic updates with > "add-distinct" > > {"docid":123456,"namelist":{"add-distinct":["Adam","Jane"]}} > > but this is not maintaining the expected uniqueness, am I doing something > wrong? Guide me please. > > Thanks, > Doss. >
SOLR Atomic Update - String multiValued Field
HI, I have a multiValued field of type String. I want to keep this list unique, so I am using atomic updates with "add-distinct" {"docid":123456,"namelist":{"add-distinct":["Adam","Jane"]}} but this is not maintaining the expected uniqueness, am I doing something wrong? Guide me please. Thanks, Doss.
Parse multivalued field as list with custom function
Hello, I'm trying to parse multivalued field (i.e : [8, 6, 9, 50]) as a List in a custom function. I looked all the existing parser here : (https://github.com/apache/lucene-solr/tree/master/solr/core/src/java/org/apache/solr/search), and I don't find any example of how to parse a multivalued field in a List. Can you give me example or some leads ? I thank you in advance.
Parse multivalued field as list with custom function
Hello, I'm trying to parse multivalued field (i.e : [8, 6, 9, 50]) as a List in a custom function. I looked all the existing parser here : (https://github.com/apache/lucene-solr/tree/master/solr/core/src/java/org/apache/solr/search), and I don't find any example of how to parse a multivalued field in a List. Can you give me example or some leads ? I thank you in advance.
Re: How to know which value matched in multivalued field
I found this page. https://stackoverflow.com/questions/2135072/determine-which-value-produced-a-hit-in-solr-multivalued-field-type Hmmm... 2019年7月12日(金) 22:08 Takashi Sasaki : > > Hi Solr experts, > > I have multivalued location on RPT field. > Is there a way to know which location matched by query? > > sample query: > =:={!bbox sfield=store}=45.15,-93.85=5 > > Of course I can recalculate on the client side, > but I want to know how to do it using Solr's features. > > Solr version is 7.3.1. > > Thanks, > Takashi Sasaki
How to know which value matched in multivalued field
Hi Solr experts, I have multivalued location on RPT field. Is there a way to know which location matched by query? sample query: =:={!bbox sfield=store}=45.15,-93.85=5 Of course I can recalculate on the client side, but I want to know how to do it using Solr's features. Solr version is 7.3.1. Thanks, Takashi Sasaki
Re: Search using filter query on multivalued fields
another option is to index dynamically, so you would index in this case, or this is what i would do: INGREDIENT_SALT_i:40 INGREDIENT_EGG_i:20 etc and query INGREDIENT_SALT_i:[20 TO *] or an arbitrary max value, since these are percentages INGREDIENT_SALT_i:[20 TO 100] On Fri, May 3, 2019 at 12:01 PM Erick Erickson wrote: > There is no way to do this with the setup you describe. That is, there’s > no way to say “only use the third element of a multiValued field”. > > What I’d do is index (perhaps in a separate field) with payloads, so you > have input like SALT|20, then use some of the payload functionality to make > this happen. See: https://lucidworks.com/2017/09/14/solr-payloads/ > > There are some other strategies that are simpler, one could index (again, > perhaps in a separate field) SALT_20. Then you can form filter queries like > “fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to > normalize (i.e. 1% couldn’t be SALT_1), so “it depends”. > > The point is that you have to index cleverly to do what you want. > > Best, > Erick > > > On May 3, 2019, at 6:26 AM, Srinivas Kashyap > wrote: > > > > Hi, > > > > I have indexed data as shown below using DIH: > > > > "INGREDIENT_NAME": [ > > "EGG", > > "CANOLA OIL", > > "SALT" > >], > > "INGREDIENT_NO": [ > > "550", > > "297", > > "314" > >], > > "COMPOSITION PERCENTAGE": [ > > 20, > > 60, > > 40 > >], > > > > Similar to this, many other records are also indexed. These are > multi-valued fields. > > > > I have a requirement to search all the records which has ingredient name > salt and it's composition percentage is more than 20. > > > > How do I write a filter query for this? > > > > P.S: I should only fetch records, whose Salt Composition percentage is > more than 20 and not other percentages. > > > > Thanks and Regards, > > Srinivas Kashyap > > > > DISCLAIMER: > > E-mails and attachments from Bamboo Rose, LLC are confidential. > > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > >
Re: Search using filter query on multivalued fields
There is no way to do this with the setup you describe. That is, there’s no way to say “only use the third element of a multiValued field”. What I’d do is index (perhaps in a separate field) with payloads, so you have input like SALT|20, then use some of the payload functionality to make this happen. See: https://lucidworks.com/2017/09/14/solr-payloads/ There are some other strategies that are simpler, one could index (again, perhaps in a separate field) SALT_20. Then you can form filter queries like “fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to normalize (i.e. 1% couldn’t be SALT_1), so “it depends”. The point is that you have to index cleverly to do what you want. Best, Erick > On May 3, 2019, at 6:26 AM, Srinivas Kashyap wrote: > > Hi, > > I have indexed data as shown below using DIH: > > "INGREDIENT_NAME": [ > "EGG", > "CANOLA OIL", > "SALT" >], > "INGREDIENT_NO": [ > "550", > "297", > "314" >], > "COMPOSITION PERCENTAGE": [ > 20, > 60, > 40 >], > > Similar to this, many other records are also indexed. These are multi-valued > fields. > > I have a requirement to search all the records which has ingredient name salt > and it's composition percentage is more than 20. > > How do I write a filter query for this? > > P.S: I should only fetch records, whose Salt Composition percentage is more > than 20 and not other percentages. > > Thanks and Regards, > Srinivas Kashyap > > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender immediately > by replying to the e-mail, and then delete it without making copies or using > it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient.
Search using filter query on multivalued fields
Hi, I have indexed data as shown below using DIH: "INGREDIENT_NAME": [ "EGG", "CANOLA OIL", "SALT" ], "INGREDIENT_NO": [ "550", "297", "314" ], "COMPOSITION PERCENTAGE": [ 20, 60, 40 ], Similar to this, many other records are also indexed. These are multi-valued fields. I have a requirement to search all the records which has ingredient name salt and it's composition percentage is more than 20. How do I write a filter query for this? P.S: I should only fetch records, whose Salt Composition percentage is more than 20 and not other percentages. Thanks and Regards, Srinivas Kashyap DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Range query on multivalued string field results in useless highlighting
Range queries against mutivalued string fields produces useless highlighting, even though "hl.highlightMultiTerm":"true" I have uncovered what I believe is a bug. At the very lease it is a difference in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1). I have a Field defined in my schema as: I am using a query containing a Range clause and I am using highlighting to get the list of values that match the range query. All examples below were using the appropriate Solr Admin Server Query page. The range query using Solr v5.1.0 produces CORRECT and useful results: { "responseHeader": { "status": 0, "QTime": 366, "params": { "q": "ResourceCorrespondent:[A TO B}", "hl": "true", "indent": "true", "hl.preserveMulti": "true", "fl": "ResourceCorrespondent,ResourceID", "hl.requireFieldMatch": "true", "hl.usePhraseHighlighter": "true", "hl.fl": "ResourceCorrespondent", "wt": "json", "hl.highlightMultiTerm": "true", "_": "1553275722025" } }, "response": { "numFound": 999, "start": 0, "docs": [ { "ResourceCorrespondent": [ "Stanley, Wendell M.", "Avery, Roy" ], "ResourceID": "CCAAHG" }, { "ResourceCorrespondent": [ "Avery, Roy" ], "ResourceID": "CCGMDS" }, ... lots more docs, then ] }, ... we get to the highlighting portion of the response ... this tells me which values of each ResourceCorrespondent field ... actually matching the query "highlighting": { "CCAAHG": { "ResourceCorrespondent": [ "Avery, Roy" ] }, "CCGMDS": { "ResourceCorrespondent": [ "Avery, Roy" ] }, "BBACKV": { "ResourceCorrespondent": [ "American Institute of Biological Sciences", "Albritton, Errett C." ] }, ... lots more useful highlight values. Note two matching values ... for document BBACKV. } *** *** However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the response is basically the same including the number of documents found { "responseHeader":{ "status":0, "QTime":245, "params":{ "q":"ResourceCorrespondent:[A TO B}", "hl":"on", "hl.preserveMulti":"true", "fl":"ResourceID, ResourceCorrespondent", "hl.requireFieldMatch":"true", "hl.fl":"ResourceCorrespondent", "hightlightMultiTerm":"true", "wt":"json", "_":"1553105129887", "usePhraseHighLighter":"true"}}, "response":{"numFound":999,"start":0,"docs":[ The documents are in a different order, but that doesn't matter. The problem is with the lighlighting which is effectively empty. I don't know what values in each document actually matched the query: "highlighting":{ "QQBBLX":{}, "QQBCLN":{}, "QQBCLM":{}, ... etc. *** NOTE: The data is the same for all Solr versions and the Solr indexes were rebuilt for each Solr version. *** Changing to using "=unified", the highlighting looks like: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":[]}, "QQBCLN":{ "ResourceCorrespondent":[]}, "QQBCLM":{ "ResourceCorrespondent":[]}, *** Closer but still no useful values *** NOTE: if I change only the query to be a wildcard query to q="ResourceCorrespondent:A*" the highlighting is correct in both Solr v7.5.0 and v7.7.1: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":["American Public Health Association"]}, "QQBCLN":{ "ResourceCorrespondent":["Abram, Morris B."]}, "QQBCLM":{ "ResourceCorrespondent":["Abram, Morris B."]}, ... etc. *** This makes me think there is some problem with a Range query feeding the Highlighter code. *** All variations of hl specs or other query parameters do not fix the problem. The wildcard query is my current work around but there still is a problem with range queries: So there is some incompatibility among: 1) A multivalued string field AND 2) A range query against that field AND 3) Highlighting The highlight portion of the response is effectively "empty" I don't know when this issue was first introduced. I have recently been updating from 5.1.0 to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening versions but I gave up to save my sanity. --Karl
Re: Accessing multiValued field from within custom function
Hi, Any hints on this topic? How to access String / Text values from a multiValued field inside custom function? Best regards, Dariusz Wojtas On Thu, Jan 3, 2019 at 6:18 PM Dariusz Wojtas wrote: > Hi, > > I am using SOLR 7.5 in the cloud mode. > I want to create a custom function similar to 'strdist' that works on > multivalued fields (multiValued=true) and finds the highest matching score. > Yes, I know the potential performance issues, but in my usecase this would > bring a huge benefit. > > There is not much information on how to work with multiValued fields, but > I have found a piece of code that might be useful. It's how SOLR standard > functions are registered: > > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ValueSourceParser.java > > The interesting part for me starts in line 424, when the 'field' function > is registered. > It optionally accepts a multivalue field for min/max calculation. > If the 2nd argument is 'min' or 'max' it tries to resolve the field as > SchemaField. > SchemaField f = fp.getReq().getSchema().getField(fieldName); > > Now the questions are: > 1. Is this the path I should follow? If not - are there any other ways? > 2. How to retrieve all the actual *String *or *Text *values from a > multivalue field, not just a single value? Some kind of a table or set of > values. How? > 3. Does cloud mode change anything here? In my case the whole index is on > a single machine, but there are several replicas. > > Best regards, > Dariusz Wojtas > >
Accessing multiValued field from within custom function
Hi, I am using SOLR 7.5 in the cloud mode. I want to create a custom function similar to 'strdist' that works on multivalued fields (multiValued=true) and finds the highest matching score. Yes, I know the potential performance issues, but in my usecase this would bring a huge benefit. There is not much information on how to work with multiValued fields, but I have found a piece of code that might be useful. It's how SOLR standard functions are registered: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ValueSourceParser.java The interesting part for me starts in line 424, when the 'field' function is registered. It optionally accepts a multivalue field for min/max calculation. If the 2nd argument is 'min' or 'max' it tries to resolve the field as SchemaField. SchemaField f = fp.getReq().getSchema().getField(fieldName); Now the questions are: 1. Is this the path I should follow? If not - are there any other ways? 2. How to retrieve all the actual *String *or *Text *values from a multivalue field, not just a single value? Some kind of a table or set of values. How? 3. Does cloud mode change anything here? In my case the whole index is on a single machine, but there are several replicas. Best regards, Dariusz Wojtas
How to return matches against a multivalued field in Solr search results?
G'day, We're running Solr 5.5.5 to build a search application for a repository of MS-Office docs and PDFs. Our schema includes a multivalued field that holds the IDs of objects embedded in our documents - there can be 100s sometimes 1000s of such objects per document. We have a custom query parser that transforms (portions of) the user's query into a search term against this IDs field. The search results return all of the IDs found in the documents matching the query. Is there some way to identify for each document in the results, the subset of IDs it contains that matched the query? See also: https://issues.apache.org/jira/browse/SOLR-3955 Thanks, Chris. This message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.
Re: Copyto with DIH Interpreting string as MultiValued field on copy
Makes total sense. Thanks to both of your for the clarification! On 8/18/18, 8:03 AM, "Alexandre Rafalovitch" wrote: >Amd part of the issue is that SolrEntityProcessor does not take individual >field definitions. So that part is ignored and instead just 'fl' mapping >is >used as Shawn explained. > >So you could also remap authorText in that definition to an ignored field. >See >https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH >/solr/solr/conf/solr-data-config.xml > >Regards, >Alex > >On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey, wrote: > >> On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote: >> > I¹m trying to track down an odd issue I¹m seeing when using the >> SolrEntityProcessor to seed some test data from a solr 4.x cluster to a >> solr 7.x cluster. It seems like strings are being interpreted as >> multivalued when passed from a string field to a text field via the >>copyTo >> directive. Any clever ideas how to resolve this? >> >> What's happening is deceptively simple. >> >> In the source system, you're copying from author to authorText. Both >> fields are stored. So if you have "Jeff Hartley" in author, you also >> have "Jeff Hartley" in authorText. So what's happening is that when the >> destination system imports from the source system, it gets "Jeff >> Hartley" in both fields, and then copyField says "put a copy of what's >> in author into authorText" ... and suddenly there are two copies of >> "Jeff Hartley" in authorText. >> >> There are two ways to deal with this: >> >> 1) In the query you're doing with SolrEntityProcessor, add an "fl" >> parameter and list all the fields *except* authorText and any other >> field where this same problem is happening. >> >> 2) Remove the copyField from the schema until after the import from the >> source server is done. >> >> Thanks, >> Shawn >> >>
Re: Copyto with DIH Interpreting string as MultiValued field on copy
Amd part of the issue is that SolrEntityProcessor does not take individual field definitions. So that part is ignored and instead just 'fl' mapping is used as Shawn explained. So you could also remap authorText in that definition to an ignored field. See https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH/solr/solr/conf/solr-data-config.xml Regards, Alex On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey, wrote: > On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote: > > I’m trying to track down an odd issue I’m seeing when using the > SolrEntityProcessor to seed some test data from a solr 4.x cluster to a > solr 7.x cluster. It seems like strings are being interpreted as > multivalued when passed from a string field to a text field via the copyTo > directive. Any clever ideas how to resolve this? > > What's happening is deceptively simple. > > In the source system, you're copying from author to authorText. Both > fields are stored. So if you have "Jeff Hartley" in author, you also > have "Jeff Hartley" in authorText. So what's happening is that when the > destination system imports from the source system, it gets "Jeff > Hartley" in both fields, and then copyField says "put a copy of what's > in author into authorText" ... and suddenly there are two copies of > "Jeff Hartley" in authorText. > > There are two ways to deal with this: > > 1) In the query you're doing with SolrEntityProcessor, add an "fl" > parameter and list all the fields *except* authorText and any other > field where this same problem is happening. > > 2) Remove the copyField from the schema until after the import from the > source server is done. > > Thanks, > Shawn > >
Re: Copyto with DIH Interpreting string as MultiValued field on copy
On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote: I’m trying to track down an odd issue I’m seeing when using the SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 7.x cluster. It seems like strings are being interpreted as multivalued when passed from a string field to a text field via the copyTo directive. Any clever ideas how to resolve this? What's happening is deceptively simple. In the source system, you're copying from author to authorText. Both fields are stored. So if you have "Jeff Hartley" in author, you also have "Jeff Hartley" in authorText. So what's happening is that when the destination system imports from the source system, it gets "Jeff Hartley" in both fields, and then copyField says "put a copy of what's in author into authorText" ... and suddenly there are two copies of "Jeff Hartley" in authorText. There are two ways to deal with this: 1) In the query you're doing with SolrEntityProcessor, add an "fl" parameter and list all the fields *except* authorText and any other field where this same problem is happening. 2) Remove the copyField from the schema until after the import from the source server is done. Thanks, Shawn
Copyto with DIH Interpreting string as MultiValued field on copy
Hi, I’m trying to track down an odd issue I’m seeing when using the SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 7.x cluster. It seems like strings are being interpreted as multivalued when passed from a string field to a text field via the copyTo directive. Any clever ideas how to resolve this? Schema: Fields and CopyTo Text fieldtype declaration: DIH Config: http://cluster.solr.eng.techtarget.com/solr/vignette " query="*:*" fl="*,orig_version_l:_version_"> Error: org.apache.solr.common.SolrException: ERROR: [doc=d751e434c69b6210VgnVCM100d01c80aRCRD] Error adding field 'author'='Jeff Hartley' msg=Multiple values encountered for non multiValued copy field authorText: Jeff Hartley at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:203) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:101) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:980) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1168) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80) ~[?:?] at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:258) ~[?:?] at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:527) ~[?:?] at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) ~[?:?] at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) ~[?:?] at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) ~[?:?] at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) ~[?:?] at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) ~[?:?] at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) ~[?:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] Caused by: org.apache.solr.common.SolrException: Multiple values encountered for non multiValued copy field authorText: Jeff Hartley at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:180) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13] ... 22 more
Re: _childDocuments_ automatically multivalued field type
Ok, I'll have a look at the link above. Thanks a lot... Best JB -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: _childDocuments_ automatically multivalued field type
Ok, I see what I have to look for, thanks to your reply. I'll adjust the schema and see difference. Thanks. Best JB -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: _childDocuments_ automatically multivalued field type
On 7/2/2018 9:18 AM, jeebix wrote: > I don't understand why for example "type_cmd_s" get the field type attribute > "singleValued", but "TTC" or "kits_sans_suite" get "multiValued" attribute ? > Why those field are in the managed-schema and enseigne_s (for example) is > not ? The field named enseigne_s is almost certainly handled by a dynamic field definition, most likely one with the name "*_s". That field (and its field type) do not have multiValued="true". This was probably already in your schema before you did any indexing. The ones that were automatically added by the data-driven nature of your schema were added as the "strings" type, which IS multi-valued. The update processor definition that is in the Solr examples is set up to add fields as multiValued, so that if a later indexing request comes in with multiple values for the field, it will not fail. This is the major danger of relying on Solr to automatically add fields to your schema. Chances are good that the choice it makes for the field will be the wrong choice. And when that happens, you will need to fix the schema and completely reindex. https://wiki.apache.org/solr/HowToReindex Thanks, Shawn
Re: _childDocuments_ automatically multivalued field type
Because your _s fields must be mapping to the dynamicField definition and are created accordingly in the schema dynamically without needing a special definition for each field. The TTC field you did map explicitly, perhaps with "schemaless" mapping autodiscovery. Which does create specific field definitions, but always multivalued. The multivalued attribute can be set on type not just on individual field. So you may just want to adjust schema definition to use singular types instead. AdminUI schema screen is helpful to see such differences. Regards, Alex On Mon, Jul 2, 2018, 11:43 AM jeebix, wrote: > Hello everybody, > > I have a problem with some field types in the managed-schema generated. > > First, the data SOLR returned with a standard query : > > response":{"numFound":365567,"start":0,"docs":[ > { > "id":"560.561.134676", > "parent_i":560, > "asso_i":561, > "personne_i":134676, > "etat_technique_s":"avec_documents", > "etat_marketing_s":"actif", > "type_parent_s":"Ecole élémentaire publique", > "type_asso_s":"APE (association de parents d'élèves)", > "groupe_type_parent_s":"ENSEIGNEMENT_PRIMAIRE", > "groupe_type_asso_s":"ASSOCIATION_DE_PARENTS", > "nombre_commandes_brut_i":2, > "nombre_commandes_i":1, > "nombre_kits_saveur_i":0, > "ca_periode_i":560, > "ca_periode_fleur_i":0, > "ca_periode_saveur_i":0, > "zone_scolaire_s":"A", > "territoire_s":"France Métropolitaine", > "region_s":"AUVERGNE RHONE-ALPES", > "departement_s":"01 AIN", > "postal_country_s":"FR", > "asso_country_s":"FRANCE", > "object_type_s":"contact", > "date_derni_re_commande_dt":"2016-05-20T00:00:00Z", > "_version_":1604889647955050496, > "_childDocuments_":[ > { > "fixe_facturation":["0256897856"], > "object_type":["order"], > "mobile_livraison":["0658987874"], > "kit_sans_suite":["false"], > "fixe_livraison":["0450598311"], > "type_cde_s":"CDE", > "statut_s":"V", > "mobile_facturation":["0658787458"], > "campagne_s":"A", > "TTC":[780], > "date_dt":"2016-05-20T00:00:00Z", > "id":"A28837", > "enseigne_s":"CRE"}, > { > "fixe_facturation":["0245784975"], > "object_type":["order"], > "mobile_livraison":["0645789874"], > "kit_sans_suite":["false"], > "type_cde_s":"KIT", > "statut_s":"V", > "mobile_facturation":["0612345678"], > "campagne_s":"A", > "TTC":[0], > "date_dt":"2016-05-04T00:00:00Z", > "id":"A25415", > "enseigne_s":"CRE"}]} > > My goal is to sum fields "TTC" by parentDocument. But with the type > "multiValued", I can't use aggregation functions. > > The core get the data from this script : /opt/solr/bin/post -c > -format solr build/index.json > > The index.json looks like that: > > [ > { > "id": "781.782.134878", > "parent_i": 781, > "asso_i": 782, > "personne_i": 134878, > "etat_technique_s": "avec_documents", > "etat_marketing_s": "inactif", > "type_parent_s": "Ecole élémentaire privée", > "type_asso_s": "APEL (association de parents école libre)", > "groupe_type_parent_s": "ENSEIGNEMENT_PRIMAIRE", > "groupe_type_asso_s": "ASSOCIATION_DE_PARENTS", > "nombre_commandes_brut_i": 4, > "nombre_commandes_i": 2, > "n
_childDocuments_ automatically multivalued field type
Hello everybody, I have a problem with some field types in the managed-schema generated. First, the data SOLR returned with a standard query : response":{"numFound":365567,"start":0,"docs":[ { "id":"560.561.134676", "parent_i":560, "asso_i":561, "personne_i":134676, "etat_technique_s":"avec_documents", "etat_marketing_s":"actif", "type_parent_s":"Ecole élémentaire publique", "type_asso_s":"APE (association de parents d'élèves)", "groupe_type_parent_s":"ENSEIGNEMENT_PRIMAIRE", "groupe_type_asso_s":"ASSOCIATION_DE_PARENTS", "nombre_commandes_brut_i":2, "nombre_commandes_i":1, "nombre_kits_saveur_i":0, "ca_periode_i":560, "ca_periode_fleur_i":0, "ca_periode_saveur_i":0, "zone_scolaire_s":"A", "territoire_s":"France Métropolitaine", "region_s":"AUVERGNE RHONE-ALPES", "departement_s":"01 AIN", "postal_country_s":"FR", "asso_country_s":"FRANCE", "object_type_s":"contact", "date_derni_re_commande_dt":"2016-05-20T00:00:00Z", "_version_":1604889647955050496, "_childDocuments_":[ { "fixe_facturation":["0256897856"], "object_type":["order"], "mobile_livraison":["0658987874"], "kit_sans_suite":["false"], "fixe_livraison":["0450598311"], "type_cde_s":"CDE", "statut_s":"V", "mobile_facturation":["0658787458"], "campagne_s":"A", "TTC":[780], "date_dt":"2016-05-20T00:00:00Z", "id":"A28837", "enseigne_s":"CRE"}, { "fixe_facturation":["0245784975"], "object_type":["order"], "mobile_livraison":["0645789874"], "kit_sans_suite":["false"], "type_cde_s":"KIT", "statut_s":"V", "mobile_facturation":["0612345678"], "campagne_s":"A", "TTC":[0], "date_dt":"2016-05-04T00:00:00Z", "id":"A25415", "enseigne_s":"CRE"}]} My goal is to sum fields "TTC" by parentDocument. But with the type "multiValued", I can't use aggregation functions. The core get the data from this script : /opt/solr/bin/post -c -format solr build/index.json The index.json looks like that: [ { "id": "781.782.134878", "parent_i": 781, "asso_i": 782, "personne_i": 134878, "etat_technique_s": "avec_documents", "etat_marketing_s": "inactif", "type_parent_s": "Ecole élémentaire privée", "type_asso_s": "APEL (association de parents école libre)", "groupe_type_parent_s": "ENSEIGNEMENT_PRIMAIRE", "groupe_type_asso_s": "ASSOCIATION_DE_PARENTS", "nombre_commandes_brut_i": 4, "nombre_commandes_i": 2, "nombre_kits_saveur_i": 2, "date_dernière_commande_dt": "2010-11-16", "ca_periode_i": 0, "ca_periode_fleur_i": 0, "ca_periode_saveur_i": 0, "zone_scolaire_s": "A", "territoire_s": "France Métropolitaine", "region_s": "AUVERGNE RHONE-ALPES", "departement_s": "01 AIN", "postal_country_s": "FR", "asso_country_s": "FRANCE", "object_type_s": "contact", "kits_sans_suite_ss": null, "_childDocuments_": [ { "fixe_facturation": "0450407279", "object_type": "order", "mobile_livraison": "0628332864", "kit_sans_suite": "false", "fixe_livraison": "0450407279", "type_cde_s": "KIT", "statut_s": "V", "mobile_facturation": "0628332864", "campagne_s": "L", "TTC": 0, "date_dt": "2009-10-12T00:00:00Z", "id": "L14276", "enseigne_s": "SAV", "gamme": [ "KITS > Kits Saveurs" ] }, { "fixe_facturation": "0450407279", "object_type": "order", "mobile_livraison": "0628332864", "kit_sans_suite": "false", "fixe_livraison": "0450407279", "type_cde_s": "CDE", "statut_s": "V", "mobile_facturation": "0628332864", "campagne_s": "L", "TTC": 1045, "date_dt": "2009-11-14T00:00:00Z", "id": "L25049", "enseigne_s": "SAV", "gamme": [ "CHOCOLAT > Assortiment", "CHOCOLAT > Individuel", "CHOCOLAT > Mono-produit", "EQUIPEMENT MAISON > Cuisine", "EQUIPEMENT MAISON > Décoration", "KITS > Kits Saveurs", "SAVEURS > Confiserie", "SAVEURS > Pâtisserie" ] } ] In the managed-schema, only those fields appear: I don't understand why for example "type_cmd_s" get the field type attribute "singleValued", but "TTC" or "kits_sans_suite" get "multiValued" attribute ? Why those field are in the managed-schema and enseigne_s (for example) is not ? Thanks a lot for your help... Best JB -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr sort multivalued field
On 6/12/2018 2:56 AM, Marc Lammers wrote: I want to sort my data by a multivalued field. I add this to my query „*sort=field(foo,min) asc“*. The configuration in the schema for this field is The documentation for the field function says that the field must contain numeric docvalues. Your field has type="string" and although you did not indicate what the definition of string is in your schema, most likely it is the solr.StrField class. https://lucene.apache.org/solr/guide/7_3/function-queries.html#FunctionQueries-field Because this is not a numeric field, I'm guessing that it will not work with the field function. All of the examples for that function are referencing a float field. Thanks, Shawn
Solr sort multivalued field
Hi All. I want to sort my data by a multivalued field. I add this to my query „*sort=field(foo,min) asc“*. The configuration in the schema for this field is The solr documentation says that i have to add the docValues="true" attribute for this field. After this I deleted the collection and reimported the data. But when I execute my query I get the following error message: *„ sort param could not be parsed as a query, and is not a field that exists in the index: field(foo,min)*“ Did I forget to set something? Thanks in advance, Marc
Re: Query a particular index from a multivalued field.
there's no such syntax OOB. You could append an index to it. So your input doc would look something like: doc 1= { "id": "1", "status": [ "b1", "a2" ] } and search appropriately. Perhaps this would be a duplicated field used only when you wanted to search by position. Best, Erick On Thu, Jun 7, 2018 at 8:36 AM, root23 wrote: > Hi all, > is there a way i can query a particular index of a multivalued field. > e.g lets say i have a document like this > doc 1= { > "id": "1", > "status": [ > "b", > "a" > ] > } > > doc2= { > "id": "1", > "status": [ > "c", > "b" > ] > } > > can i query like give me the document which has status = b at index 0. which > should only return doc 1. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Query a particular index from a multivalued field.
Hi all, is there a way i can query a particular index of a multivalued field. e.g lets say i have a document like this doc 1= { "id": "1", "status": [ "b", "a" ] } doc2= { "id": "1", "status": [ "c", "b" ] } can i query like give me the document which has status = b at index 0. which should only return doc 1. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: How to replacing values on multiValued all together by using 1 query
On 5/10/2018 7:51 AM, Issei Nishigata wrote: I create a field called employee_name, and use it as multiValued. If “Mr.Smith" that is part of the value of the field is changed to “Mr.Brown", do I have to create 1 million deletion queries and updating queries in case where “Mr.Smith" appears in 1 million documents? Do we have a simple way of updating to use only 1 query? Solr is not a database. There is no functionality to update field X on all documents that match a query. If your index has a field designated as uniqueKey, you won't need to do a delete, you can just reindex those documents and Solr will do the delete for you. If you do not have a uniqueKey, then you will need to delete the old document before indexing its replacement. https://wiki.apache.org/solr/HowToReindex If your index meets the criteria for Atomic Updates, then you could use the delete/add functionality in that feature to take care of it. Solr will construct a complete document from your atomic update and all the data currently in the document, then index the new document, deleting the old one in the process. Atomic Updates also require a uniqueKey. See this page for information on Atomic Updates and the field storage requirements: https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html Thanks, Shawn
How to replacing values on multiValued all together by using 1 query
Hi, all I create a field called employee_name, and use it as multiValued. If “Mr.Smith" that is part of the value of the field is changed to “Mr.Brown", do I have to create 1 million deletion queries and updating queries in case where “Mr.Smith" appears in 1 million documents? Do we have a simple way of updating to use only 1 query? Thanks, Issei -- Issei Nishigata
Solr sort on latest upcoming timestamp value on multivalued field
I have a multivalued field for session timings (where i store timestamps) of groups document. e.g. session_timings: [1526882026, 1513882026, 1533882026 ]. My sorting logic is the groups should be listed sorted based on their upcoming session time. For example, Group A has three session_timings = [1, 2, 5]. And Group B also has three session_timings = [1, 6, 7]. If current timestamp is 3, then Group A should come first because next session for Group A is on 5, whereas for Group B its 7. Is this possible with solr sorting? Or do I have to use another way to do this? Any help would be great. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
On Mon, Feb 26, 2018 at 7:14 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Faceting works on multivalued fields, perhaps you can do something with > that? > > The main difference I see in this case between facets and groups is that groups are sorted by score, so most relevant group comes first. Which is very useful when I have to return grouped results to the user. -- Vincenzo D'Amore
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Of course, and in that use-case you'd want a particular document to appear in all three categories. Another client may want the doc to appear in only the "most important" category, however that's defined. Another client may want the doc to appear in "the more recent" day (assuming we're grouping by date). Or "the oldest day". that's what I meant by " rather than do something which will be wrong it throws an error". Whatever you choose will be "wrong" in some use-case. Your use-case is certainly valid, but nobody has come forward with a patch to allow it that I know of. Faceting works on multivalued fields, perhaps you can do something with that? Best, Erick On Mon, Feb 26, 2018 at 9:10 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi Erick, > > please consider this case where there is a group products that are > televisions. > > Now I have only one category per product, but in same cases like the > television I could have more than one. > > Some products should be available simultaneously in more categories, thats > why the field I was trying to group is a multivalue, for example: > > /home-video/televisions/tv-led (516) > /home-video/televisions/tv-ultra-hd-4k (363) > /home-video/televisions/smart-tv (19) > > So there can be a television that is simultaneously a TV led, a smart tv > and is ultra hd 4k. > > So, for example, I should be able to submit the following query: > > - fq=available:true > - fq=vertical:0 > - q=television > - rows=3 > - group=true > - group.field=category > - group.limit=0 > > So the returned groups should be something like this (this is the output I > have now for the single value field) > > > > 51653 > > > /home-video/televisions/tv-led > maxScore="0.6224861"> > > > > /home-video/televisions/tv-ultra-hd-4k > maxScore="0.5923965"> > > > > /home-video/televisions/smart-tv > > > > > > > > > > On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> What does "group by" mean on a field with more than one value? Say I >> have "A" and "B" in the field in a single document. What group does it >> go in, one labeld "A" or one labeled "B"? >> >> So IIUC, rather than do something which will be wrong it throws an >> error if the field is defined as multiValued. And whatever option is >> chosen (e.g. use the min or max or) will be wrong sometime. >> >> Although admittedly the error is a bit obscure... >> >> Best, >> Erick >> >> On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com> >> wrote: >> > Hi Amrit, >> > >> > thanks for your help. >> > >> > I know that only 5/10% of documents in the collection have more than one >> > value for the field I was trying to group by. >> > >> > So there isn't a particular memory usage in this case. Do you know if >> there >> > is any other counter-indication I have to be aware of? >> > >> > I was thinking to avoid this problem hacking the source code and deploy a >> > personalised version of Solr. >> > >> > Best regards, >> > Vincenzo >> > >> > >> > >> > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com> >> > wrote: >> > >> >> Vincenzo, >> >> >> >> As I read the source code; SchemaField.java >> >> >> >> /** >> >> * Sanity checks that the properties of this field type are plausible >> >> * for a field that may be used to get a FieldCacheSource, throwing >> >> * an appropriate exception (including the field name) if it is not. >> >> * FieldType subclasses can choose to call this method in their >> >> * getValueSource implementation >> >> * @see FieldType#getValueSource >> >> */ >> >> public void checkFieldCacheSource() throws SolrException { >> >> if ( multiValued() ) { >> >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, >> >> "can not use FieldCache on multivalued >> field: " >> >> + getName()); >> >> } >> >> if (! hasDocValues() ) { >> >> if ( ! ( indexed() && null != this.type.getUninve
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Hi Erick, please consider this case where there is a group products that are televisions. Now I have only one category per product, but in same cases like the television I could have more than one. Some products should be available simultaneously in more categories, thats why the field I was trying to group is a multivalue, for example: /home-video/televisions/tv-led (516) /home-video/televisions/tv-ultra-hd-4k (363) /home-video/televisions/smart-tv (19) So there can be a television that is simultaneously a TV led, a smart tv and is ultra hd 4k. So, for example, I should be able to submit the following query: - fq=available:true - fq=vertical:0 - q=television - rows=3 - group=true - group.field=category - group.limit=0 So the returned groups should be something like this (this is the output I have now for the single value field) 51653 /home-video/televisions/tv-led /home-video/televisions/tv-ultra-hd-4k /home-video/televisions/smart-tv On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson <erickerick...@gmail.com> wrote: > What does "group by" mean on a field with more than one value? Say I > have "A" and "B" in the field in a single document. What group does it > go in, one labeld "A" or one labeled "B"? > > So IIUC, rather than do something which will be wrong it throws an > error if the field is defined as multiValued. And whatever option is > chosen (e.g. use the min or max or) will be wrong sometime. > > Although admittedly the error is a bit obscure... > > Best, > Erick > > On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com> > wrote: > > Hi Amrit, > > > > thanks for your help. > > > > I know that only 5/10% of documents in the collection have more than one > > value for the field I was trying to group by. > > > > So there isn't a particular memory usage in this case. Do you know if > there > > is any other counter-indication I have to be aware of? > > > > I was thinking to avoid this problem hacking the source code and deploy a > > personalised version of Solr. > > > > Best regards, > > Vincenzo > > > > > > > > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com> > > wrote: > > > >> Vincenzo, > >> > >> As I read the source code; SchemaField.java > >> > >> /** > >> * Sanity checks that the properties of this field type are plausible > >> * for a field that may be used to get a FieldCacheSource, throwing > >> * an appropriate exception (including the field name) if it is not. > >> * FieldType subclasses can choose to call this method in their > >> * getValueSource implementation > >> * @see FieldType#getValueSource > >> */ > >> public void checkFieldCacheSource() throws SolrException { > >> if ( multiValued() ) { > >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, > >> "can not use FieldCache on multivalued > field: " > >> + getName()); > >> } > >> if (! hasDocValues() ) { > >> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) > ) { > >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, > >> "can not use FieldCache on a field w/o > >> docValues unless it is indexed and supports Uninversion: " > >> + getName()); > >> } > >> } > >> } > >> > >> Seems like FieldCache are not allowed to un-invert values for > >> multi-valued fields. > >> > >> I can suspect the reason, multiple values will eat up more memory? Not > >> sure, someone else can weigh in. > >> > >> > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> Medium: https://medium.com/@sarkaramrit2 > >> > >> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com> > >> wrote: > >> > >> > Hi, > >> > > >> > while trying to run a group query on a multivalue field I received > this > >> > error: > >> > > >> > can not use FieldCache on multivalued field: > >> > > >> > > >> > > >> > > >> > > >> > true > >> > 400 > >> > 4 > >> > > >> > > >> > > >> > org.apache.solr.common.SolrException str> > >> > org.apache.solr.common. > >> > SolrException > >> > > >> > can not use FieldCache on multivalued field: > >> > categoryLevels > >> > 400 > >> > > >> > > >> > > >> > I don't understand why this is happening. > >> > > >> > Do you know any way to work around this problem? > >> > > >> > Thanks in advance, > >> > Vincenzo > >> > > >> > -- > >> > Vincenzo D'Amore > >> > > >> > > > > > > > > -- > > Vincenzo D'Amore > -- Vincenzo D'Amore
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
What does "group by" mean on a field with more than one value? Say I have "A" and "B" in the field in a single document. What group does it go in, one labeld "A" or one labeled "B"? So IIUC, rather than do something which will be wrong it throws an error if the field is defined as multiValued. And whatever option is chosen (e.g. use the min or max or) will be wrong sometime. Although admittedly the error is a bit obscure... Best, Erick On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi Amrit, > > thanks for your help. > > I know that only 5/10% of documents in the collection have more than one > value for the field I was trying to group by. > > So there isn't a particular memory usage in this case. Do you know if there > is any other counter-indication I have to be aware of? > > I was thinking to avoid this problem hacking the source code and deploy a > personalised version of Solr. > > Best regards, > Vincenzo > > > > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com> > wrote: > >> Vincenzo, >> >> As I read the source code; SchemaField.java >> >> /** >> * Sanity checks that the properties of this field type are plausible >> * for a field that may be used to get a FieldCacheSource, throwing >> * an appropriate exception (including the field name) if it is not. >> * FieldType subclasses can choose to call this method in their >> * getValueSource implementation >> * @see FieldType#getValueSource >> */ >> public void checkFieldCacheSource() throws SolrException { >> if ( multiValued() ) { >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, >> "can not use FieldCache on multivalued field: " >> + getName()); >> } >> if (! hasDocValues() ) { >> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) { >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, >> "can not use FieldCache on a field w/o >> docValues unless it is indexed and supports Uninversion: " >> + getName()); >> } >> } >> } >> >> Seems like FieldCache are not allowed to un-invert values for >> multi-valued fields. >> >> I can suspect the reason, multiple values will eat up more memory? Not >> sure, someone else can weigh in. >> >> >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> Medium: https://medium.com/@sarkaramrit2 >> >> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com> >> wrote: >> >> > Hi, >> > >> > while trying to run a group query on a multivalue field I received this >> > error: >> > >> > can not use FieldCache on multivalued field: >> > >> > >> > >> > >> > >> > true >> > 400 >> > 4 >> > >> > >> > >> > org.apache.solr.common.SolrException >> > org.apache.solr.common. >> > SolrException >> > >> > can not use FieldCache on multivalued field: >> > categoryLevels >> > 400 >> > >> > >> > >> > I don't understand why this is happening. >> > >> > Do you know any way to work around this problem? >> > >> > Thanks in advance, >> > Vincenzo >> > >> > -- >> > Vincenzo D'Amore >> > >> > > > > -- > Vincenzo D'Amore
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Hi Amrit, thanks for your help. I know that only 5/10% of documents in the collection have more than one value for the field I was trying to group by. So there isn't a particular memory usage in this case. Do you know if there is any other counter-indication I have to be aware of? I was thinking to avoid this problem hacking the source code and deploy a personalised version of Solr. Best regards, Vincenzo On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com> wrote: > Vincenzo, > > As I read the source code; SchemaField.java > > /** > * Sanity checks that the properties of this field type are plausible > * for a field that may be used to get a FieldCacheSource, throwing > * an appropriate exception (including the field name) if it is not. > * FieldType subclasses can choose to call this method in their > * getValueSource implementation > * @see FieldType#getValueSource > */ > public void checkFieldCacheSource() throws SolrException { > if ( multiValued() ) { > throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, > "can not use FieldCache on multivalued field: " > + getName()); > } > if (! hasDocValues() ) { > if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) { > throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, > "can not use FieldCache on a field w/o > docValues unless it is indexed and supports Uninversion: " > + getName()); > } > } > } > > Seems like FieldCache are not allowed to un-invert values for > multi-valued fields. > > I can suspect the reason, multiple values will eat up more memory? Not > sure, someone else can weigh in. > > > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com> > wrote: > > > Hi, > > > > while trying to run a group query on a multivalue field I received this > > error: > > > > can not use FieldCache on multivalued field: > > > > > > > > > > > > true > > 400 > > 4 > > > > > > > > org.apache.solr.common.SolrException > > org.apache.solr.common. > > SolrException > > > > can not use FieldCache on multivalued field: > > categoryLevels > > 400 > > > > > > > > I don't understand why this is happening. > > > > Do you know any way to work around this problem? > > > > Thanks in advance, > > Vincenzo > > > > -- > > Vincenzo D'Amore > > > -- Vincenzo D'Amore
Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Vincenzo, As I read the source code; SchemaField.java /** * Sanity checks that the properties of this field type are plausible * for a field that may be used to get a FieldCacheSource, throwing * an appropriate exception (including the field name) if it is not. * FieldType subclasses can choose to call this method in their * getValueSource implementation * @see FieldType#getValueSource */ public void checkFieldCacheSource() throws SolrException { if ( multiValued() ) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "can not use FieldCache on multivalued field: " + getName()); } if (! hasDocValues() ) { if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "can not use FieldCache on a field w/o docValues unless it is indexed and supports Uninversion: " + getName()); } } } Seems like FieldCache are not allowed to un-invert values for multi-valued fields. I can suspect the reason, multiple values will eat up more memory? Not sure, someone else can weigh in. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi, > > while trying to run a group query on a multivalue field I received this > error: > > can not use FieldCache on multivalued field: > > > > > > true > 400 > 4 > > > > org.apache.solr.common.SolrException > org.apache.solr.common. > SolrException > > can not use FieldCache on multivalued field: > categoryLevels > 400 > > > > I don't understand why this is happening. > > Do you know any way to work around this problem? > > Thanks in advance, > Vincenzo > > -- > Vincenzo D'Amore >
Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels
Hi, while trying to run a group query on a multivalue field I received this error: can not use FieldCache on multivalued field: true 400 4 org.apache.solr.common.SolrException org.apache.solr.common.SolrException can not use FieldCache on multivalued field: categoryLevels 400 I don't understand why this is happening. Do you know any way to work around this problem? Thanks in advance, Vincenzo -- Vincenzo D'Amore
Re: Title Search scoring issues with multivalued field & norm
Using edismax with different fields for each title will affect the final scores if the tie paramter is non-zero. Can we create separate document for each title? The uniqueness won't be for movie_id but for each title. In this manner, even while using edismax, the other titles won't affect the score. Any other way to handle norms in multivalued field? On Thu, Feb 1, 2018 at 12:24 PM, Sravan Kumar <sra...@caavo.com> wrote: > @Walter: Perhaps you are right on not to consider stemming. Instead fuzzy > search will cover these along with the misspellings. > > In case of symbols, we want the titles matching the symbols ranked higher > than the others. Perhaps we can use this field only for boosting. > > Certain movies have around 4-6 different aliases based on what our source > gives and we do not really know what is the max. Is there no other way from > lucene/solr to use a multivalued field? > > > On Thu, Feb 1, 2018 at 11:06 AM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> I was the first search engineer at Netflix and moved their search from a >> home-grown engine to Solr. It worked very well with a single title field >> and aliases. >> >> I think your schema is too complicated for movie search. >> >> Stemming is not useful. It doesn’t help search and it can hurt. You don’t >> want the movie “Saw” to match the query “see”. >> >> When is it useful to search with symbols? Remove the punctuation. >> >> The only movie titles with symbols that caused any challenge were: >> >> * Frost/Nixon >> * .hack//Sign >> * +/- >> >> For the first two, removing punctuation worked fine. For the last one, I >> hardcoded a translation to “plus/minus” before indexing or querying. >> >> Query completion made a huge difference, taking our clickthrough rate >> from 0.45 to 0.55. >> >> Later, we added fuzzy search to handle misspellings. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> > On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote: >> > >> > @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. >> This >> > is done through the fieldnorm component in the class. The issue is when >> the >> > field is multivalued. Consider the field has two string each of 4 >> tokens. >> > The fieldNorm from the lucene TFIDFSimilarity class considers the total >> sum >> > of these two values i.e 8 for normalizing instead of 4. Hence, the >> ranking >> > is distorted. >> > Regarding the search evaluation, we do have a curated set. >> > >> > >> > On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote: >> > >> >> For smaller length documents TFIDFSimilarity will weight towards >> shorter >> >> documents. Another way to say this, if your documents are 5-10 terms, >> the >> >> 5 terms are going to win. >> >> You might think about having per token, or token pair, weight. I >> would be >> >> surprised if there was not something similar out there. This is a >> common >> >> issue with any short text. >> >> I guess I would think of this as TFICF, where the CF is the corpus >> >> frequency. You also might want to weight inversely proportional to the >> age >> >> of the title, older are less important. This is assuming people are >> doing >> >> searches within some time cluster, newer is more likely. >> >> >> >> For some obvious advice, things you probably already know. This kind >> of >> >> search needs some hard measurement to begin to know how to tune it. >> You >> >> need to find a reasonable annotated representation. So, if you took >> the >> >> previous months searches where there is a chain of successive >> searches. If >> >> you weighted things differently would you shorten the length of the >> chain. >> >> Can you get the click throughs to happen sooner. >> >> >> >> Anyway, just my 2 cents >> >> >> >> >> >> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> >> wrote: >> >> >> >>> >> >>> @Walter: We have 6 fields declared in schema.xml for title each with >> >>> different type of analyzer. One without processing symbols, other >> stemmed >> >>> and other removing symbols, etc. So, if we have separate fields for
Re: Title Search scoring issues with multivalued field & norm
@Walter: Perhaps you are right on not to consider stemming. Instead fuzzy search will cover these along with the misspellings. In case of symbols, we want the titles matching the symbols ranked higher than the others. Perhaps we can use this field only for boosting. Certain movies have around 4-6 different aliases based on what our source gives and we do not really know what is the max. Is there no other way from lucene/solr to use a multivalued field? On Thu, Feb 1, 2018 at 11:06 AM, Walter Underwood <wun...@wunderwood.org> wrote: > I was the first search engineer at Netflix and moved their search from a > home-grown engine to Solr. It worked very well with a single title field > and aliases. > > I think your schema is too complicated for movie search. > > Stemming is not useful. It doesn’t help search and it can hurt. You don’t > want the movie “Saw” to match the query “see”. > > When is it useful to search with symbols? Remove the punctuation. > > The only movie titles with symbols that caused any challenge were: > > * Frost/Nixon > * .hack//Sign > * +/- > > For the first two, removing punctuation worked fine. For the last one, I > hardcoded a translation to “plus/minus” before indexing or querying. > > Query completion made a huge difference, taking our clickthrough rate from > 0.45 to 0.55. > > Later, we added fuzzy search to handle misspellings. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote: > > > > @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. > This > > is done through the fieldnorm component in the class. The issue is when > the > > field is multivalued. Consider the field has two string each of 4 tokens. > > The fieldNorm from the lucene TFIDFSimilarity class considers the total > sum > > of these two values i.e 8 for normalizing instead of 4. Hence, the > ranking > > is distorted. > > Regarding the search evaluation, we do have a curated set. > > > > > > On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote: > > > >> For smaller length documents TFIDFSimilarity will weight towards shorter > >> documents. Another way to say this, if your documents are 5-10 terms, > the > >> 5 terms are going to win. > >> You might think about having per token, or token pair, weight. I would > be > >> surprised if there was not something similar out there. This is a > common > >> issue with any short text. > >> I guess I would think of this as TFICF, where the CF is the corpus > >> frequency. You also might want to weight inversely proportional to the > age > >> of the title, older are less important. This is assuming people are > doing > >> searches within some time cluster, newer is more likely. > >> > >> For some obvious advice, things you probably already know. This kind of > >> search needs some hard measurement to begin to know how to tune it. You > >> need to find a reasonable annotated representation. So, if you took the > >> previous months searches where there is a chain of successive > searches. If > >> you weighted things differently would you shorten the length of the > chain. > >> Can you get the click throughs to happen sooner. > >> > >> Anyway, just my 2 cents > >> > >> > >> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote: > >> > >>> > >>> @Walter: We have 6 fields declared in schema.xml for title each with > >>> different type of analyzer. One without processing symbols, other > stemmed > >>> and other removing symbols, etc. So, if we have separate fields for > each > >>> alias it will be that many times the number of final fields declared in > >>> schema.xml. And we exactly do not know what is the maximum number of > >>> aliases a movie can have. > >>> @Walter: I will try this but isn’t there any other way where I can > >> tweak ? > >>> > >>> @eric: will try this. But it will work only for exact matches. > >>> > >>> > >>>> On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com > > > >>> wrote: > >>>> > >>>> Or use a boost for the phrase, something like > >>>> "beauty and the beast"^5 > >>>> > >>>>> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood
Re: Title Search scoring issues with multivalued field & norm
I was the first search engineer at Netflix and moved their search from a home-grown engine to Solr. It worked very well with a single title field and aliases. I think your schema is too complicated for movie search. Stemming is not useful. It doesn’t help search and it can hurt. You don’t want the movie “Saw” to match the query “see”. When is it useful to search with symbols? Remove the punctuation. The only movie titles with symbols that caused any challenge were: * Frost/Nixon * .hack//Sign * +/- For the first two, removing punctuation worked fine. For the last one, I hardcoded a translation to “plus/minus” before indexing or querying. Query completion made a huge difference, taking our clickthrough rate from 0.45 to 0.55. Later, we added fuzzy search to handle misspellings. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote: > > @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. This > is done through the fieldnorm component in the class. The issue is when the > field is multivalued. Consider the field has two string each of 4 tokens. > The fieldNorm from the lucene TFIDFSimilarity class considers the total sum > of these two values i.e 8 for normalizing instead of 4. Hence, the ranking > is distorted. > Regarding the search evaluation, we do have a curated set. > > > On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote: > >> For smaller length documents TFIDFSimilarity will weight towards shorter >> documents. Another way to say this, if your documents are 5-10 terms, the >> 5 terms are going to win. >> You might think about having per token, or token pair, weight. I would be >> surprised if there was not something similar out there. This is a common >> issue with any short text. >> I guess I would think of this as TFICF, where the CF is the corpus >> frequency. You also might want to weight inversely proportional to the age >> of the title, older are less important. This is assuming people are doing >> searches within some time cluster, newer is more likely. >> >> For some obvious advice, things you probably already know. This kind of >> search needs some hard measurement to begin to know how to tune it. You >> need to find a reasonable annotated representation. So, if you took the >> previous months searches where there is a chain of successive searches. If >> you weighted things differently would you shorten the length of the chain. >> Can you get the click throughs to happen sooner. >> >> Anyway, just my 2 cents >> >> >> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote: >> >>> >>> @Walter: We have 6 fields declared in schema.xml for title each with >>> different type of analyzer. One without processing symbols, other stemmed >>> and other removing symbols, etc. So, if we have separate fields for each >>> alias it will be that many times the number of final fields declared in >>> schema.xml. And we exactly do not know what is the maximum number of >>> aliases a movie can have. >>> @Walter: I will try this but isn’t there any other way where I can >> tweak ? >>> >>> @eric: will try this. But it will work only for exact matches. >>> >>> >>>> On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>>> >>>> Or use a boost for the phrase, something like >>>> "beauty and the beast"^5 >>>> >>>>> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood < >>> wun...@wunderwood.org> wrote: >>>>> You can use a separate field for title aliases. That is what I did for >>> Netflix search. >>>>> >>>>> Why disable idf? Disabling tf for titles can be a good idea, for >>> example the movie “New York, New York” is not twice as much about New >> York >>> as some other film that just lists it once. >>>>> >>>>> Also, consider using a popularity score as a boost. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: >>>>>> >>>>>> Hi, >>>>>> We are using solr for our movie title search. >>>>>> >>>>>> >>>>>
Re: Title Search scoring issues with multivalued field & norm
@Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. This is done through the fieldnorm component in the class. The issue is when the field is multivalued. Consider the field has two string each of 4 tokens. The fieldNorm from the lucene TFIDFSimilarity class considers the total sum of these two values i.e 8 for normalizing instead of 4. Hence, the ranking is distorted. Regarding the search evaluation, we do have a curated set. On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote: > For smaller length documents TFIDFSimilarity will weight towards shorter > documents. Another way to say this, if your documents are 5-10 terms, the > 5 terms are going to win. > You might think about having per token, or token pair, weight. I would be > surprised if there was not something similar out there. This is a common > issue with any short text. > I guess I would think of this as TFICF, where the CF is the corpus > frequency. You also might want to weight inversely proportional to the age > of the title, older are less important. This is assuming people are doing > searches within some time cluster, newer is more likely. > > For some obvious advice, things you probably already know. This kind of > search needs some hard measurement to begin to know how to tune it. You > need to find a reasonable annotated representation. So, if you took the > previous months searches where there is a chain of successive searches. If > you weighted things differently would you shorten the length of the chain. > Can you get the click throughs to happen sooner. > > Anyway, just my 2 cents > > > On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote: > > > > > @Walter: We have 6 fields declared in schema.xml for title each with > > different type of analyzer. One without processing symbols, other stemmed > > and other removing symbols, etc. So, if we have separate fields for each > > alias it will be that many times the number of final fields declared in > > schema.xml. And we exactly do not know what is the maximum number of > > aliases a movie can have. > > @Walter: I will try this but isn’t there any other way where I can > tweak ? > > > > @eric: will try this. But it will work only for exact matches. > > > > > > > On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com> > > wrote: > > > > > > Or use a boost for the phrase, something like > > > "beauty and the beast"^5 > > > > > >> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood < > > wun...@wunderwood.org> wrote: > > >> You can use a separate field for title aliases. That is what I did for > > Netflix search. > > >> > > >> Why disable idf? Disabling tf for titles can be a good idea, for > > example the movie “New York, New York” is not twice as much about New > York > > as some other film that just lists it once. > > >> > > >> Also, consider using a popularity score as a boost. > > >> > > >> wunder > > >> Walter Underwood > > >> wun...@wunderwood.org > > >> http://observer.wunderwood.org/ (my blog) > > >> > > >>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: > > >>> > > >>> Hi, > > >>> We are using solr for our movie title search. > > >>> > > >>> > > >>> As it is "title search", this should be treated different than the > > normal > > >>> document search. > > >>> Hence, we use a modified version of TFIDFSimilarity with the > following > > >>> changes. > > >>> - disabled TF & IDF and will only have 1 as value. > > >>> - disabled norms by specifying omitNorms as true for all the fields. > > >>> > > >>> There are 6 fields with different analyzers and we make use of > > different > > >>> weights in edismax's qf & pf parameters to match tokens & boost > > phrases. > > >>> > > >>> But, movies could have aliases and have multiple titles. So, we made > > the > > >>> fields multivalued. > > >>> > > >>> Now, consider the following four documents > > >>> 1> "Beauty and the Beast" > > >>> 2> "The Real Beauty and the Beast" > > >>> 3> "Beauty and the Beast", "La bella y la bestia" > > >>> 4> "Beauty and the Beast" > > >&g
Re: Title Search scoring issues with multivalued field & norm
For smaller length documents TFIDFSimilarity will weight towards shorter documents. Another way to say this, if your documents are 5-10 terms, the 5 terms are going to win. You might think about having per token, or token pair, weight. I would be surprised if there was not something similar out there. This is a common issue with any short text. I guess I would think of this as TFICF, where the CF is the corpus frequency. You also might want to weight inversely proportional to the age of the title, older are less important. This is assuming people are doing searches within some time cluster, newer is more likely. For some obvious advice, things you probably already know. This kind of search needs some hard measurement to begin to know how to tune it. You need to find a reasonable annotated representation. So, if you took the previous months searches where there is a chain of successive searches. If you weighted things differently would you shorten the length of the chain. Can you get the click throughs to happen sooner. Anyway, just my 2 cents On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote: > > @Walter: We have 6 fields declared in schema.xml for title each with > different type of analyzer. One without processing symbols, other stemmed > and other removing symbols, etc. So, if we have separate fields for each > alias it will be that many times the number of final fields declared in > schema.xml. And we exactly do not know what is the maximum number of > aliases a movie can have. > @Walter: I will try this but isn’t there any other way where I can tweak ? > > @eric: will try this. But it will work only for exact matches. > > > > On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > > Or use a boost for the phrase, something like > > "beauty and the beast"^5 > > > >> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood < > wun...@wunderwood.org> wrote: > >> You can use a separate field for title aliases. That is what I did for > Netflix search. > >> > >> Why disable idf? Disabling tf for titles can be a good idea, for > example the movie “New York, New York” is not twice as much about New York > as some other film that just lists it once. > >> > >> Also, consider using a popularity score as a boost. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: > >>> > >>> Hi, > >>> We are using solr for our movie title search. > >>> > >>> > >>> As it is "title search", this should be treated different than the > normal > >>> document search. > >>> Hence, we use a modified version of TFIDFSimilarity with the following > >>> changes. > >>> - disabled TF & IDF and will only have 1 as value. > >>> - disabled norms by specifying omitNorms as true for all the fields. > >>> > >>> There are 6 fields with different analyzers and we make use of > different > >>> weights in edismax's qf & pf parameters to match tokens & boost > phrases. > >>> > >>> But, movies could have aliases and have multiple titles. So, we made > the > >>> fields multivalued. > >>> > >>> Now, consider the following four documents > >>> 1> "Beauty and the Beast" > >>> 2> "The Real Beauty and the Beast" > >>> 3> "Beauty and the Beast", "La bella y la bestia" > >>> 4> "Beauty and the Beast" > >>> > >>> Note: Document 3 has two titles in it. > >>> > >>> So, for a query "Beauty and the Beast" and with the above > configuration all > >>> the documents receive same score. But 1,3,4 should have got same score > and > >>> document 2 lesser than others. > >>> > >>> To solve this, we followed what is suggested in the following thread: > >>> http://lucene.472066.n3.nabble.com/Influencing-scores- > on-values-in-multiValue-fields-td1791651.html > >>> > >>> Now, the fields which are used to boost are made to use Norms. And for > >>> matching norms are disabled. This is to make sure that exact & near > exact > >>> matches are rewarded. > >>> > >>> But, for the same query, we get the following results. > >>> query: "Beauty & the Beast" > >>> Search Results: > >>> 1> "Beauty and the Beast" > >>> 4> "Beauty and the Beast" > >>> 2> "The Real Beauty and the Beast" > >>> 3> "Beauty and the Beast", "La bella y la bestia" > >>> > >>> Clearly, the changes have solved only a part of the problem. The > document 3 > >>> should be ranked/scored higher than document 2. > >>> > >>> This is because lucene considers the total field length across all the > >>> values in a multivalued field for normalization. > >>> > >>> How do we handle this scenario and make sure that in multivalued > fields the > >>> normalization is taken care of? > >>> > >>> > >>> -- > >>> Regards, > >>> Sravan > >> >
Re: Title Search scoring issues with multivalued field & norm
@Walter: We have 6 fields declared in schema.xml for title each with different type of analyzer. One without processing symbols, other stemmed and other removing symbols, etc. So, if we have separate fields for each alias it will be that many times the number of final fields declared in schema.xml. And we exactly do not know what is the maximum number of aliases a movie can have. @Walter: I will try this but isn’t there any other way where I can tweak ? @eric: will try this. But it will work only for exact matches. > On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Or use a boost for the phrase, something like > "beauty and the beast"^5 > >> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <wun...@wunderwood.org> >> wrote: >> You can use a separate field for title aliases. That is what I did for >> Netflix search. >> >> Why disable idf? Disabling tf for titles can be a good idea, for example the >> movie “New York, New York” is not twice as much about New York as some other >> film that just lists it once. >> >> Also, consider using a popularity score as a boost. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: >>> >>> Hi, >>> We are using solr for our movie title search. >>> >>> >>> As it is "title search", this should be treated different than the normal >>> document search. >>> Hence, we use a modified version of TFIDFSimilarity with the following >>> changes. >>> - disabled TF & IDF and will only have 1 as value. >>> - disabled norms by specifying omitNorms as true for all the fields. >>> >>> There are 6 fields with different analyzers and we make use of different >>> weights in edismax's qf & pf parameters to match tokens & boost phrases. >>> >>> But, movies could have aliases and have multiple titles. So, we made the >>> fields multivalued. >>> >>> Now, consider the following four documents >>> 1> "Beauty and the Beast" >>> 2> "The Real Beauty and the Beast" >>> 3> "Beauty and the Beast", "La bella y la bestia" >>> 4> "Beauty and the Beast" >>> >>> Note: Document 3 has two titles in it. >>> >>> So, for a query "Beauty and the Beast" and with the above configuration all >>> the documents receive same score. But 1,3,4 should have got same score and >>> document 2 lesser than others. >>> >>> To solve this, we followed what is suggested in the following thread: >>> http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html >>> >>> Now, the fields which are used to boost are made to use Norms. And for >>> matching norms are disabled. This is to make sure that exact & near exact >>> matches are rewarded. >>> >>> But, for the same query, we get the following results. >>> query: "Beauty & the Beast" >>> Search Results: >>> 1> "Beauty and the Beast" >>> 4> "Beauty and the Beast" >>> 2> "The Real Beauty and the Beast" >>> 3> "Beauty and the Beast", "La bella y la bestia" >>> >>> Clearly, the changes have solved only a part of the problem. The document 3 >>> should be ranked/scored higher than document 2. >>> >>> This is because lucene considers the total field length across all the >>> values in a multivalued field for normalization. >>> >>> How do we handle this scenario and make sure that in multivalued fields the >>> normalization is taken care of? >>> >>> >>> -- >>> Regards, >>> Sravan >>
Re: Title Search scoring issues with multivalued field & norm
Or use a boost for the phrase, something like "beauty and the beast"^5 On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <wun...@wunderwood.org> wrote: > You can use a separate field for title aliases. That is what I did for > Netflix search. > > Why disable idf? Disabling tf for titles can be a good idea, for example the > movie “New York, New York” is not twice as much about New York as some other > film that just lists it once. > > Also, consider using a popularity score as a boost. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: >> >> Hi, >> We are using solr for our movie title search. >> >> >> As it is "title search", this should be treated different than the normal >> document search. >> Hence, we use a modified version of TFIDFSimilarity with the following >> changes. >> - disabled TF & IDF and will only have 1 as value. >> - disabled norms by specifying omitNorms as true for all the fields. >> >> There are 6 fields with different analyzers and we make use of different >> weights in edismax's qf & pf parameters to match tokens & boost phrases. >> >> But, movies could have aliases and have multiple titles. So, we made the >> fields multivalued. >> >> Now, consider the following four documents >> 1> "Beauty and the Beast" >> 2> "The Real Beauty and the Beast" >> 3> "Beauty and the Beast", "La bella y la bestia" >> 4> "Beauty and the Beast" >> >> Note: Document 3 has two titles in it. >> >> So, for a query "Beauty and the Beast" and with the above configuration all >> the documents receive same score. But 1,3,4 should have got same score and >> document 2 lesser than others. >> >> To solve this, we followed what is suggested in the following thread: >> http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html >> >> Now, the fields which are used to boost are made to use Norms. And for >> matching norms are disabled. This is to make sure that exact & near exact >> matches are rewarded. >> >> But, for the same query, we get the following results. >> query: "Beauty & the Beast" >> Search Results: >> 1> "Beauty and the Beast" >> 4> "Beauty and the Beast" >> 2> "The Real Beauty and the Beast" >> 3> "Beauty and the Beast", "La bella y la bestia" >> >> Clearly, the changes have solved only a part of the problem. The document 3 >> should be ranked/scored higher than document 2. >> >> This is because lucene considers the total field length across all the >> values in a multivalued field for normalization. >> >> How do we handle this scenario and make sure that in multivalued fields the >> normalization is taken care of? >> >> >> -- >> Regards, >> Sravan >
Re: Title Search scoring issues with multivalued field & norm
You can use a separate field for title aliases. That is what I did for Netflix search. Why disable idf? Disabling tf for titles can be a good idea, for example the movie “New York, New York” is not twice as much about New York as some other film that just lists it once. Also, consider using a popularity score as a boost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: > > Hi, > We are using solr for our movie title search. > > > As it is "title search", this should be treated different than the normal > document search. > Hence, we use a modified version of TFIDFSimilarity with the following > changes. > - disabled TF & IDF and will only have 1 as value. > - disabled norms by specifying omitNorms as true for all the fields. > > There are 6 fields with different analyzers and we make use of different > weights in edismax's qf & pf parameters to match tokens & boost phrases. > > But, movies could have aliases and have multiple titles. So, we made the > fields multivalued. > > Now, consider the following four documents > 1> "Beauty and the Beast" > 2> "The Real Beauty and the Beast" > 3> "Beauty and the Beast", "La bella y la bestia" > 4> "Beauty and the Beast" > > Note: Document 3 has two titles in it. > > So, for a query "Beauty and the Beast" and with the above configuration all > the documents receive same score. But 1,3,4 should have got same score and > document 2 lesser than others. > > To solve this, we followed what is suggested in the following thread: > http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html > > Now, the fields which are used to boost are made to use Norms. And for > matching norms are disabled. This is to make sure that exact & near exact > matches are rewarded. > > But, for the same query, we get the following results. > query: "Beauty & the Beast" > Search Results: > 1> "Beauty and the Beast" > 4> "Beauty and the Beast" > 2> "The Real Beauty and the Beast" > 3> "Beauty and the Beast", "La bella y la bestia" > > Clearly, the changes have solved only a part of the problem. The document 3 > should be ranked/scored higher than document 2. > > This is because lucene considers the total field length across all the > values in a multivalued field for normalization. > > How do we handle this scenario and make sure that in multivalued fields the > normalization is taken care of? > > > -- > Regards, > Sravan
Title Search scoring issues with multivalued field & norm
Hi, We are using solr for our movie title search. As it is "title search", this should be treated different than the normal document search. Hence, we use a modified version of TFIDFSimilarity with the following changes. - disabled TF & IDF and will only have 1 as value. - disabled norms by specifying omitNorms as true for all the fields. There are 6 fields with different analyzers and we make use of different weights in edismax's qf & pf parameters to match tokens & boost phrases. But, movies could have aliases and have multiple titles. So, we made the fields multivalued. Now, consider the following four documents 1> "Beauty and the Beast" 2> "The Real Beauty and the Beast" 3> "Beauty and the Beast", "La bella y la bestia" 4> "Beauty and the Beast" Note: Document 3 has two titles in it. So, for a query "Beauty and the Beast" and with the above configuration all the documents receive same score. But 1,3,4 should have got same score and document 2 lesser than others. To solve this, we followed what is suggested in the following thread: http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html Now, the fields which are used to boost are made to use Norms. And for matching norms are disabled. This is to make sure that exact & near exact matches are rewarded. But, for the same query, we get the following results. query: "Beauty & the Beast" Search Results: 1> "Beauty and the Beast" 4> "Beauty and the Beast" 2> "The Real Beauty and the Beast" 3> "Beauty and the Beast", "La bella y la bestia" Clearly, the changes have solved only a part of the problem. The document 3 should be ranked/scored higher than document 2. This is because lucene considers the total field length across all the values in a multivalued field for normalization. How do we handle this scenario and make sure that in multivalued fields the normalization is taken care of? -- Regards, Sravan
Re: hashJoin - Multivalued field
I´am sorry, everything is working fine! 2018-01-23 16:44 GMT-02:00 Kojo: > I am trying to solve one problem, exactly as the case described here: > > http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on- > multi-valued-field-td4353794.html > > I cannot accomplish that on Solr 6.6, my streaming expression returns > nothing: > > > hashJoin( > search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number", > sort="p_number asc"), > hashed=cartesianProduct( > search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO > *]", fl="processes, id", sort="id asc"), > processes, > ), > on="p_number=processes" > ) > > Both fields are of type string. > > > One strange thing is that if I filter the first query using fq, some > results appear. > > hashJoin( > search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number", > sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"), > hashed=cartesianProduct( > search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO > *]", fl="processes, id", sort="id asc"), > processes, > ), > on="p_number=processes" > ) > > > > { > "result-set": { > "docs": [ > { > "processes": "00/01011-6", > "p_number": "00/01011-6", > "id": "43256" > }, > { > "processes": "97/13133-4", > "p_number": "97/13133-4", > "id": "43256" > }, > { > "EOF": true, > "RESPONSE_TIME": 343 > } > ] > } > } > > > Can you help me, please? >
hashJoin - Multivalued field
I am trying to solve one problem, exactly as the case described here: http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on-multi-valued-field-td4353794.html I cannot accomplish that on Solr 6.6, my streaming expression returns nothing: hashJoin( search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number", sort="p_number asc"), hashed=cartesianProduct( search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO *]", fl="processes, id", sort="id asc"), processes, ), on="p_number=processes" ) Both fields are of type string. One strange thing is that if I filter the first query using fq, some results appear. hashJoin( search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number", sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"), hashed=cartesianProduct( search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO *]", fl="processes, id", sort="id asc"), processes, ), on="p_number=processes" ) { "result-set": { "docs": [ { "processes": "00/01011-6", "p_number": "00/01011-6", "id": "43256" }, { "processes": "97/13133-4", "p_number": "97/13133-4", "id": "43256" }, { "EOF": true, "RESPONSE_TIME": 343 } ] } } Can you help me, please?
Re: DocValues for multivalued strings and boolean fields
On 12/20/2017 6:09 PM, S G wrote: One of our Solr users is trying to set docValues="true" for multivalued string fields and boolean-type fields. I am not sure what the performance impact of that would be. Can docValues negatively affect performance in any way? Adding to what Emir said: The docValues data will be the same as stored data, but it will be uncompressed, and written in such a way that Lucene can read all values for one field simply by reading data off the disk, no computations or seeks within the file are required. If the field is indexed and stored, then docValues will not be accessed during normal queries unless there is a sort parameter or a facet parameter that mentions a field with docValues. If present, docValues data will be used for sorting and facets, otherwise indexed values will be used. Usually, sorting or facets with docValues uses less memory and performs faster than the same operation without docValues. If the machine has insufficient system RAM to effectively cache index data, the performance may not improve. When docValues is added to a field, a complete reindex is required, or Solr will not work properly. If a field that already contains docValues has a change in the setting for multiValued, then that will require a reindex, but you must also take another step -- completely wiping the index directory before reloading or restarting. If the wipe doesn't happen in this situation, then the core is going to completely break and throw exceptions. Thanks, Shawn
Re: DocValues for multivalued strings and boolean fields
Hi SG, Doc values is another file to write so indexing performances will suffer. In theory, query performances will suffer because alternative is in memory structure (fieldCache and fieldValueCache). In practice, it will not because in memory structure requires larger heap, requires time/resources to build after each commit or on first query and it is likely that doc values’ files will be cached by OS so it will not be “disk speed”. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 21 Dec 2017, at 02:09, S G <sg.online.em...@gmail.com> wrote: > > Hi, > > One of our Solr users is trying to set docValues="true" for multivalued > string fields and boolean-type fields. > > I am not sure what the performance impact of that would be. > Can docValues negatively affect performance in any way? > > We are using Solr 6.5.1 and also experimenting with 7.1.0 > > Thanks > SG
DocValues for multivalued strings and boolean fields
Hi, One of our Solr users is trying to set docValues="true" for multivalued string fields and boolean-type fields. I am not sure what the performance impact of that would be. Can docValues negatively affect performance in any way? We are using Solr 6.5.1 and also experimenting with 7.1.0 Thanks SG
Re: Schemaless detecting multivalued fields
Also, if you _know_ certain fields should be defined you can define them explicitly and let schemaless figure out all the others. That said, eventually you're going to have to control your schema, schemaless is _not_ recommended for production systems unless you can absolutely guarantee the input is in a specific format. And by "specific format" I mean no field first encountered as, say, an int later comes through as a float. All date fields are of acceptable formats, no field first encountered as a single valued field is every multivalued later etc. And if you can guarantee that you can create an explicitly defined schema anyway. Best, Erick On Thu, Oct 19, 2017 at 2:00 AM, Emir Arnautović <emir.arnauto...@sematext.com> wrote: > Hi John, > You should be able to do that with custom update request processor chain and > https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html > > <https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html> > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 19 Oct 2017, at 08:00, John Davis <johndavis925...@gmail.com> wrote: >> >> Hi, >> I know about the schemaless configuration defaulting to multivalued fields >> of the corresponding type. >> >> I was just wondering if there was a way to first detect if the incoming >> value is list or singleton, and based on it pick the corresponding types. >> Ideally if the value is an long then use tlong while if it is list of longs >> then use tlongS. >> >> Thanks! >> John >
Re: Schemaless detecting multivalued fields
Hi John, You should be able to do that with custom update request processor chain and https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html <https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html> HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 19 Oct 2017, at 08:00, John Davis <johndavis925...@gmail.com> wrote: > > Hi, > I know about the schemaless configuration defaulting to multivalued fields > of the corresponding type. > > I was just wondering if there was a way to first detect if the incoming > value is list or singleton, and based on it pick the corresponding types. > Ideally if the value is an long then use tlong while if it is list of longs > then use tlongS. > > Thanks! > John
Schemaless detecting multivalued fields
Hi, I know about the schemaless configuration defaulting to multivalued fields of the corresponding type. I was just wondering if there was a way to first detect if the incoming value is list or singleton, and based on it pick the corresponding types. Ideally if the value is an long then use tlong while if it is list of longs then use tlongS. Thanks! John
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
The key is removing the entire data directory as in "rm -rf solr_core/data" with Solr down then restarting Solr. Or create a new core. It's most probably working on Windows because the schema was set with multiVauled=false when you indexed your first document. Best, Erick On Thu, Jul 20, 2017 at 5:16 AM, prashantas <prashanta.t...@gmail.com> wrote: > I am not running solr in cloud mode. > > On Thu, Jul 20, 2017 at 4:40 PM, Shawn Heisey-2 [via Lucene] < > ml+s472066n4346954...@n3.nabble.com> wrote: > >> On 7/20/2017 2:30 AM, prashantas wrote: >> > I am using solr6.4. In my managed-schema, I have defined my field >> details. >> > None of my fields are multiValued. If I set property multiValued=false , >> it >> > works fine in Windows, but in CentOS/RHEL, it does not accept the same >> and >> > the field still shows multiValued true in my solr admin UI. Please help >> me >> > how can I set multiValued = false in some fields. >> > <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> >> >> >> Is Solr running in cloud mode on either of these systems? >> >> Thanks, >> Shawn >> >> >> >> -- >> If you reply to this email, your message will be added to the discussion >> below: >> http://lucene.472066.n3.nabble.com/multiValued-false- >> is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346954.html >> To unsubscribe from multiValued=false is not working in Solr 6.4 in >> RHEL/CentOS, click here >> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=> >> . >> NAML >> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > -- > > *with regards,Prashanta* > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346967.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
I am not running solr in cloud mode. On Thu, Jul 20, 2017 at 4:40 PM, Shawn Heisey-2 [via Lucene] < ml+s472066n4346954...@n3.nabble.com> wrote: > On 7/20/2017 2:30 AM, prashantas wrote: > > I am using solr6.4. In my managed-schema, I have defined my field > details. > > None of my fields are multiValued. If I set property multiValued=false , > it > > works fine in Windows, but in CentOS/RHEL, it does not accept the same > and > > the field still shows multiValued true in my solr admin UI. Please help > me > > how can I set multiValued = false in some fields. > > <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> > > > Is Solr running in cloud mode on either of these systems? > > Thanks, > Shawn > > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/multiValued-false- > is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346954.html > To unsubscribe from multiValued=false is not working in Solr 6.4 in > RHEL/CentOS, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- *with regards,Prashanta* -- View this message in context: http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346967.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
On 7/20/2017 2:30 AM, prashantas wrote: > I am using solr6.4. In my managed-schema, I have defined my field details. > None of my fields are multiValued. If I set property multiValued=false , it > works fine in Windows, but in CentOS/RHEL, it does not accept the same and > the field still shows multiValued true in my solr admin UI. Please help me > how can I set multiValued = false in some fields. > <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> Is Solr running in cloud mode on either of these systems? Thanks, Shawn
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
By saying: I am just adding multiValued=false in the managed-schema file. Are you modifying in the local filesystem "conf" or going into the core conf directory and changing there? If you are SolrCloud, you should change the same on Zookeeper.
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
Assuming the service solr service restart does its job, I think the only thing I would do is to completely remove the data directory content, instead of just running the delete query. Bare in mind that when you delete a document in Solr, this is marked as deleted, but it takes potentially a while until it really leaves the index ( after a successful segment merge). This could bring to potential conflict in the data structures when documents of different schemas are in the index. I don't know if it is your case, but I would double check. - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
I am just adding multiValued=false in the managed-schema file. Then deleting the complete data by running the command curl http://localhost:8983/solr/Schools/update?commit=true -d '*:*' where 'Schools' is my core name. Then restart the solr by "service solr restart" And then import the csv file by executing the command curl ' http://localhost:8983/solr/Schools/update?commit=true' --data-binary @tbl_SCHOOLS.csv -H 'Content-type:application/csv' Please let me know if I am doing anything wrong. with regards, Prashanta On Thu, Jul 20, 2017 at 2:29 PM, alessandro.benedetti [via Lucene] < ml+s472066n4346941...@n3.nabble.com> wrote: > I doubt it is an environment problem at all. > How are you modifying your schema ? > How you reloading your core/collection ? > Are you restarting your Solr instance ? > > Regards > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/multiValued-false- > is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346941.html > To unsubscribe from multiValued=false is not working in Solr 6.4 in > RHEL/CentOS, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- *with regards,Prashanta* -- View this message in context: http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346943.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS
I doubt it is an environment problem at all. How are you modifying your schema ? How you reloading your core/collection ? Are you restarting your Solr instance ? Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346941.html Sent from the Solr - User mailing list archive at Nabble.com.
multiValued=false is not working in Solr 6.4 in RHEL/CentOS
I am using solr6.4. In my managed-schema, I have defined my field details. None of my fields are multiValued. If I set property multiValued=false , it works fine in Windows, but in CentOS/RHEL, it does not accept the same and the field still shows multiValued true in my solr admin UI. Please help me how can I set multiValued = false in some fields. <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> -- View this message in context: http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field x is not multivalued and destination for multiple copyFields
Nawab, If you have multivalued=true and you index a document with two or more name_* fields, then the name_token will have two or more values, all of which will be searchable. I think this is what you want, because names can be quite different in different languages. For example the country names de Deutschland en Germany fr Allemagne es Alemania or names for 'shoe' schuh zapato cipela If you have multivalued=false and try to index two or more languages for the name, it would cause an exception to be thrown in the indexing thread and the document will not be indexed (schema validation will fail). If you use multivalued=false on a field which is a copyfield destination, this is not a common usage pattern and in my mind (using a northern idiom) you are "walking on thin ice". Tell us how it goes. Cheers -- Rick On 2017-06-05 03:23 AM, Nawab Zada Asad Iqbal wrote: *Hi, I have a field 'name_token' which gets value via copyFields on different language-specific fields (e.g. name_en , name_it, name_es, etc.) If I can ensure that, * *only one of these language-specific fields will have a value for a given document, is it ok to ignore this warning:"IndexSchema Field name_token is not multivalued and destination for multiple copyFields"* *Also, what will happen if my a record happens to have values in two language specific name fields (e.g. if a name or word exists in two languages: name_zh and name_ja ). * *My understanding is that the value is same anyways, so there is no drawback, but can it result in an exception?* *Regards* *Nawab*
Field x is not multivalued and destination for multiple copyFields
*Hi, I have a field 'name_token' which gets value via copyFields on different language-specific fields (e.g. name_en , name_it, name_es, etc.) If I can ensure that, * *only one of these language-specific fields will have a value for a given document, is it ok to ignore this warning:"IndexSchema Field name_token is not multivalued and destination for multiple copyFields"* *Also, what will happen if my a record happens to have values in two language specific name fields (e.g. if a name or word exists in two languages: name_zh and name_ja ). * *My understanding is that the value is same anyways, so there is no drawback, but can it result in an exception?* *Regards* *Nawab*
Re: Grouping by a multivalued field
Shacky Quote "A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed...". So yes, you are on the right track. Cheers -- Rick https://stackoverflow.com/questions/5800762/what-is-the-use-of-multivalued-field-type-in-solr On May 26, 2017 9:45:48 AM EDT, shacky <shack...@gmail.com> wrote: >Hi, >I need to create a new collection on my Solr 6.1.0 cluster where every >row >is a "content" and every content can belong to one or many categories, >which are specified in a multivalued field "categories". > >In my web app the user can search by categories, and if wanted it can >even >group results by category. If it wants to group by category, what about >the >contents which belongs to more than one category? > >In this case the search results page should show the same content more >times in different categories. I don't want the web application to >filter >and order results because in this case it should ask Solr for every >rows (I >know this is not advised for bad performance), so is there a way to let >Solr make this? For example, repeating the same content in two >categories >if a flag is enabled or if I am asking Solr to sort by category? > >Thank you very much! >Bye -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Grouping by a multivalued field
Hi, I need to create a new collection on my Solr 6.1.0 cluster where every row is a "content" and every content can belong to one or many categories, which are specified in a multivalued field "categories". In my web app the user can search by categories, and if wanted it can even group results by category. If it wants to group by category, what about the contents which belongs to more than one category? In this case the search results page should show the same content more times in different categories. I don't want the web application to filter and order results because in this case it should ask Solr for every rows (I know this is not advised for bad performance), so is there a way to let Solr make this? For example, repeating the same content in two categories if a flag is enabled or if I am asking Solr to sort by category? Thank you very much! Bye
Re: Managed Schema multiValued Predict Problem
Lova, When a search term is "foo*" or similar, you have a multivalue search. In schema.xml you have for a typical field, an index analysis chain and a query analysis chain. In the multivalue case, neither of these chains is followed. There is a wiki page which explains what chain gets followed, perhaps someone can supply the link. To get further with this question, you could show us parts of the schema.xml Cheers -- Rick On April 26, 2017 5:28:34 AM EDT, Lova <miandrisoal...@gmail.com> wrote: >Hello, >I have this error >org.apache.solr.common.SolrException: can not use FieldCache on >multivalued >field: post_title > >I can need specific field as multivalue, it's a bug in my app > >what I change in solrconfig.xml please? > >Thanks > > > >-- >View this message in context: >http://lucene.472066.n3.nabble.com/Managed-Schema-multiValued-Predict-Problem-tp4324634p4331936.html >Sent from the Solr - User mailing list archive at Nabble.com. -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Managed Schema multiValued Predict Problem
Hello, I have this error org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: post_title I can need specific field as multivalue, it's a bug in my app what I change in solrconfig.xml please? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Managed-Schema-multiValued-Predict-Problem-tp4324634p4331936.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: KeywordTokenizer and multiValued field
So I have a field named "key" that uses KeywordTokenizer and has multiValued="true" set. A doc like val one yet another value third My field will have exactly three indexed tokens val one yet another value third Best, Erick On Wed, Apr 12, 2017 at 2:38 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > I don't understand the first option, what is each value? Keyword tokenizer > emits single token, analogous to string type. > > > > On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood > <wun...@wunderwood.org> wrote: > Does the KeywordTokenizer make each value into a unitary string or does it > take the whole list of values and make that a single string? > > I really hope it is the former. I can’t find this in the docs (including > JavaDocs). > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog)
Re: KeywordTokenizer and multiValued field
I don't understand the first option, what is each value? Keyword tokenizer emits single token, analogous to string type. On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwoodwrote: Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single string? I really hope it is the former. I can’t find this in the docs (including JavaDocs). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: KeywordTokenizer and multiValued field
Hi Wunder, I think it's the first option: if you have 3 values then the analyzer chain is executed three times. Andrea On 12/04/17 18:45, Walter Underwood wrote: Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single string? I really hope it is the former. I can’t find this in the docs (including JavaDocs). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
KeywordTokenizer and multiValued field
Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single string? I really hope it is the former. I can’t find this in the docs (including JavaDocs). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Error whan using percentile facet on multivalued fields (again)
Hi, I'm using Solr 5.3.1. When trying to do a percentile facet on a multivalued field I get the following exception - *org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://host_name:8983/solr/core_name <http://host_name:8983/solr/core_name>: can not use FieldCache on multivalued field: attributes.size_num* * at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)* * at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)* * at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)* * at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)* * at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)* * at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)* The query url is - http://host_name:8983/solr/core_name/select?json.facet=%7B%22my_facet_name%22:%22percentile(attributes.size_num,70.0,80.0,90.0)%22%7D=0 And the faceted fields schema is - ... ... I came across this thread from 2015 - http://grokbase.com/t/lucene/solr-user/157dh11ffx/fieldcache-error-for-multivalued-fields-in-json-facets In which Yonik says Solr 5.2 doesn't support *sum()* on multivalued fields. *Is this also the case for percentiles in Solr 5.3.1? **Is there any solution to this?* *If not, Is this resolved in a more recent release of Solr?* Thank you very much, Ron Visbord
Count Dates Given A Range in a Multivalued Field
Hi All, I have a multivalued date field i.e.: [2017-02-06T00:00:00Z,2017-02-09T00:00:00Z,2017-03-04T00:00:00Z] I want to count how many dates exist given a data range within such field. i.e. start: 2017-02-01T00:00:00Z end: 2017-02-28T00:00:00Z result is 2 (2017-02-06T00:00:00Z and 2017-02-09T00:00:00Z). I want to do it with JSON Facet API. How can I do it?
sum multivalued field index with banana
hi sorry if this a little bit out ouf topic, i've just started to using banana dashboard. and i want to do summarize proccess from data that indexed in solr can i do sum proccess with banana dashboard when i have some multivalued data index on my field? this is my sample data on solr : "timestamp_dt":"2016-12-30T15:50:00Z", "FR":["fr1"], "EV":"89v", "RC":[0], "SF":["SSP"], "CT":["POST"], "rb.id":["rb30", "rb30"], "rb.co":[1, 2], "rb.lat":[47, 9] Ok, from the data above, is it possible to summarize the value of "rb.co" with EV as a Group By. ? On my banana dashboard panel, i've try to set something like this : but there is nothing happen on it. any suggestion pls ? Best Regards, Yuza
Re: Managed Schema multiValued Predict Problem
You are right, I mean schemaless mode. I saw that it's your answer ;) I've edited solrconfig.xml and fixed it. Thanks! On Mon, Mar 13, 2017 at 5:46 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > There is managed schema, which means it is editable via API, and there > is 'schemaless' mode that uses that to auto-define the field based on > the first occurance. > > 'schemaless' mode does not know if the field will be multi-valued the > first time it sees content for that field. So, all the fields created > automatically are multivalued. You can change the definition or you > can define the field explicitly using the API or Admin UI. > > 'schemaless' is only there really for a quick prototyping with unknown > content. > > Regards, >Alex. > P.s. That's my SO answer :-) Glad you found it useful. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 March 2017 at 11:15, Furkan KAMACI <furkankam...@gmail.com> wrote: > > Hi, > > > > I generate dummy documents to test Solr 6.4.2. I create a field like that > > at my test code: > > > > int customCount = r.nextInt(500); > > document.addField("custom_count", customCount); > > > > This field is indexed as: > > > > org.apache.solr.schema.TrieLongField > > > > and > > > > Multivalued. > > > > I want to use FieldCache on multivalued field and don't want it to be > > multivalued. When I check managed-schema I see that: > > > >> positionIncrementGap="0" docValues="true" precisionStep="0"/> > >> positionIncrementGap="0" docValues="true" multiValued="true" > > precisionStep="0"/> > > > > So, it seems that it's predicted as longs instead of long. > > > > What is the reason behind that? > > > > Kind Regards, > > Furkan KAMACI >
Re: Managed Schema multiValued Predict Problem
There is managed schema, which means it is editable via API, and there is 'schemaless' mode that uses that to auto-define the field based on the first occurance. 'schemaless' mode does not know if the field will be multi-valued the first time it sees content for that field. So, all the fields created automatically are multivalued. You can change the definition or you can define the field explicitly using the API or Admin UI. 'schemaless' is only there really for a quick prototyping with unknown content. Regards, Alex. P.s. That's my SO answer :-) Glad you found it useful. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 March 2017 at 11:15, Furkan KAMACI <furkankam...@gmail.com> wrote: > Hi, > > I generate dummy documents to test Solr 6.4.2. I create a field like that > at my test code: > > int customCount = r.nextInt(500); > document.addField("custom_count", customCount); > > This field is indexed as: > > org.apache.solr.schema.TrieLongField > > and > > Multivalued. > > I want to use FieldCache on multivalued field and don't want it to be > multivalued. When I check managed-schema I see that: > >positionIncrementGap="0" docValues="true" precisionStep="0"/> >positionIncrementGap="0" docValues="true" multiValued="true" > precisionStep="0"/> > > So, it seems that it's predicted as longs instead of long. > > What is the reason behind that? > > Kind Regards, > Furkan KAMACI