Re: Get first value in a multivalued field

2021-03-04 Thread Walter Underwood
You can copy the field to another field, then use the 
FirstFieldValueUpdateProcessorFactory to limit that field to the first value. 
At least, that seems to be what that URP does. I have not used it.

https://solr.apache.org/guide/8_8/update-request-processors.html

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 4, 2021, at 11:42 AM, ufuk yılmaz  wrote:
> 
> Hi,
> 
> Is it possible in any way to get the first value in a multivalued field? 
> Using function queries, streaming expressions or any other way without 
> reindexing? (Stream decorators have array(), but no way to get a value at a 
> specific index?)
> 
> Another one, is it possible to match a regex to a text field and extract only 
> the matching part?
> 
> I tried very hard for this too but couldn’t find a way.
> 
> --ufuk
> 
> Sent from Mail for Windows 10
> 



Get first value in a multivalued field

2021-03-04 Thread ufuk yılmaz
Hi,

Is it possible in any way to get the first value in a multivalued field? Using 
function queries, streaming expressions or any other way without reindexing? 
(Stream decorators have array(), but no way to get a value at a specific index?)

Another one, is it possible to match a regex to a text field and extract only 
the matching part?

I tried very hard for this too but couldn’t find a way.

--ufuk

Sent from Mail for Windows 10



Cross join on multivalued field

2021-02-23 Thread Luke Oak
Hi,

I am wondering whether there is planning to implement cross collections join 
query on multivalued field 

Thanks 

Sent from my iPhone

Multivalued text_general field returns lowercased value in "if" function query

2021-02-23 Thread ufuk yılmaz
I have a type=”text_general” multivalued=”true” field, named fieldA.

When I use a function query, with fields like

fields=if(true, fieldA, -1), fieldA

Response is:

"response":{"numFound":1,"start":0,"maxScore":4.6553917,"docs":[
  {
"fieldA":["SomeMixedCaseValue"],
"if(true,fieldA,-1)":"somemixedcasevalue"}]
}}

Is this a bug or an expected output? Is there a way to avoid it getting 
lowercased?

Whole field definition is:

  

  
  
  


  
  
  
  

  





-ufuk yilmaz

Sent from Mail for Windows 10



parsing multivalued fields in Value Source Parser

2020-12-03 Thread Manna,Tridib
Hello All,

I am writing a custom function query that requires to parse a multivalued 
field. I am getting this exception : org.apache.solr.common.SolrException: can 
not use FieldCache on multivalued field
The function query works as expected with single-valued field.

How can I parse a multi-valued fields with FunctionQParser( or any other way)? 
and get the all the values for that field for further processing in my custom 
function ?

TIA,

Tridib Manna


Why am I able to sort on a multiValued field?

2020-11-13 Thread Andy C
I am adding a new float field to my index that I want to perform range
searches and sorting on. It will only contain a single value.

I have an existing dynamic field definition in my schema.xml that I wanted
to use to avoid having to updating the schema:




I went ahead and implemented this in a test system (recently updated to
Solr 8.7), but then it occurred to me that I am not going to be able to
sort on the field because it is defined as multiValued.

But to my surprise sorting worked, and gave the expected results.Why? Can
this behavior be relied on in future releases?

Appreciate any insights.

Thanks
- AndyC -


Re: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Munendra S N
add-distinct is similar to add but does contains check before adding the
value. In general, performance overhead should be minimal

Regards,
Munendra S N



On Fri, Oct 30, 2020 at 7:29 PM Srinivas Kashyap
 wrote:

> Thanks Munendra, this will really help me. Are there any performance
> overhead with this?
>
> Thanks,
> Srinivas
>
>
> From: Munendra S N 
> Sent: 30 October 2020 19:20
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
>
> Srinivas,
>
> For atomic updates, you could use add-distinct operation to avoid
> duplicates -
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
> This operation is available from Solr 7.3
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood  <mailto:wun...@wunderwood.org>>
> wrote:
>
> > Since you are already taking the performance hit of atomic updates,
> > I doubt you’ll see any impact from field types or update request
> > processors.
> > The extra cost of atomic updates will be much greater than indexing cost.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org<mailto:wun...@wunderwood.org>
> > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my
> blog)
> >
> > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap  .INVALID<mailto:srini...@bamboorose.com.INVALID>>
> > wrote:
> > >
> > > Thanks Dwane,
> > >
> > > I have a doubt, according to the java doc, the duplicates still
> continue
> > to exist in the field. May be during query time, the field returns only
> > unique values? Am I right with my assumption?
> > >
> > > And also, what is the performance overhead for this
> UniqueFiled*Factory?
> > >
> > > Thanks,
> > > Srinivas
> > >
> > > From: Dwane Hall mailto:dwaneh...@hotmail.com>>
> > > Sent: 29 October 2020 14:33
> > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > > Subject: Re: Avoiding duplicate entry for a multivalued field
> > >
> > > Srinivas this is possible by adding an unique field update processor to
> > the update processor chain you are using to perform your updates
> (/update,
> > /update/json, /update/json/docs, .../a_custom_one)
> > >
> > > The Java Documents explain its use nicely
> > > (
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >
> > <
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >>)
> > or there are articles on stack overflow addressing this exact problem (
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > <
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > >)
> > >
> > > Thanks,
> > >
> > > Dwane
> > > 
> > > From: Srinivas Kashyap  <mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>>
> srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>>
> > > Sent: Thursday, 29 October 2020 3:49 PM
> > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org
> <mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>
> <
> > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>>
> > > Subject: Avoiding duplicate entry for a multivalued field
> > >
> > > Hello,
> > >
> > > Say, I have a schema field which is multivalued. Is there a way to
> > maintain distinct values for that field though I continue to add
> duplicate
> > values through atomic update via 

RE: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Srinivas Kashyap
Thanks Munendra, this will really help me. Are there any performance overhead 
with this?

Thanks,
Srinivas


From: Munendra S N 
Sent: 30 October 2020 19:20
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood 
mailto:wun...@wunderwood.org>>
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org<mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/<http://observer.wunderwood.org> (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
> > mailto:srini...@bamboorose.com.INVALID>>
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall mailto:dwaneh...@hotmail.com>>
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> >)
> >
> > Thanks,
> >
> > Dwane
> > 
> > From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> 
srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: 
> > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>
> >  <
> solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in

Re: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Munendra S N
Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood 
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
> > 
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall 
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >)
> >
> > Thanks,
> >
> > Dwane
> > 
> > From: Srinivas Kashyap  srini...@bamboorose.com.INVALID>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> <
> solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>
>


Re: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Walter Underwood
Since you are already taking the performance hit of atomic updates, 
I doubt you’ll see any impact from field types or update request processors.
The extra cost of atomic updates will be much greater than indexing cost.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
>  wrote:
> 
> Thanks Dwane,
> 
> I have a doubt, according to the java doc, the duplicates still continue to 
> exist in the field. May be during query time, the field returns only unique 
> values? Am I right with my assumption?
> 
> And also, what is the performance overhead for this UniqueFiled*Factory?
> 
> Thanks,
> Srinivas
> 
> From: Dwane Hall 
> Sent: 29 October 2020 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
> 
> Srinivas this is possible by adding an unique field update processor to the 
> update processor chain you are using to perform your updates (/update, 
> /update/json, /update/json/docs, .../a_custom_one)
> 
> The Java Documents explain its use nicely
> (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
>  or there are articles on stack overflow addressing this exact problem 
> (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)
> 
> Thanks,
> 
> Dwane
> 
> From: Srinivas Kashyap 
> mailto:srini...@bamboorose.com.INVALID>>
> Sent: Thursday, 29 October 2020 3:49 PM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> 
> mailto:solr-user@lucene.apache.org>>
> Subject: Avoiding duplicate entry for a multivalued field
> 
> Hello,
> 
> Say, I have a schema field which is multivalued. Is there a way to maintain 
> distinct values for that field though I continue to add duplicate values 
> through atomic update via solrj?
> 
> Is there some property setting to have only unique values in a multi valued 
> fields?
> 
> Thanks,
> Srinivas
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.
> 
> Disclaimer
> 
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
> 
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more visit the Mimecast website.



Re: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Michael Gibney
If I understand correctly what you're trying to do, docValues for a
number of field types are (at least in their multivalued incarnation)
backed by SortedSetDocValues, which inherently deduplicate values
per-document. In your case it sounds like you could maybe rely on that
behavior as a feature, set stored=false, docValues=true,
useDocValuesAsStored=true, and achieve the desired behavior?
Michael

On Thu, Oct 29, 2020 at 6:17 AM Srinivas Kashyap
 wrote:
>
> Thanks Dwane,
>
> I have a doubt, according to the java doc, the duplicates still continue to 
> exist in the field. May be during query time, the field returns only unique 
> values? Am I right with my assumption?
>
> And also, what is the performance overhead for this UniqueFiled*Factory?
>
> Thanks,
> Srinivas
>
> From: Dwane Hall 
> Sent: 29 October 2020 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
>
> Srinivas this is possible by adding an unique field update processor to the 
> update processor chain you are using to perform your updates (/update, 
> /update/json, /update/json/docs, .../a_custom_one)
>
> The Java Documents explain its use nicely
> (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
>  or there are articles on stack overflow addressing this exact problem 
> (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)
>
> Thanks,
>
> Dwane
> 
> From: Srinivas Kashyap 
> mailto:srini...@bamboorose.com.INVALID>>
> Sent: Thursday, 29 October 2020 3:49 PM
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> 
> mailto:solr-user@lucene.apache.org>>
> Subject: Avoiding duplicate entry for a multivalued field
>
> Hello,
>
> Say, I have a schema field which is multivalued. Is there a way to maintain 
> distinct values for that field though I continue to add duplicate values 
> through atomic update via solrj?
>
> Is there some property setting to have only unique values in a multi valued 
> fields?
>
> Thanks,
> Srinivas
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.
>
> Disclaimer
>
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more visit the Mimecast website.


RE: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Srinivas Kashyap
Thanks Dwane,

I have a doubt, according to the java doc, the duplicates still continue to 
exist in the field. May be during query time, the field returns only unique 
values? Am I right with my assumption?

And also, what is the performance overhead for this UniqueFiled*Factory?

Thanks,
Srinivas

From: Dwane Hall 
Sent: 29 October 2020 14:33
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas this is possible by adding an unique field update processor to the 
update processor chain you are using to perform your updates (/update, 
/update/json, /update/json/docs, .../a_custom_one)

The Java Documents explain its use nicely
(https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
 or there are articles on stack overflow addressing this exact problem 
(https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)

Thanks,

Dwane

From: Srinivas Kashyap 
mailto:srini...@bamboorose.com.INVALID>>
Sent: Thursday, 29 October 2020 3:49 PM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> 
mailto:solr-user@lucene.apache.org>>
Subject: Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain 
distinct values for that field though I continue to add duplicate values 
through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued 
fields?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Dwane Hall
Srinivas this is possible by adding an unique field update processor to the 
update processor chain you are using to perform your updates (/update, 
/update/json, /update/json/docs, .../a_custom_one)

The Java Documents explain its use nicely
(https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html)
 or there are articles on stack overflow addressing this exact problem 
(https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655)

Thanks,

Dwane

From: Srinivas Kashyap 
Sent: Thursday, 29 October 2020 3:49 PM
To: solr-user@lucene.apache.org 
Subject: Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain 
distinct values for that field though I continue to add duplicate values 
through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued 
fields?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Avoiding duplicate entry for a multivalued field

2020-10-28 Thread Srinivas Kashyap
Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain 
distinct values for that field though I continue to add duplicate values 
through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued 
fields?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Multivalued field for Analysis on Admin page.

2020-10-09 Thread Jae Joo
I forgot how to enter multivalued in Analysis Page in Admin.
Can anyone help?

Jae


Query function error - can not use FieldCache on multivalued field

2020-09-14 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use Solr query function as a boost for term matches in the
title field. Here's my boost function

bf=if(exists(query({!v='title:Import data'})),10,0)

This throws the following error --> can not use FieldCache on multivalued
field: data

The function seems to be only working for a single term. The title field
doesn't support multivalued but it's configured to analyze terms. Here's
the field definition.



I was under the impression that I would be able to use the query function
to evaluate a regular query field. Am I missing something? If there's a
constraint on this function, can this boost be done in a different way?

Any pointers will be appreciated.

Thanks,
Shamik


Re: JTS, IsWithin predicate, and multivalued fields

2020-07-13 Thread Murray Johnston
Replying to myself and for posterity:


This is expected behavior per the comments on 
https://issues.apache.org/jira/browse/LUCENE-4644 that originally added the 
IsWithin predicate.


Too bad, I was really pleased with my idea but I can see why it was implemented 
the way it was.  Back to the drawing board



From: Murray Johnston 
Sent: Monday, July 13, 2020 11:52:20 AM
To: solr-user@lucene.apache.org
Subject: JTS, IsWithin predicate, and multivalued fields

Message from External Sender

Hi all,


I'm trying to use (abuse[1]) a SpatialRecursivePrefixTreeFieldType field that 
is multivalued with the IsWithin JTS predicate.  After some testing, it appears 
that all values of the field must satisfy the predicate in order for the 
document to be returned.  Is that expected?  It seems somewhat different than 
other semantics for a multivalued field.


Thanks,


-Murray



[1] My use case is an extension of the SpatialForTimeDurations usage.  My 
prices have a "time in advance" that must also be satisfied.  My idea was to 
model this as a line instead of just a point and verify that the entire line 
IsWithin the bounding box but I might be blocked from doing this if it won't 
work for a multivalued field.




JTS, IsWithin predicate, and multivalued fields

2020-07-13 Thread Murray Johnston
Hi all,


I'm trying to use (abuse[1]) a SpatialRecursivePrefixTreeFieldType field that 
is multivalued with the IsWithin JTS predicate.  After some testing, it appears 
that all values of the field must satisfy the predicate in order for the 
document to be returned.  Is that expected?  It seems somewhat different than 
other semantics for a multivalued field.


Thanks,


-Murray



[1] My use case is an extension of the SpatialForTimeDurations usage.  My 
prices have a "time in advance" that must also be satisfied.  My idea was to 
model this as a line instead of just a point and verify that the entire line 
IsWithin the bounding box but I might be blocked from doing this if it won't 
work for a multivalued field.




Re: use highlighting on multivalued fields with positionIncrementGap 0

2020-02-23 Thread Paras Lehana
I haven't worked with highlighting much but what's the need to store terms
in multivalued field?

On Fri, 14 Feb 2020 at 20:04, Nicolas Franck 
wrote:

> I'm trying to use highlighting on a multivalued text field (analysis not
> so important) ..
>
>
>   { text: [ "hello", "world" ], id: 1 }
>
> but I want to match across the string boundaries:
>
>   q=text:"hello world"
>
> This works by setting the attribute
> positionIncrementGap to 0, but then the hightlighting entry is empty
>
>   "highlighting": { "1" : { "text" : [] } }
>
> Parameters are:
>
>   hl=true
>   hl.fl=text
>   hl.snippets=50
>   hl.fragSize=1
>
> Any idea why this happens?
> I guess this gap is internal stuff handled by Lucene that Solr doesn't
> know about?
> (as for lucene, there are no multivalued fields!)
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


use highlighting on multivalued fields with positionIncrementGap 0

2020-02-14 Thread Nicolas Franck
I'm trying to use highlighting on a multivalued text field (analysis not so 
important) ..


  { text: [ "hello", "world" ], id: 1 }

but I want to match across the string boundaries:

  q=text:"hello world"

This works by setting the attribute
positionIncrementGap to 0, but then the hightlighting entry is empty

  "highlighting": { "1" : { "text" : [] } }

Parameters are:

  hl=true
  hl.fl=text
  hl.snippets=50
  hl.fragSize=1

Any idea why this happens? 
I guess this gap is internal stuff handled by Lucene that Solr doesn't know 
about?
(as for lucene, there are no multivalued fields!)



Re: Analysing Multivalued Fields

2019-12-31 Thread Erick Erickson
First, if you’re using primitive types, there is no analysis so in that case 
the question is irrelevant.

If you’re using a text-based field, the only difference between single-valued 
and multi-valued fields for analyzed types (i.e. text fields) is the offset 
recorded between entries. For instance:

Single value
this is some text
position   token
0   this
1   is
2   some
3   text

Multi valued with positionIncrementGap=100
this is
some text

position   token
0   this
1   is
101   some
102   text

With a positionIncrementGap of 1, there’d be no difference. So if you’re using 
text-based fields, just do the values one at a time.

Or this is an XY problem, you’re trying to solve some problem. If the above is 
irrelevant, what is that problem you’re tying to solve?

Best,
Erick

> On Dec 31, 2019, at 1:32 AM, Sidharth Negi  wrote:
> 
> Hi,
> 
> Is there a way to analyze how multiple values in a multivalued field are
> being tokenized and processed during indexing?
> 
> The "Analysis" page on the UI assumes that my multiple comma-separated
> values is a single value. It filters out the comma and acts as if it's a
> single value that I specified.
> 
> Thanks in advance!



Analysing Multivalued Fields

2019-12-30 Thread Sidharth Negi
Hi,

Is there a way to analyze how multiple values in a multivalued field are
being tokenized and processed during indexing?

The "Analysis" page on the UI assumes that my multiple comma-separated
values is a single value. It filters out the comma and acts as if it's a
single value that I specified.

Thanks in advance!


Re: SolrNet...multiValued.

2019-09-02 Thread Shawn Heisey

On 9/2/2019 8:58 AM, Britto Raj wrote:

I am working MNC and doing Prototype using SolrNet and Solr. I have
few questions and got stuck not able to move forward..





4. When i try to access using
SolrQueryResults results = solr.Query(new
SolrQuery("title:\"changeme4\""));
It throws error as
could not convert value 'system.collections.arraylist' to
property 'title' of document type solr rest client.program+product'


SolrNet is third party software.  It was not created by the Solr 
project.  I have absolutely no idea what that error message means.  With 
no experience at all using .net, I can't even tell you whether that 
error message comes from the SolrNet library or from the language itself.


I found this:

https://groups.google.com/forum/embed/#!forum/solrnet

There is also an "issues" tab on the project page:

https://github.com/SolrNet/SolrNet

Thanks,
Shawn


SolrNet...multiValued.

2019-09-02 Thread Britto Raj
Hello All,
I am working MNC and doing Prototype using SolrNet and Solr. I have
few questions and got stuck not able to move forward..

1. I have created a collection e.g Product without any field.
2. Using SolrNet.. created one Field like below.
public class Product
{

[SolrField("title")]
public string title { get; set; }
}
3. And below code is executed.. title is created as MultiValued as true.
   Startup.Init("http://localhost:8983/solr/product;);

ISolrOperations solr =
ServiceLocator.Current.GetInstance>();
// Product test = new Product() {title = new
List() { "changeme4" } };
Product test = new Product() { title =  "changeme4" };
solr.Add(test);
solr.Commit();
4. When i try to access using
   SolrQueryResults results = solr.Query(new
SolrQuery("title:\"changeme4\""));
It throws error as
   could not convert value 'system.collections.arraylist' to
property 'title' of document type solr rest client.program+product'
So i need to convert public string title { get; set; } as public
ICollection title { get; set; }
But i want to create a set of Properties in my document. I really
don't need to use ICollection and MultiValued as false. Please help me
to resolve the issue to move forward.


Re: SOLR Atomic Update - String multiValued Field

2019-07-24 Thread Furkan KAMACI
Hi Doss,

What was existing value and what happens after you do atomic update?

Kind Regards,
Furkan KAMACI

On Wed, Jul 24, 2019 at 2:47 PM Doss  wrote:

> HI,
>
> I have a multiValued field of type String.
>
>  multiValued="true"/>
>
> I want to keep this list unique, so I am using atomic updates with
> "add-distinct"
>
> {"docid":123456,"namelist":{"add-distinct":["Adam","Jane"]}}
>
> but this is not maintaining the expected uniqueness, am I doing something
> wrong? Guide me please.
>
> Thanks,
> Doss.
>


SOLR Atomic Update - String multiValued Field

2019-07-24 Thread Doss
HI,

I have a multiValued field of type String.



I want to keep this list unique, so I am using atomic updates with
"add-distinct"

{"docid":123456,"namelist":{"add-distinct":["Adam","Jane"]}}

but this is not maintaining the expected uniqueness, am I doing something
wrong? Guide me please.

Thanks,
Doss.


Parse multivalued field as list with custom function

2019-07-17 Thread Gregory.Guichard
Hello,


I'm trying to parse multivalued field (i.e : [8, 6, 9, 50]) as a List in a 
custom function.

I looked all the existing parser here : 
(https://github.com/apache/lucene-solr/tree/master/solr/core/src/java/org/apache/solr/search),
 and I don't find any example of how to parse a multivalued field in a List.


Can you give me example or some leads ?


I thank you in advance.





Parse multivalued field as list with custom function

2019-07-16 Thread Gregory.Guichard
Hello,


I'm trying to parse multivalued field (i.e : [8, 6, 9, 50]) as a List in a 
custom function.

I looked all the existing parser here : 
(https://github.com/apache/lucene-solr/tree/master/solr/core/src/java/org/apache/solr/search),
 and I don't find any example of how to parse a multivalued field in a List.


Can you give me example or some leads ?


I thank you in advance.





Re: How to know which value matched in multivalued field

2019-07-12 Thread Takashi Sasaki
I found this page.
https://stackoverflow.com/questions/2135072/determine-which-value-produced-a-hit-in-solr-multivalued-field-type
Hmmm...

2019年7月12日(金) 22:08 Takashi Sasaki :
>
> Hi Solr experts,
>
> I have multivalued location on RPT field.
> Is there a way to know which location matched by query?
>
> sample query:
> =:={!bbox sfield=store}=45.15,-93.85=5
>
> Of course I can recalculate on the client side,
> but I want to know how to do it using Solr's features.
>
> Solr version is 7.3.1.
>
> Thanks,
> Takashi Sasaki


How to know which value matched in multivalued field

2019-07-12 Thread Takashi Sasaki
Hi Solr experts,

I have multivalued location on RPT field.
Is there a way to know which location matched by query?

sample query:
=:={!bbox sfield=store}=45.15,-93.85=5

Of course I can recalculate on the client side,
but I want to know how to do it using Solr's features.

Solr version is 7.3.1.

Thanks,
Takashi Sasaki


Re: Search using filter query on multivalued fields

2019-05-03 Thread David Hastings
another option is to index dynamically, so you would index in this case, or
this is what i would do:
INGREDIENT_SALT_i:40
INGREDIENT_EGG_i:20
etc

and query
INGREDIENT_SALT_i:[20 TO *]
or an arbitrary max value, since these are percentages

INGREDIENT_SALT_i:[20 TO 100]


On Fri, May 3, 2019 at 12:01 PM Erick Erickson 
wrote:

> There is no way to do this with the setup you describe. That is, there’s
> no way to say “only use the third element of a multiValued field”.
>
> What I’d do is index (perhaps in a separate field) with payloads, so you
> have input like SALT|20, then use some of the payload functionality to make
> this happen. See: https://lucidworks.com/2017/09/14/solr-payloads/
>
> There are some other strategies that are simpler, one could index (again,
> perhaps in a separate field) SALT_20. Then you can form filter queries like
> “fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to
> normalize (i.e. 1% couldn’t be SALT_1), so “it depends”.
>
> The point is that you have to index cleverly to do what you want.
>
> Best,
> Erick
>
> > On May 3, 2019, at 6:26 AM, Srinivas Kashyap 
> wrote:
> >
> > Hi,
> >
> > I have indexed data as shown below using DIH:
> >
> > "INGREDIENT_NAME": [
> >  "EGG",
> >  "CANOLA OIL",
> >  "SALT"
> >],
> > "INGREDIENT_NO": [
> >  "550",
> >  "297",
> >  "314"
> >],
> > "COMPOSITION PERCENTAGE": [
> >  20,
> >  60,
> >  40
> >],
> >
> > Similar to this, many other records are also indexed. These are
> multi-valued fields.
> >
> > I have a requirement to search all the records which has ingredient name
> salt and it's composition percentage is more than 20.
> >
> > How do I write a filter query for this?
> >
> > P.S: I should only fetch records, whose Salt Composition percentage is
> more than 20 and not other percentages.
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>
>


Re: Search using filter query on multivalued fields

2019-05-03 Thread Erick Erickson
There is no way to do this with the setup you describe. That is, there’s no way 
to say “only use the third element of a multiValued field”.

What I’d do is index (perhaps in a separate field) with payloads, so you have 
input like SALT|20, then use some of the payload functionality to make this 
happen. See: https://lucidworks.com/2017/09/14/solr-payloads/

There are some other strategies that are simpler, one could index (again, 
perhaps in a separate field) SALT_20. Then you can form filter queries like 
“fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to 
normalize (i.e. 1% couldn’t be SALT_1), so “it depends”.

The point is that you have to index cleverly to do what you want.

Best,
Erick

> On May 3, 2019, at 6:26 AM, Srinivas Kashyap  wrote:
> 
> Hi,
> 
> I have indexed data as shown below using DIH:
> 
> "INGREDIENT_NAME": [
>  "EGG",
>  "CANOLA OIL",
>  "SALT"
>],
> "INGREDIENT_NO": [
>  "550",
>  "297",
>  "314"
>],
> "COMPOSITION PERCENTAGE": [
>  20,
>  60,
>  40
>],
> 
> Similar to this, many other records are also indexed. These are multi-valued 
> fields.
> 
> I have a requirement to search all the records which has ingredient name salt 
> and it's composition percentage is more than 20.
> 
> How do I write a filter query for this?
> 
> P.S: I should only fetch records, whose Salt Composition percentage is more 
> than 20 and not other percentages.
> 
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.



Search using filter query on multivalued fields

2019-05-03 Thread Srinivas Kashyap
Hi,

I have indexed data as shown below using DIH:

"INGREDIENT_NAME": [
  "EGG",
  "CANOLA OIL",
  "SALT"
],
"INGREDIENT_NO": [
  "550",
  "297",
  "314"
],
"COMPOSITION PERCENTAGE": [
  20,
  60,
  40
],

Similar to this, many other records are also indexed. These are multi-valued 
fields.

I have a requirement to search all the records which has ingredient name salt 
and it's composition percentage is more than 20.

How do I write a filter query for this?

P.S: I should only fetch records, whose Salt Composition percentage is more 
than 20 and not other percentages.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Range query on multivalued string field results in useless highlighting

2019-03-22 Thread Wolf, Karl (NIH/NLM/LHC) [C]
Range queries against mutivalued string fields produces useless highlighting, 
even though "hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference 
in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a Field defined in my schema as:




I am using a query containing a Range clause and I am using highlighting to get 
the list of values that match the range query.

All examples below were using the appropriate Solr Admin Server Query page.

The range query using Solr v5.1.0 produces CORRECT and useful results:

{
  "responseHeader": {
"status": 0,
"QTime": 366,
"params": {
  "q": "ResourceCorrespondent:[A TO B}",
  "hl": "true",
  "indent": "true",
  "hl.preserveMulti": "true",
  "fl": "ResourceCorrespondent,ResourceID",
  "hl.requireFieldMatch": "true",
  "hl.usePhraseHighlighter": "true",
  "hl.fl": "ResourceCorrespondent",
  "wt": "json",
  "hl.highlightMultiTerm": "true",
  "_": "1553275722025"
}
  },
  "response": {
"numFound": 999,
"start": 0,
"docs": [
  {
"ResourceCorrespondent": [
  "Stanley, Wendell M.",
  "Avery, Roy"
],
"ResourceID": "CCAAHG"
  },
  {
"ResourceCorrespondent": [
  "Avery, Roy"
],
"ResourceID": "CCGMDS"
  },
... lots more docs, then
]
  },
... we get to the highlighting portion of the response
... this tells me which values of each ResourceCorrespondent field
... actually matching the query

  "highlighting": {
"CCAAHG": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"CCGMDS": {
  "ResourceCorrespondent": [
"Avery, Roy"
  ]
},
"BBACKV": {
  "ResourceCorrespondent": [
"American Institute of Biological Sciences",
"Albritton, Errett C."
  ]
},
... lots more useful highlight values. Note two matching values
... for document BBACKV.
}

***
***
However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top 
portion of the
response is basically the same including the number of documents found

{
  "responseHeader":{
"status":0,
"QTime":245,
"params":{
  "q":"ResourceCorrespondent:[A TO B}",
  "hl":"on",
  "hl.preserveMulti":"true",
  "fl":"ResourceID, ResourceCorrespondent",
  "hl.requireFieldMatch":"true",
  "hl.fl":"ResourceCorrespondent",
  "hightlightMultiTerm":"true",
  "wt":"json",
  "_":"1553105129887",
  "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

The documents are in a different order, but that doesn't matter.

The problem is with the lighlighting which is effectively empty. I don't know 
what
values in each document actually matched the query:

  "highlighting":{
"QQBBLX":{},
"QQBCLN":{},
"QQBCLM":{},
... etc.

*** NOTE: The data is the same for all Solr versions and the Solr indexes were 
rebuilt
for each Solr version.

***
Changing to using "=unified", the highlighting looks like:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":[]},
"QQBCLN":{
  "ResourceCorrespondent":[]},
"QQBCLM":{
  "ResourceCorrespondent":[]},

*** Closer but still no useful values

***
NOTE: if I change only the query to be a wildcard query to 
q="ResourceCorrespondent:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1:

  "highlighting":{
"QQBBLX":{
  "ResourceCorrespondent":["American Public Health Association"]},
"QQBCLN":{
  "ResourceCorrespondent":["Abram, Morris B."]},
"QQBCLM":{
  "ResourceCorrespondent":["Abram, Morris B."]},
... etc.

*** This makes me think there is some problem with a Range query feeding the
Highlighter code.

***
All variations of hl specs or other query parameters do not fix the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

So there is some incompatibility among:

1) A multivalued string field AND
2) A range query against that field AND
3) Highlighting

The highlight portion of the response is effectively "empty"

I don't know when this issue was first introduced. I have recently been 
updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for 
the intervening
versions but I gave up to save my sanity.

--Karl


Re: Accessing multiValued field from within custom function

2019-02-07 Thread Dariusz Wojtas
Hi,

Any hints on this topic?
How to access String / Text values from a multiValued field inside custom
function?

Best regards,
Dariusz Wojtas

On Thu, Jan 3, 2019 at 6:18 PM Dariusz Wojtas  wrote:

> Hi,
>
> I am using SOLR 7.5 in the cloud mode.
> I want to create a custom function similar to 'strdist' that works on
> multivalued fields (multiValued=true) and finds the highest matching score.
> Yes, I know the potential performance issues, but in my usecase this would
> bring a huge benefit.
>
> There is not much information on how to work with multiValued fields, but
> I have found a piece of code that might be useful. It's how SOLR standard
> functions are registered:
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ValueSourceParser.java
>
> The interesting part for me starts in line 424, when the 'field' function
> is registered.
> It optionally accepts a multivalue field for min/max calculation.
> If the 2nd argument is 'min' or 'max' it tries to resolve the field as
> SchemaField.
>   SchemaField f = fp.getReq().getSchema().getField(fieldName);
>
> Now the questions are:
> 1. Is this the path I should follow? If not - are there any other ways?
> 2. How to retrieve all the actual *String *or *Text *values from a
> multivalue field, not just a single value? Some kind of a table or set of
> values. How?
> 3. Does cloud mode change anything here? In my case the whole index is on
> a single machine, but there are several replicas.
>
> Best regards,
> Dariusz Wojtas
>
>


Accessing multiValued field from within custom function

2019-01-03 Thread Dariusz Wojtas
Hi,

I am using SOLR 7.5 in the cloud mode.
I want to create a custom function similar to 'strdist' that works on
multivalued fields (multiValued=true) and finds the highest matching score.
Yes, I know the potential performance issues, but in my usecase this would
bring a huge benefit.

There is not much information on how to work with multiValued fields, but I
have found a piece of code that might be useful. It's how SOLR standard
functions are registered:
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ValueSourceParser.java

The interesting part for me starts in line 424, when the 'field' function
is registered.
It optionally accepts a multivalue field for min/max calculation.
If the 2nd argument is 'min' or 'max' it tries to resolve the field as
SchemaField.
  SchemaField f = fp.getReq().getSchema().getField(fieldName);

Now the questions are:
1. Is this the path I should follow? If not - are there any other ways?
2. How to retrieve all the actual *String *or *Text *values from a
multivalue field, not just a single value? Some kind of a table or set of
values. How?
3. Does cloud mode change anything here? In my case the whole index is on a
single machine, but there are several replicas.

Best regards,
Dariusz Wojtas


How to return matches against a multivalued field in Solr search results?

2018-09-27 Thread Pudney Chris (ext) GBJH
G'day,

We're running Solr 5.5.5 to build a search application for a repository of 
MS-Office docs and PDFs.

Our schema includes a multivalued field that holds the IDs of objects embedded 
in our documents - there can be 100s sometimes 1000s of such objects per 
document.

We have a custom query parser that transforms (portions of) the user's query 
into a search term against this IDs field. The search results return all of the 
IDs found in the documents matching the query.

Is there some way to identify for each document in the results, the subset of 
IDs it contains that matched the query?

See also: https://issues.apache.org/jira/browse/SOLR-3955

Thanks,
Chris.

This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


Re: Copyto with DIH Interpreting string as MultiValued field on copy

2018-08-18 Thread Zimmermann, Thomas
Makes total sense. Thanks to both of your for the clarification!

On 8/18/18, 8:03 AM, "Alexandre Rafalovitch"  wrote:

>Amd part of the issue is that SolrEntityProcessor does not take individual
>field definitions. So that part is ignored and instead just 'fl' mapping
>is
>used as Shawn explained.
>
>So you could also remap authorText in that definition to an ignored field.
>See
>https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH
>/solr/solr/conf/solr-data-config.xml
>
>Regards,
>Alex
>
>On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey,  wrote:
>
>> On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:
>> > I¹m trying to track down an odd issue I¹m seeing when using the
>> SolrEntityProcessor to seed some test data from a solr 4.x cluster to a
>> solr 7.x cluster. It seems like strings are being interpreted as
>> multivalued when passed from a string field to a text field via the
>>copyTo
>> directive. Any clever ideas how to resolve this?
>>
>> What's happening is deceptively simple.
>>
>> In the source system, you're copying from author to authorText.  Both
>> fields are stored.  So if you have "Jeff Hartley" in author, you also
>> have "Jeff Hartley" in authorText. So what's happening is that when the
>> destination system imports from the source system, it gets "Jeff
>> Hartley" in both fields, and then copyField says "put a copy of what's
>> in author into authorText" ... and suddenly there are two copies of
>> "Jeff Hartley" in authorText.
>>
>> There are two ways to deal with this:
>>
>> 1) In the query you're doing with SolrEntityProcessor, add an "fl"
>> parameter and list all the fields *except* authorText and any other
>> field where this same problem is happening.
>>
>> 2) Remove the copyField from the schema until after the import from the
>> source server is done.
>>
>> Thanks,
>> Shawn
>>
>>



Re: Copyto with DIH Interpreting string as MultiValued field on copy

2018-08-18 Thread Alexandre Rafalovitch
Amd part of the issue is that SolrEntityProcessor does not take individual
field definitions. So that part is ignored and instead just 'fl' mapping is
used as Shawn explained.

So you could also remap authorText in that definition to an ignored field.
See
https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH/solr/solr/conf/solr-data-config.xml

Regards,
Alex

On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey,  wrote:

> On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:
> > I’m trying to track down an odd issue I’m seeing when using the
> SolrEntityProcessor to seed some test data from a solr 4.x cluster to a
> solr 7.x cluster. It seems like strings are being interpreted as
> multivalued when passed from a string field to a text field via the copyTo
> directive. Any clever ideas how to resolve this?
>
> What's happening is deceptively simple.
>
> In the source system, you're copying from author to authorText.  Both
> fields are stored.  So if you have "Jeff Hartley" in author, you also
> have "Jeff Hartley" in authorText. So what's happening is that when the
> destination system imports from the source system, it gets "Jeff
> Hartley" in both fields, and then copyField says "put a copy of what's
> in author into authorText" ... and suddenly there are two copies of
> "Jeff Hartley" in authorText.
>
> There are two ways to deal with this:
>
> 1) In the query you're doing with SolrEntityProcessor, add an "fl"
> parameter and list all the fields *except* authorText and any other
> field where this same problem is happening.
>
> 2) Remove the copyField from the schema until after the import from the
> source server is done.
>
> Thanks,
> Shawn
>
>


Re: Copyto with DIH Interpreting string as MultiValued field on copy

2018-08-17 Thread Shawn Heisey

On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:

I’m trying to track down an odd issue I’m seeing when using the 
SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 
7.x cluster. It seems like strings are being interpreted as multivalued when 
passed from a string field to a text field via the copyTo directive. Any clever 
ideas how to resolve this?


What's happening is deceptively simple.

In the source system, you're copying from author to authorText.  Both 
fields are stored.  So if you have "Jeff Hartley" in author, you also 
have "Jeff Hartley" in authorText. So what's happening is that when the 
destination system imports from the source system, it gets "Jeff 
Hartley" in both fields, and then copyField says "put a copy of what's 
in author into authorText" ... and suddenly there are two copies of 
"Jeff Hartley" in authorText.


There are two ways to deal with this:

1) In the query you're doing with SolrEntityProcessor, add an "fl" 
parameter and list all the fields *except* authorText and any other 
field where this same problem is happening.


2) Remove the copyField from the schema until after the import from the 
source server is done.


Thanks,
Shawn



Copyto with DIH Interpreting string as MultiValued field on copy

2018-08-17 Thread Zimmermann, Thomas
Hi,

I’m trying to track down an odd issue I’m seeing when using the 
SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 
7.x cluster. It seems like strings are being interpreted as multivalued when 
passed from a string field to a text field via the copyTo directive. Any clever 
ideas how to resolve this?

Schema:


Fields and CopyTo








Text fieldtype declaration:








































DIH Config:





http://cluster.solr.eng.techtarget.com/solr/vignette "

query="*:*"

fl="*,orig_version_l:_version_">



















Error:


org.apache.solr.common.SolrException: ERROR: 
[doc=d751e434c69b6210VgnVCM100d01c80aRCRD] Error adding field 
'author'='Jeff Hartley' msg=Multiple values encountered for non multiValued 
copy field authorText: Jeff Hartley

at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:203) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:101)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:980)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1168)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
 ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80) 
~[?:?]

at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:258)
 ~[?:?]

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:527)
 ~[?:?]

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
 ~[?:?]

at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) 
~[?:?]

at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) 
~[?:?]

at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
 ~[?:?]

at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) 
~[?:?]

at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
 ~[?:?]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

Caused by: org.apache.solr.common.SolrException: Multiple values encountered 
for non multiValued copy field authorText: Jeff Hartley

at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:180) 
~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz 
- 2018-06-18 16:55:13]

... 22 more



Re: _childDocuments_ automatically multivalued field type

2018-07-02 Thread jeebix
Ok, I'll have a look at the link above.

Thanks a lot...

Best
JB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: _childDocuments_ automatically multivalued field type

2018-07-02 Thread jeebix
Ok, I see what I have to look for, thanks to your reply. I'll adjust the
schema and see difference.

Thanks.

Best
JB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: _childDocuments_ automatically multivalued field type

2018-07-02 Thread Shawn Heisey
On 7/2/2018 9:18 AM, jeebix wrote:
> I don't understand why for example "type_cmd_s" get the field type attribute
> "singleValued", but "TTC" or "kits_sans_suite" get "multiValued" attribute ?
> Why those field are in the managed-schema and enseigne_s (for example) is
> not ?

The field named enseigne_s is almost certainly handled by a dynamic
field definition, most likely one with the name "*_s".  That field (and
its field type) do not have multiValued="true".  This was probably
already in your schema before you did any indexing.

The ones that were automatically added by the data-driven nature of your
schema were added as the "strings" type, which IS multi-valued.  The
update processor definition that is in the Solr examples is set up to
add fields as multiValued, so that if a later indexing request comes in
with multiple values for the field, it will not fail.

This is the major danger of relying on Solr to automatically add fields
to your schema.  Chances are good that the choice it makes for the field
will be the wrong choice.  And when that happens, you will need to fix
the schema and completely reindex.

https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: _childDocuments_ automatically multivalued field type

2018-07-02 Thread Alexandre Rafalovitch
Because your _s fields must be mapping to the dynamicField definition and
are created accordingly in the schema dynamically without needing a special
definition for each field.

The TTC field you did map explicitly, perhaps with "schemaless" mapping
autodiscovery. Which does create specific field definitions, but always
multivalued.

The multivalued attribute can be set on type  not just on individual field.

So you may just want to adjust schema definition to use singular types
instead.

AdminUI schema screen is helpful to see such differences.

Regards,
 Alex

On Mon, Jul 2, 2018, 11:43 AM jeebix,  wrote:

> Hello everybody,
>
> I have a problem with some field types in the managed-schema generated.
>
> First, the data SOLR returned with a standard query :
>
> response":{"numFound":365567,"start":0,"docs":[
>   {
> "id":"560.561.134676",
> "parent_i":560,
> "asso_i":561,
> "personne_i":134676,
> "etat_technique_s":"avec_documents",
> "etat_marketing_s":"actif",
> "type_parent_s":"Ecole élémentaire publique",
> "type_asso_s":"APE (association de parents d'élèves)",
> "groupe_type_parent_s":"ENSEIGNEMENT_PRIMAIRE",
> "groupe_type_asso_s":"ASSOCIATION_DE_PARENTS",
> "nombre_commandes_brut_i":2,
> "nombre_commandes_i":1,
> "nombre_kits_saveur_i":0,
> "ca_periode_i":560,
> "ca_periode_fleur_i":0,
> "ca_periode_saveur_i":0,
> "zone_scolaire_s":"A",
> "territoire_s":"France Métropolitaine",
> "region_s":"AUVERGNE RHONE-ALPES",
> "departement_s":"01 AIN",
> "postal_country_s":"FR",
> "asso_country_s":"FRANCE",
> "object_type_s":"contact",
> "date_derni_re_commande_dt":"2016-05-20T00:00:00Z",
> "_version_":1604889647955050496,
> "_childDocuments_":[
> {
>   "fixe_facturation":["0256897856"],
>   "object_type":["order"],
>   "mobile_livraison":["0658987874"],
>   "kit_sans_suite":["false"],
>   "fixe_livraison":["0450598311"],
>   "type_cde_s":"CDE",
>   "statut_s":"V",
>   "mobile_facturation":["0658787458"],
>   "campagne_s":"A",
>   "TTC":[780],
>   "date_dt":"2016-05-20T00:00:00Z",
>   "id":"A28837",
>   "enseigne_s":"CRE"},
> {
>   "fixe_facturation":["0245784975"],
>   "object_type":["order"],
>   "mobile_livraison":["0645789874"],
>   "kit_sans_suite":["false"],
>   "type_cde_s":"KIT",
>   "statut_s":"V",
>   "mobile_facturation":["0612345678"],
>   "campagne_s":"A",
>   "TTC":[0],
>   "date_dt":"2016-05-04T00:00:00Z",
>   "id":"A25415",
>   "enseigne_s":"CRE"}]}
>
> My goal is to sum fields "TTC" by parentDocument. But with the type
> "multiValued", I can't use aggregation functions.
>
> The core get the data from this script : /opt/solr/bin/post -c 
> -format solr build/index.json
>
> The index.json looks like that:
>
> [
>   {
> "id": "781.782.134878",
> "parent_i": 781,
> "asso_i": 782,
> "personne_i": 134878,
> "etat_technique_s": "avec_documents",
> "etat_marketing_s": "inactif",
> "type_parent_s": "Ecole élémentaire privée",
> "type_asso_s": "APEL (association de parents école libre)",
> "groupe_type_parent_s": "ENSEIGNEMENT_PRIMAIRE",
> "groupe_type_asso_s": "ASSOCIATION_DE_PARENTS",
> "nombre_commandes_brut_i": 4,
> "nombre_commandes_i": 2,
> "n

_childDocuments_ automatically multivalued field type

2018-07-02 Thread jeebix
Hello everybody,

I have a problem with some field types in the managed-schema generated.

First, the data SOLR returned with a standard query :

response":{"numFound":365567,"start":0,"docs":[
  {
"id":"560.561.134676",
"parent_i":560,
"asso_i":561,
"personne_i":134676,
"etat_technique_s":"avec_documents",
"etat_marketing_s":"actif",
"type_parent_s":"Ecole élémentaire publique",
"type_asso_s":"APE (association de parents d'élèves)",
"groupe_type_parent_s":"ENSEIGNEMENT_PRIMAIRE",
"groupe_type_asso_s":"ASSOCIATION_DE_PARENTS",
"nombre_commandes_brut_i":2,
"nombre_commandes_i":1,
"nombre_kits_saveur_i":0,
"ca_periode_i":560,
"ca_periode_fleur_i":0,
"ca_periode_saveur_i":0,
"zone_scolaire_s":"A",
"territoire_s":"France Métropolitaine",
"region_s":"AUVERGNE RHONE-ALPES",
"departement_s":"01 AIN",
"postal_country_s":"FR",
"asso_country_s":"FRANCE",
"object_type_s":"contact",
"date_derni_re_commande_dt":"2016-05-20T00:00:00Z",
"_version_":1604889647955050496,
"_childDocuments_":[
{
  "fixe_facturation":["0256897856"],
  "object_type":["order"],
  "mobile_livraison":["0658987874"],
  "kit_sans_suite":["false"],
  "fixe_livraison":["0450598311"],
  "type_cde_s":"CDE",
  "statut_s":"V",
  "mobile_facturation":["0658787458"],
  "campagne_s":"A",
  "TTC":[780],
  "date_dt":"2016-05-20T00:00:00Z",
  "id":"A28837",
  "enseigne_s":"CRE"},
{
  "fixe_facturation":["0245784975"],
  "object_type":["order"],
  "mobile_livraison":["0645789874"],
  "kit_sans_suite":["false"],
  "type_cde_s":"KIT",
  "statut_s":"V",
  "mobile_facturation":["0612345678"],
  "campagne_s":"A",
  "TTC":[0],
  "date_dt":"2016-05-04T00:00:00Z",
  "id":"A25415",
  "enseigne_s":"CRE"}]}

My goal is to sum fields "TTC" by parentDocument. But with the type
"multiValued", I can't use aggregation functions.

The core get the data from this script : /opt/solr/bin/post -c 
-format solr build/index.json

The index.json looks like that:

[
  {
"id": "781.782.134878",
"parent_i": 781,
"asso_i": 782,
"personne_i": 134878,
"etat_technique_s": "avec_documents",
"etat_marketing_s": "inactif",
"type_parent_s": "Ecole élémentaire privée",
"type_asso_s": "APEL (association de parents école libre)",
"groupe_type_parent_s": "ENSEIGNEMENT_PRIMAIRE",
"groupe_type_asso_s": "ASSOCIATION_DE_PARENTS",
"nombre_commandes_brut_i": 4,
"nombre_commandes_i": 2,
"nombre_kits_saveur_i": 2,
"date_dernière_commande_dt": "2010-11-16",
"ca_periode_i": 0,
"ca_periode_fleur_i": 0,
"ca_periode_saveur_i": 0,
"zone_scolaire_s": "A",
"territoire_s": "France Métropolitaine",
"region_s": "AUVERGNE RHONE-ALPES",
"departement_s": "01 AIN",
"postal_country_s": "FR",
"asso_country_s": "FRANCE",
"object_type_s": "contact",
"kits_sans_suite_ss": null,
"_childDocuments_": [
  {
"fixe_facturation": "0450407279",
"object_type": "order",
    "mobile_livraison": "0628332864",
"kit_sans_suite": "false",
"fixe_livraison": "0450407279",
"type_cde_s": "KIT",
"statut_s": "V",
"mobile_facturation": "0628332864",
"campagne_s": "L",
"TTC": 0,
"date_dt": "2009-10-12T00:00:00Z",
"id": "L14276",
"enseigne_s": "SAV",
"gamme": [
  "KITS > Kits Saveurs"
]
  },
  {
"fixe_facturation": "0450407279",
"object_type": "order",
"mobile_livraison": "0628332864",
"kit_sans_suite": "false",
"fixe_livraison": "0450407279",
"type_cde_s": "CDE",
"statut_s": "V",
"mobile_facturation": "0628332864",
"campagne_s": "L",
"TTC": 1045,
"date_dt": "2009-11-14T00:00:00Z",
"id": "L25049",
"enseigne_s": "SAV",
"gamme": [
  "CHOCOLAT > Assortiment",
  "CHOCOLAT > Individuel",
  "CHOCOLAT > Mono-produit",
  "EQUIPEMENT MAISON > Cuisine",
  "EQUIPEMENT MAISON > Décoration",
  "KITS > Kits Saveurs",
  "SAVEURS > Confiserie",
  "SAVEURS > Pâtisserie"
]
}
]

In the managed-schema, only those fields appear:














I don't understand why for example "type_cmd_s" get the field type attribute
"singleValued", but "TTC" or "kits_sans_suite" get "multiValued" attribute ?
Why those field are in the managed-schema and enseigne_s (for example) is
not ?

Thanks a lot for your help...

Best
JB





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr sort multivalued field

2018-06-12 Thread Shawn Heisey

On 6/12/2018 2:56 AM, Marc Lammers wrote:

I want to sort my data by a multivalued field. I add this to my query
„*sort=field(foo,min)
asc“*. The configuration in the schema for this field is




The documentation for the field function says that the field must 
contain numeric docvalues.  Your field has type="string" and although 
you did not indicate what the definition of string is in your schema, 
most likely it is the solr.StrField class.


https://lucene.apache.org/solr/guide/7_3/function-queries.html#FunctionQueries-field

Because this is not a numeric field, I'm guessing that it will not work 
with the field function.  All of the examples for that function are 
referencing a float field.


Thanks,
Shawn



Solr sort multivalued field

2018-06-12 Thread Marc Lammers
Hi All.



I want to sort my data by a multivalued field. I add this to my query
„*sort=field(foo,min)
asc“*. The configuration in the schema for this field is







The solr documentation says that i have to add the docValues="true"
attribute for this field. After this I deleted the collection and
reimported the data. But when I execute my query I get the following error
message:

*„ sort param could not be parsed as a query, and is not a field that
exists in the index: field(foo,min)*“


Did I forget to set something?

Thanks in advance,

Marc


Re: Query a particular index from a multivalued field.

2018-06-07 Thread Erick Erickson
there's no such syntax OOB.

You could append an index to it. So your input doc would look something like:

 doc 1= {
"id": "1",
"status": [
  "b1",
  "a2"
]
 }

and search appropriately.

Perhaps this would be a duplicated field used only when you wanted to
search by position.

Best,
Erick

On Thu, Jun 7, 2018 at 8:36 AM, root23  wrote:
> Hi all,
> is there a way i can query a particular index of a multivalued field.
> e.g lets say i have a document like this
>  doc 1= {
> "id": "1",
> "status": [
>   "b",
>   "a"
> ]
>  }
>
> doc2= {
> "id": "1",
> "status": [
>   "c",
>   "b"
> ]
>  }
>
> can i query like give me the document which has status = b at index 0. which
> should only return doc 1.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Query a particular index from a multivalued field.

2018-06-07 Thread root23
Hi all,
is there a way i can query a particular index of a multivalued field.
e.g lets say i have a document like this
 doc 1= {
"id": "1",
"status": [
  "b",
  "a"
]
 }

doc2= {
"id": "1",
"status": [
  "c",
  "b"
]
 }

can i query like give me the document which has status = b at index 0. which
should only return doc 1.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to replacing values on multiValued all together by using 1 query

2018-05-10 Thread Shawn Heisey

On 5/10/2018 7:51 AM, Issei Nishigata wrote:

I create a field called employee_name, and use it as multiValued.
If “Mr.Smith" that is part of the value of the field is changed to
“Mr.Brown", do I have to create 1 million deletion queries
and updating queries in case where “Mr.Smith" appears in 1 million
documents?

Do we have a simple way of updating to use only 1 query?


Solr is not a database.  There is no functionality to update field X on 
all documents that match a query.


If your index has a field designated as uniqueKey, you won't need to do 
a delete, you can just reindex those documents and Solr will do the 
delete for you.  If you do not have a uniqueKey, then you will need to 
delete the old document before indexing its replacement.


https://wiki.apache.org/solr/HowToReindex

If your index meets the criteria for Atomic Updates, then you could use 
the delete/add functionality in that feature to take care of it.  Solr 
will construct a complete document from your atomic update and all the 
data currently in the document, then index the new document, deleting 
the old one in the process. Atomic Updates also require a uniqueKey.


See this page for information on Atomic Updates and the field storage 
requirements:


https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html

Thanks,
Shawn



How to replacing values on multiValued all together by using 1 query

2018-05-10 Thread Issei Nishigata
Hi, all


I create a field called employee_name, and use it as multiValued.
If “Mr.Smith" that is part of the value of the field is changed to
“Mr.Brown", do I have to create 1 million deletion queries
and updating queries in case where “Mr.Smith" appears in 1 million
documents?

Do we have a simple way of updating to use only 1 query?


Thanks,
Issei

-- 
Issei Nishigata


Solr sort on latest upcoming timestamp value on multivalued field

2018-04-17 Thread sayantan94
I have a multivalued field for session timings (where i store timestamps) of
groups document. e.g. session_timings: [1526882026, 1513882026, 1533882026
]. My sorting logic is the groups should be listed sorted based on their
upcoming session time.

For example, Group A has three session_timings = [1, 2, 5]. And Group B also
has three session_timings = [1, 6, 7]. If current timestamp is 3, then Group
A should come first because next session for Group A is on 5, whereas for
Group B its 7. Is this possible with solr sorting? Or do I have to use
another way to do this? Any help would be great.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
On Mon, Feb 26, 2018 at 7:14 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

>
> Faceting works on multivalued fields, perhaps you can do something with
> that?
>
> The main difference I see in this case between facets and groups is that
groups are sorted by score, so most relevant group comes first.
Which is very useful when I have to return grouped results to the user.


-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
Of course, and in that use-case you'd want a particular document to
appear in all three categories.

Another client may want the doc to appear in only the "most important"
category, however that's defined.

Another client may want the doc to appear in "the more recent" day
(assuming we're grouping by date). Or "the oldest day".

that's what I meant by " rather than do something which will be wrong
it throws an error". Whatever you choose will be "wrong" in some
use-case.

Your use-case is certainly valid, but nobody has come forward with a
patch to allow it that I know of.

Faceting works on multivalued fields, perhaps you can do something with that?

Best,
Erick

On Mon, Feb 26, 2018 at 9:10 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
> Hi Erick,
>
> please consider this case where there is a group products that are
> televisions.
>
> Now I have only one category per product, but in same cases like the
> television I could have more than one.
>
> Some products should be available simultaneously in more categories, thats
> why the field I was trying to group is a multivalue, for example:
>
> /home-video/televisions/tv-led (516)
> /home-video/televisions/tv-ultra-hd-4k (363)
> /home-video/televisions/smart-tv (19)
>
> So there can be a television that is simultaneously a TV led, a smart tv
> and is ultra hd 4k.
>
> So, for example, I should be able to submit the following query:
>
> - fq=available:true
> - fq=vertical:0
> - q=television
> - rows=3
> - group=true
> - group.field=category
> - group.limit=0
>
> So the returned groups should be something like this (this is the output I
> have now for the single value field)
>
> 
>   
> 51653
> 
>   
> /home-video/televisions/tv-led
>  maxScore="0.6224861">
> 
>   
>   
> /home-video/televisions/tv-ultra-hd-4k
>  maxScore="0.5923965">
> 
>   
>   
> /home-video/televisions/smart-tv
> 
> 
>   
> 
>   
> 
>
>
>
> On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> What does "group by" mean on a field with more than one value? Say I
>> have "A" and "B" in the field in a single document. What group does it
>> go in, one labeld "A" or one labeled "B"?
>>
>> So IIUC, rather than do something which will be wrong it throws an
>> error if the field is defined as multiValued. And whatever option is
>> chosen (e.g. use the min or max or) will be wrong sometime.
>>
>> Although admittedly the error is a bit obscure...
>>
>> Best,
>> Erick
>>
>> On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com>
>> wrote:
>> > Hi Amrit,
>> >
>> > thanks for your help.
>> >
>> > I know that only 5/10% of documents in the collection have more than one
>> > value for the field I was trying to group by.
>> >
>> > So there isn't a particular memory usage in this case. Do you know if
>> there
>> > is any other counter-indication I have to be aware of?
>> >
>> > I was thinking to avoid this problem hacking the source code and deploy a
>> > personalised version of Solr.
>> >
>> > Best regards,
>> > Vincenzo
>> >
>> >
>> >
>> > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
>> > wrote:
>> >
>> >> Vincenzo,
>> >>
>> >> As I read the source code;  SchemaField.java
>> >>
>> >> /**
>> >>  * Sanity checks that the properties of this field type are plausible
>> >>  * for a field that may be used to get a FieldCacheSource, throwing
>> >>  * an appropriate exception (including the field name) if it is not.
>> >>  * FieldType subclasses can choose to call this method in their
>> >>  * getValueSource implementation
>> >>  * @see FieldType#getValueSource
>> >>  */
>> >> public void checkFieldCacheSource() throws SolrException {
>> >>   if ( multiValued() ) {
>> >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>> >> "can not use FieldCache on multivalued
>> field: "
>> >> + getName());
>> >>   }
>> >>   if (! hasDocValues() ) {
>> >> if ( ! ( indexed() && null != this.type.getUninve

Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Erick,

please consider this case where there is a group products that are
televisions.

Now I have only one category per product, but in same cases like the
television I could have more than one.

Some products should be available simultaneously in more categories, thats
why the field I was trying to group is a multivalue, for example:

/home-video/televisions/tv-led (516)
/home-video/televisions/tv-ultra-hd-4k (363)
/home-video/televisions/smart-tv (19)

So there can be a television that is simultaneously a TV led, a smart tv
and is ultra hd 4k.

So, for example, I should be able to submit the following query:

- fq=available:true
- fq=vertical:0
- q=television
- rows=3
- group=true
- group.field=category
- group.limit=0

So the returned groups should be something like this (this is the output I
have now for the single value field)


  
51653

  
/home-video/televisions/tv-led


  
  
/home-video/televisions/tv-ultra-hd-4k


  
  
/home-video/televisions/smart-tv


  

  




On Mon, Feb 26, 2018 at 4:44 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> What does "group by" mean on a field with more than one value? Say I
> have "A" and "B" in the field in a single document. What group does it
> go in, one labeld "A" or one labeled "B"?
>
> So IIUC, rather than do something which will be wrong it throws an
> error if the field is defined as multiValued. And whatever option is
> chosen (e.g. use the min or max or) will be wrong sometime.
>
> Although admittedly the error is a bit obscure...
>
> Best,
> Erick
>
> On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com>
> wrote:
> > Hi Amrit,
> >
> > thanks for your help.
> >
> > I know that only 5/10% of documents in the collection have more than one
> > value for the field I was trying to group by.
> >
> > So there isn't a particular memory usage in this case. Do you know if
> there
> > is any other counter-indication I have to be aware of?
> >
> > I was thinking to avoid this problem hacking the source code and deploy a
> > personalised version of Solr.
> >
> > Best regards,
> > Vincenzo
> >
> >
> >
> > On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
> > wrote:
> >
> >> Vincenzo,
> >>
> >> As I read the source code;  SchemaField.java
> >>
> >> /**
> >>  * Sanity checks that the properties of this field type are plausible
> >>  * for a field that may be used to get a FieldCacheSource, throwing
> >>  * an appropriate exception (including the field name) if it is not.
> >>  * FieldType subclasses can choose to call this method in their
> >>  * getValueSource implementation
> >>  * @see FieldType#getValueSource
> >>  */
> >> public void checkFieldCacheSource() throws SolrException {
> >>   if ( multiValued() ) {
> >> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
> >> "can not use FieldCache on multivalued
> field: "
> >> + getName());
> >>   }
> >>   if (! hasDocValues() ) {
> >> if ( ! ( indexed() && null != this.type.getUninversionType(this) )
> ) {
> >>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
> >>   "can not use FieldCache on a field w/o
> >> docValues unless it is indexed and supports Uninversion: "
> >>   + getName());
> >> }
> >>   }
> >> }
> >>
> >> Seems like FieldCache are not allowed to un-invert values for
> >> multi-valued fields.
> >>
> >> I can suspect the reason, multiple values will eat up more memory? Not
> >> sure, someone else can weigh in.
> >>
> >>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> Medium: https://medium.com/@sarkaramrit2
> >>
> >> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > while trying to run a group query on a multivalue field I received
> this
> >> > error:
> >> >
> >> > can not use FieldCache on multivalued field:
> >> >
> >> > 
> >> > 
> >> >
> >> > 
> >> >   true
> >> >   400
> >> >   4
> >> > 
> >> > 
> >> >   
> >> > org.apache.solr.common.SolrException str>
> >> > org.apache.solr.common.
> >> > SolrException
> >> >   
> >> >   can not use FieldCache on multivalued field:
> >> > categoryLevels
> >> >   400
> >> > 
> >> > 
> >> >
> >> > I don't understand why this is happening.
> >> >
> >> > Do you know any way to work around this problem?
> >> >
> >> > Thanks in advance,
> >> > Vincenzo
> >> >
> >> > --
> >> > Vincenzo D'Amore
> >> >
> >>
> >
> >
> >
> > --
> > Vincenzo D'Amore
>



-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Erick Erickson
What does "group by" mean on a field with more than one value? Say I
have "A" and "B" in the field in a single document. What group does it
go in, one labeld "A" or one labeled "B"?

So IIUC, rather than do something which will be wrong it throws an
error if the field is defined as multiValued. And whatever option is
chosen (e.g. use the min or max or) will be wrong sometime.

Although admittedly the error is a bit obscure...

Best,
Erick

On Mon, Feb 26, 2018 at 7:37 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
> Hi Amrit,
>
> thanks for your help.
>
> I know that only 5/10% of documents in the collection have more than one
> value for the field I was trying to group by.
>
> So there isn't a particular memory usage in this case. Do you know if there
> is any other counter-indication I have to be aware of?
>
> I was thinking to avoid this problem hacking the source code and deploy a
> personalised version of Solr.
>
> Best regards,
> Vincenzo
>
>
>
> On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
> wrote:
>
>> Vincenzo,
>>
>> As I read the source code;  SchemaField.java
>>
>> /**
>>  * Sanity checks that the properties of this field type are plausible
>>  * for a field that may be used to get a FieldCacheSource, throwing
>>  * an appropriate exception (including the field name) if it is not.
>>  * FieldType subclasses can choose to call this method in their
>>  * getValueSource implementation
>>  * @see FieldType#getValueSource
>>  */
>> public void checkFieldCacheSource() throws SolrException {
>>   if ( multiValued() ) {
>> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>> "can not use FieldCache on multivalued field: "
>> + getName());
>>   }
>>   if (! hasDocValues() ) {
>> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
>>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>>   "can not use FieldCache on a field w/o
>> docValues unless it is indexed and supports Uninversion: "
>>   + getName());
>> }
>>   }
>> }
>>
>> Seems like FieldCache are not allowed to un-invert values for
>> multi-valued fields.
>>
>> I can suspect the reason, multiple values will eat up more memory? Not
>> sure, someone else can weigh in.
>>
>>
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > while trying to run a group query on a multivalue field I received this
>> > error:
>> >
>> > can not use FieldCache on multivalued field:
>> >
>> > 
>> > 
>> >
>> > 
>> >   true
>> >   400
>> >   4
>> > 
>> > 
>> >   
>> > org.apache.solr.common.SolrException
>> > org.apache.solr.common.
>> > SolrException
>> >   
>> >   can not use FieldCache on multivalued field:
>> > categoryLevels
>> >   400
>> > 
>> > 
>> >
>> > I don't understand why this is happening.
>> >
>> > Do you know any way to work around this problem?
>> >
>> > Thanks in advance,
>> > Vincenzo
>> >
>> > --
>> > Vincenzo D'Amore
>> >
>>
>
>
>
> --
> Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi Amrit,

thanks for your help.

I know that only 5/10% of documents in the collection have more than one
value for the field I was trying to group by.

So there isn't a particular memory usage in this case. Do you know if there
is any other counter-indication I have to be aware of?

I was thinking to avoid this problem hacking the source code and deploy a
personalised version of Solr.

Best regards,
Vincenzo



On Mon, Feb 26, 2018 at 3:22 PM, Amrit Sarkar <sarkaramr...@gmail.com>
wrote:

> Vincenzo,
>
> As I read the source code;  SchemaField.java
>
> /**
>  * Sanity checks that the properties of this field type are plausible
>  * for a field that may be used to get a FieldCacheSource, throwing
>  * an appropriate exception (including the field name) if it is not.
>  * FieldType subclasses can choose to call this method in their
>  * getValueSource implementation
>  * @see FieldType#getValueSource
>  */
> public void checkFieldCacheSource() throws SolrException {
>   if ( multiValued() ) {
> throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>     "can not use FieldCache on multivalued field: "
> + getName());
>   }
>   if (! hasDocValues() ) {
> if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
>   "can not use FieldCache on a field w/o
> docValues unless it is indexed and supports Uninversion: "
>   + getName());
> }
>   }
> }
>
> Seems like FieldCache are not allowed to un-invert values for
> multi-valued fields.
>
> I can suspect the reason, multiple values will eat up more memory? Not
> sure, someone else can weigh in.
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com>
> wrote:
>
> > Hi,
> >
> > while trying to run a group query on a multivalue field I received this
> > error:
> >
> > can not use FieldCache on multivalued field:
> >
> > 
> > 
> >
> > 
> >   true
> >   400
> >   4
> > 
> > 
> >   
> > org.apache.solr.common.SolrException
> > org.apache.solr.common.
> > SolrException
> >   
> >   can not use FieldCache on multivalued field:
> > categoryLevels
> >   400
> > 
> > 
> >
> > I don't understand why this is happening.
> >
> > Do you know any way to work around this problem?
> >
> > Thanks in advance,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> >
>



-- 
Vincenzo D'Amore


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Amrit Sarkar
Vincenzo,

As I read the source code;  SchemaField.java

/**
 * Sanity checks that the properties of this field type are plausible
 * for a field that may be used to get a FieldCacheSource, throwing
 * an appropriate exception (including the field name) if it is not.
 * FieldType subclasses can choose to call this method in their
 * getValueSource implementation
 * @see FieldType#getValueSource
 */
public void checkFieldCacheSource() throws SolrException {
  if ( multiValued() ) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
"can not use FieldCache on multivalued field: "
+ getName());
  }
  if (! hasDocValues() ) {
if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
  "can not use FieldCache on a field w/o
docValues unless it is indexed and supports Uninversion: "
  + getName());
}
  }
}

Seems like FieldCache are not allowed to un-invert values for
multi-valued fields.

I can suspect the reason, multiple values will eat up more memory? Not
sure, someone else can weigh in.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore <v.dam...@gmail.com>
wrote:

> Hi,
>
> while trying to run a group query on a multivalue field I received this
> error:
>
> can not use FieldCache on multivalued field:
>
> 
> 
>
> 
>   true
>   400
>   4
> 
> 
>   
> org.apache.solr.common.SolrException
> org.apache.solr.common.
> SolrException
>   
>   can not use FieldCache on multivalued field:
> categoryLevels
>   400
> 
> 
>
> I don't understand why this is happening.
>
> Do you know any way to work around this problem?
>
> Thanks in advance,
> Vincenzo
>
> --
> Vincenzo D'Amore
>


Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Vincenzo D'Amore
Hi,

while trying to run a group query on a multivalue field I received this
error:

can not use FieldCache on multivalued field:





  true
  400
  4


  
org.apache.solr.common.SolrException
org.apache.solr.common.SolrException
  
  can not use FieldCache on multivalued field:
categoryLevels
  400



I don't understand why this is happening.

Do you know any way to work around this problem?

Thanks in advance,
Vincenzo

-- 
Vincenzo D'Amore


Re: Title Search scoring issues with multivalued field & norm

2018-02-04 Thread Sravan Kumar
Using edismax with different fields for each title will affect the final
scores if the tie paramter is non-zero.

Can we create separate document for each title? The uniqueness won't be for
movie_id but for each title. In this manner, even while using edismax, the
other titles won't affect the score.

Any other way to handle norms in multivalued field?

On Thu, Feb 1, 2018 at 12:24 PM, Sravan Kumar <sra...@caavo.com> wrote:

> @Walter: Perhaps you are right on not to consider stemming. Instead fuzzy
> search will cover these along with the misspellings.
>
> In case of symbols, we want the titles matching the symbols ranked higher
> than the others. Perhaps we can use this field only for boosting.
>
> Certain movies have around 4-6 different aliases based on what our source
> gives and we do not really know what is the max. Is there no other way from
> lucene/solr to use a multivalued field?
>
>
> On Thu, Feb 1, 2018 at 11:06 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> I was the first search engineer at Netflix and moved their search from a
>> home-grown engine to Solr. It worked very well with a single title field
>> and aliases.
>>
>> I think your schema is too complicated for movie search.
>>
>> Stemming is not useful. It doesn’t help search and it can hurt. You don’t
>> want the movie “Saw” to match the query “see”.
>>
>> When is it useful to search with symbols? Remove the punctuation.
>>
>> The only movie titles with symbols that caused any challenge were:
>>
>> * Frost/Nixon
>> * .hack//Sign
>> * +/-
>>
>> For the first two, removing punctuation worked fine. For the last one, I
>> hardcoded a translation to “plus/minus” before indexing or querying.
>>
>> Query completion made a huge difference, taking our clickthrough rate
>> from 0.45 to 0.55.
>>
>> Later, we added fuzzy search to handle misspellings.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote:
>> >
>> > @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents.
>> This
>> > is done through the fieldnorm component in the class. The issue is when
>> the
>> > field is multivalued. Consider the field has two string each of 4
>> tokens.
>> > The fieldNorm from the lucene TFIDFSimilarity class considers the total
>> sum
>> > of these two values i.e 8 for normalizing instead of 4. Hence, the
>> ranking
>> > is distorted.
>> > Regarding the search evaluation, we do have a curated set.
>> >
>> >
>> > On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote:
>> >
>> >> For smaller length documents TFIDFSimilarity will weight towards
>> shorter
>> >> documents.  Another way to say this, if your documents are 5-10 terms,
>> the
>> >> 5 terms are going to win.
>> >> You might think about having per token, or token pair, weight.  I
>> would be
>> >> surprised if there was not something similar out there.  This is a
>> common
>> >> issue with any short text.
>> >> I guess I would think of this as TFICF, where the CF is the corpus
>> >> frequency. You also might want to weight inversely proportional to the
>> age
>> >> of the title, older are less important.  This is assuming people are
>> doing
>> >> searches within some time cluster, newer is more likely.
>> >>
>> >> For some obvious advice, things you probably already know.  This kind
>> of
>> >> search needs some hard measurement to begin to know how to tune it.
>> You
>> >> need to find a reasonable annotated representation.  So, if you took
>> the
>> >> previous months searches where there is a chain of successive
>> searches.  If
>> >> you weighted things differently would you shorten the length of the
>> chain.
>> >> Can you get the click throughs to happen sooner.
>> >>
>> >> Anyway, just my 2 cents
>> >>
>> >>
>> >> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com>
>> wrote:
>> >>
>> >>>
>> >>> @Walter: We have 6 fields declared in schema.xml for title each with
>> >>> different type of analyzer. One without processing symbols, other
>> stemmed
>> >>> and other removing  symbols, etc. So, if we have separate fields for

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Sravan Kumar
@Walter: Perhaps you are right on not to consider stemming. Instead fuzzy
search will cover these along with the misspellings.

In case of symbols, we want the titles matching the symbols ranked higher
than the others. Perhaps we can use this field only for boosting.

Certain movies have around 4-6 different aliases based on what our source
gives and we do not really know what is the max. Is there no other way from
lucene/solr to use a multivalued field?


On Thu, Feb 1, 2018 at 11:06 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I was the first search engineer at Netflix and moved their search from a
> home-grown engine to Solr. It worked very well with a single title field
> and aliases.
>
> I think your schema is too complicated for movie search.
>
> Stemming is not useful. It doesn’t help search and it can hurt. You don’t
> want the movie “Saw” to match the query “see”.
>
> When is it useful to search with symbols? Remove the punctuation.
>
> The only movie titles with symbols that caused any challenge were:
>
> * Frost/Nixon
> * .hack//Sign
> * +/-
>
> For the first two, removing punctuation worked fine. For the last one, I
> hardcoded a translation to “plus/minus” before indexing or querying.
>
> Query completion made a huge difference, taking our clickthrough rate from
> 0.45 to 0.55.
>
> Later, we added fuzzy search to handle misspellings.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote:
> >
> > @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents.
> This
> > is done through the fieldnorm component in the class. The issue is when
> the
> > field is multivalued. Consider the field has two string each of 4 tokens.
> > The fieldNorm from the lucene TFIDFSimilarity class considers the total
> sum
> > of these two values i.e 8 for normalizing instead of 4. Hence, the
> ranking
> > is distorted.
> > Regarding the search evaluation, we do have a curated set.
> >
> >
> > On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote:
> >
> >> For smaller length documents TFIDFSimilarity will weight towards shorter
> >> documents.  Another way to say this, if your documents are 5-10 terms,
> the
> >> 5 terms are going to win.
> >> You might think about having per token, or token pair, weight.  I would
> be
> >> surprised if there was not something similar out there.  This is a
> common
> >> issue with any short text.
> >> I guess I would think of this as TFICF, where the CF is the corpus
> >> frequency. You also might want to weight inversely proportional to the
> age
> >> of the title, older are less important.  This is assuming people are
> doing
> >> searches within some time cluster, newer is more likely.
> >>
> >> For some obvious advice, things you probably already know.  This kind of
> >> search needs some hard measurement to begin to know how to tune it.  You
> >> need to find a reasonable annotated representation.  So, if you took the
> >> previous months searches where there is a chain of successive
> searches.  If
> >> you weighted things differently would you shorten the length of the
> chain.
> >> Can you get the click throughs to happen sooner.
> >>
> >> Anyway, just my 2 cents
> >>
> >>
> >> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote:
> >>
> >>>
> >>> @Walter: We have 6 fields declared in schema.xml for title each with
> >>> different type of analyzer. One without processing symbols, other
> stemmed
> >>> and other removing  symbols, etc. So, if we have separate fields for
> each
> >>> alias it will be that many times the number of final fields declared in
> >>> schema.xml. And we exactly do not know what is the maximum number of
> >>> aliases a movie can have.
> >>> @Walter: I will try this but isn’t there any other way  where I can
> >> tweak ?
> >>>
> >>> @eric: will try this. But it will work only for exact matches.
> >>>
> >>>
> >>>> On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Or use a boost for the phrase, something like
> >>>> "beauty and the beast"^5
> >>>>
> >>>>> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood 

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Walter Underwood
I was the first search engineer at Netflix and moved their search from a 
home-grown engine to Solr. It worked very well with a single title field and 
aliases.

I think your schema is too complicated for movie search.

Stemming is not useful. It doesn’t help search and it can hurt. You don’t want 
the movie “Saw” to match the query “see”.

When is it useful to search with symbols? Remove the punctuation.

The only movie titles with symbols that caused any challenge were:

* Frost/Nixon
* .hack//Sign
* +/-

For the first two, removing punctuation worked fine. For the last one, I 
hardcoded a translation to “plus/minus” before indexing or querying.

Query completion made a huge difference, taking our clickthrough rate from 0.45 
to 0.55.

Later, we added fuzzy search to handle misspellings.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 31, 2018, at 8:54 PM, Sravan Kumar <sra...@caavo.com> wrote:
> 
> @Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. This
> is done through the fieldnorm component in the class. The issue is when the
> field is multivalued. Consider the field has two string each of 4 tokens.
> The fieldNorm from the lucene TFIDFSimilarity class considers the total sum
> of these two values i.e 8 for normalizing instead of 4. Hence, the ranking
> is distorted.
> Regarding the search evaluation, we do have a curated set.
> 
> 
> On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote:
> 
>> For smaller length documents TFIDFSimilarity will weight towards shorter
>> documents.  Another way to say this, if your documents are 5-10 terms, the
>> 5 terms are going to win.
>> You might think about having per token, or token pair, weight.  I would be
>> surprised if there was not something similar out there.  This is a common
>> issue with any short text.
>> I guess I would think of this as TFICF, where the CF is the corpus
>> frequency. You also might want to weight inversely proportional to the age
>> of the title, older are less important.  This is assuming people are doing
>> searches within some time cluster, newer is more likely.
>> 
>> For some obvious advice, things you probably already know.  This kind of
>> search needs some hard measurement to begin to know how to tune it.  You
>> need to find a reasonable annotated representation.  So, if you took the
>> previous months searches where there is a chain of successive searches.  If
>> you weighted things differently would you shorten the length of the chain.
>> Can you get the click throughs to happen sooner.
>> 
>> Anyway, just my 2 cents
>> 
>> 
>> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote:
>> 
>>> 
>>> @Walter: We have 6 fields declared in schema.xml for title each with
>>> different type of analyzer. One without processing symbols, other stemmed
>>> and other removing  symbols, etc. So, if we have separate fields for each
>>> alias it will be that many times the number of final fields declared in
>>> schema.xml. And we exactly do not know what is the maximum number of
>>> aliases a movie can have.
>>> @Walter: I will try this but isn’t there any other way  where I can
>> tweak ?
>>> 
>>> @eric: will try this. But it will work only for exact matches.
>>> 
>>> 
>>>> On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>>> 
>>>> Or use a boost for the phrase, something like
>>>> "beauty and the beast"^5
>>>> 
>>>>> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <
>>> wun...@wunderwood.org> wrote:
>>>>> You can use a separate field for title aliases. That is what I did for
>>> Netflix search.
>>>>> 
>>>>> Why disable idf? Disabling tf for titles can be a good idea, for
>>> example the movie “New York, New York” is not twice as much about New
>> York
>>> as some other film that just lists it once.
>>>>> 
>>>>> Also, consider using a popularity score as a boost.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wun...@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> We are using solr for our movie title search.
>>>>>> 
>>>>>> 
>>>>>

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Sravan Kumar
@Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. This
is done through the fieldnorm component in the class. The issue is when the
field is multivalued. Consider the field has two string each of 4 tokens.
The fieldNorm from the lucene TFIDFSimilarity class considers the total sum
of these two values i.e 8 for normalizing instead of 4. Hence, the ranking
is distorted.
Regarding the search evaluation, we do have a curated set.


On Thu, Feb 1, 2018 at 9:18 AM, Tim Casey <tca...@gmail.com> wrote:

> For smaller length documents TFIDFSimilarity will weight towards shorter
> documents.  Another way to say this, if your documents are 5-10 terms, the
> 5 terms are going to win.
> You might think about having per token, or token pair, weight.  I would be
> surprised if there was not something similar out there.  This is a common
> issue with any short text.
> I guess I would think of this as TFICF, where the CF is the corpus
> frequency. You also might want to weight inversely proportional to the age
> of the title, older are less important.  This is assuming people are doing
> searches within some time cluster, newer is more likely.
>
> For some obvious advice, things you probably already know.  This kind of
> search needs some hard measurement to begin to know how to tune it.  You
> need to find a reasonable annotated representation.  So, if you took the
> previous months searches where there is a chain of successive searches.  If
> you weighted things differently would you shorten the length of the chain.
> Can you get the click throughs to happen sooner.
>
> Anyway, just my 2 cents
>
>
> On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote:
>
> >
> > @Walter: We have 6 fields declared in schema.xml for title each with
> > different type of analyzer. One without processing symbols, other stemmed
> > and other removing  symbols, etc. So, if we have separate fields for each
> > alias it will be that many times the number of final fields declared in
> > schema.xml. And we exactly do not know what is the maximum number of
> > aliases a movie can have.
> > @Walter: I will try this but isn’t there any other way  where I can
> tweak ?
> >
> > @eric: will try this. But it will work only for exact matches.
> >
> >
> > > On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> > >
> > > Or use a boost for the phrase, something like
> > > "beauty and the beast"^5
> > >
> > >> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <
> > wun...@wunderwood.org> wrote:
> > >> You can use a separate field for title aliases. That is what I did for
> > Netflix search.
> > >>
> > >> Why disable idf? Disabling tf for titles can be a good idea, for
> > example the movie “New York, New York” is not twice as much about New
> York
> > as some other film that just lists it once.
> > >>
> > >> Also, consider using a popularity score as a boost.
> > >>
> > >> wunder
> > >> Walter Underwood
> > >> wun...@wunderwood.org
> > >> http://observer.wunderwood.org/  (my blog)
> > >>
> > >>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
> > >>>
> > >>> Hi,
> > >>> We are using solr for our movie title search.
> > >>>
> > >>>
> > >>> As it is "title search", this should be treated different than the
> > normal
> > >>> document search.
> > >>> Hence, we use a modified version of TFIDFSimilarity with the
> following
> > >>> changes.
> > >>> -  disabled TF & IDF and will only have 1 as value.
> > >>> -  disabled norms by specifying omitNorms as true for all the fields.
> > >>>
> > >>> There are 6 fields with different analyzers and we make use of
> > different
> > >>> weights in edismax's qf & pf parameters to match tokens & boost
> > phrases.
> > >>>
> > >>> But, movies could have aliases and have multiple titles. So, we made
> > the
> > >>> fields multivalued.
> > >>>
> > >>> Now, consider the following four documents
> > >>> 1>  "Beauty and the Beast"
> > >>> 2>  "The Real Beauty and the Beast"
> > >>> 3>  "Beauty and the Beast", "La bella y la bestia"
> > >>> 4>  "Beauty and the Beast"
> > >&g

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Tim Casey
For smaller length documents TFIDFSimilarity will weight towards shorter
documents.  Another way to say this, if your documents are 5-10 terms, the
5 terms are going to win.
You might think about having per token, or token pair, weight.  I would be
surprised if there was not something similar out there.  This is a common
issue with any short text.
I guess I would think of this as TFICF, where the CF is the corpus
frequency. You also might want to weight inversely proportional to the age
of the title, older are less important.  This is assuming people are doing
searches within some time cluster, newer is more likely.

For some obvious advice, things you probably already know.  This kind of
search needs some hard measurement to begin to know how to tune it.  You
need to find a reasonable annotated representation.  So, if you took the
previous months searches where there is a chain of successive searches.  If
you weighted things differently would you shorten the length of the chain.
Can you get the click throughs to happen sooner.

Anyway, just my 2 cents


On Wed, Jan 31, 2018 at 6:38 PM, Sravan Kumar <sra...@caavo.com> wrote:

>
> @Walter: We have 6 fields declared in schema.xml for title each with
> different type of analyzer. One without processing symbols, other stemmed
> and other removing  symbols, etc. So, if we have separate fields for each
> alias it will be that many times the number of final fields declared in
> schema.xml. And we exactly do not know what is the maximum number of
> aliases a movie can have.
> @Walter: I will try this but isn’t there any other way  where I can tweak ?
>
> @eric: will try this. But it will work only for exact matches.
>
>
> > On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > Or use a boost for the phrase, something like
> > "beauty and the beast"^5
> >
> >> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <
> wun...@wunderwood.org> wrote:
> >> You can use a separate field for title aliases. That is what I did for
> Netflix search.
> >>
> >> Why disable idf? Disabling tf for titles can be a good idea, for
> example the movie “New York, New York” is not twice as much about New York
> as some other film that just lists it once.
> >>
> >> Also, consider using a popularity score as a boost.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
> >>>
> >>> Hi,
> >>> We are using solr for our movie title search.
> >>>
> >>>
> >>> As it is "title search", this should be treated different than the
> normal
> >>> document search.
> >>> Hence, we use a modified version of TFIDFSimilarity with the following
> >>> changes.
> >>> -  disabled TF & IDF and will only have 1 as value.
> >>> -  disabled norms by specifying omitNorms as true for all the fields.
> >>>
> >>> There are 6 fields with different analyzers and we make use of
> different
> >>> weights in edismax's qf & pf parameters to match tokens & boost
> phrases.
> >>>
> >>> But, movies could have aliases and have multiple titles. So, we made
> the
> >>> fields multivalued.
> >>>
> >>> Now, consider the following four documents
> >>> 1>  "Beauty and the Beast"
> >>> 2>  "The Real Beauty and the Beast"
> >>> 3>  "Beauty and the Beast", "La bella y la bestia"
> >>> 4>  "Beauty and the Beast"
> >>>
> >>> Note: Document 3 has two titles in it.
> >>>
> >>> So, for a query "Beauty and the Beast" and with the above
> configuration all
> >>> the documents receive same score. But 1,3,4 should have got same score
> and
> >>> document 2 lesser than others.
> >>>
> >>> To solve this, we followed what is suggested in the following thread:
> >>> http://lucene.472066.n3.nabble.com/Influencing-scores-
> on-values-in-multiValue-fields-td1791651.html
> >>>
> >>> Now, the fields which are used to boost are made to use Norms. And for
> >>> matching norms are disabled. This is to make sure that exact & near
> exact
> >>> matches are rewarded.
> >>>
> >>> But, for the same query, we get the following results.
> >>> query: "Beauty & the Beast"
> >>> Search Results:
> >>> 1>  "Beauty and the Beast"
> >>> 4>  "Beauty and the Beast"
> >>> 2>  "The Real Beauty and the Beast"
> >>> 3>  "Beauty and the Beast", "La bella y la bestia"
> >>>
> >>> Clearly, the changes have solved only a part of the problem. The
> document 3
> >>> should be ranked/scored higher than document 2.
> >>>
> >>> This is because lucene considers the total field length across all the
> >>> values in a multivalued field for normalization.
> >>>
> >>> How do we handle this scenario and make sure that in multivalued
> fields the
> >>> normalization is taken care of?
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Sravan
> >>
>


Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Sravan Kumar

@Walter: We have 6 fields declared in schema.xml for title each with different 
type of analyzer. One without processing symbols, other stemmed and other 
removing  symbols, etc. So, if we have separate fields for each alias it will 
be that many times the number of final fields declared in schema.xml. And we 
exactly do not know what is the maximum number of aliases a movie can have. 
@Walter: I will try this but isn’t there any other way  where I can tweak ?

@eric: will try this. But it will work only for exact matches. 


> On Jan 31, 2018, at 10:39 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Or use a boost for the phrase, something like
> "beauty and the beast"^5
> 
>> On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <wun...@wunderwood.org> 
>> wrote:
>> You can use a separate field for title aliases. That is what I did for 
>> Netflix search.
>> 
>> Why disable idf? Disabling tf for titles can be a good idea, for example the 
>> movie “New York, New York” is not twice as much about New York as some other 
>> film that just lists it once.
>> 
>> Also, consider using a popularity score as a boost.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
>>> 
>>> Hi,
>>> We are using solr for our movie title search.
>>> 
>>> 
>>> As it is "title search", this should be treated different than the normal
>>> document search.
>>> Hence, we use a modified version of TFIDFSimilarity with the following
>>> changes.
>>> -  disabled TF & IDF and will only have 1 as value.
>>> -  disabled norms by specifying omitNorms as true for all the fields.
>>> 
>>> There are 6 fields with different analyzers and we make use of different
>>> weights in edismax's qf & pf parameters to match tokens & boost phrases.
>>> 
>>> But, movies could have aliases and have multiple titles. So, we made the
>>> fields multivalued.
>>> 
>>> Now, consider the following four documents
>>> 1>  "Beauty and the Beast"
>>> 2>  "The Real Beauty and the Beast"
>>> 3>  "Beauty and the Beast", "La bella y la bestia"
>>> 4>  "Beauty and the Beast"
>>> 
>>> Note: Document 3 has two titles in it.
>>> 
>>> So, for a query "Beauty and the Beast" and with the above configuration all
>>> the documents receive same score. But 1,3,4 should have got same score and
>>> document 2 lesser than others.
>>> 
>>> To solve this, we followed what is suggested in the following thread:
>>> http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html
>>> 
>>> Now, the fields which are used to boost are made to use Norms. And for
>>> matching norms are disabled. This is to make sure that exact & near exact
>>> matches are rewarded.
>>> 
>>> But, for the same query, we get the following results.
>>> query: "Beauty & the Beast"
>>> Search Results:
>>> 1>  "Beauty and the Beast"
>>> 4>  "Beauty and the Beast"
>>> 2>  "The Real Beauty and the Beast"
>>> 3>  "Beauty and the Beast", "La bella y la bestia"
>>> 
>>> Clearly, the changes have solved only a part of the problem. The document 3
>>> should be ranked/scored higher than document 2.
>>> 
>>> This is because lucene considers the total field length across all the
>>> values in a multivalued field for normalization.
>>> 
>>> How do we handle this scenario and make sure that in multivalued fields the
>>> normalization is taken care of?
>>> 
>>> 
>>> --
>>> Regards,
>>> Sravan
>> 


Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Erick Erickson
Or use a boost for the phrase, something like
"beauty and the beast"^5

On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> You can use a separate field for title aliases. That is what I did for 
> Netflix search.
>
> Why disable idf? Disabling tf for titles can be a good idea, for example the 
> movie “New York, New York” is not twice as much about New York as some other 
> film that just lists it once.
>
> Also, consider using a popularity score as a boost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
>>
>> Hi,
>> We are using solr for our movie title search.
>>
>>
>> As it is "title search", this should be treated different than the normal
>> document search.
>> Hence, we use a modified version of TFIDFSimilarity with the following
>> changes.
>> -  disabled TF & IDF and will only have 1 as value.
>> -  disabled norms by specifying omitNorms as true for all the fields.
>>
>> There are 6 fields with different analyzers and we make use of different
>> weights in edismax's qf & pf parameters to match tokens & boost phrases.
>>
>> But, movies could have aliases and have multiple titles. So, we made the
>> fields multivalued.
>>
>> Now, consider the following four documents
>> 1>  "Beauty and the Beast"
>> 2>  "The Real Beauty and the Beast"
>> 3>  "Beauty and the Beast", "La bella y la bestia"
>> 4>  "Beauty and the Beast"
>>
>> Note: Document 3 has two titles in it.
>>
>> So, for a query "Beauty and the Beast" and with the above configuration all
>> the documents receive same score. But 1,3,4 should have got same score and
>> document 2 lesser than others.
>>
>> To solve this, we followed what is suggested in the following thread:
>> http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html
>>
>> Now, the fields which are used to boost are made to use Norms. And for
>> matching norms are disabled. This is to make sure that exact & near exact
>> matches are rewarded.
>>
>> But, for the same query, we get the following results.
>> query: "Beauty & the Beast"
>> Search Results:
>> 1>  "Beauty and the Beast"
>> 4>  "Beauty and the Beast"
>> 2>  "The Real Beauty and the Beast"
>> 3>  "Beauty and the Beast", "La bella y la bestia"
>>
>> Clearly, the changes have solved only a part of the problem. The document 3
>> should be ranked/scored higher than document 2.
>>
>> This is because lucene considers the total field length across all the
>> values in a multivalued field for normalization.
>>
>> How do we handle this scenario and make sure that in multivalued fields the
>> normalization is taken care of?
>>
>>
>> --
>> Regards,
>> Sravan
>


Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Walter Underwood
You can use a separate field for title aliases. That is what I did for Netflix 
search.

Why disable idf? Disabling tf for titles can be a good idea, for example the 
movie “New York, New York” is not twice as much about New York as some other 
film that just lists it once.

Also, consider using a popularity score as a boost.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote:
> 
> Hi,
> We are using solr for our movie title search.
> 
> 
> As it is "title search", this should be treated different than the normal
> document search.
> Hence, we use a modified version of TFIDFSimilarity with the following
> changes.
> -  disabled TF & IDF and will only have 1 as value.
> -  disabled norms by specifying omitNorms as true for all the fields.
> 
> There are 6 fields with different analyzers and we make use of different
> weights in edismax's qf & pf parameters to match tokens & boost phrases.
> 
> But, movies could have aliases and have multiple titles. So, we made the
> fields multivalued.
> 
> Now, consider the following four documents
> 1>  "Beauty and the Beast"
> 2>  "The Real Beauty and the Beast"
> 3>  "Beauty and the Beast", "La bella y la bestia"
> 4>  "Beauty and the Beast"
> 
> Note: Document 3 has two titles in it.
> 
> So, for a query "Beauty and the Beast" and with the above configuration all
> the documents receive same score. But 1,3,4 should have got same score and
> document 2 lesser than others.
> 
> To solve this, we followed what is suggested in the following thread:
> http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html
> 
> Now, the fields which are used to boost are made to use Norms. And for
> matching norms are disabled. This is to make sure that exact & near exact
> matches are rewarded.
> 
> But, for the same query, we get the following results.
> query: "Beauty & the Beast"
> Search Results:
> 1>  "Beauty and the Beast"
> 4>  "Beauty and the Beast"
> 2>  "The Real Beauty and the Beast"
> 3>  "Beauty and the Beast", "La bella y la bestia"
> 
> Clearly, the changes have solved only a part of the problem. The document 3
> should be ranked/scored higher than document 2.
> 
> This is because lucene considers the total field length across all the
> values in a multivalued field for normalization.
> 
> How do we handle this scenario and make sure that in multivalued fields the
> normalization is taken care of?
> 
> 
> -- 
> Regards,
> Sravan



Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Sravan Kumar
Hi,
We are using solr for our movie title search.


As it is "title search", this should be treated different than the normal
document search.
Hence, we use a modified version of TFIDFSimilarity with the following
changes.
-  disabled TF & IDF and will only have 1 as value.
-  disabled norms by specifying omitNorms as true for all the fields.

There are 6 fields with different analyzers and we make use of different
weights in edismax's qf & pf parameters to match tokens & boost phrases.

But, movies could have aliases and have multiple titles. So, we made the
fields multivalued.

Now, consider the following four documents
1>  "Beauty and the Beast"
2>  "The Real Beauty and the Beast"
3>  "Beauty and the Beast", "La bella y la bestia"
4>  "Beauty and the Beast"

Note: Document 3 has two titles in it.

So, for a query "Beauty and the Beast" and with the above configuration all
the documents receive same score. But 1,3,4 should have got same score and
document 2 lesser than others.

To solve this, we followed what is suggested in the following thread:
http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html

Now, the fields which are used to boost are made to use Norms. And for
matching norms are disabled. This is to make sure that exact & near exact
matches are rewarded.

But, for the same query, we get the following results.
query: "Beauty & the Beast"
Search Results:
1>  "Beauty and the Beast"
4>  "Beauty and the Beast"
2>  "The Real Beauty and the Beast"
3>  "Beauty and the Beast", "La bella y la bestia"

Clearly, the changes have solved only a part of the problem. The document 3
should be ranked/scored higher than document 2.

This is because lucene considers the total field length across all the
values in a multivalued field for normalization.

How do we handle this scenario and make sure that in multivalued fields the
normalization is taken care of?


-- 
Regards,
Sravan


Re: hashJoin - Multivalued field

2018-01-23 Thread Kojo
I´am sorry, everything is working fine!

2018-01-23 16:44 GMT-02:00 Kojo :

> I am trying to solve one problem, exactly as the case described here:
>
> http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on-
> multi-valued-field-td4353794.html
>
> I cannot accomplish that on Solr 6.6, my streaming expression returns
> nothing:
>
>
> hashJoin(
>   search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
> sort="p_number asc"),
>   hashed=cartesianProduct(
>   search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
> *]", fl="processes, id", sort="id asc"),
>   processes,
>   ),
>   on="p_number=processes"
> )
>
> Both fields are of type string.
>
>
> One strange thing is that if I filter the first query using fq, some
> results appear.
>
> hashJoin(
>   search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
> sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"),
>   hashed=cartesianProduct(
>   search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
> *]", fl="processes, id", sort="id asc"),
>   processes,
>   ),
>   on="p_number=processes"
> )
>
>
>
> {
>   "result-set": {
> "docs": [
>   {
> "processes": "00/01011-6",
> "p_number": "00/01011-6",
> "id": "43256"
>   },
>   {
> "processes": "97/13133-4",
> "p_number": "97/13133-4",
> "id": "43256"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 343
>   }
> ]
>   }
> }
>
>
> Can you help me, please?
>


hashJoin - Multivalued field

2018-01-23 Thread Kojo
I am trying to solve one problem, exactly as the case described here:

http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on-multi-valued-field-td4353794.html

I cannot accomplish that on Solr 6.6, my streaming expression returns
nothing:


hashJoin(
  search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
sort="p_number asc"),
  hashed=cartesianProduct(
  search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
*]", fl="processes, id", sort="id asc"),
  processes,
  ),
  on="p_number=processes"
)

Both fields are of type string.


One strange thing is that if I filter the first query using fq, some
results appear.

hashJoin(
  search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"),
  hashed=cartesianProduct(
  search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
*]", fl="processes, id", sort="id asc"),
  processes,
  ),
  on="p_number=processes"
)



{
  "result-set": {
"docs": [
  {
"processes": "00/01011-6",
"p_number": "00/01011-6",
"id": "43256"
  },
  {
"processes": "97/13133-4",
"p_number": "97/13133-4",
"id": "43256"
  },
  {
"EOF": true,
"RESPONSE_TIME": 343
  }
]
  }
}


Can you help me, please?


Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Shawn Heisey

On 12/20/2017 6:09 PM, S G wrote:

One of our Solr users is trying to set docValues="true" for multivalued
string fields and boolean-type fields.

I am not sure what the performance impact of that would be.
Can docValues negatively affect performance in any way?


Adding to what Emir said:

The docValues data will be the same as stored data, but it will be 
uncompressed, and written in such a way that Lucene can read all values 
for one field simply by reading data off the disk, no computations or 
seeks within the file are required.


If the field is indexed and stored, then docValues will not be accessed 
during normal queries unless there is a sort parameter or a facet 
parameter that mentions a field with docValues.  If present, docValues 
data will be used for sorting and facets, otherwise indexed values will 
be used.  Usually, sorting or facets with docValues uses less memory and 
performs faster than the same operation without docValues.  If the 
machine has insufficient system RAM to effectively cache index data, the 
performance may not improve.


When docValues is added to a field, a complete reindex is required, or 
Solr will not work properly.


If a field that already contains docValues has a change in the setting 
for multiValued, then that will require a reindex, but you must also 
take another step -- completely wiping the index directory before 
reloading or restarting.  If the wipe doesn't happen in this situation, 
then the core is going to completely break and throw exceptions.


Thanks,
Shawn


Re: DocValues for multivalued strings and boolean fields

2017-12-21 Thread Emir Arnautović
Hi SG,
Doc values is another file to write so indexing performances will suffer. In 
theory, query performances will suffer because alternative is in memory 
structure (fieldCache and fieldValueCache). In practice, it will not because in 
memory structure requires larger heap, requires time/resources to build  after 
each commit or on first query and it is likely that doc values’ files will be 
cached by OS so it will not be “disk speed”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Dec 2017, at 02:09, S G <sg.online.em...@gmail.com> wrote:
> 
> Hi,
> 
> One of our Solr users is trying to set docValues="true" for multivalued
> string fields and boolean-type fields.
> 
> I am not sure what the performance impact of that would be.
> Can docValues negatively affect performance in any way?
> 
> We are using Solr 6.5.1 and also experimenting with 7.1.0
> 
> Thanks
> SG



DocValues for multivalued strings and boolean fields

2017-12-20 Thread S G
Hi,

One of our Solr users is trying to set docValues="true" for multivalued
string fields and boolean-type fields.

I am not sure what the performance impact of that would be.
Can docValues negatively affect performance in any way?

We are using Solr 6.5.1 and also experimenting with 7.1.0

Thanks
SG


Re: Schemaless detecting multivalued fields

2017-10-19 Thread Erick Erickson
Also, if you _know_ certain fields should be defined you can define
them explicitly and let schemaless figure out all the others.

That said, eventually you're going to have to control your schema,
schemaless is _not_ recommended for production systems unless you can
absolutely guarantee the input is in a specific format. And by
"specific format" I mean no field first encountered as, say, an int
later comes through as a float. All date fields are of acceptable
formats, no field first encountered as a single valued field is every
multivalued later etc.

And if you can guarantee that you can create an explicitly defined
schema anyway.

Best,
Erick

On Thu, Oct 19, 2017 at 2:00 AM, Emir Arnautović
<emir.arnauto...@sematext.com> wrote:
> Hi John,
> You should be able to do that with custom update request processor chain and 
> https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
>  
> <https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 19 Oct 2017, at 08:00, John Davis <johndavis925...@gmail.com> wrote:
>>
>> Hi,
>> I know about the schemaless configuration defaulting to multivalued fields
>> of the corresponding type.
>>
>> I was just wondering if there was a way to first detect if the incoming
>> value is list or singleton, and based on it pick the corresponding types.
>> Ideally if the value is an long then use tlong while if it is list of longs
>> then use tlongS.
>>
>> Thanks!
>> John
>


Re: Schemaless detecting multivalued fields

2017-10-19 Thread Emir Arnautović
Hi John,
You should be able to do that with custom update request processor chain and 
https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
 
<https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html>

HTH,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Oct 2017, at 08:00, John Davis <johndavis925...@gmail.com> wrote:
> 
> Hi,
> I know about the schemaless configuration defaulting to multivalued fields
> of the corresponding type.
> 
> I was just wondering if there was a way to first detect if the incoming
> value is list or singleton, and based on it pick the corresponding types.
> Ideally if the value is an long then use tlong while if it is list of longs
> then use tlongS.
> 
> Thanks!
> John



Schemaless detecting multivalued fields

2017-10-19 Thread John Davis
Hi,
I know about the schemaless configuration defaulting to multivalued fields
of the corresponding type.

I was just wondering if there was a way to first detect if the incoming
value is list or singleton, and based on it pick the corresponding types.
Ideally if the value is an long then use tlong while if it is list of longs
then use tlongS.

Thanks!
John


Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Erick Erickson
The key is removing the entire data directory as in

"rm -rf solr_core/data"

with Solr down then restarting Solr. Or create a new core.

It's most probably working on Windows because the schema
was set with multiVauled=false when you indexed your first
document.

Best,
Erick

On Thu, Jul 20, 2017 at 5:16 AM, prashantas <prashanta.t...@gmail.com> wrote:
> I am not running solr in cloud mode.
>
> On Thu, Jul 20, 2017 at 4:40 PM, Shawn Heisey-2 [via Lucene] <
> ml+s472066n4346954...@n3.nabble.com> wrote:
>
>> On 7/20/2017 2:30 AM, prashantas wrote:
>> > I am using solr6.4. In my managed-schema, I have defined my field
>> details.
>> > None of my fields are multiValued. If I set property multiValued=false ,
>> it
>> > works fine in Windows, but in CentOS/RHEL, it does not accept the same
>> and
>> > the field still shows multiValued true in my solr admin UI. Please help
>> me
>> > how can I set multiValued = false  in some fields.
>> > <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png>
>>
>>
>> Is Solr running in cloud mode on either of these systems?
>>
>> Thanks,
>> Shawn
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://lucene.472066.n3.nabble.com/multiValued-false-
>> is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346954.html
>> To unsubscribe from multiValued=false is not working in Solr 6.4 in
>> RHEL/CentOS, click here
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=>
>> .
>> NAML
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
>
> *with regards,Prashanta*
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346967.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am not running solr in cloud mode.

On Thu, Jul 20, 2017 at 4:40 PM, Shawn Heisey-2 [via Lucene] <
ml+s472066n4346954...@n3.nabble.com> wrote:

> On 7/20/2017 2:30 AM, prashantas wrote:
> > I am using solr6.4. In my managed-schema, I have defined my field
> details.
> > None of my fields are multiValued. If I set property multiValued=false ,
> it
> > works fine in Windows, but in CentOS/RHEL, it does not accept the same
> and
> > the field still shows multiValued true in my solr admin UI. Please help
> me
> > how can I set multiValued = false  in some fields.
> > <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png>
>
>
> Is Solr running in cloud mode on either of these systems?
>
> Thanks,
> Shawn
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/multiValued-false-
> is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346954.html
> To unsubscribe from multiValued=false is not working in Solr 6.4 in
> RHEL/CentOS, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 

*with regards,Prashanta*




--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Shawn Heisey
On 7/20/2017 2:30 AM, prashantas wrote:
> I am using solr6.4. In my managed-schema, I have defined my field details.
> None of my fields are multiValued. If I set property multiValued=false , it
> works fine in Windows, but in CentOS/RHEL, it does not accept the same and
> the field still shows multiValued true in my solr admin UI. Please help me
> how can I set multiValued = false  in some fields.
> <http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> 

Is Solr running in cloud mode on either of these systems?

Thanks,
Shawn



Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Amrit Sarkar
By saying:

 I am just adding multiValued=false in the managed-schema file.


Are you modifying in the local filesystem "conf" or going into the core
conf directory and changing there? If you are SolrCloud, you should change
the same on Zookeeper.


Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread alessandro.benedetti
Assuming the service solr service restart does its job, I think the only
thing I would do is to completely remove the data directory content, instead
of just running the delete query.

Bare in mind that when you delete a document in Solr, this is marked as
deleted, but it takes potentially a while until it really leaves the index (
after a successful segment merge).
This could bring to potential conflict in the data structures when documents
of different schemas are in the index.
I don't know if it is your case, but I would double check.



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346945.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am just adding multiValued=false in the managed-schema file.

Then deleting the complete data by running the command   curl
http://localhost:8983/solr/Schools/update?commit=true -d
'*:*'   where 'Schools' is my core name.

Then restart the solr by "service solr restart"
And then import the csv file by executing the command  curl '
http://localhost:8983/solr/Schools/update?commit=true' --data-binary
@tbl_SCHOOLS.csv -H 'Content-type:application/csv'

Please let me know if I am doing anything wrong.

with regards,
Prashanta

On Thu, Jul 20, 2017 at 2:29 PM, alessandro.benedetti [via Lucene] <
ml+s472066n4346941...@n3.nabble.com> wrote:

> I doubt it is an environment problem at all.
> How are you modifying your schema ?
> How you reloading your core/collection ?
> Are you restarting your Solr instance ?
>
> Regards
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/multiValued-false-
> is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346941.html
> To unsubscribe from multiValued=false is not working in Solr 6.4 in
> RHEL/CentOS, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=4346939=cHJhc2hhbnRhLnRlenVAZ21haWwuY29tfDQzNDY5Mzl8LTExMTE5MDU=>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 

*with regards,Prashanta*




--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346943.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread alessandro.benedetti
I doubt it is an environment problem at all.
How are you modifying your schema ?
How you reloading your core/collection ?
Are you restarting your Solr instance ?

Regards



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939p4346941.html
Sent from the Solr - User mailing list archive at Nabble.com.


multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread prashantas
I am using solr6.4. In my managed-schema, I have defined my field details.
None of my fields are multiValued. If I set property multiValued=false , it
works fine in Windows, but in CentOS/RHEL, it does not accept the same and
the field still shows multiValued true in my solr admin UI. Please help me
how can I set multiValued = false  in some fields.
<http://lucene.472066.n3.nabble.com/file/n4346939/multiValued_CentOS.png> 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-false-is-not-working-in-Solr-6-4-in-RHEL-CentOS-tp4346939.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field x is not multivalued and destination for multiple copyFields

2017-06-05 Thread Rick Leir

Nawab,

If you have multivalued=true and you index a document with two or more 
name_* fields, then the name_token will have two or more values, all of 
which will be searchable. I think this is what you want, because names 
can be quite different in different languages. For example the country names


de Deutschland
en Germany
fr Allemagne
es Alemania

or names for 'shoe'
schuh
zapato
cipela

If you have multivalued=false and try to index two or more languages for 
the name, it would cause an exception to be thrown in the indexing 
thread and the document will not be indexed (schema validation will fail).


If you use multivalued=false on a field which is a copyfield 
destination, this is not a common usage pattern and in my mind (using a 
northern idiom) you are "walking on thin ice". Tell us how it goes.

Cheers -- Rick


On 2017-06-05 03:23 AM, Nawab Zada Asad Iqbal wrote:

*Hi, I have a field 'name_token' which gets value via copyFields on
different language-specific fields (e.g. name_en , name_it, name_es, etc.)
If I can ensure that, *



*only one of these language-specific fields will have a value for a given
document, is it ok to ignore this warning:"IndexSchema Field name_token is
not multivalued and destination for multiple copyFields"*


*Also, what will happen if my a record happens to have values in two
language specific name fields (e.g. if a name or word exists in two
languages: name_zh and name_ja ). *



*My understanding is that the value is same anyways, so there is no
drawback, but can it result in an exception?*

*Regards*

*Nawab*





Field x is not multivalued and destination for multiple copyFields

2017-06-05 Thread Nawab Zada Asad Iqbal
*Hi, I have a field 'name_token' which gets value via copyFields on
different language-specific fields (e.g. name_en , name_it, name_es, etc.)
If I can ensure that, *



*only one of these language-specific fields will have a value for a given
document, is it ok to ignore this warning:"IndexSchema Field name_token is
not multivalued and destination for multiple copyFields"*


*Also, what will happen if my a record happens to have values in two
language specific name fields (e.g. if a name or word exists in two
languages: name_zh and name_ja ). *



*My understanding is that the value is same anyways, so there is no
drawback, but can it result in an exception?*

*Regards*

*Nawab*


Re: Grouping by a multivalued field

2017-05-26 Thread Rick Leir
Shacky
Quote "A multivalued field is useful when there are more than one value present 
for the field. An easy example would be tags, there can be multiple tags that 
need to be indexed...". So yes, you are on the right track. Cheers -- Rick
https://stackoverflow.com/questions/5800762/what-is-the-use-of-multivalued-field-type-in-solr


On May 26, 2017 9:45:48 AM EDT, shacky <shack...@gmail.com> wrote:
>Hi,
>I need to create a new collection on my Solr 6.1.0 cluster where every
>row
>is a "content" and every content can belong to one or many categories,
>which are specified in a multivalued field "categories".
>
>In my web app the user can search by categories, and if wanted it can
>even
>group results by category. If it wants to group by category, what about
>the
>contents which belongs to more than one category?
>
>In this case the search results page should show the same content more
>times in different categories. I don't want the web application to
>filter
>and order results because in this case it should ask Solr for every
>rows (I
>know this is not advised for bad performance), so is there a way to let
>Solr make this? For example, repeating the same content in two
>categories
>if a flag is enabled or if I am asking Solr to sort by category?
>
>Thank you very much!
>Bye

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Grouping by a multivalued field

2017-05-26 Thread shacky
Hi,
I need to create a new collection on my Solr 6.1.0 cluster where every row
is a "content" and every content can belong to one or many categories,
which are specified in a multivalued field "categories".

In my web app the user can search by categories, and if wanted it can even
group results by category. If it wants to group by category, what about the
contents which belongs to more than one category?

In this case the search results page should show the same content more
times in different categories. I don't want the web application to filter
and order results because in this case it should ask Solr for every rows (I
know this is not advised for bad performance), so is there a way to let
Solr make this? For example, repeating the same content in two categories
if a flag is enabled or if I am asking Solr to sort by category?

Thank you very much!
Bye


Re: Managed Schema multiValued Predict Problem

2017-04-26 Thread Rick Leir
Lova,
When a search term is "foo*" or similar, you have a multivalue search.

In schema.xml you have for a typical field, an index analysis chain and a query 
analysis chain. In the multivalue case, neither of these chains is followed. 
There is a wiki page which explains what chain gets followed, perhaps someone 
can supply the link. 

To get further with this question, you could show us parts of the schema.xml
Cheers -- Rick

On April 26, 2017 5:28:34 AM EDT, Lova <miandrisoal...@gmail.com> wrote:
>Hello,
>I have this error
>org.apache.solr.common.SolrException: can not use FieldCache on
>multivalued
>field: post_title
>
>I can need specific field as multivalue, it's a bug in my app
>
>what I change in solrconfig.xml please?
>
>Thanks
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Managed-Schema-multiValued-Predict-Problem-tp4324634p4331936.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Managed Schema multiValued Predict Problem

2017-04-26 Thread Lova
Hello,
I have this error
org.apache.solr.common.SolrException: can not use FieldCache on multivalued
field: post_title

I can need specific field as multivalue, it's a bug in my app

what I change in solrconfig.xml please?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Managed-Schema-multiValued-Predict-Problem-tp4324634p4331936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Erick Erickson
So I have a field named "key" that uses KeywordTokenizer and has
multiValued="true" set. A doc like

  val one
  yet another value
  third


My field will have exactly three indexed tokens

val one
yet another value
third

Best,
Erick

On Wed, Apr 12, 2017 at 2:38 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:
> I don't understand the first option, what is each value? Keyword tokenizer 
> emits single token, analogous to string type.
>
>
>
> On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood 
> <wun...@wunderwood.org> wrote:
> Does the KeywordTokenizer make each value into a unitary string or does it 
> take the whole list of values and make that a single string?
>
> I really hope it is the former. I can’t find this in the docs (including 
> JavaDocs).
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)


Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Ahmet Arslan
I don't understand the first option, what is each value? Keyword tokenizer 
emits single token, analogous to string type.



On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood 
 wrote:
Does the KeywordTokenizer make each value into a unitary string or does it take 
the whole list of values and make that a single string?

I really hope it is the former. I can’t find this in the docs (including 
JavaDocs).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Andrea Gazzarini

Hi Wunder,
I think it's the first option: if you have 3 values then the analyzer 
chain is executed three times.


Andrea

On 12/04/17 18:45, Walter Underwood wrote:

Does the KeywordTokenizer make each value into a unitary string or does it take 
the whole list of values and make that a single string?

I really hope it is the former. I can’t find this in the docs (including 
JavaDocs).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)







KeywordTokenizer and multiValued field

2017-04-12 Thread Walter Underwood
Does the KeywordTokenizer make each value into a unitary string or does it take 
the whole list of values and make that a single string?

I really hope it is the former. I can’t find this in the docs (including 
JavaDocs).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Error whan using percentile facet on multivalued fields (again)

2017-04-03 Thread ron visbord
Hi,

I'm using Solr 5.3.1. When trying to do a percentile facet on a multivalued
field I get the following exception -

*org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://host_name:8983/solr/core_name
<http://host_name:8983/solr/core_name>: can not use FieldCache on
multivalued field: attributes.size_num*
* at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)*
* at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)*
* at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)*
* at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)*
* at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)*
* at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)*

The query url is -
http://host_name:8983/solr/core_name/select?json.facet=%7B%22my_facet_name%22:%22percentile(attributes.size_num,70.0,80.0,90.0)%22%7D=0

And the faceted fields schema is -


...


...




I came across this thread from 2015 -
http://grokbase.com/t/lucene/solr-user/157dh11ffx/fieldcache-error-for-multivalued-fields-in-json-facets
In which Yonik says Solr 5.2 doesn't support *sum()* on multivalued fields.

*Is this also the case for percentiles in Solr 5.3.1? **Is there any
solution to this?*
*If not, Is this resolved in a more recent release of Solr?*

Thank you very much,
Ron Visbord


Count Dates Given A Range in a Multivalued Field

2017-03-20 Thread Furkan KAMACI
Hi All,

I have a multivalued date field i.e.:

[2017-02-06T00:00:00Z,2017-02-09T00:00:00Z,2017-03-04T00:00:00Z]

I want to count how many dates exist given a data range within such field.
i.e.

start: 2017-02-01T00:00:00Z
end: 2017-02-28T00:00:00Z

result is 2 (2017-02-06T00:00:00Z and 2017-02-09T00:00:00Z). I want to do
it with JSON Facet API.

How can I do it?


sum multivalued field index with banana

2017-03-16 Thread tkg_cangkul

hi sorry if this a little bit out ouf topic,

i've just started to using banana dashboard. and i want to do summarize 
proccess from data that indexed in solr


can i do sum proccess with banana dashboard when i have some multivalued 
data index on my field?


this is my sample data on solr :

"timestamp_dt":"2016-12-30T15:50:00Z",
"FR":["fr1"],
"EV":"89v",
"RC":[0],
"SF":["SSP"],
"CT":["POST"],
"rb.id":["rb30", "rb30"],
"rb.co":[1,  2],
"rb.lat":[47, 9]

Ok, from the data above, is it possible to summarize the value of 
"rb.co" with EV as a Group By. ?

On my banana dashboard panel, i've try to set something like this :



but there is nothing happen on it.

any suggestion pls ?



Best Regards,

Yuza


Re: Managed Schema multiValued Predict Problem

2017-03-13 Thread Furkan KAMACI
You are right, I mean schemaless mode. I saw that it's your answer ;) I've
edited solrconfig.xml and fixed it. Thanks!

On Mon, Mar 13, 2017 at 5:46 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> There is managed schema, which means it is editable via API, and there
> is 'schemaless' mode that uses that to auto-define the field based on
> the first occurance.
>
> 'schemaless' mode does not know if the field will be multi-valued the
> first time it sees content for that field. So, all the fields created
> automatically are multivalued. You can change the definition or you
> can define the field explicitly using the API or Admin UI.
>
> 'schemaless' is only there really for a quick prototyping with unknown
> content.
>
> Regards,
>Alex.
> P.s. That's my SO answer :-) Glad you found it useful.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 13 March 2017 at 11:15, Furkan KAMACI <furkankam...@gmail.com> wrote:
> > Hi,
> >
> > I generate dummy documents to test Solr 6.4.2. I create a field like that
> > at my test code:
> >
> > int customCount = r.nextInt(500);
> > document.addField("custom_count", customCount);
> >
> > This field is indexed as:
> >
> >   org.apache.solr.schema.TrieLongField
> >
> > and
> >
> > Multivalued.
> >
> > I want to use FieldCache on multivalued field and don't want it to be
> > multivalued. When I check managed-schema I see that:
> >
> >> positionIncrementGap="0" docValues="true" precisionStep="0"/>
> >> positionIncrementGap="0" docValues="true" multiValued="true"
> > precisionStep="0"/>
> >
> > So, it seems that it's predicted as longs instead of long.
> >
> > What is the reason behind that?
> >
> > Kind Regards,
> > Furkan KAMACI
>


Re: Managed Schema multiValued Predict Problem

2017-03-13 Thread Alexandre Rafalovitch
There is managed schema, which means it is editable via API, and there
is 'schemaless' mode that uses that to auto-define the field based on
the first occurance.

'schemaless' mode does not know if the field will be multi-valued the
first time it sees content for that field. So, all the fields created
automatically are multivalued. You can change the definition or you
can define the field explicitly using the API or Admin UI.

'schemaless' is only there really for a quick prototyping with unknown content.

Regards,
   Alex.
P.s. That's my SO answer :-) Glad you found it useful.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 13 March 2017 at 11:15, Furkan KAMACI <furkankam...@gmail.com> wrote:
> Hi,
>
> I generate dummy documents to test Solr 6.4.2. I create a field like that
> at my test code:
>
> int customCount = r.nextInt(500);
> document.addField("custom_count", customCount);
>
> This field is indexed as:
>
>   org.apache.solr.schema.TrieLongField
>
> and
>
> Multivalued.
>
> I want to use FieldCache on multivalued field and don't want it to be
> multivalued. When I check managed-schema I see that:
>
>positionIncrementGap="0" docValues="true" precisionStep="0"/>
>positionIncrementGap="0" docValues="true" multiValued="true"
> precisionStep="0"/>
>
> So, it seems that it's predicted as longs instead of long.
>
> What is the reason behind that?
>
> Kind Regards,
> Furkan KAMACI


  1   2   3   4   5   6   7   8   9   10   >