Re: Avoiding duplicate entry for a multivalued field
add-distinct is similar to add but does contains check before adding the value. In general, performance overhead should be minimal Regards, Munendra S N On Fri, Oct 30, 2020 at 7:29 PM Srinivas Kashyap wrote: > Thanks Munendra, this will really help me. Are there any performance > overhead with this? > > Thanks, > Srinivas > > > From: Munendra S N > Sent: 30 October 2020 19:20 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas, > > For atomic updates, you could use add-distinct operation to avoid > duplicates - > https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html< > https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html> > This operation is available from Solr 7.3 > > Regards, > Munendra S N > > > > On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood <mailto:wun...@wunderwood.org>> > wrote: > > > Since you are already taking the performance hit of atomic updates, > > I doubt you’ll see any impact from field types or update request > > processors. > > The extra cost of atomic updates will be much greater than indexing cost. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org<mailto:wun...@wunderwood.org> > > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my > blog) > > > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap .INVALID<mailto:srini...@bamboorose.com.INVALID>> > > wrote: > > > > > > Thanks Dwane, > > > > > > I have a doubt, according to the java doc, the duplicates still > continue > > to exist in the field. May be during query time, the field returns only > > unique values? Am I right with my assumption? > > > > > > And also, what is the performance overhead for this > UniqueFiled*Factory? > > > > > > Thanks, > > > Srinivas > > > > > > From: Dwane Hall mailto:dwaneh...@hotmail.com>> > > > Sent: 29 October 2020 14:33 > > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > > > Srinivas this is possible by adding an unique field update processor to > > the update processor chain you are using to perform your updates > (/update, > > /update/json, /update/json/docs, .../a_custom_one) > > > > > > The Java Documents explain its use nicely > > > ( > > > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > > > > < > > > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > >>) > > or there are articles on stack overflow addressing this exact problem ( > > > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > > > > < > > > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > > > > >) > > > > > > Thanks, > > > > > > Dwane > > > ____ > > > From: Srinivas Kashyap <mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> > srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>> > > > Sent: Thursday, 29 October 2020 3:49 PM > > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org > <mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>> > < > > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>> > > > Subject: Avoiding duplicate entry for a multivalued field > > > > > > Hello, > > > > > > Say, I have a schema field which is multivalued. Is there a way to > > maintain distinct values for that field though I continue to add > duplicate > > values through atomic update via solrj?
RE: Avoiding duplicate entry for a multivalued field
Thanks Munendra, this will really help me. Are there any performance overhead with this? Thanks, Srinivas From: Munendra S N Sent: 30 October 2020 19:20 To: solr-user@lucene.apache.org Subject: Re: Avoiding duplicate entry for a multivalued field Srinivas, For atomic updates, you could use add-distinct operation to avoid duplicates - https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html> This operation is available from Solr 7.3 Regards, Munendra S N On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood mailto:wun...@wunderwood.org>> wrote: > Since you are already taking the performance hit of atomic updates, > I doubt you’ll see any impact from field types or update request > processors. > The extra cost of atomic updates will be much greater than indexing cost. > > wunder > Walter Underwood > wun...@wunderwood.org<mailto:wun...@wunderwood.org> > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my blog) > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > > mailto:srini...@bamboorose.com.INVALID>> > wrote: > > > > Thanks Dwane, > > > > I have a doubt, according to the java doc, the duplicates still continue > to exist in the field. May be during query time, the field returns only > unique values? Am I right with my assumption? > > > > And also, what is the performance overhead for this UniqueFiled*Factory? > > > > Thanks, > > Srinivas > > > > From: Dwane Hall mailto:dwaneh...@hotmail.com>> > > Sent: 29 October 2020 14:33 > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > Srinivas this is possible by adding an unique field update processor to > the update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > > > The Java Documents explain its use nicely > > ( > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html> > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>>) > or there are articles on stack overflow addressing this exact problem ( > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655> > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655> > >) > > > > Thanks, > > > > Dwane > > > > From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>> > > Sent: Thursday, 29 October 2020 3:49 PM > > To: > > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>> > > < > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>> > > Subject: Avoiding duplicate entry for a multivalued field > > > > Hello, > > > > Say, I have a schema field which is multivalued. Is there a way to > maintain distinct values for that field though I continue to add duplicate > values through atomic update via solrj? > > > > Is there some property setting to have only unique values in a multi > valued fields? > > > > Thanks, > > Srinivas > > > > DISCLAIMER: > > E-mails and attachments from Bamboo Rose, LLC are confidential. > > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > > > Disclaimer > > > > The information contained in this
Re: Avoiding duplicate entry for a multivalued field
Srinivas, For atomic updates, you could use add-distinct operation to avoid duplicates - https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html This operation is available from Solr 7.3 Regards, Munendra S N On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood wrote: > Since you are already taking the performance hit of atomic updates, > I doubt you’ll see any impact from field types or update request > processors. > The extra cost of atomic updates will be much greater than indexing cost. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > > > wrote: > > > > Thanks Dwane, > > > > I have a doubt, according to the java doc, the duplicates still continue > to exist in the field. May be during query time, the field returns only > unique values? Am I right with my assumption? > > > > And also, what is the performance overhead for this UniqueFiled*Factory? > > > > Thanks, > > Srinivas > > > > From: Dwane Hall > > Sent: 29 October 2020 14:33 > > To: solr-user@lucene.apache.org > > Subject: Re: Avoiding duplicate entry for a multivalued field > > > > Srinivas this is possible by adding an unique field update processor to > the update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > > > The Java Documents explain its use nicely > > ( > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > < > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem ( > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > < > https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655 > >) > > > > Thanks, > > > > Dwane > > > > From: Srinivas Kashyap srini...@bamboorose.com.INVALID>> > > Sent: Thursday, 29 October 2020 3:49 PM > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> < > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>> > > Subject: Avoiding duplicate entry for a multivalued field > > > > Hello, > > > > Say, I have a schema field which is multivalued. Is there a way to > maintain distinct values for that field though I continue to add duplicate > values through atomic update via solrj? > > > > Is there some property setting to have only unique values in a multi > valued fields? > > > > Thanks, > > Srinivas > > > > DISCLAIMER: > > E-mails and attachments from Bamboo Rose, LLC are confidential. > > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > > > Disclaimer > > > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a > Service (SaaS) for business. Providing a safer and more useful place for > your human generated data. Specializing in; Security, archiving and > compliance. To find out more visit the Mimecast website. > >
Re: Avoiding duplicate entry for a multivalued field
Since you are already taking the performance hit of atomic updates, I doubt you’ll see any impact from field types or update request processors. The extra cost of atomic updates will be much greater than indexing cost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap > wrote: > > Thanks Dwane, > > I have a doubt, according to the java doc, the duplicates still continue to > exist in the field. May be during query time, the field returns only unique > values? Am I right with my assumption? > > And also, what is the performance overhead for this UniqueFiled*Factory? > > Thanks, > Srinivas > > From: Dwane Hall > Sent: 29 October 2020 14:33 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas this is possible by adding an unique field update processor to the > update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > The Java Documents explain its use nicely > (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem > (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) > > Thanks, > > Dwane > > From: Srinivas Kashyap > mailto:srini...@bamboorose.com.INVALID>> > Sent: Thursday, 29 October 2020 3:49 PM > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > mailto:solr-user@lucene.apache.org>> > Subject: Avoiding duplicate entry for a multivalued field > > Hello, > > Say, I have a schema field which is multivalued. Is there a way to maintain > distinct values for that field though I continue to add duplicate values > through atomic update via solrj? > > Is there some property setting to have only unique values in a multi valued > fields? > > Thanks, > Srinivas > > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender immediately > by replying to the e-mail, and then delete it without making copies or using > it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a Service > (SaaS) for business. Providing a safer and more useful place for your human > generated data. Specializing in; Security, archiving and compliance. To find > out more visit the Mimecast website.
Re: Avoiding duplicate entry for a multivalued field
If I understand correctly what you're trying to do, docValues for a number of field types are (at least in their multivalued incarnation) backed by SortedSetDocValues, which inherently deduplicate values per-document. In your case it sounds like you could maybe rely on that behavior as a feature, set stored=false, docValues=true, useDocValuesAsStored=true, and achieve the desired behavior? Michael On Thu, Oct 29, 2020 at 6:17 AM Srinivas Kashyap wrote: > > Thanks Dwane, > > I have a doubt, according to the java doc, the duplicates still continue to > exist in the field. May be during query time, the field returns only unique > values? Am I right with my assumption? > > And also, what is the performance overhead for this UniqueFiled*Factory? > > Thanks, > Srinivas > > From: Dwane Hall > Sent: 29 October 2020 14:33 > To: solr-user@lucene.apache.org > Subject: Re: Avoiding duplicate entry for a multivalued field > > Srinivas this is possible by adding an unique field update processor to the > update processor chain you are using to perform your updates (/update, > /update/json, /update/json/docs, .../a_custom_one) > > The Java Documents explain its use nicely > (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) > or there are articles on stack overflow addressing this exact problem > (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) > > Thanks, > > Dwane > > From: Srinivas Kashyap > mailto:srini...@bamboorose.com.INVALID>> > Sent: Thursday, 29 October 2020 3:49 PM > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > mailto:solr-user@lucene.apache.org>> > Subject: Avoiding duplicate entry for a multivalued field > > Hello, > > Say, I have a schema field which is multivalued. Is there a way to maintain > distinct values for that field though I continue to add duplicate values > through atomic update via solrj? > > Is there some property setting to have only unique values in a multi valued > fields? > > Thanks, > Srinivas > > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender immediately > by replying to the e-mail, and then delete it without making copies or using > it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a Service > (SaaS) for business. Providing a safer and more useful place for your human > generated data. Specializing in; Security, archiving and compliance. To find > out more visit the Mimecast website.
RE: Avoiding duplicate entry for a multivalued field
Thanks Dwane, I have a doubt, according to the java doc, the duplicates still continue to exist in the field. May be during query time, the field returns only unique values? Am I right with my assumption? And also, what is the performance overhead for this UniqueFiled*Factory? Thanks, Srinivas From: Dwane Hall Sent: 29 October 2020 14:33 To: solr-user@lucene.apache.org Subject: Re: Avoiding duplicate entry for a multivalued field Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one) The Java Documents explain its use nicely (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>) Thanks, Dwane From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID>> Sent: Thursday, 29 October 2020 3:49 PM To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> mailto:solr-user@lucene.apache.org>> Subject: Avoiding duplicate entry for a multivalued field Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Re: Avoiding duplicate entry for a multivalued field
Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one) The Java Documents explain its use nicely (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655) Thanks, Dwane From: Srinivas Kashyap Sent: Thursday, 29 October 2020 3:49 PM To: solr-user@lucene.apache.org Subject: Avoiding duplicate entry for a multivalued field Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Avoiding duplicate entry for a multivalued field
Hello, Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj? Is there some property setting to have only unique values in a multi valued fields? Thanks, Srinivas DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.