Re: Solr dependency update at Apache Beam - which versions should be supported

2020-10-30 Thread Piotr Szuberski
Well, it would require to maintain tests for all of the versions Beam wants
to support. For all the time Beam had SolrJ 5.5.4 as compile dependency so
it's not likely a needed feature.

On Fri, Oct 30, 2020 at 1:30 PM matthew sporleder 
wrote:

> Is there a reason you can't use a bunch of solr versions and let beam
> users choose at runtime?
>
> > On Oct 30, 2020, at 4:58 AM, Piotr Szuberski <
> piotr.szuber...@polidea.com> wrote:
> >
> > Thank you very much for your answer!
> >
> > Beam has a compile time dependency on Solr so the user doesn't have to
> > provide his own. The problem would happen when a user wants to use both
> > Solr X version and Beam SolrIO in the same project.
> >
> > As I understood it'd be the best choice to use the 8.x.y version and it
> > shouldn't break anything to the users using Beam as their only
> dependency?
> >
> > Regards,
> > Piotr
> >
> >> On Tue, Oct 27, 2020 at 10:26 PM Mike Drob  wrote:
> >>
> >> Piotr,
> >>
> >> Based on the questions that we've seen over the past month on this list,
> >> there are still users with Solr on 6, 7, and 8. I suspect there are
> still
> >> Solr 5 users out there too, although they don't appear to be asking for
> >> help - likely they are in set it and forget it mode.
> >>
> >> Solr 7 may not be officially deprecated on our site, but it's pretty
> old at
> >> this point and we're not doing any development on it outside of
> mybe a
> >> very high profile security fix. Even then, we might acknowledge it and
> >> recommend users update to 8.x anyway.
> >>
> >> The index files generated by Lucene and consumed by Solr are backwards
> >> compatible up to one major version. Some of the API remains compatible,
> a
> >> client issuing simple queries to Solr 5 would probably work fine even
> >> against Solr 9 when it comes out eventually. A client doing admin
> >> operations will be less certain. I don't know enough about Beam to tell
> you
> >> where on the spectrum your use will fall.
> >>
> >> I'm not sure if this was helpful or not, but maybe it is a nudge in the
> >> right direction.
> >>
> >> Good luck,
> >> Mike
> >>
> >>
> >> On Tue, Oct 27, 2020 at 11:09 AM Piotr Szuberski <
> >> piotr.szuber...@polidea.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> We are working on dependency updates at Apache Beam and I would like to
> >>> consult which versions should be supported so we don't break any
> existing
> >>> users.
> >>>
> >>> Previously the supported Solr version was 5.5.4.
> >>>
> >>> Versions 8.x.y and 7.x.y naturally come to mind as they are the only
> not
> >>> deprecated. But maybe there are users that use some earlier versions?
> >>>
> >>> Are these versions backwards-compatible or there are things to be aware
> >> of?
> >>>
> >>> Regards
> >>>
> >>
> >
> >
> > --
> >
> > *Piotr Szuberski*
> > Polidea  | Junior Software Engineer
> >
> > E: piotr.szuber...@polidea.com
> >
> > Unique Tech
> > Check out our projects! 
>


Re: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Munendra S N
add-distinct is similar to add but does contains check before adding the
value. In general, performance overhead should be minimal

Regards,
Munendra S N



On Fri, Oct 30, 2020 at 7:29 PM Srinivas Kashyap
 wrote:

> Thanks Munendra, this will really help me. Are there any performance
> overhead with this?
>
> Thanks,
> Srinivas
>
>
> From: Munendra S N 
> Sent: 30 October 2020 19:20
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
>
> Srinivas,
>
> For atomic updates, you could use add-distinct operation to avoid
> duplicates -
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
> This operation is available from Solr 7.3
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood  >
> wrote:
>
> > Since you are already taking the performance hit of atomic updates,
> > I doubt you’ll see any impact from field types or update request
> > processors.
> > The extra cost of atomic updates will be much greater than indexing cost.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/ (my
> blog)
> >
> > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap  .INVALID>
> > wrote:
> > >
> > > Thanks Dwane,
> > >
> > > I have a doubt, according to the java doc, the duplicates still
> continue
> > to exist in the field. May be during query time, the field returns only
> > unique values? Am I right with my assumption?
> > >
> > > And also, what is the performance overhead for this
> UniqueFiled*Factory?
> > >
> > > Thanks,
> > > Srinivas
> > >
> > > From: Dwane Hall mailto:dwaneh...@hotmail.com>>
> > > Sent: 29 October 2020 14:33
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Avoiding duplicate entry for a multivalued field
> > >
> > > Srinivas this is possible by adding an unique field update processor to
> > the update processor chain you are using to perform your updates
> (/update,
> > /update/json, /update/json/docs, .../a_custom_one)
> > >
> > > The Java Documents explain its use nicely
> > > (
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >
> > <
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >>)
> > or there are articles on stack overflow addressing this exact problem (
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > <
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > >)
> > >
> > > Thanks,
> > >
> > > Dwane
> > > 
> > > From: Srinivas Kashyap  >
> srini...@bamboorose.com.INVALID>>
> > > Sent: Thursday, 29 October 2020 3:49 PM
> > > To: solr-user@lucene.apache.org >
> <
> > solr-user@lucene.apache.org>>
> > > Subject: Avoiding duplicate entry for a multivalued field
> > >
> > > Hello,
> > >
> > > Say, I have a schema field which is multivalued. Is there a way to
> > maintain distinct values for that field though I continue to add
> duplicate
> > values through atomic update via solrj?
> > >
> > > Is there some property setting to have only unique values in a multi
> > valued fields?
> > >
> > > Thanks,
> > > Srinivas
> > > 
> > > DISCLAIMER:
> > > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > > If you are not the intended recipient, please notify the sender
> > immediately by replying to the e-mail, and then delete it without making
> > copies or using it in any way.
> > > No representation is made that this email or any attachments are free
> of
> > viruses. Virus scanning is recommended and is the responsibility of the
> > recipient.
> > >
> > > Disclaimer
> > >
> > > The information contained in this communication from the sender is
> > confidential. It is 

Re: Simulate facet.exists for json query facets

2020-10-30 Thread Michael Gibney
>If all of those facet queries are _known_ to be a performance hit,
you might be able to do something custom.That would require
custom code though and I wouldn’t go there unless you can
demonstrate need.

Yeah ... indeed if those facet queries are relatively static (and thus
cacheable ... even if there are a lot of them), an appropriately-sized
filterCache would allow them to be cached to good effect and then the
performance hit should be negligible. Knowing what the queries are up
front, you could even add them to your warming queries.

It'd also be unusual (though possible, sure?) to run these kinds of
facet queries with no intention of ever conditionally following up in
a way that would want the actual results/docSet -- even if the
initial/more common query only cares about boolean existence.

The case in which this type of functionality really might be indicated is:
1. only care about boolean result (obvious, ok)
2. dynamic (i.e., not-particularly-cacheable) queries
3. never intend to follow up with a request that calls for full results

If both of the first two conditions hold, and especially if the third
also holds, there would in principle definitely be efficiency to be
gained by early termination (and avoiding the creation of a DocSet,
which at the moment happens unconditionally for every facet query).
I'm also thinking about this through the lens of bringing the JSON
Facet API to parity with the legacy facet API, fwiw ...

On Fri, Oct 30, 2020 at 9:02 AM Erick Erickson  wrote:
>
> I don’t think there’s anything to do what you’re asking OOB.
>
> If all of those facet queries are _known_ to be a performance hit,
> you might be able to do something custom.That would require
> custom code though and I wouldn’t go there unless you can
> demonstrate need.
>
> If you issue a debug=timing you’ll see the time each component
> takes,  and there’s a separate entry for faceting so that’ll give you
> a clue whether it’s worth the effort.
>
> Best,
> Erick
>
> > On Oct 30, 2020, at 8:10 AM, Michael Gibney  
> > wrote:
> >
> > Michael, sorry for the confusion; I was positing a *hypothetical*
> > "exists()" function that doesn't currently exist, that *is* an
> > aggregate function, and the *does* stop early. I didn't account for
> > the fact that there's already an "exists()" function *query* that
> > behaves very differently. So yes, definitely confusing :-). I guess
> > choosing a different name for the proposed aggregate function would
> > make sense. I was suggesting it mostly as an alternative to extending
> > the syntax of JSON Facet "query" facet type, and to say that I think
> > the implementation of such an aggregate function would be pretty
> > straightforward.
> >
> > On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
> >>
> >> @Erick
> >>
> >> Sorry! I chose a simple example as I wanted to reduce complexity.
> >> In detail:
> >> * We have distinct contents like tours, offers, events, etc which
> >> themselves may be categorized: A tour may be a hiking tour, a
> >> mountaineering tour, ...
> >> * We have hundreds of customers that want to facet their searches to that
> >> content types but often with distinct combinations of categories, i.e.
> >> customer A wants his facet "tours" to only count hiking tours, customer B
> >> only mountaineering tours, customer C a combination of both, etc
> >> * We use "query" facets as each facet request will be build dynamically (it
> >> is not feasible to aggregate certain categories and add them as an
> >> additional solr schema field as we have hundreds of different 
> >> combinations).
> >> * Anyways, our ui only requires adding a toggle to filter for (for example)
> >> "tours" in case a facet result is present. We do not care about the number
> >> of tours.
> >> * As we have millions of contents and dozens of content types (and dozens
> >> of categories per content type) such queries may take a very long time.
> >>
> >> A complex example may look like this:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> *q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
> >> 21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
> >> 21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
> >> query, q: \"+categoryId:21515\"   },   condition:{ type : query,
> >> q: \"+categoryId:21514\"   },   hut:{ type : query, q:
> >> \"+categoryId:8510\"   },   skiresort:{ type : query, q:
> >> \"+categoryId:21493\"   },   offer:{ type : query, q:
> >> \"+categoryId:21462\"   },   lodging:{ type : query, q:
> >> \"+categoryId:6061\"   },   event:{ type : query, q:
> >> \"+categoryId:21465\"   },   poi:{ type : query, q:
> >> \"+(+categoryId:6000 

RE: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Srinivas Kashyap
Thanks Munendra, this will really help me. Are there any performance overhead 
with this?

Thanks,
Srinivas


From: Munendra S N 
Sent: 30 October 2020 19:20
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood 
mailto:wun...@wunderwood.org>>
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/ (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
> > mailto:srini...@bamboorose.com.INVALID>>
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall mailto:dwaneh...@hotmail.com>>
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >)
> >
> > Thanks,
> >
> > Dwane
> > 
> > From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> 
srini...@bamboorose.com.INVALID>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: 
> > solr-user@lucene.apache.org>
> >  <
> solr-user@lucene.apache.org>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for 

Re: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Munendra S N
Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood 
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
> > 
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall 
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >)
> >
> > Thanks,
> >
> > Dwane
> > 
> > From: Srinivas Kashyap  srini...@bamboorose.com.INVALID>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: solr-user@lucene.apache.org <
> solr-user@lucene.apache.org>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>
>


Re: Simulate facet.exists for json query facets

2020-10-30 Thread Erick Erickson
I don’t think there’s anything to do what you’re asking OOB.

If all of those facet queries are _known_ to be a performance hit,
you might be able to do something custom.That would require 
custom code though and I wouldn’t go there unless you can
demonstrate need.

If you issue a debug=timing you’ll see the time each component 
takes,  and there’s a separate entry for faceting so that’ll give you
a clue whether it’s worth the effort.

Best,
Erick

> On Oct 30, 2020, at 8:10 AM, Michael Gibney  wrote:
> 
> Michael, sorry for the confusion; I was positing a *hypothetical*
> "exists()" function that doesn't currently exist, that *is* an
> aggregate function, and the *does* stop early. I didn't account for
> the fact that there's already an "exists()" function *query* that
> behaves very differently. So yes, definitely confusing :-). I guess
> choosing a different name for the proposed aggregate function would
> make sense. I was suggesting it mostly as an alternative to extending
> the syntax of JSON Facet "query" facet type, and to say that I think
> the implementation of such an aggregate function would be pretty
> straightforward.
> 
> On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
>> 
>> @Erick
>> 
>> Sorry! I chose a simple example as I wanted to reduce complexity.
>> In detail:
>> * We have distinct contents like tours, offers, events, etc which
>> themselves may be categorized: A tour may be a hiking tour, a
>> mountaineering tour, ...
>> * We have hundreds of customers that want to facet their searches to that
>> content types but often with distinct combinations of categories, i.e.
>> customer A wants his facet "tours" to only count hiking tours, customer B
>> only mountaineering tours, customer C a combination of both, etc
>> * We use "query" facets as each facet request will be build dynamically (it
>> is not feasible to aggregate certain categories and add them as an
>> additional solr schema field as we have hundreds of different combinations).
>> * Anyways, our ui only requires adding a toggle to filter for (for example)
>> "tours" in case a facet result is present. We do not care about the number
>> of tours.
>> * As we have millions of contents and dozens of content types (and dozens
>> of categories per content type) such queries may take a very long time.
>> 
>> A complex example may look like this:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
>> 21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
>> 21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
>> query, q: \"+categoryId:21515\"   },   condition:{ type : query,
>> q: \"+categoryId:21514\"   },   hut:{ type : query, q:
>> \"+categoryId:8510\"   },   skiresort:{ type : query, q:
>> \"+categoryId:21493\"   },   offer:{ type : query, q:
>> \"+categoryId:21462\"   },   lodging:{ type : query, q:
>> \"+categoryId:6061\"   },   event:{ type : query, q:
>> \"+categoryId:21465\"   },   poi:{ type : query, q:
>> \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
>> type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
>> type : query, q: \"+categoryId:21200\"   },   list:{ type :
>> query, q: \"+categoryId:21481\"   } }\=0"*
>> 
>> @Michael
>> 
>> Thanks for your suggestion but this does not work as
>> * the facet module expects an aggregate function (which i simply added by
>> embracing your call with sum(...))
>> * and (please correct me if I am wrong) the exists() function not stops on
>> the first match, but counts the number of results for which the query
>> matches a document.



Re: Solr dependency update at Apache Beam - which versions should be supported

2020-10-30 Thread matthew sporleder
Is there a reason you can't use a bunch of solr versions and let beam users 
choose at runtime?

> On Oct 30, 2020, at 4:58 AM, Piotr Szuberski  
> wrote:
> 
> Thank you very much for your answer!
> 
> Beam has a compile time dependency on Solr so the user doesn't have to
> provide his own. The problem would happen when a user wants to use both
> Solr X version and Beam SolrIO in the same project.
> 
> As I understood it'd be the best choice to use the 8.x.y version and it
> shouldn't break anything to the users using Beam as their only dependency?
> 
> Regards,
> Piotr
> 
>> On Tue, Oct 27, 2020 at 10:26 PM Mike Drob  wrote:
>> 
>> Piotr,
>> 
>> Based on the questions that we've seen over the past month on this list,
>> there are still users with Solr on 6, 7, and 8. I suspect there are still
>> Solr 5 users out there too, although they don't appear to be asking for
>> help - likely they are in set it and forget it mode.
>> 
>> Solr 7 may not be officially deprecated on our site, but it's pretty old at
>> this point and we're not doing any development on it outside of mybe a
>> very high profile security fix. Even then, we might acknowledge it and
>> recommend users update to 8.x anyway.
>> 
>> The index files generated by Lucene and consumed by Solr are backwards
>> compatible up to one major version. Some of the API remains compatible, a
>> client issuing simple queries to Solr 5 would probably work fine even
>> against Solr 9 when it comes out eventually. A client doing admin
>> operations will be less certain. I don't know enough about Beam to tell you
>> where on the spectrum your use will fall.
>> 
>> I'm not sure if this was helpful or not, but maybe it is a nudge in the
>> right direction.
>> 
>> Good luck,
>> Mike
>> 
>> 
>> On Tue, Oct 27, 2020 at 11:09 AM Piotr Szuberski <
>> piotr.szuber...@polidea.com> wrote:
>> 
>>> Hi,
>>> 
>>> We are working on dependency updates at Apache Beam and I would like to
>>> consult which versions should be supported so we don't break any existing
>>> users.
>>> 
>>> Previously the supported Solr version was 5.5.4.
>>> 
>>> Versions 8.x.y and 7.x.y naturally come to mind as they are the only not
>>> deprecated. But maybe there are users that use some earlier versions?
>>> 
>>> Are these versions backwards-compatible or there are things to be aware
>> of?
>>> 
>>> Regards
>>> 
>> 
> 
> 
> -- 
> 
> *Piotr Szuberski*
> Polidea  | Junior Software Engineer
> 
> E: piotr.szuber...@polidea.com
> 
> Unique Tech
> Check out our projects! 


Re: Simulate facet.exists for json query facets

2020-10-30 Thread Michael Gibney
Michael, sorry for the confusion; I was positing a *hypothetical*
"exists()" function that doesn't currently exist, that *is* an
aggregate function, and the *does* stop early. I didn't account for
the fact that there's already an "exists()" function *query* that
behaves very differently. So yes, definitely confusing :-). I guess
choosing a different name for the proposed aggregate function would
make sense. I was suggesting it mostly as an alternative to extending
the syntax of JSON Facet "query" facet type, and to say that I think
the implementation of such an aggregate function would be pretty
straightforward.

On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
>
> @Erick
>
> Sorry! I chose a simple example as I wanted to reduce complexity.
> In detail:
> * We have distinct contents like tours, offers, events, etc which
> themselves may be categorized: A tour may be a hiking tour, a
> mountaineering tour, ...
> * We have hundreds of customers that want to facet their searches to that
> content types but often with distinct combinations of categories, i.e.
> customer A wants his facet "tours" to only count hiking tours, customer B
> only mountaineering tours, customer C a combination of both, etc
> * We use "query" facets as each facet request will be build dynamically (it
> is not feasible to aggregate certain categories and add them as an
> additional solr schema field as we have hundreds of different combinations).
> * Anyways, our ui only requires adding a toggle to filter for (for example)
> "tours" in case a facet result is present. We do not care about the number
> of tours.
> * As we have millions of contents and dozens of content types (and dozens
> of categories per content type) such queries may take a very long time.
>
> A complex example may look like this:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
> 21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
> 21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
> query, q: \"+categoryId:21515\"   },   condition:{ type : query,
>  q: \"+categoryId:21514\"   },   hut:{ type : query, q:
> \"+categoryId:8510\"   },   skiresort:{ type : query, q:
> \"+categoryId:21493\"   },   offer:{ type : query, q:
> \"+categoryId:21462\"   },   lodging:{ type : query, q:
> \"+categoryId:6061\"   },   event:{ type : query, q:
> \"+categoryId:21465\"   },   poi:{ type : query, q:
> \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
>  type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
>  type : query, q: \"+categoryId:21200\"   },   list:{ type :
> query, q: \"+categoryId:21481\"   } }\=0"*
>
> @Michael
>
> Thanks for your suggestion but this does not work as
> * the facet module expects an aggregate function (which i simply added by
> embracing your call with sum(...))
> * and (please correct me if I am wrong) the exists() function not stops on
> the first match, but counts the number of results for which the query
> matches a document.


Re: Solr dependency update at Apache Beam - which versions should be supported

2020-10-30 Thread Piotr Szuberski
Thank you very much for your answer!

Beam has a compile time dependency on Solr so the user doesn't have to
provide his own. The problem would happen when a user wants to use both
Solr X version and Beam SolrIO in the same project.

As I understood it'd be the best choice to use the 8.x.y version and it
shouldn't break anything to the users using Beam as their only dependency?

Regards,
Piotr

On Tue, Oct 27, 2020 at 10:26 PM Mike Drob  wrote:

> Piotr,
>
> Based on the questions that we've seen over the past month on this list,
> there are still users with Solr on 6, 7, and 8. I suspect there are still
> Solr 5 users out there too, although they don't appear to be asking for
> help - likely they are in set it and forget it mode.
>
> Solr 7 may not be officially deprecated on our site, but it's pretty old at
> this point and we're not doing any development on it outside of mybe a
> very high profile security fix. Even then, we might acknowledge it and
> recommend users update to 8.x anyway.
>
> The index files generated by Lucene and consumed by Solr are backwards
> compatible up to one major version. Some of the API remains compatible, a
> client issuing simple queries to Solr 5 would probably work fine even
> against Solr 9 when it comes out eventually. A client doing admin
> operations will be less certain. I don't know enough about Beam to tell you
> where on the spectrum your use will fall.
>
> I'm not sure if this was helpful or not, but maybe it is a nudge in the
> right direction.
>
> Good luck,
> Mike
>
>
> On Tue, Oct 27, 2020 at 11:09 AM Piotr Szuberski <
> piotr.szuber...@polidea.com> wrote:
>
> > Hi,
> >
> > We are working on dependency updates at Apache Beam and I would like to
> > consult which versions should be supported so we don't break any existing
> > users.
> >
> > Previously the supported Solr version was 5.5.4.
> >
> > Versions 8.x.y and 7.x.y naturally come to mind as they are the only not
> > deprecated. But maybe there are users that use some earlier versions?
> >
> > Are these versions backwards-compatible or there are things to be aware
> of?
> >
> > Regards
> >
>


-- 

*Piotr Szuberski*
Polidea  | Junior Software Engineer

E: piotr.szuber...@polidea.com

Unique Tech
Check out our projects! 


Re: Simulate facet.exists for json query facets

2020-10-30 Thread michael dürr
@Erick

Sorry! I chose a simple example as I wanted to reduce complexity.
In detail:
* We have distinct contents like tours, offers, events, etc which
themselves may be categorized: A tour may be a hiking tour, a
mountaineering tour, ...
* We have hundreds of customers that want to facet their searches to that
content types but often with distinct combinations of categories, i.e.
customer A wants his facet "tours" to only count hiking tours, customer B
only mountaineering tours, customer C a combination of both, etc
* We use "query" facets as each facet request will be build dynamically (it
is not feasible to aggregate certain categories and add them as an
additional solr schema field as we have hundreds of different combinations).
* Anyways, our ui only requires adding a toggle to filter for (for example)
"tours" in case a facet result is present. We do not care about the number
of tours.
* As we have millions of contents and dozens of content types (and dozens
of categories per content type) such queries may take a very long time.

A complex example may look like this:























































*q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
query, q: \"+categoryId:21515\"   },   condition:{ type : query,
 q: \"+categoryId:21514\"   },   hut:{ type : query, q:
\"+categoryId:8510\"   },   skiresort:{ type : query, q:
\"+categoryId:21493\"   },   offer:{ type : query, q:
\"+categoryId:21462\"   },   lodging:{ type : query, q:
\"+categoryId:6061\"   },   event:{ type : query, q:
\"+categoryId:21465\"   },   poi:{ type : query, q:
\"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
 type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
 type : query, q: \"+categoryId:21200\"   },   list:{ type :
query, q: \"+categoryId:21481\"   } }\=0"*

@Michael

Thanks for your suggestion but this does not work as
* the facet module expects an aggregate function (which i simply added by
embracing your call with sum(...))
* and (please correct me if I am wrong) the exists() function not stops on
the first match, but counts the number of results for which the query
matches a document.