Re: Faceting: !terms vs mincount precedence

2020-11-17 Thread Jason Gerlowski
Thanks for the context David - I didn't realize this was built as an
internal mechanism and then documented later on.  A few other thoughts
below:

> {!terms}, it suggests a reference to the TermsQParser, but when you write 
> {!terms=a,b,c} it suggests local-params
I agree that the two are easy to confuse.  Apologies for abbreviating
it at points in my earlier email - I was doing it for brevity and
didn't intend the confusion.

> I think that "terms" local-param to faceting was a purely internal thing that 
> wasn't documented
That may be.  But I disagree that it shouldn't've been documented in
the first place.  Digging into this has cost me a good bit of time,
and even now maybe I've got more digging to do, maybe a bug to fix,
etc.  But without someone's (Christine's?) documentation I'd be even
worse off, without any idea that this "terms" local-params support
exists at all.  The documentation even mentions that "terms" doesn't
work well with some other faceting params.  The details could be a bit
fuller, but the warning *is* there.  So I don't find any fault with
documenting this sort of stuff - especially when it gives warnings
about potential limitations.

Anyway, still hoping someone else might chime in with a slick
workaround or something.  But it does look at this point like I'll
have to go another route or put in some effort myself.

Jason

On Tue, Nov 17, 2020 at 3:41 PM David Smiley  wrote:
>
> This is confusing because when you write {!terms}, it suggests a reference
> to the TermsQParser, but when you write {!terms=a,b,c} it suggests
> local-params, with key "terms" and value "a,b,c" -- entirely different
> things.  I think that "terms" local-param to faceting was a purely internal
> thing that wasn't documented; it existed as an internal implementation
> detail.  Then someone (I think Christine, if not then Mikhail) observed it
> wasn't documented, and added some basic docs.  Now you come along and try
> to use it with other things that unsurprisingly it just wasn't designed
> for.  That's my estimation of the matter... and *if* true, illustrates that
> maybe some internal params should stay internal and don't need to be
> publicly documented.  I confess I've used that faceting local-param in an
> app once before too; it's useful.  I know my response isn't a direct answer
> to your question RE mincount... perhaps it can be made to work?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Nov 17, 2020 at 8:21 AM Jason Gerlowski 
> wrote:
>
> > Hey all,
> >
> > I was using the {!terms} local parameter on some traditional field
> > facets to make sure particular values were returned.
> >
> > e.g.
> > facet=true={!terms='fantasy,scifi,mystery'}genre_s_s.facet.mincount=2
> >
> > On single-shard collections in 8.6.3 this worked as I expected -
> > "fantasy", "scifi", and "mystery" were the only 3 field values
> > returned, and "mystery" was returned despite its count value being
> > less than the specified "mincount".  But on a multi-shard collection
> > "mystery" isn't returned (presumably because a "mincount" check
> > filters out the values on the facet aggregator node).
> >
> > What are the expected semantics when "{!terms}" and "mincount" are
> > used together?  Should mincount filter out values in {!terms}, or
> > should those values be excluded from any mincount filtering?  The
> > behavior is clearly inconsistent between single and multi-shard, so it
> > deserves a JIRA either way.  Just trying to figure out what the
> > expected behavior is.
> >
> > Best,
> >
> > Jason
> >


Re: Faceting: !terms vs mincount precedence

2020-11-17 Thread David Smiley
This is confusing because when you write {!terms}, it suggests a reference
to the TermsQParser, but when you write {!terms=a,b,c} it suggests
local-params, with key "terms" and value "a,b,c" -- entirely different
things.  I think that "terms" local-param to faceting was a purely internal
thing that wasn't documented; it existed as an internal implementation
detail.  Then someone (I think Christine, if not then Mikhail) observed it
wasn't documented, and added some basic docs.  Now you come along and try
to use it with other things that unsurprisingly it just wasn't designed
for.  That's my estimation of the matter... and *if* true, illustrates that
maybe some internal params should stay internal and don't need to be
publicly documented.  I confess I've used that faceting local-param in an
app once before too; it's useful.  I know my response isn't a direct answer
to your question RE mincount... perhaps it can be made to work?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Nov 17, 2020 at 8:21 AM Jason Gerlowski 
wrote:

> Hey all,
>
> I was using the {!terms} local parameter on some traditional field
> facets to make sure particular values were returned.
>
> e.g.
> facet=true={!terms='fantasy,scifi,mystery'}genre_s_s.facet.mincount=2
>
> On single-shard collections in 8.6.3 this worked as I expected -
> "fantasy", "scifi", and "mystery" were the only 3 field values
> returned, and "mystery" was returned despite its count value being
> less than the specified "mincount".  But on a multi-shard collection
> "mystery" isn't returned (presumably because a "mincount" check
> filters out the values on the facet aggregator node).
>
> What are the expected semantics when "{!terms}" and "mincount" are
> used together?  Should mincount filter out values in {!terms}, or
> should those values be excluded from any mincount filtering?  The
> behavior is clearly inconsistent between single and multi-shard, so it
> deserves a JIRA either way.  Just trying to figure out what the
> expected behavior is.
>
> Best,
>
> Jason
>


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Sorry, correction, taking "the" time

On Mon, 19 Oct 2020 22:18:30 +0300
uyilmaz  wrote:

> Thanks for taking time to write a detailed answer.
> 
> We use Solr to both store our data and to perform aggregations, using 
> faceting or streaming expressions. When required analysis is too complex to 
> do in Solr, we export large query results from Solr to a more capable 
> analysis tool.
> 
> So I guess all fields need to be docValues="true", because export handler and 
> streaming both require fields to have docValues, and even if I won't use a 
> field in queries or facets, it should be in available to read in result set. 
> Fields that won't be searched or faceted can be (indexed=false stored=false 
> docValues=true) right?
> 
> --uyilmaz
> 
> 
> On Mon, 19 Oct 2020 14:14:27 -0400
> Michael Gibney  wrote:
> 
> > As you've observed, it is indeed possible to facet on fields with
> > docValues=true, indexed=false; but in almost all cases you should
> > probably set indexed=true. 1. for distributed facet count refinement,
> > the "indexed" approach is used to look up counts by value; 2. assuming
> > you're wanting to do something usual, e.g. allow users to apply
> > filters based on facet counts, the filter application would use the
> > "indexed" approach as well. Where indexed=false, if either filtering
> > or distributed refinement is attempted, I'm not 100% sure what
> > happens. It might fail, or lead to inconsistent results, or attempt to
> > look up results via the equivalent of a "table scan" over docValues (I
> > think the last of these is what actually happens, fwiw) ... but none
> > of these options is likely desirable.
> > 
> > Michael
> > 
> > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> > >
> > > Thanks! This also contributed to my confusion:
> > >
> > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> > >
> > > "If you want Solr to perform both analysis (for searching) and faceting 
> > > on the full literal strings, use the copyField directive in your Schema 
> > > to create two versions of the field: one Text and one String. Make sure 
> > > both are indexed="true"."
> > >
> > > On Mon, 19 Oct 2020 13:08:00 -0400
> > > Alexandre Rafalovitch  wrote:
> > >
> > > > I think this is all explained quite well in the Ref Guide:
> > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > > >
> > > > DocValues is a different way to index/store values. Faceting is a
> > > > primary use case where docValues are better than what 'indexed=true'
> > > > gives you.
> > > >
> > > > Regards,
> > > >Alex.
> > > >
> > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  
> > > > wrote:
> > > > >
> > > > >
> > > > > Hey all,
> > > > >
> > > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > > mistake) we can facet on fields marked as both indexed and stored 
> > > > > being false:
> > > > >
> > > > >  > > > > indexed="false" stored="false" docValues="true"/>
> > > > >
> > > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > > confirm this?
> > > > >
> > > > > Regards
> > > > >
> > > > > --
> > > > > uyilmaz 
> > >
> > >
> > > --
> > > uyilmaz 
> 
> 
> -- 
> uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks for taking time to write a detailed answer.

We use Solr to both store our data and to perform aggregations, using faceting 
or streaming expressions. When required analysis is too complex to do in Solr, 
we export large query results from Solr to a more capable analysis tool.

So I guess all fields need to be docValues="true", because export handler and 
streaming both require fields to have docValues, and even if I won't use a 
field in queries or facets, it should be in available to read in result set. 
Fields that won't be searched or faceted can be (indexed=false stored=false 
docValues=true) right?

--uyilmaz


On Mon, 19 Oct 2020 14:14:27 -0400
Michael Gibney  wrote:

> As you've observed, it is indeed possible to facet on fields with
> docValues=true, indexed=false; but in almost all cases you should
> probably set indexed=true. 1. for distributed facet count refinement,
> the "indexed" approach is used to look up counts by value; 2. assuming
> you're wanting to do something usual, e.g. allow users to apply
> filters based on facet counts, the filter application would use the
> "indexed" approach as well. Where indexed=false, if either filtering
> or distributed refinement is attempted, I'm not 100% sure what
> happens. It might fail, or lead to inconsistent results, or attempt to
> look up results via the equivalent of a "table scan" over docValues (I
> think the last of these is what actually happens, fwiw) ... but none
> of these options is likely desirable.
> 
> Michael
> 
> On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> >
> > Thanks! This also contributed to my confusion:
> >
> > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> >
> > "If you want Solr to perform both analysis (for searching) and faceting on 
> > the full literal strings, use the copyField directive in your Schema to 
> > create two versions of the field: one Text and one String. Make sure both 
> > are indexed="true"."
> >
> > On Mon, 19 Oct 2020 13:08:00 -0400
> > Alexandre Rafalovitch  wrote:
> >
> > > I think this is all explained quite well in the Ref Guide:
> > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > >
> > > DocValues is a different way to index/store values. Faceting is a
> > > primary use case where docValues are better than what 'indexed=true'
> > > gives you.
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > > >
> > > >
> > > > Hey all,
> > > >
> > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > mistake) we can facet on fields marked as both indexed and stored being 
> > > > false:
> > > >
> > > >  > > > indexed="false" stored="false" docValues="true"/>
> > > >
> > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > confirm this?
> > > >
> > > > Regards
> > > >
> > > > --
> > > > uyilmaz 
> >
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Walter Underwood
Hmm. Fields used for faceting will also be used for filtering, which is a kind
of search. Are docValues OK for filtering? I expect they might be slow the
first time, then cached.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 19, 2020, at 11:15 AM, Erick Erickson  wrote:
> 
> uyilmaz:
> 
> Hmm, that _is_ confusing. And inaccurate.
> 
> In this context, it should read something like
> 
> The Text field should have indexed="true" docValues=“false" if used for 
> searching 
> but not faceting and the String field should have indexed="false" 
> docValues=“true"
> if used for faceting but not searching.
> 
> I’ll fix this, thanks for pointing this out.
> 
> Erick
> 
>> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
>> 
>> Thanks! This also contributed to my confusion:
>> 
>> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>> 
>> "If you want Solr to perform both analysis (for searching) and faceting on 
>> the full literal strings, use the copyField directive in your Schema to 
>> create two versions of the field: one Text and one String. Make sure both 
>> are indexed="true"."
>> 
>> On Mon, 19 Oct 2020 13:08:00 -0400
>> Alexandre Rafalovitch  wrote:
>> 
>>> I think this is all explained quite well in the Ref Guide:
>>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>>> 
>>> DocValues is a different way to index/store values. Faceting is a
>>> primary use case where docValues are better than what 'indexed=true'
>>> gives you.
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
 
 
 Hey all,
 
 From my little experiments, I see that (if I didn't make a stupid mistake) 
 we can facet on fields marked as both indexed and stored being false:
 
 >>> stored="false" docValues="true"/>
 
 I'm suprised by this, I thought I would need to index it. Can you confirm 
 this?
 
 Regards
 
 --
 uyilmaz 
>> 
>> 
>> -- 
>> uyilmaz 
> 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Erick Erickson
uyilmaz:

Hmm, that _is_ confusing. And inaccurate.

In this context, it should read something like

The Text field should have indexed="true" docValues=“false" if used for 
searching 
but not faceting and the String field should have indexed="false" 
docValues=“true"
if used for faceting but not searching.

I’ll fix this, thanks for pointing this out.

Erick

> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
> 
> Thanks! This also contributed to my confusion:
> 
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> 
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
> 
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
> 
>> I think this is all explained quite well in the Ref Guide:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>> 
>> DocValues is a different way to index/store values. Faceting is a
>> primary use case where docValues are better than what 'indexed=true'
>> gives you.
>> 
>> Regards,
>>   Alex.
>> 
>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>>> 
>>> 
>>> Hey all,
>>> 
>>> From my little experiments, I see that (if I didn't make a stupid mistake) 
>>> we can facet on fields marked as both indexed and stored being false:
>>> 
>>> >> stored="false" docValues="true"/>
>>> 
>>> I'm suprised by this, I thought I would need to index it. Can you confirm 
>>> this?
>>> 
>>> Regards
>>> 
>>> --
>>> uyilmaz 
> 
> 
> -- 
> uyilmaz 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Michael Gibney
As you've observed, it is indeed possible to facet on fields with
docValues=true, indexed=false; but in almost all cases you should
probably set indexed=true. 1. for distributed facet count refinement,
the "indexed" approach is used to look up counts by value; 2. assuming
you're wanting to do something usual, e.g. allow users to apply
filters based on facet counts, the filter application would use the
"indexed" approach as well. Where indexed=false, if either filtering
or distributed refinement is attempted, I'm not 100% sure what
happens. It might fail, or lead to inconsistent results, or attempt to
look up results via the equivalent of a "table scan" over docValues (I
think the last of these is what actually happens, fwiw) ... but none
of these options is likely desirable.

Michael

On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
>
> Thanks! This also contributed to my confusion:
>
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
>
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
>
> > I think this is all explained quite well in the Ref Guide:
> > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> >
> > DocValues is a different way to index/store values. Faceting is a
> > primary use case where docValues are better than what 'indexed=true'
> > gives you.
> >
> > Regards,
> >Alex.
> >
> > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > >
> > >
> > > Hey all,
> > >
> > > From my little experiments, I see that (if I didn't make a stupid 
> > > mistake) we can facet on fields marked as both indexed and stored being 
> > > false:
> > >
> > >  > > stored="false" docValues="true"/>
> > >
> > > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > > this?
> > >
> > > Regards
> > >
> > > --
> > > uyilmaz 
>
>
> --
> uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks! This also contributed to my confusion:

https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters

"If you want Solr to perform both analysis (for searching) and faceting on the 
full literal strings, use the copyField directive in your Schema to create two 
versions of the field: one Text and one String. Make sure both are 
indexed="true"."

On Mon, 19 Oct 2020 13:08:00 -0400
Alexandre Rafalovitch  wrote:

> I think this is all explained quite well in the Ref Guide:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html
> 
> DocValues is a different way to index/store values. Faceting is a
> primary use case where docValues are better than what 'indexed=true'
> gives you.
> 
> Regards,
>Alex.
> 
> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> >
> >
> > Hey all,
> >
> > From my little experiments, I see that (if I didn't make a stupid mistake) 
> > we can facet on fields marked as both indexed and stored being false:
> >
> >  > stored="false" docValues="true"/>
> >
> > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > this?
> >
> > Regards
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Alexandre Rafalovitch
I think this is all explained quite well in the Ref Guide:
https://lucene.apache.org/solr/guide/8_6/docvalues.html

DocValues is a different way to index/store values. Faceting is a
primary use case where docValues are better than what 'indexed=true'
gives you.

Regards,
   Alex.

On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>
>
> Hey all,
>
> From my little experiments, I see that (if I didn't make a stupid mistake) we 
> can facet on fields marked as both indexed and stored being false:
>
>  stored="false" docValues="true"/>
>
> I'm suprised by this, I thought I would need to index it. Can you confirm 
> this?
>
> Regards
>
> --
> uyilmaz 


Re: Faceting with Stats

2019-07-05 Thread Erick Erickson
Thanks for bring closure to this. Yeah, “escaping hell” is something that
happens to us all, something that works in a browser doesn’t work
from SolrJ and neither one may work with curl and……

Pretty often, BTW, I look at the Solr log. It takes a little practice to 
reconstruct the query, but it’s not very hard. Then I work 
backwards…

Best,
Erick

> On Jul 4, 2019, at 11:14 PM, Ahmed Adel  wrote:
> 
> Thanks for your reply! Yes, it turned out to be an issue with the way the
> request was being sent, which was cURL that required special handling and
> escaping of spaces and special characters. Using another client cleared
> this issue and the request below worked perfectly now.
> 
> Best,
> A.
> 
> On Thu, Jul 4, 2019 at 4:53 PM Erick Erickson 
> wrote:
> 
>> Might be a formatting error with my mail client, but the very first line
>> is not well formed.
>> 
>> q: * is incorrect
>> 
>> q=*:*
>> 
>> 
>> 
>> I do not see that example on the page either. Looks like you took the bit
>> that starts with stats=true and mis-typed the q clause.
>> 
>> Best,
>> Erick
>>> On Jul 3, 2019, at 5:08 AM, Ahmed Adel  wrote:
>>> 
>>> Hi,
>>> 
>>> As per the documentation recommendation of using pivot with stats
>> component
>>> instead (
>>> 
>> https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots
>> ),
>>> replacing the stats options that were previously used with the newer
>> pivot
>>> options as follows:
>>> 
>>> q: *
>>> stats=true
>>> stats.field={!tag=piv1 mean=true}average_rating_f
>>> facet=true
>>> facet.pivot={!stats=piv1}author_s
>>> 
>>> returns the following error:
>>> 
>>> Bad Message 400
>>> reason: Illegal character SPACE=' '
>>> 
>>> This is a syntax issue rather than a logical one, however. Any thoughts
>> of
>>> what could be missing would be appreciated.
>>> 
>>> Thanks,
>>> A. Adel
>>> 
>>> On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:
>>> 
 Hi,
 
 How can stats field value be calculated for top facet values? In other
 words, the following request parameters should return the stats.field
 measures for facets sorted by count:
 
 q: *
 wt: json
 stats: true
 stats.facet: authors_s
 stats.field: average_rating_f
 facet.missing: true
 f.authors_s.facet.sort: count
 
 However, the response is not sorted by facet field count. Is there
 something missing?
 
 Best,
 A.
 
>> 
>> --
> Sent from my iPhone



Re: Faceting with Stats

2019-07-05 Thread Ahmed Adel
Thanks for your reply! Yes, it turned out to be an issue with the way the
request was being sent, which was cURL that required special handling and
escaping of spaces and special characters. Using another client cleared
this issue and the request below worked perfectly now.

Best,
A.

On Thu, Jul 4, 2019 at 4:53 PM Erick Erickson 
wrote:

> Might be a formatting error with my mail client, but the very first line
> is not well formed.
>
> q: * is incorrect
>
> q=*:*
>
>
>
> I do not see that example on the page either. Looks like you took the bit
> that starts with stats=true and mis-typed the q clause.
>
> Best,
> Erick
> > On Jul 3, 2019, at 5:08 AM, Ahmed Adel  wrote:
> >
> > Hi,
> >
> > As per the documentation recommendation of using pivot with stats
> component
> > instead (
> >
> https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots
> ),
> > replacing the stats options that were previously used with the newer
> pivot
> > options as follows:
> >
> > q: *
> > stats=true
> > stats.field={!tag=piv1 mean=true}average_rating_f
> > facet=true
> > facet.pivot={!stats=piv1}author_s
> >
> > returns the following error:
> >
> > Bad Message 400
> > reason: Illegal character SPACE=' '
> >
> > This is a syntax issue rather than a logical one, however. Any thoughts
> of
> > what could be missing would be appreciated.
> >
> > Thanks,
> > A. Adel
> >
> > On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:
> >
> >> Hi,
> >>
> >> How can stats field value be calculated for top facet values? In other
> >> words, the following request parameters should return the stats.field
> >> measures for facets sorted by count:
> >>
> >> q: *
> >> wt: json
> >> stats: true
> >> stats.facet: authors_s
> >> stats.field: average_rating_f
> >> facet.missing: true
> >> f.authors_s.facet.sort: count
> >>
> >> However, the response is not sorted by facet field count. Is there
> >> something missing?
> >>
> >> Best,
> >> A.
> >>
>
> --
Sent from my iPhone


Re: Faceting with Stats

2019-07-04 Thread Erick Erickson
Might be a formatting error with my mail client, but the very first line is not 
well formed. 

q: * is incorrect

q=*:*



I do not see that example on the page either. Looks like you took the bit
that starts with stats=true and mis-typed the q clause.

Best,
Erick
> On Jul 3, 2019, at 5:08 AM, Ahmed Adel  wrote:
> 
> Hi,
> 
> As per the documentation recommendation of using pivot with stats component
> instead (
> https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
> replacing the stats options that were previously used with the newer pivot
> options as follows:
> 
> q: *
> stats=true
> stats.field={!tag=piv1 mean=true}average_rating_f
> facet=true
> facet.pivot={!stats=piv1}author_s
> 
> returns the following error:
> 
> Bad Message 400
> reason: Illegal character SPACE=' '
> 
> This is a syntax issue rather than a logical one, however. Any thoughts of
> what could be missing would be appreciated.
> 
> Thanks,
> A. Adel
> 
> On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:
> 
>> Hi,
>> 
>> How can stats field value be calculated for top facet values? In other
>> words, the following request parameters should return the stats.field
>> measures for facets sorted by count:
>> 
>> q: *
>> wt: json
>> stats: true
>> stats.facet: authors_s
>> stats.field: average_rating_f
>> facet.missing: true
>> f.authors_s.facet.sort: count
>> 
>> However, the response is not sorted by facet field count. Is there
>> something missing?
>> 
>> Best,
>> A.
>> 



Re: Faceting with Stats

2019-07-04 Thread Ahmed Adel
Hi,

As per the documentation recommendation of using pivot with stats component
instead (
https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
replacing the stats options that were previously used with the newer pivot
options as follows:

q: *
stats=true
stats.field={!tag=piv1 mean=true}average_rating_f
facet=true
facet.pivot={!stats=piv1}author_s

returns the following error:

Bad Message 400
reason: Illegal character SPACE=' '

This is a syntax issue rather than a logical one, however. Any thoughts of
what could be missing would be appreciated.

Thanks,
A. Adel

On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:

> Hi,
>
> How can stats field value be calculated for top facet values? In other
> words, the following request parameters should return the stats.field
> measures for facets sorted by count:
>
> q: *
> wt: json
> stats: true
> stats.facet: authors_s
> stats.field: average_rating_f
> facet.missing: true
> f.authors_s.facet.sort: count
>
> However, the response is not sorted by facet field count. Is there
> something missing?
>
> Best,
> A.
>


Re: Faceting with Stats

2019-07-03 Thread Ahmed Adel
Hi,

As per the documentation recommendation of using pivot with stats component
instead (
https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
replacing the stats options that were previously used with the newer pivot
options as follows:

q: *
stats=true
stats.field={!tag=piv1 mean=true}average_rating_f
facet=true
facet.pivot={!stats=piv1}author_s

returns the following error:

Bad Message 400
reason: Illegal character SPACE=' '

This is a syntax issue rather than a logical one, however. Any thoughts of
what could be missing would be appreciated.

Thanks,
A. Adel

On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:

> Hi,
>
> How can stats field value be calculated for top facet values? In other
> words, the following request parameters should return the stats.field
> measures for facets sorted by count:
>
> q: *
> wt: json
> stats: true
> stats.facet: authors_s
> stats.field: average_rating_f
> facet.missing: true
> f.authors_s.facet.sort: count
>
> However, the response is not sorted by facet field count. Is there
> something missing?
>
> Best,
> A.
>


Re: Faceting filter tagging doesn't work in case where 0 matches are found

2019-02-18 Thread Mikhail Khludnev
I've consulted regarding this case. This is not an issue, you may bring
facet back adding not yet documented property processEmpty:true

On Mon, Feb 18, 2019 at 10:42 AM Mikhail Khludnev  wrote:

> Hello,
> I'm not sure but it sounds like an issue, would you mind to raise one at
> https://issues.apache.org/jira/projects/SOLR/ ?
>
> On Sun, Feb 17, 2019 at 6:57 PM Arvydas Silanskas <
> nma.arvydas.silans...@gmail.com> wrote:
>
>> Good evening,
>>
>> I am using facet json api to query aggregation data, and I don't care
>> about
>> the returned documents themselves. One of the use cases I want to employ
>> is
>> tagging filter queries for fields, and then exclude those filters when
>> faceting. My problem is, however, that in those cases where the filter has
>> 0 matches, the facets aren't calculated at all.
>>
>> I'm using dataset I found at
>> https://www.raspberry.nl/2010/12/29/solr-test-dataset/ . To illustrate --
>> this is an an example when filter doesn't filter out everything (working
>> as
>> expected):
>>
>> Request:
>> {
>>   "query": "*:*",
>>   "facet": {
>> "latitude_f": {
>>   "type": "range",
>>   "start": -90,
>>   "facet": {
>> "population": "sum(population_i)"
>>   },
>>   "domain": {
>> "excludeTags": "latitude_f"
>>   },
>>   "gap": 10,
>>   "end": -70,
>>   "field": "latitude_f"
>> }
>>   },
>>   "limit": 0,
>>   "filter": [
>> "{!tag=latitude_f}latitude_f:[-80.0 TO -70.0]"
>>   ]
>> }
>>
>> Response:
>>
>> {
>>   "facets": {
>> "count": 1,
>> "latitude_f": {
>>   "buckets": [
>> {
>>   "val": -90,
>>   "count": 0
>> },
>> {
>>   "val": -80,
>>   "count": 1,
>>   "population": 1258
>> }
>>   ]
>> }
>>   }
>> }
>>
>>
>> Example when filter filters everything out:
>>
>> Request is the same, except the filter field value is
>>
>>   "filter": [
>> "{!tag=latitude_f}latitude_f:[-90.0 TO -80.0]"
>>   ]
>>
>> and response is
>>
>>  "facets":{
>> "count":0}
>>
>> . I'm returned no facets whatsoever. However I'd expect the response to be
>> the same as and for the first request, since the only one filter is used,
>> and is excluded in faceting.
>>
>> Is this a bug? What are the workarounds for such problem?
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Faceting filter tagging doesn't work in case where 0 matches are found

2019-02-18 Thread Zheng Lin Edwin Yeo
Hi,

Which version of Solr are you using when you face this problem?

Regards,
Edwin

On Mon, 18 Feb 2019 at 15:43, Mikhail Khludnev  wrote:

> Hello,
> I'm not sure but it sounds like an issue, would you mind to raise one at
> https://issues.apache.org/jira/projects/SOLR/ ?
>
> On Sun, Feb 17, 2019 at 6:57 PM Arvydas Silanskas <
> nma.arvydas.silans...@gmail.com> wrote:
>
> > Good evening,
> >
> > I am using facet json api to query aggregation data, and I don't care
> about
> > the returned documents themselves. One of the use cases I want to employ
> is
> > tagging filter queries for fields, and then exclude those filters when
> > faceting. My problem is, however, that in those cases where the filter
> has
> > 0 matches, the facets aren't calculated at all.
> >
> > I'm using dataset I found at
> > https://www.raspberry.nl/2010/12/29/solr-test-dataset/ . To illustrate
> --
> > this is an an example when filter doesn't filter out everything (working
> as
> > expected):
> >
> > Request:
> > {
> >   "query": "*:*",
> >   "facet": {
> > "latitude_f": {
> >   "type": "range",
> >   "start": -90,
> >   "facet": {
> > "population": "sum(population_i)"
> >   },
> >   "domain": {
> > "excludeTags": "latitude_f"
> >   },
> >   "gap": 10,
> >   "end": -70,
> >   "field": "latitude_f"
> > }
> >   },
> >   "limit": 0,
> >   "filter": [
> > "{!tag=latitude_f}latitude_f:[-80.0 TO -70.0]"
> >   ]
> > }
> >
> > Response:
> >
> > {
> >   "facets": {
> > "count": 1,
> > "latitude_f": {
> >   "buckets": [
> > {
> >   "val": -90,
> >   "count": 0
> > },
> > {
> >   "val": -80,
> >   "count": 1,
> >   "population": 1258
> > }
> >   ]
> > }
> >   }
> > }
> >
> >
> > Example when filter filters everything out:
> >
> > Request is the same, except the filter field value is
> >
> >   "filter": [
> > "{!tag=latitude_f}latitude_f:[-90.0 TO -80.0]"
> >   ]
> >
> > and response is
> >
> >  "facets":{
> > "count":0}
> >
> > . I'm returned no facets whatsoever. However I'd expect the response to
> be
> > the same as and for the first request, since the only one filter is used,
> > and is excluded in faceting.
> >
> > Is this a bug? What are the workarounds for such problem?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Faceting filter tagging doesn't work in case where 0 matches are found

2019-02-17 Thread Mikhail Khludnev
Hello,
I'm not sure but it sounds like an issue, would you mind to raise one at
https://issues.apache.org/jira/projects/SOLR/ ?

On Sun, Feb 17, 2019 at 6:57 PM Arvydas Silanskas <
nma.arvydas.silans...@gmail.com> wrote:

> Good evening,
>
> I am using facet json api to query aggregation data, and I don't care about
> the returned documents themselves. One of the use cases I want to employ is
> tagging filter queries for fields, and then exclude those filters when
> faceting. My problem is, however, that in those cases where the filter has
> 0 matches, the facets aren't calculated at all.
>
> I'm using dataset I found at
> https://www.raspberry.nl/2010/12/29/solr-test-dataset/ . To illustrate --
> this is an an example when filter doesn't filter out everything (working as
> expected):
>
> Request:
> {
>   "query": "*:*",
>   "facet": {
> "latitude_f": {
>   "type": "range",
>   "start": -90,
>   "facet": {
> "population": "sum(population_i)"
>   },
>   "domain": {
> "excludeTags": "latitude_f"
>   },
>   "gap": 10,
>   "end": -70,
>   "field": "latitude_f"
> }
>   },
>   "limit": 0,
>   "filter": [
> "{!tag=latitude_f}latitude_f:[-80.0 TO -70.0]"
>   ]
> }
>
> Response:
>
> {
>   "facets": {
> "count": 1,
> "latitude_f": {
>   "buckets": [
> {
>   "val": -90,
>   "count": 0
> },
> {
>   "val": -80,
>   "count": 1,
>   "population": 1258
> }
>   ]
> }
>   }
> }
>
>
> Example when filter filters everything out:
>
> Request is the same, except the filter field value is
>
>   "filter": [
> "{!tag=latitude_f}latitude_f:[-90.0 TO -80.0]"
>   ]
>
> and response is
>
>  "facets":{
> "count":0}
>
> . I'm returned no facets whatsoever. However I'd expect the response to be
> the same as and for the first request, since the only one filter is used,
> and is excluded in faceting.
>
> Is this a bug? What are the workarounds for such problem?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Faceting with a multi valued field

2018-09-27 Thread Shawn Heisey

On 9/25/2018 2:14 PM, Hanjan, Harinder wrote:

Hello!


When starting a new topic on the mailing list, do not reply to an 
existing message.  Your thread is buried within a thread originally 
titled "Extracting top level URL when indexing document".


https://home.apache.org/~hossman/#threadhijack


Notice that the Communities facet has 2 non zero results. I understand this is 
because I'm using fq to get only documents which contain BANFF TRAIL but those 
documents also contain PARKDALE.


Facets return information for what the document that match the query 
contain.  ALL of the information.  The query that returned those matches 
is not examined at all when calculating facets, only the *results* of 
the query are examined.  I don't think there's any way you can exclude 
the information that you want to exclude, other than removing it from 
the documents entirely.  I would imagine that the PARKDALE information 
is required in those documents for other purposes and probably can't be 
removed.


Thanks,
Shawn



RE: [EXT] Re: Faceting with a multi valued field

2018-09-27 Thread Hanjan, Harinder
I control everything except the data that's being indexed. So I can manipulate 
the Solr query as needed.

I tried the facet.prefix option and initial testing shows promise. 
q=*:*=on=Communities=BANFF+TRAIL+-+BNF

Thanks much! 


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, September 25, 2018 3:14 PM
To: solr-user
Subject: [EXT] Re: Faceting with a multi valued field

What specifically do you control? Just keyword (and "Communities:"
part is locked?) or anything after q= or anything that allows multiple 
variables?

Because if you could isolate search value, you could use for example 
facet.prefix, set in solrconfig as a default parameter and populated from the 
same variable as the Communities search.

You may also want to set facet.mincount=1 in solrconfig.xml to avoid 0-value 
facets in general:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_faceting.html=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=xAdIgtTdaZYLG3jsYsLQqWtQBb9-cHsyG58r_mvTm-E=RgNvfB_bRwAfe9NpY1HedFlSHUNY0QbZ4VCXTzduTMo=

Regards,
   Alex.


On 25 September 2018 at 16:50, John Blythe  wrote:
> you can update your filter query to be a facet query, this will apply 
> the query to the resulting facet set instead of the Communities field itself.
>
> --
> John Blythe
>
>
> On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
> 
> wrote:
>
>> Hello!
>>
>> I am doing faceting on a field which has multiple values and it's 
>> yielding expected but undesireable results. I need different 
>> behaviour but not sure how to formulate a query for it. Here is my current 
>> setup.
>>
>> = Data Set =
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"], "Document 
>> Type":"Engagement - What We Heard Report", "Navigation":"Livelink", 
>> "SolrId":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_one=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=xAdIgtTdaZYLG3jsYsLQqWtQBb9-cHsyG58r_mvTm-E=-ZCoMFGNAEILlQOvY1Stra9dCF-rM48tZSTT3QJcOA0=;
>>   }
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"], "Document 
>> Type":"Engagement - What We Heard Report", "Navigation":"Livelink", 
>> "Id":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_two=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=xAdIgtTdaZYLG3jsYsLQqWtQBb9-cHsyG58r_mvTm-E=_JPFUX0e0zqyJWHQzWH815ThZAsdGu5TwDSkXBIL23Q=;
>>   }
>>   {
>> "Communities":["SUNALTA - SNA"],
>> "Document Type":"Engagement - What We Heard Report", 
>> "Navigation":"Livelink", 
>> "Id":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_three=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=xAdIgtTdaZYLG3jsYsLQqWtQBb9-cHsyG58r_mvTm-E=scFc0GYxSyRaAiAmu4M3AvYNiMsgqffG1Jmko76YjH8=;
>>   }
>>
>> = Query I run now =
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_
>> solr_everything_select-3Fq-3D-2A-3A-2A-26facet-3Don-26facet.field-3DC
>> ommunities-26fq-3DCommunities-3A-2522BANFF=DwIBaQ=jdm1Hby_BzoqwoY
>> zPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSN
>> EuM3U=xAdIgtTdaZYLG3jsYsLQqWtQBb9-cHsyG58r_mvTm-E=G7NJKKdDNh0wP5l
>> sjrSQnmbT77hUTSgx2giYBuQFdEI=
>> TRAIL - BNF"
>>
>>
>> = Results I get now =
>> {
>>   ...
>>   "facet_counts":{
>> "facet_queries":{},
>> "facet_fields":{
>>   "Communities":[
>> "BANFF TRAIL - BNF",2,
>> "PARKDALE - PKD",2,
>> "SUNALTA - SNA",0]},
>>...
>>
>> Notice that the Communities facet has 2 non zero results. I 
>> understand this is because I'm using fq to get only documents which 
>> contain BANFF TRAIL but those documents also contain PARKDALE.
>>
>> Now, I am using facets to drive navigation on my page. The business 
>> case is that user can select a community to get documents pertaining 
>> to that specific community only. This works with the query I have 
>> above. However, the facets results also contain other communities 
>> which then get displayed to the user. For example, with the query 
>> above, user will see both BANFF TRAIL and

RE: [EXT] Re: Faceting with a multi valued field

2018-09-27 Thread Hanjan, Harinder
John,

I just want to make sure I understand correctly. Replace, fq with facet.query?

So then the resultant query goes from:
q=*:*=on=Communities=Communities:"BANFF TRAIL - BNF"

to:
q=*:*=on=Communities="BANFF TRAIL - BNF"


If that's correct, then this does not resolve the issue. I still get 2 values 
under Communities facet.

Harinder

-Original Message-
From: John Blythe [mailto:johnbly...@gmail.com] 
Sent: Tuesday, September 25, 2018 2:50 PM
To: solr-user@lucene.apache.org
Subject: [EXT] Re: Faceting with a multi valued field

you can update your filter query to be a facet query, this will apply the query 
to the resulting facet set instead of the Communities field itself.

--
John Blythe


On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
wrote:

> Hello!
>
> I am doing faceting on a field which has multiple values and it's 
> yielding expected but undesireable results. I need different behaviour 
> but not sure how to formulate a query for it. Here is my current setup.
>
> = Data Set =
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"], "Document 
> Type":"Engagement - What We Heard Report", "Navigation":"Livelink", 
> "SolrId":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_one=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=PX7TJqsA8tYgbN7HmkGd0GNzotXPc3hcoc9xRvmOiXI=GMTgF731T72VIryx_v7VD5f_oBlbrzXYAB1UEBQMOOc=;
>   }
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"], "Document 
> Type":"Engagement - What We Heard Report", "Navigation":"Livelink", 
> "Id":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_two=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=PX7TJqsA8tYgbN7HmkGd0GNzotXPc3hcoc9xRvmOiXI=FN6T49z8wjc_mRdXnVHgcdZBcZB6O_InSyUzxaxxiM0=;
>   }
>   {
> "Communities":["SUNALTA - SNA"],
> "Document Type":"Engagement - What We Heard Report", 
> "Navigation":"Livelink", 
> "Id":"https://urldefense.proofpoint.com/v2/url?u=http-3A__thesimpsons.com_three=DwIBaQ=jdm1Hby_BzoqwoYzPsUCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3U=PX7TJqsA8tYgbN7HmkGd0GNzotXPc3hcoc9xRvmOiXI=HEJFyAhHIn5T-riqVVMR011KXAn38lZUDyRQ-ljC-qA=;
>   }
>
> = Query I run now =
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_s
> olr_everything_select-3Fq-3D-2A-3A-2A-26facet-3Don-26facet.field-3DCom
> munities-26fq-3DCommunities-3A-2522BANFF=DwIBaQ=jdm1Hby_BzoqwoYzPs
> UCHSCnNps9LuidNkyKDuvdq3M=N30IrhmaeKKhVHu13d-HO9gO9CysWnvGGoKrSNEuM3
> U=PX7TJqsA8tYgbN7HmkGd0GNzotXPc3hcoc9xRvmOiXI=Cx6EubqN_-ocYrZA6jsJ
> TGzodPqUPVu78eY1iMB_0L8=
> TRAIL - BNF"
>
>
> = Results I get now =
> {
>   ...
>   "facet_counts":{
> "facet_queries":{},
> "facet_fields":{
>   "Communities":[
> "BANFF TRAIL - BNF",2,
> "PARKDALE - PKD",2,
> "SUNALTA - SNA",0]},
>...
>
> Notice that the Communities facet has 2 non zero results. I understand 
> this is because I'm using fq to get only documents which contain BANFF 
> TRAIL but those documents also contain PARKDALE.
>
> Now, I am using facets to drive navigation on my page. The business 
> case is that user can select a community to get documents pertaining 
> to that specific community only. This works with the query I have 
> above. However, the facets results also contain other communities 
> which then get displayed to the user. For example, with the query 
> above, user will see both BANFF TRAIL and PARKDALE as selected values 
> even though user only selected BANFF TRAIL. It's worthwhile noting 
> that I have no control over the data being sent to Solr and can't change it.
>
> How can I formulate a query to ensure that when user selects BANFF 
> TRAIL, only BANFF TRAIL is returned under Solr facets?
>
> Thanks!
> Harinder
>
> 
> NOTICE -
> This communication is intended ONLY for the use of the person or 
> entity named above and may contain information that is confidential or 
> legally privileged. If you are not the intended recipient named above 
> or a person responsible for delivering messages or communications to 
> the intended recipient, YOU ARE HEREBY NOTIFIED that any use, 
> distribution, or copying of this communication or any of the 
> information contained in it is strictly prohibited. If you have 
> received this communication in error, please notify us immediately by 
> telephone and then destroy or delete this communication, or return it 
> to us by mail if requested by us. The City of Calgary thanks you for your 
> attention and co-operation.
>


Re: Faceting with a multi valued field

2018-09-25 Thread Alexandre Rafalovitch
What specifically do you control? Just keyword (and "Communities:"
part is locked?) or anything after q= or anything that allows multiple
variables?

Because if you could isolate search value, you could use for example
facet.prefix, set in solrconfig as a default parameter and populated
from the same variable as the Communities search.

You may also want to set facet.mincount=1 in solrconfig.xml to avoid
0-value facets in general:
https://lucene.apache.org/solr/guide/7_4/faceting.html

Regards,
   Alex.


On 25 September 2018 at 16:50, John Blythe  wrote:
> you can update your filter query to be a facet query, this will apply the
> query to the resulting facet set instead of the Communities field itself.
>
> --
> John Blythe
>
>
> On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
> wrote:
>
>> Hello!
>>
>> I am doing faceting on a field which has multiple values and it's yielding
>> expected but undesireable results. I need different behaviour but not sure
>> how to formulate a query for it. Here is my current setup.
>>
>> = Data Set =
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "SolrId":"http://thesimpsons.com/one;
>>   }
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "Id":"http://thesimpsons.com/two;
>>   }
>>   {
>> "Communities":["SUNALTA - SNA"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "Id":"http://thesimpsons.com/three;
>>   }
>>
>> = Query I run now =
>>
>> http://localhost:8984/solr/everything/select?q=*:*=on=Communities=Communities:"BANFF
>> TRAIL - BNF"
>>
>>
>> = Results I get now =
>> {
>>   ...
>>   "facet_counts":{
>> "facet_queries":{},
>> "facet_fields":{
>>   "Communities":[
>> "BANFF TRAIL - BNF",2,
>> "PARKDALE - PKD",2,
>> "SUNALTA - SNA",0]},
>>...
>>
>> Notice that the Communities facet has 2 non zero results. I understand
>> this is because I'm using fq to get only documents which contain BANFF
>> TRAIL but those documents also contain PARKDALE.
>>
>> Now, I am using facets to drive navigation on my page. The business case
>> is that user can select a community to get documents pertaining to that
>> specific community only. This works with the query I have above. However,
>> the facets results also contain other communities which then get displayed
>> to the user. For example, with the query above, user will see both BANFF
>> TRAIL and PARKDALE as selected values even though user only selected BANFF
>> TRAIL. It's worthwhile noting that I have no control over the data being
>> sent to Solr and can't change it.
>>
>> How can I formulate a query to ensure that when user selects BANFF TRAIL,
>> only BANFF TRAIL is returned under Solr facets?
>>
>> Thanks!
>> Harinder
>>
>> 
>> NOTICE -
>> This communication is intended ONLY for the use of the person or entity
>> named above and may contain information that is confidential or legally
>> privileged. If you are not the intended recipient named above or a person
>> responsible for delivering messages or communications to the intended
>> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
>> of this communication or any of the information contained in it is strictly
>> prohibited. If you have received this communication in error, please notify
>> us immediately by telephone and then destroy or delete this communication,
>> or return it to us by mail if requested by us. The City of Calgary thanks
>> you for your attention and co-operation.
>>


Re: Faceting with a multi valued field

2018-09-25 Thread John Blythe
you can update your filter query to be a facet query, this will apply the
query to the resulting facet set instead of the Communities field itself.

--
John Blythe


On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
wrote:

> Hello!
>
> I am doing faceting on a field which has multiple values and it's yielding
> expected but undesireable results. I need different behaviour but not sure
> how to formulate a query for it. Here is my current setup.
>
> = Data Set =
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "SolrId":"http://thesimpsons.com/one;
>   }
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "Id":"http://thesimpsons.com/two;
>   }
>   {
> "Communities":["SUNALTA - SNA"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "Id":"http://thesimpsons.com/three;
>   }
>
> = Query I run now =
>
> http://localhost:8984/solr/everything/select?q=*:*=on=Communities=Communities:"BANFF
> TRAIL - BNF"
>
>
> = Results I get now =
> {
>   ...
>   "facet_counts":{
> "facet_queries":{},
> "facet_fields":{
>   "Communities":[
> "BANFF TRAIL - BNF",2,
> "PARKDALE - PKD",2,
> "SUNALTA - SNA",0]},
>...
>
> Notice that the Communities facet has 2 non zero results. I understand
> this is because I'm using fq to get only documents which contain BANFF
> TRAIL but those documents also contain PARKDALE.
>
> Now, I am using facets to drive navigation on my page. The business case
> is that user can select a community to get documents pertaining to that
> specific community only. This works with the query I have above. However,
> the facets results also contain other communities which then get displayed
> to the user. For example, with the query above, user will see both BANFF
> TRAIL and PARKDALE as selected values even though user only selected BANFF
> TRAIL. It's worthwhile noting that I have no control over the data being
> sent to Solr and can't change it.
>
> How can I formulate a query to ensure that when user selects BANFF TRAIL,
> only BANFF TRAIL is returned under Solr facets?
>
> Thanks!
> Harinder
>
> 
> NOTICE -
> This communication is intended ONLY for the use of the person or entity
> named above and may contain information that is confidential or legally
> privileged. If you are not the intended recipient named above or a person
> responsible for delivering messages or communications to the intended
> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
> of this communication or any of the information contained in it is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by telephone and then destroy or delete this communication,
> or return it to us by mail if requested by us. The City of Calgary thanks
> you for your attention and co-operation.
>


Re: Faceting with EnumFieldType in 7.1

2018-09-20 Thread Walter Underwood
Yes.

Consider search for a bug database with severity levels in an enum field. A 
facet on severity would be a normal feature for that search.

I would call this a bug.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 20, 2018, at 8:41 AM, Peter Tyrrell  wrote:
> 
> I have to assume this is a bug. (EnumFieldType does not return facet values.) 
> I didn't want to create a ticket before floating the issue on this mailing 
> list, but absent any further guidance or discussion...
> 
> So one last shot. Should a field of EnumFieldType be useable as a facet field?
> 
> Thanks,
> 
> Peter
> 
> Peter Tyrrell, MLIS
> Lead Developer at Andornot
> 1-866-266-2525 x706 / ptyrr...@andornot.com
> 
> -Original Message-
> From: Peter Tyrrell 
> Sent: September 14, 2018 3:04 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Faceting with EnumFieldType in 7.1
> 
> Yes.
> 
> Peter Tyrrell, MLIS
> Lead Developer at Andornot
> 1-866-266-2525 x706 / ptyrr...@andornot.com
> 
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: September 13, 2018 8:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Faceting with EnumFieldType in 7.1
> 
> Was the document re-indexed in Solr 7.1?
> 
> Regards,
> Edwin
> 
> On Wed, 12 Sep 2018 at 23:38, Peter Tyrrell  wrote:
> 
>> I updated an older Solr 4.10 core to Solr 7.1 recently. In so doing, I 
>> took an old 'gradeLevel_enum' field of type EnumField and made it an 
>> EnumFieldType, since the former has been deprecated. The old core was 
>> able to facet on gradeLevel_enum, but the new 7.1 core just returns no 
>> facet values whatsoever for that field. Both cores return 
>> gradeLevel_enum values ok when fl=gradeLevel_enum.
>> 
>> In the schema, gradeLevel_enum is defined dynamically:
>> 
>> > multiValued="true" />
>> > enumsConfig="enumsConfig.xml" enumName="gradeLevels">
>> 
>> This simple query fails to return any facet values in 7.1, but does 
>> facet in 4.10:
>> 
>> 
>> http://localhost:8983/solr/core1/select?facet.field=gradeLevel_enum
>> cet=on=id,gradeLevel_enum=*:*=json
>> 
>> Thanks for any insight.
>> 
>> Peter Tyrrell, MLIS
>> Lead Developer at Andornot
>> 1-866-266-2525 x706 /
>> ptyrr...@andornot.com<mailto:ptyrr...@andornot.com>
>> 
>> 



RE: Faceting with EnumFieldType in 7.1

2018-09-20 Thread Peter Tyrrell
I have to assume this is a bug. (EnumFieldType does not return facet values.) I 
didn't want to create a ticket before floating the issue on this mailing list, 
but absent any further guidance or discussion...

So one last shot. Should a field of EnumFieldType be useable as a facet field?

Thanks,

Peter

Peter Tyrrell, MLIS
Lead Developer at Andornot
1-866-266-2525 x706 / ptyrr...@andornot.com

-Original Message-
From: Peter Tyrrell 
Sent: September 14, 2018 3:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Faceting with EnumFieldType in 7.1

Yes.

Peter Tyrrell, MLIS
Lead Developer at Andornot
1-866-266-2525 x706 / ptyrr...@andornot.com

-Original Message-
From: Zheng Lin Edwin Yeo 
Sent: September 13, 2018 8:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Faceting with EnumFieldType in 7.1

Was the document re-indexed in Solr 7.1?

Regards,
Edwin

On Wed, 12 Sep 2018 at 23:38, Peter Tyrrell  wrote:

> I updated an older Solr 4.10 core to Solr 7.1 recently. In so doing, I 
> took an old 'gradeLevel_enum' field of type EnumField and made it an 
> EnumFieldType, since the former has been deprecated. The old core was 
> able to facet on gradeLevel_enum, but the new 7.1 core just returns no 
> facet values whatsoever for that field. Both cores return 
> gradeLevel_enum values ok when fl=gradeLevel_enum.
>
> In the schema, gradeLevel_enum is defined dynamically:
>
>  multiValued="true" />
>  enumsConfig="enumsConfig.xml" enumName="gradeLevels">
>
> This simple query fails to return any facet values in 7.1, but does 
> facet in 4.10:
>
>
> http://localhost:8983/solr/core1/select?facet.field=gradeLevel_enum
> cet=on=id,gradeLevel_enum=*:*=json
>
> Thanks for any insight.
>
> Peter Tyrrell, MLIS
> Lead Developer at Andornot
> 1-866-266-2525 x706 /
> ptyrr...@andornot.com<mailto:ptyrr...@andornot.com>
>
>


RE: Faceting with EnumFieldType in 7.1

2018-09-14 Thread Peter Tyrrell
Yes.

Peter Tyrrell, MLIS
Lead Developer at Andornot
1-866-266-2525 x706 / ptyrr...@andornot.com

-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: September 13, 2018 8:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Faceting with EnumFieldType in 7.1

Was the document re-indexed in Solr 7.1?

Regards,
Edwin

On Wed, 12 Sep 2018 at 23:38, Peter Tyrrell  wrote:

> I updated an older Solr 4.10 core to Solr 7.1 recently. In so doing, I 
> took an old 'gradeLevel_enum' field of type EnumField and made it an 
> EnumFieldType, since the former has been deprecated. The old core was 
> able to facet on gradeLevel_enum, but the new 7.1 core just returns no 
> facet values whatsoever for that field. Both cores return 
> gradeLevel_enum values ok when fl=gradeLevel_enum.
>
> In the schema, gradeLevel_enum is defined dynamically:
>
>  multiValued="true" />
>  enumsConfig="enumsConfig.xml" enumName="gradeLevels">
>
> This simple query fails to return any facet values in 7.1, but does 
> facet in 4.10:
>
>
> http://localhost:8983/solr/core1/select?facet.field=gradeLevel_enum
> cet=on=id,gradeLevel_enum=*:*=json
>
> Thanks for any insight.
>
> Peter Tyrrell, MLIS
> Lead Developer at Andornot
> 1-866-266-2525 x706 / 
> ptyrr...@andornot.com<mailto:ptyrr...@andornot.com>
>
>


Re: Faceting with EnumFieldType in 7.1

2018-09-13 Thread Zheng Lin Edwin Yeo
Was the document re-indexed in Solr 7.1?

Regards,
Edwin

On Wed, 12 Sep 2018 at 23:38, Peter Tyrrell  wrote:

> I updated an older Solr 4.10 core to Solr 7.1 recently. In so doing, I
> took an old 'gradeLevel_enum' field of type EnumField and made it an
> EnumFieldType, since the former has been deprecated. The old core was able
> to facet on gradeLevel_enum, but the new 7.1 core just returns no facet
> values whatsoever for that field. Both cores return gradeLevel_enum values
> ok when fl=gradeLevel_enum.
>
> In the schema, gradeLevel_enum is defined dynamically:
>
>  multiValued="true" />
>  enumsConfig="enumsConfig.xml" enumName="gradeLevels">
>
> This simple query fails to return any facet values in 7.1, but does facet
> in 4.10:
>
>
> http://localhost:8983/solr/core1/select?facet.field=gradeLevel_enum=on=id,gradeLevel_enum=*:*=json
>
> Thanks for any insight.
>
> Peter Tyrrell, MLIS
> Lead Developer at Andornot
> 1-866-266-2525 x706 / ptyrr...@andornot.com
>
>


Re: Faceting with nested Document

2018-08-11 Thread Mikhail Khludnev
The first two mistakes are:
 - using fq for children fields ,
 - using a value master_id:0 as a parents' filter
Regarding the question, you are getting non-zero facets because you exclude
filter produces empty results.


Re: Faceting over ExternalFileField

2018-05-09 Thread Mikhail Khludnev
Absence of error is a bug for me. The problem is that eff is doubles not
strings with ordinals. It would be possible after
https://issues.apache.org/jira/browse/SOLR-10528
Now you can try to create several type:query subfacets passing either
{!frange} or just plain Lucene query (there is a slight chance they works
with eff).

On Wed, May 9, 2018 at 1:36 PM, Michal Danilák 
wrote:

> Is it possible to facet over ExternalFileField values?
>
> If I have this in my schema.xml:
>
>  keyField="id" defVal="0" stored="false" indexed="false" />
>
> 
>
> And request the following facet:
>
> facet={
> "age": {
> "field": "eff_age",
> "type": "terms",
> "limit": 10
> }
> }
>
> It returns an empty list of buckets.
>
> First, it doesn't throw an error, which means, it should do something.
> Second, if I'm not mistaken, external file fields behave like doc values,
> so it should be possible to facet over them.
>
> Am I doing something wrong? Is there some other way around this?
>
> Thanks
>
> -Michal
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Faceting question

2018-05-02 Thread Shawn Heisey
On 5/2/2018 2:56 PM, Weffelmeyer, Stacie wrote:
> Question on faceting.  We have a dynamicField that we want to facet
> on. Below is the field and the type of information that field generates.
>
>  
>
> cid:image001.png@01D3E22D.DE028870
>

This image is not available.  This mailing list will almost always strip
attachments from email that it receives.

>    
> "*customMetadata*":["{\"controlledContent\":{\"metadata\":{\"programs\":[\"program1\"],\"departments\":[\"department1\"],\"locations\":[\"location1\"],\"functions\":[\"function1\"],\"customTags\":[\"customTag1\",\"customTag2\"],\"corporate\":false,\"redline\":false},\"who\":{\"lastUpdateDate\":\"2018-04-26T14:35:02.268Z\",\"creationDate\":\"2018-04-26T14:35:01.445Z\",\"createdBy\":38853},\"clientOwners\":[38853],\"clientLastUpdateDate\":\"2018-04-25T21:15:06.000Z\",\"clientCreationDate\":\"2018-04-25T20:58:34.000Z\",\"clientContentId\":\"DOC-8030\",\"type\":{\"applicationId\":2574,\"code\":\"WI\",\"name\":\"Work
> Instruction\",\"id\":\"5ac3d4d111570f0047a8ceb9\"},\"status\":\"active\",\"version\":1}}"],
>

I do not know what this is.  It looks a little like JSON.  But if it's
json, there are a lot of escaped quotes in it, and I don't really know
what I'm looking at.

>  
>
> It will always have customMetadata.controlledContent.metadata
>
>  
>
> Then from metadata, it could be anything, which is why it is a
> dynamicField.
>
>  
>
> In this example there is
>
> customMetadata.controlledContent.metadata.programs
>
> customMetadata.controlledContent.metadata.departments
>
> customMetadata.controlledContent.metadata.locations
>

Solr does not have the concept of a nested data type.  So how are you
getting from all that text above to period-delimited strings in a
hierarchy?  If you're using some kind of custom plugin for Solr to have
it support something it doesn't do out of the box, you're probably going
to need to talk to the author of that plugin.

Solr's dynamicField support is only dynamic in the sense that the
precise field name is not found in the schema.  The field name is
dynamic.  When it comes to what's IN the field, it doesn't matter
whether it's a dynamic field or not.

> If I enable faceting, it will do so with the field customMetadata. But
> it doesn’t help because it separates every space as a term.  But
> ideally I want to facet on customMetadata.controlledContent.metadata.
> Doing so brings back no facets.
>
>  
>
> Is this possible?  How can we best accomplish this?
>

We will need to understand exactly what you are indexing, what's in your
schema, the exact query requests you are sending, and what you are
expecting back.

Thanks,
Shawn



Re: Faceting Word Count

2017-11-09 Thread Toke Eskildsen
On Wed, 2017-11-08 at 16:58 +0200, Wael Kader wrote:
> Facets are taking around 1 minute to return data now.

Can you verify if this is due to updates causing a new searcher to be
opened or if it just takes that long? Easy way to test it to stop
updating the index then do a few call with different search criteria:
Do they all take ~1 minute or just the first?

- Toke Eskildsen, Royal Danish Library 



Re: Faceting Word Count

2017-11-08 Thread alessandro.benedetti
Apart from the performance, to get a "word cloud" from a subset of documents
it is a slighly different problem than getting the facets out of it.

If my understanding is correct, what you want is to extract the "significant
terms" out of your results set.[1]

Using faceting is a rough approximation, that may be good enough in your
case.
I second the previous comments and in addition I definitely discourage the
term enum approach if you have million of terms...

[1] https://issues.apache.org/jira/browse/SOLR-9851



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Faceting Word Count

2017-11-08 Thread Wael Kader
Hi,

I want to know the best option for getting word cloud in SOLR.
Is it saving the data as multivalued, using vector, JSON faceting(didn't
work with me)? Terms doesn't work because I can't provide any criteria.

I don't mind changing the design but I need to know the best feasible way
that won't make any problems on the long run.
I want to be able to get the word frequency based on a criteria. Facets are
taking around 1 minute to return data now.

Regards,
Wael

On Wed, Nov 8, 2017 at 11:06 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Wael,
> You can try out JSON faceting - it’s not just about rq/resp format, but it
> uses different implementation as well. In any case you will have to index
> documents differently in order to be able to use docValues.
>
> HTH
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 Nov 2017, at 09:26, Wael Kader  wrote:
> >
> > Hi,
> >
> > The whole index has 100M but when I add the criteria, it will filter the
> > data to maybe 10k as a max number of rows.
> > The facet isn't working when the total number of records in the index is
> > 100M but it was working at 5M.
> >
> > I have social media & RSS data in the index and I am trying to get the
> word
> > count for a specific user on specific date intervals.
> >
> > Regards,
> > Wael
> >
> > On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
> > wrote:
> >
> >> _Why_ do you want to get the word counts? Faceting on all of the
> >> tokens for 100M docs isn't something Solr is ordinarily used for. As
> >> Emir says it'll take a huge amount of memory. You can use one of the
> >> function queries (termfreq IIRC) that will give you the count of any
> >> individual term you have and will be very fast.
> >>
> >> But getting all of the word counts in the index is probably not
> >> something I'd use Solr for.
> >>
> >> This may be an XY problem, you're asking how to do something specific
> >> (X) without explaining what the problem you're trying to solve is (Y).
> >> Perhaps there's another way to accomplish (Y) if we knew more about
> >> what it is.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
> >>  wrote:
> >>> Hi Wael,
> >>> You are faceting on analyzed field. This results in field being
> >> uninverted - fieldValueCache being built - on first call after every
> >> commit. This is both time and memory consuming (you can check in admin
> >> console in stats how much memory it took).
> >>> What you need to do is to create multivalue string field (not text) and
> >> parse values (do analysis steps) on client side and store it like that.
> >> This will allow you to enable docValues on that field and avoid building
> >> fieldValueCache.
> >>>
> >>> HTH,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 6 Nov 2017, at 13:06, Wael Kader  wrote:
> 
>  Hi,
> 
>  I am using a custom field. Below is the field definition.
>  I am using this because I don't want stemming.
> 
> 
>  positionIncrementGap="100">
>  
>  mapping="mapping-ISOLatin1Accent.txt"/>
>    
> 
>    ignoreCase="true"
>    words="stopwords.txt"
>    enablePositionIncrements="true"
>    />
>    protected="protwords.txt"
>    generateWordParts="0"
>    generateNumberParts="1"
>    catenateWords="1"
>    catenateNumbers="1"
>    catenateAll="0"
>    splitOnCaseChange="1"
>    preserveOriginal="1"/>
>    
> 
>    
>  
>  
>  mapping="mapping-ISOLatin1Accent.txt"/>
>    
> >> synonyms="synonyms.txt"
>  ignoreCase="true" expand="true"/>
>    ignoreCase="true"
>    words="stopwords.txt"
>    enablePositionIncrements="true"
>    />
>  
>    protected="protwords.txt"
>    generateWordParts="0"
>    catenateWords="0"
>    catenateNumbers="0"
>    catenateAll="0"
>    splitOnCaseChange="1"
>    preserveOriginal="1"/>
>    
>    
>    
>    
>  
>    
> 
> 
>  Regards,
>  Wael
> 
>  On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
>  emir.arnauto...@sematext.com> wrote:
> 
> > Hi Wael,
> > Can you provide your field definition and 

Re: Faceting Word Count

2017-11-08 Thread Emir Arnautović
Hi Wael,
You can try out JSON faceting - it’s not just about rq/resp format, but it uses 
different implementation as well. In any case you will have to index documents 
differently in order to be able to use docValues.

HTH
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Nov 2017, at 09:26, Wael Kader  wrote:
> 
> Hi,
> 
> The whole index has 100M but when I add the criteria, it will filter the
> data to maybe 10k as a max number of rows.
> The facet isn't working when the total number of records in the index is
> 100M but it was working at 5M.
> 
> I have social media & RSS data in the index and I am trying to get the word
> count for a specific user on specific date intervals.
> 
> Regards,
> Wael
> 
> On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
> wrote:
> 
>> _Why_ do you want to get the word counts? Faceting on all of the
>> tokens for 100M docs isn't something Solr is ordinarily used for. As
>> Emir says it'll take a huge amount of memory. You can use one of the
>> function queries (termfreq IIRC) that will give you the count of any
>> individual term you have and will be very fast.
>> 
>> But getting all of the word counts in the index is probably not
>> something I'd use Solr for.
>> 
>> This may be an XY problem, you're asking how to do something specific
>> (X) without explaining what the problem you're trying to solve is (Y).
>> Perhaps there's another way to accomplish (Y) if we knew more about
>> what it is.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>>  wrote:
>>> Hi Wael,
>>> You are faceting on analyzed field. This results in field being
>> uninverted - fieldValueCache being built - on first call after every
>> commit. This is both time and memory consuming (you can check in admin
>> console in stats how much memory it took).
>>> What you need to do is to create multivalue string field (not text) and
>> parse values (do analysis steps) on client side and store it like that.
>> This will allow you to enable docValues on that field and avoid building
>> fieldValueCache.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 6 Nov 2017, at 13:06, Wael Kader  wrote:
 
 Hi,
 
 I am using a custom field. Below is the field definition.
 I am using this because I don't want stemming.
 
 
   >>> positionIncrementGap="100">
 
   >>> mapping="mapping-ISOLatin1Accent.txt"/>
   
 
   >>>   ignoreCase="true"
   words="stopwords.txt"
   enablePositionIncrements="true"
   />
   >>>   protected="protwords.txt"
   generateWordParts="0"
   generateNumberParts="1"
   catenateWords="1"
   catenateNumbers="1"
   catenateAll="0"
   splitOnCaseChange="1"
   preserveOriginal="1"/>
   
 
   
 
 
   >>> mapping="mapping-ISOLatin1Accent.txt"/>
   
   > synonyms="synonyms.txt"
 ignoreCase="true" expand="true"/>
   >>>   ignoreCase="true"
   words="stopwords.txt"
   enablePositionIncrements="true"
   />
 
   >>>   protected="protwords.txt"
   generateWordParts="0"
   catenateWords="0"
   catenateNumbers="0"
   catenateAll="0"
   splitOnCaseChange="1"
   preserveOriginal="1"/>
   
   
   
   
 
   
 
 
 Regards,
 Wael
 
 On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
 emir.arnauto...@sematext.com> wrote:
 
> Hi Wael,
> Can you provide your field definition and sample query.
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
> 
> 
> 
>> On 6 Nov 2017, at 08:30, Wael Kader  wrote:
>> 
>> Hello,
>> 
>> I am having an index with around 100 Million documents.
>> I have a multivalued column that I am saving big chunks of text data
>> in.
> It
>> has around 20 GB of RAM and 4 CPU's.
>> 
>> I was doing faceting on it to get word cloud but it was taking around
>> 1
>> second to retrieve when the data was 5-10 Million .
>> Now I have more data and its taking minutes to get the results (that
>> is
> if
>> it gets it and SOLR doesn't crash). Whats 

Re: Faceting Word Count

2017-11-07 Thread Erick Erickson
bq: 10k as a max number of rows.

This doesn't matter. In order to facet on the word count, Solr has to
be prepared to facet on all possible docs. For all Solr knows, a
_single_ document may contain every word so the size of the structure
that contains the counters has to be prepared for N buckets, where N
is the total number of distinct words in the entire corpus.

You'll really have to find an alternative approach, somehow restrict
the choices etc. I think.

Best,
Erick

On Tue, Nov 7, 2017 at 12:26 AM, Wael Kader  wrote:
> Hi,
>
> The whole index has 100M but when I add the criteria, it will filter the
> data to maybe 10k as a max number of rows.
> The facet isn't working when the total number of records in the index is
> 100M but it was working at 5M.
>
> I have social media & RSS data in the index and I am trying to get the word
> count for a specific user on specific date intervals.
>
> Regards,
> Wael
>
> On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
> wrote:
>
>> _Why_ do you want to get the word counts? Faceting on all of the
>> tokens for 100M docs isn't something Solr is ordinarily used for. As
>> Emir says it'll take a huge amount of memory. You can use one of the
>> function queries (termfreq IIRC) that will give you the count of any
>> individual term you have and will be very fast.
>>
>> But getting all of the word counts in the index is probably not
>> something I'd use Solr for.
>>
>> This may be an XY problem, you're asking how to do something specific
>> (X) without explaining what the problem you're trying to solve is (Y).
>> Perhaps there's another way to accomplish (Y) if we knew more about
>> what it is.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>>  wrote:
>> > Hi Wael,
>> > You are faceting on analyzed field. This results in field being
>> uninverted - fieldValueCache being built - on first call after every
>> commit. This is both time and memory consuming (you can check in admin
>> console in stats how much memory it took).
>> > What you need to do is to create multivalue string field (not text) and
>> parse values (do analysis steps) on client side and store it like that.
>> This will allow you to enable docValues on that field and avoid building
>> fieldValueCache.
>> >
>> > HTH,
>> > Emir
>> > --
>> > Monitoring - Log Management - Alerting - Anomaly Detection
>> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>> >
>> >
>> >> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am using a custom field. Below is the field definition.
>> >> I am using this because I don't want stemming.
>> >>
>> >>
>> >>> >> positionIncrementGap="100">
>> >>  
>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>> >>
>> >>
>> >>> >>ignoreCase="true"
>> >>words="stopwords.txt"
>> >>enablePositionIncrements="true"
>> >>/>
>> >>> >>protected="protwords.txt"
>> >>generateWordParts="0"
>> >>generateNumberParts="1"
>> >>catenateWords="1"
>> >>catenateNumbers="1"
>> >>catenateAll="0"
>> >>splitOnCaseChange="1"
>> >>preserveOriginal="1"/>
>> >>
>> >>
>> >>
>> >>  
>> >>  
>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>> >>
>> >>> synonyms="synonyms.txt"
>> >> ignoreCase="true" expand="true"/>
>> >>> >>ignoreCase="true"
>> >>words="stopwords.txt"
>> >>enablePositionIncrements="true"
>> >>/>
>> >> 
>> >>> >>protected="protwords.txt"
>> >>generateWordParts="0"
>> >>catenateWords="0"
>> >>catenateNumbers="0"
>> >>catenateAll="0"
>> >>splitOnCaseChange="1"
>> >>preserveOriginal="1"/>
>> >>
>> >>
>> >>
>> >>
>> >>  
>> >>
>> >>
>> >>
>> >> Regards,
>> >> Wael
>> >>
>> >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
>> >> emir.arnauto...@sematext.com> wrote:
>> >>
>> >>> Hi Wael,
>> >>> Can you provide your field definition and sample query.
>> >>>
>> >>> Thanks,
>> >>> Emir
>> >>> --
>> >>> Monitoring - Log Management - Alerting - Anomaly Detection
>> >>> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> >>>
>> >>>
>> >>>
>>  On 6 Nov 2017, at 08:30, Wael Kader  wrote:
>> 
>>  Hello,
>> 
>>  I am having an index with around 100 Million documents.
>>  I have a multivalued column that I am saving big chunks of text data
>> in.
>> >>> It
>>  has around 20 GB of RAM and 4 CPU's.
>> 
>>  I was doing faceting on it to get word cloud but 

Re: Faceting Word Count

2017-11-07 Thread Wael Kader
Hi,

The whole index has 100M but when I add the criteria, it will filter the
data to maybe 10k as a max number of rows.
The facet isn't working when the total number of records in the index is
100M but it was working at 5M.

I have social media & RSS data in the index and I am trying to get the word
count for a specific user on specific date intervals.

Regards,
Wael

On Mon, Nov 6, 2017 at 3:42 PM, Erick Erickson 
wrote:

> _Why_ do you want to get the word counts? Faceting on all of the
> tokens for 100M docs isn't something Solr is ordinarily used for. As
> Emir says it'll take a huge amount of memory. You can use one of the
> function queries (termfreq IIRC) that will give you the count of any
> individual term you have and will be very fast.
>
> But getting all of the word counts in the index is probably not
> something I'd use Solr for.
>
> This may be an XY problem, you're asking how to do something specific
> (X) without explaining what the problem you're trying to solve is (Y).
> Perhaps there's another way to accomplish (Y) if we knew more about
> what it is.
>
> Best,
> Erick
>
>
>
> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>  wrote:
> > Hi Wael,
> > You are faceting on analyzed field. This results in field being
> uninverted - fieldValueCache being built - on first call after every
> commit. This is both time and memory consuming (you can check in admin
> console in stats how much memory it took).
> > What you need to do is to create multivalue string field (not text) and
> parse values (do analysis steps) on client side and store it like that.
> This will allow you to enable docValues on that field and avoid building
> fieldValueCache.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
> >>
> >> Hi,
> >>
> >> I am using a custom field. Below is the field definition.
> >> I am using this because I don't want stemming.
> >>
> >>
> >> >> positionIncrementGap="100">
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>generateNumberParts="1"
> >>catenateWords="1"
> >>catenateNumbers="1"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>  
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> 
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>catenateWords="0"
> >>catenateNumbers="0"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>
> >>  
> >>
> >>
> >>
> >> Regards,
> >> Wael
> >>
> >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Wael,
> >>> Can you provide your field definition and sample query.
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 6 Nov 2017, at 08:30, Wael Kader  wrote:
> 
>  Hello,
> 
>  I am having an index with around 100 Million documents.
>  I have a multivalued column that I am saving big chunks of text data
> in.
> >>> It
>  has around 20 GB of RAM and 4 CPU's.
> 
>  I was doing faceting on it to get word cloud but it was taking around
> 1
>  second to retrieve when the data was 5-10 Million .
>  Now I have more data and its taking minutes to get the results (that
> is
> >>> if
>  it gets it and SOLR doesn't crash). Whats the best way to make it run
> or
>  maybe its not scalable to make it run on my current schema and design
> >>> with
>  News articles.
> 
>  I am looking to find the best solution for this. Maybe create another
> >>> index
>  to split the data while inserting it or maybe if I change some
> settings
> >>> in
>  SolrConfig or add some RAM, it would perform better.
> 
>  --
>  Regards,
>  Wael
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> 

Re: Faceting Word Count

2017-11-06 Thread Jokin C
He said that it's using to get a word cloud, if it's not related to the
search and it's a generic word cloud of the index, using the luke request
handler to get the first 250 o 500 word could work.

http://localhost:8983/solr/core/admin/luke?fl=text=500=json


On Mon, Nov 6, 2017 at 4:42 PM, Erick Erickson 
wrote:

> _Why_ do you want to get the word counts? Faceting on all of the
> tokens for 100M docs isn't something Solr is ordinarily used for. As
> Emir says it'll take a huge amount of memory. You can use one of the
> function queries (termfreq IIRC) that will give you the count of any
> individual term you have and will be very fast.
>
> But getting all of the word counts in the index is probably not
> something I'd use Solr for.
>
> This may be an XY problem, you're asking how to do something specific
> (X) without explaining what the problem you're trying to solve is (Y).
> Perhaps there's another way to accomplish (Y) if we knew more about
> what it is.
>
> Best,
> Erick
>
>
>
> On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
>  wrote:
> > Hi Wael,
> > You are faceting on analyzed field. This results in field being
> uninverted - fieldValueCache being built - on first call after every
> commit. This is both time and memory consuming (you can check in admin
> console in stats how much memory it took).
> > What you need to do is to create multivalue string field (not text) and
> parse values (do analysis steps) on client side and store it like that.
> This will allow you to enable docValues on that field and avoid building
> fieldValueCache.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
> >>
> >> Hi,
> >>
> >> I am using a custom field. Below is the field definition.
> >> I am using this because I don't want stemming.
> >>
> >>
> >> >> positionIncrementGap="100">
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>generateNumberParts="1"
> >>catenateWords="1"
> >>catenateNumbers="1"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>  
> >>  
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>
> >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> >>ignoreCase="true"
> >>words="stopwords.txt"
> >>enablePositionIncrements="true"
> >>/>
> >> 
> >> >>protected="protwords.txt"
> >>generateWordParts="0"
> >>catenateWords="0"
> >>catenateNumbers="0"
> >>catenateAll="0"
> >>splitOnCaseChange="1"
> >>preserveOriginal="1"/>
> >>
> >>
> >>
> >>
> >>  
> >>
> >>
> >>
> >> Regards,
> >> Wael
> >>
> >> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Wael,
> >>> Can you provide your field definition and sample query.
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
>  On 6 Nov 2017, at 08:30, Wael Kader  wrote:
> 
>  Hello,
> 
>  I am having an index with around 100 Million documents.
>  I have a multivalued column that I am saving big chunks of text data
> in.
> >>> It
>  has around 20 GB of RAM and 4 CPU's.
> 
>  I was doing faceting on it to get word cloud but it was taking around
> 1
>  second to retrieve when the data was 5-10 Million .
>  Now I have more data and its taking minutes to get the results (that
> is
> >>> if
>  it gets it and SOLR doesn't crash). Whats the best way to make it run
> or
>  maybe its not scalable to make it run on my current schema and design
> >>> with
>  News articles.
> 
>  I am looking to find the best solution for this. Maybe create another
> >>> index
>  to split the data while inserting it or maybe if I change some
> settings
> >>> in
>  SolrConfig or add some RAM, it would perform better.
> 
>  --
>  Regards,
>  Wael
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Wael
> >
>


Re: Faceting Word Count

2017-11-06 Thread Erick Erickson
_Why_ do you want to get the word counts? Faceting on all of the
tokens for 100M docs isn't something Solr is ordinarily used for. As
Emir says it'll take a huge amount of memory. You can use one of the
function queries (termfreq IIRC) that will give you the count of any
individual term you have and will be very fast.

But getting all of the word counts in the index is probably not
something I'd use Solr for.

This may be an XY problem, you're asking how to do something specific
(X) without explaining what the problem you're trying to solve is (Y).
Perhaps there's another way to accomplish (Y) if we knew more about
what it is.

Best,
Erick



On Mon, Nov 6, 2017 at 4:15 AM, Emir Arnautović
 wrote:
> Hi Wael,
> You are faceting on analyzed field. This results in field being uninverted - 
> fieldValueCache being built - on first call after every commit. This is both 
> time and memory consuming (you can check in admin console in stats how much 
> memory it took).
> What you need to do is to create multivalue string field (not text) and parse 
> values (do analysis steps) on client side and store it like that. This will 
> allow you to enable docValues on that field and avoid building 
> fieldValueCache.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
>>
>> Hi,
>>
>> I am using a custom field. Below is the field definition.
>> I am using this because I don't want stemming.
>>
>>
>>> positionIncrementGap="100">
>>  
>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>
>>
>>>ignoreCase="true"
>>words="stopwords.txt"
>>enablePositionIncrements="true"
>>/>
>>>protected="protwords.txt"
>>generateWordParts="0"
>>generateNumberParts="1"
>>catenateWords="1"
>>catenateNumbers="1"
>>catenateAll="0"
>>splitOnCaseChange="1"
>>preserveOriginal="1"/>
>>
>>
>>
>>  
>>  
>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>
>>> ignoreCase="true" expand="true"/>
>>>ignoreCase="true"
>>words="stopwords.txt"
>>enablePositionIncrements="true"
>>/>
>> 
>>>protected="protwords.txt"
>>generateWordParts="0"
>>catenateWords="0"
>>catenateNumbers="0"
>>catenateAll="0"
>>splitOnCaseChange="1"
>>preserveOriginal="1"/>
>>
>>
>>
>>
>>  
>>
>>
>>
>> Regards,
>> Wael
>>
>> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>
>>> Hi Wael,
>>> Can you provide your field definition and sample query.
>>>
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
 On 6 Nov 2017, at 08:30, Wael Kader  wrote:

 Hello,

 I am having an index with around 100 Million documents.
 I have a multivalued column that I am saving big chunks of text data in.
>>> It
 has around 20 GB of RAM and 4 CPU's.

 I was doing faceting on it to get word cloud but it was taking around 1
 second to retrieve when the data was 5-10 Million .
 Now I have more data and its taking minutes to get the results (that is
>>> if
 it gets it and SOLR doesn't crash). Whats the best way to make it run or
 maybe its not scalable to make it run on my current schema and design
>>> with
 News articles.

 I am looking to find the best solution for this. Maybe create another
>>> index
 to split the data while inserting it or maybe if I change some settings
>>> in
 SolrConfig or add some RAM, it would perform better.

 --
 Regards,
 Wael
>>>
>>>
>>
>>
>> --
>> Regards,
>> Wael
>


Re: Faceting Word Count

2017-11-06 Thread Emir Arnautović
Hi Wael,
You are faceting on analyzed field. This results in field being uninverted - 
fieldValueCache being built - on first call after every commit. This is both 
time and memory consuming (you can check in admin console in stats how much 
memory it took). 
What you need to do is to create multivalue string field (not text) and parse 
values (do analysis steps) on client side and store it like that. This will 
allow you to enable docValues on that field and avoid building fieldValueCache.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Nov 2017, at 13:06, Wael Kader  wrote:
> 
> Hi,
> 
> I am using a custom field. Below is the field definition.
> I am using this because I don't want stemming.
> 
> 
> positionIncrementGap="100">
>  
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>protected="protwords.txt"
>generateWordParts="0"
>generateNumberParts="1"
>catenateWords="1"
>catenateNumbers="1"
>catenateAll="0"
>splitOnCaseChange="1"
>preserveOriginal="1"/>
>
> 
>
>  
>  
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> 
>protected="protwords.txt"
>generateWordParts="0"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>splitOnCaseChange="1"
>preserveOriginal="1"/>
>
>
>
>
>  
>
> 
> 
> Regards,
> Wael
> 
> On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Wael,
>> Can you provide your field definition and sample query.
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 6 Nov 2017, at 08:30, Wael Kader  wrote:
>>> 
>>> Hello,
>>> 
>>> I am having an index with around 100 Million documents.
>>> I have a multivalued column that I am saving big chunks of text data in.
>> It
>>> has around 20 GB of RAM and 4 CPU's.
>>> 
>>> I was doing faceting on it to get word cloud but it was taking around 1
>>> second to retrieve when the data was 5-10 Million .
>>> Now I have more data and its taking minutes to get the results (that is
>> if
>>> it gets it and SOLR doesn't crash). Whats the best way to make it run or
>>> maybe its not scalable to make it run on my current schema and design
>> with
>>> News articles.
>>> 
>>> I am looking to find the best solution for this. Maybe create another
>> index
>>> to split the data while inserting it or maybe if I change some settings
>> in
>>> SolrConfig or add some RAM, it would perform better.
>>> 
>>> --
>>> Regards,
>>> Wael
>> 
>> 
> 
> 
> -- 
> Regards,
> Wael



Re: Faceting Word Count

2017-11-06 Thread Wael Kader
Hi,

I am using a custom field. Below is the field definition.
I am using this because I don't want stemming.



  








  
  










  



Regards,
Wael

On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Wael,
> Can you provide your field definition and sample query.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 6 Nov 2017, at 08:30, Wael Kader  wrote:
> >
> > Hello,
> >
> > I am having an index with around 100 Million documents.
> > I have a multivalued column that I am saving big chunks of text data in.
> It
> > has around 20 GB of RAM and 4 CPU's.
> >
> > I was doing faceting on it to get word cloud but it was taking around 1
> > second to retrieve when the data was 5-10 Million .
> > Now I have more data and its taking minutes to get the results (that is
> if
> > it gets it and SOLR doesn't crash). Whats the best way to make it run or
> > maybe its not scalable to make it run on my current schema and design
> with
> > News articles.
> >
> > I am looking to find the best solution for this. Maybe create another
> index
> > to split the data while inserting it or maybe if I change some settings
> in
> > SolrConfig or add some RAM, it would perform better.
> >
> > --
> > Regards,
> > Wael
>
>


-- 
Regards,
Wael


Re: Faceting Word Count

2017-11-06 Thread Emir Arnautović
Hi Wael,
Can you provide your field definition and sample query.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Nov 2017, at 08:30, Wael Kader  wrote:
> 
> Hello,
> 
> I am having an index with around 100 Million documents.
> I have a multivalued column that I am saving big chunks of text data in. It
> has around 20 GB of RAM and 4 CPU's.
> 
> I was doing faceting on it to get word cloud but it was taking around 1
> second to retrieve when the data was 5-10 Million .
> Now I have more data and its taking minutes to get the results (that is if
> it gets it and SOLR doesn't crash). Whats the best way to make it run or
> maybe its not scalable to make it run on my current schema and design with
> News articles.
> 
> I am looking to find the best solution for this. Maybe create another index
> to split the data while inserting it or maybe if I change some settings in
> SolrConfig or add some RAM, it would perform better.
> 
> -- 
> Regards,
> Wael



Re: Faceting and Grouping Performance Degradation in Solr 5

2017-02-06 Thread Solr User
I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User  wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variancemaybe network, etc.
>
> Thoughts?
>
>
> Date9/29/20169/29/
> 20169/29/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/201610/3/
> 201610/3/201610/3/201610/3/2016Solr
> Version5.5.25.5.24.8.14.
> 8.14.8.15.5.25.5.25.5.2<
> /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873
> 57873176958593694593694
> 578735787357873578730<
> /td>00<
> /td>0Segment Count3434 td>1827273434<
> td>34348811 td>8811
> facet.method=uifYESYESN/A<
> td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario
> #1198210145186<
> td>190208209210206 td>1091427370160 td>1098385Scenario
> #29288596258 td>7270777468<
> td>7363616654
> 5251
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>>> wrote:
>>>
 On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
 > Further testing indicates that any performance difference is not due
 > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
 > deletes.

 Sanity check: Could you describe how you test?

 * How many queries do you issue for each test?
 * Are each query a new one or do you re-use the same query?
 * Do you discard the first X calls?
 * Are the numbers averages, medians or something third?
 * What do you do about disk cache?
 * Are both Solr's on the same machine?
 * Do they 

Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-05 Thread Zheng Lin Edwin Yeo
Thanks for the information.
Will try them out.

Regards,
Edwin

On 5 October 2016 at 02:58, Mikhail Khludnev  wrote:

> Edwin,
> It seems like you try to pull document hierarchy back. That's usually done
> by searching parents and fl=[child ..],,.
>
> On Tue, Oct 4, 2016 at 5:22 PM, Zheng Lin Edwin Yeo 
> wrote:
>
> > Some of the sample documents are like the following:
> >
> > Author is the Header, while Books are the Child
> >
> > Author: Edwin
> > Books: Book 1
> >Book 2
> >Book 3
> >
> > Author: John
> > Books: Book 4
> >Book 5
> >
> > For this query:
> >
> > http://localhost:8983/solr/collection1/select?q=*:*
> > ={
> >author:{
> > type:terms,
> > field:author_s,
> >   domain: { blockParent : "type_s:author" }
> >},
> >books:{
> > type:terms,
> > field:book_s,
> > domain: { blockChild : "type_s:book" }
> >   }
> > }=null=0
> >
> > I'll get the following results:
> >
> > "facets":{
> > "count":2,
> > "author":{
> >   "buckets":[{
> >   "val":"Edwin",
> >   "count":1,
> >   "books":{
> > "buckets":[]}},
> > {
> >   "val":"John",
> >   "count":1,
> >   "books":{
> > "buckets":[]}},
> >
> > I can't manage to get the list of books to be displayed in the buckets
> for
> > books.
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 October 2016 at 19:29, Yonik Seeley  wrote:
> >
> > > Perhaps show a couple sample documents, and then what data you're
> > > looking for in a response?
> > > This stuff can be tough to pin down without concrete examples.
> > >
> > > -Yonik
> > >
> > >
> > > On Tue, Oct 4, 2016 at 5:22 AM, Zheng Lin Edwin Yeo
> > >  wrote:
> > > > I have tried to use this nested query, but I still can't get results
> > for
> > > > the list of books.
> > > >
> > > > http://localhost:8983/solr/collection1/select?q=*:*
> > > > ={
> > > >items:{
> > > >   type:terms,
> > > >   field:author_s,
> > > >  domain: { blockParent : "type_s:author" },
> > > >  facet:{
> > > > by1:{
> > > > type:terms,
> > > > field:book_s,
> > > > domain: { blockChild : "type_s:book" }
> > > > }
> > > > }
> > > >  }
> > > >}
> > > > }=null=0
> > > >
> > > >
> > > > Only when I didn't use the nested method, but query it individually
> > like
> > > > the one below, the I managed to get the result.
> > > >
> > > > http://localhost:8983/solr/collection1/select?q=*:*
> > > > ={
> > > >items:{
> > > > type:terms,
> > > > field:author_s,
> > > >   domain: { blockParent : "type_s:author" }
> > > >},
> > > >by1:{
> > > > type:terms,
> > > > field:book_s,
> > > > domain: { blockChild : "type_s:book" }
> > > >   }
> > > > }=null=0
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 4 October 2016 at 15:22, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> You need to switch the domain to the child records. It is somewhere
> in
> > > the
> > > >> guide or Yonik's blog linked.
> > > >>
> > > >> Regards,
> > > >>Alex
> > > >>
> > > >> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo" 
> > > wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > Is it possible to do nested faceting on both records in parent and
> > > child
> > > >> in
> > > >> > a single query?
> > > >> >
> > > >> > For example, I want to facet both author_s and book_s. Author is
> > > indexed
> > > >> as
> > > >> > a parent, whereas Book is indexed as a child.
> > > >> >
> > > >> > I tried the following JSON Facet query, which is to do a facet of
> > all
> > > the
> > > >> > list of author (in the parent), followed by a facet of all the
> list
> > of
> > > >> > books (in the child) that are written by the author.
> > > >> >
> > > >> > http://localhost:8983/solr/collection1/select?q=*:*
> > > >> > ={
> > > >> >items:{
> > > >> >   type:terms,
> > > >> >   field:author_s,
> > > >> >  facet:{
> > > >> > by1:{
> > > >> > type:terms,
> > > >> > field:book_s
> > > >> > }
> > > >> > }
> > > >> >  }
> > > >> >}
> > > >> > }=null=0
> > > >> >
> > > >> >
> > > >> > However, it only managed to return me the facet of the list of
> > > author. I
> > > >> > could not get any results for the list of books. Is this possible
> to
> > > be
> > > >> > done, or what could be wrong with my query?
> > > >> >
> > > >> >
> > > >> > Regards,
> > > >> > Edwin
> > > >> >
> > > >>
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Mikhail Khludnev
Edwin,
It seems like you try to pull document hierarchy back. That's usually done
by searching parents and fl=[child ..],,.

On Tue, Oct 4, 2016 at 5:22 PM, Zheng Lin Edwin Yeo 
wrote:

> Some of the sample documents are like the following:
>
> Author is the Header, while Books are the Child
>
> Author: Edwin
> Books: Book 1
>Book 2
>Book 3
>
> Author: John
> Books: Book 4
>Book 5
>
> For this query:
>
> http://localhost:8983/solr/collection1/select?q=*:*
> ={
>author:{
> type:terms,
> field:author_s,
>   domain: { blockParent : "type_s:author" }
>},
>books:{
> type:terms,
> field:book_s,
> domain: { blockChild : "type_s:book" }
>   }
> }=null=0
>
> I'll get the following results:
>
> "facets":{
> "count":2,
> "author":{
>   "buckets":[{
>   "val":"Edwin",
>   "count":1,
>   "books":{
> "buckets":[]}},
> {
>   "val":"John",
>   "count":1,
>   "books":{
> "buckets":[]}},
>
> I can't manage to get the list of books to be displayed in the buckets for
> books.
>
>
> Regards,
> Edwin
>
>
> On 4 October 2016 at 19:29, Yonik Seeley  wrote:
>
> > Perhaps show a couple sample documents, and then what data you're
> > looking for in a response?
> > This stuff can be tough to pin down without concrete examples.
> >
> > -Yonik
> >
> >
> > On Tue, Oct 4, 2016 at 5:22 AM, Zheng Lin Edwin Yeo
> >  wrote:
> > > I have tried to use this nested query, but I still can't get results
> for
> > > the list of books.
> > >
> > > http://localhost:8983/solr/collection1/select?q=*:*
> > > ={
> > >items:{
> > >   type:terms,
> > >   field:author_s,
> > >  domain: { blockParent : "type_s:author" },
> > >  facet:{
> > > by1:{
> > > type:terms,
> > > field:book_s,
> > > domain: { blockChild : "type_s:book" }
> > > }
> > > }
> > >  }
> > >}
> > > }=null=0
> > >
> > >
> > > Only when I didn't use the nested method, but query it individually
> like
> > > the one below, the I managed to get the result.
> > >
> > > http://localhost:8983/solr/collection1/select?q=*:*
> > > ={
> > >items:{
> > > type:terms,
> > > field:author_s,
> > >   domain: { blockParent : "type_s:author" }
> > >},
> > >by1:{
> > > type:terms,
> > > field:book_s,
> > > domain: { blockChild : "type_s:book" }
> > >   }
> > > }=null=0
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 4 October 2016 at 15:22, Alexandre Rafalovitch 
> > > wrote:
> > >
> > >> You need to switch the domain to the child records. It is somewhere in
> > the
> > >> guide or Yonik's blog linked.
> > >>
> > >> Regards,
> > >>Alex
> > >>
> > >> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo" 
> > wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > Is it possible to do nested faceting on both records in parent and
> > child
> > >> in
> > >> > a single query?
> > >> >
> > >> > For example, I want to facet both author_s and book_s. Author is
> > indexed
> > >> as
> > >> > a parent, whereas Book is indexed as a child.
> > >> >
> > >> > I tried the following JSON Facet query, which is to do a facet of
> all
> > the
> > >> > list of author (in the parent), followed by a facet of all the list
> of
> > >> > books (in the child) that are written by the author.
> > >> >
> > >> > http://localhost:8983/solr/collection1/select?q=*:*
> > >> > ={
> > >> >items:{
> > >> >   type:terms,
> > >> >   field:author_s,
> > >> >  facet:{
> > >> > by1:{
> > >> > type:terms,
> > >> > field:book_s
> > >> > }
> > >> > }
> > >> >  }
> > >> >}
> > >> > }=null=0
> > >> >
> > >> >
> > >> > However, it only managed to return me the facet of the list of
> > author. I
> > >> > could not get any results for the list of books. Is this possible to
> > be
> > >> > done, or what could be wrong with my query?
> > >> >
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >>
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Alexandre Rafalovitch
I _think_ what is happening is that you are going in both parent and
child directions in your filters.

Try making your query ('q') define your original domain
(q=type_s:author) and then 'books' goes inside the parent "author"
scope and that's where you change your domain.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 October 2016 at 21:22, Zheng Lin Edwin Yeo  wrote:
> Some of the sample documents are like the following:
>
> Author is the Header, while Books are the Child
>
> Author: Edwin
> Books: Book 1
>Book 2
>Book 3
>
> Author: John
> Books: Book 4
>Book 5
>
> For this query:
>
> http://localhost:8983/solr/collection1/select?q=*:*
> ={
>author:{
> type:terms,
> field:author_s,
>   domain: { blockParent : "type_s:author" }
>},
>books:{
> type:terms,
> field:book_s,
> domain: { blockChild : "type_s:book" }
>   }
> }=null=0
>
> I'll get the following results:
>
> "facets":{
> "count":2,
> "author":{
>   "buckets":[{
>   "val":"Edwin",
>   "count":1,
>   "books":{
> "buckets":[]}},
> {
>   "val":"John",
>   "count":1,
>   "books":{
> "buckets":[]}},
>
> I can't manage to get the list of books to be displayed in the buckets for
> books.
>
>
> Regards,
> Edwin
>
>
> On 4 October 2016 at 19:29, Yonik Seeley  wrote:
>
>> Perhaps show a couple sample documents, and then what data you're
>> looking for in a response?
>> This stuff can be tough to pin down without concrete examples.
>>
>> -Yonik
>>
>>
>> On Tue, Oct 4, 2016 at 5:22 AM, Zheng Lin Edwin Yeo
>>  wrote:
>> > I have tried to use this nested query, but I still can't get results for
>> > the list of books.
>> >
>> > http://localhost:8983/solr/collection1/select?q=*:*
>> > ={
>> >items:{
>> >   type:terms,
>> >   field:author_s,
>> >  domain: { blockParent : "type_s:author" },
>> >  facet:{
>> > by1:{
>> > type:terms,
>> > field:book_s,
>> > domain: { blockChild : "type_s:book" }
>> > }
>> > }
>> >  }
>> >}
>> > }=null=0
>> >
>> >
>> > Only when I didn't use the nested method, but query it individually like
>> > the one below, the I managed to get the result.
>> >
>> > http://localhost:8983/solr/collection1/select?q=*:*
>> > ={
>> >items:{
>> > type:terms,
>> > field:author_s,
>> >   domain: { blockParent : "type_s:author" }
>> >},
>> >by1:{
>> > type:terms,
>> > field:book_s,
>> > domain: { blockChild : "type_s:book" }
>> >   }
>> > }=null=0
>> >
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 4 October 2016 at 15:22, Alexandre Rafalovitch 
>> > wrote:
>> >
>> >> You need to switch the domain to the child records. It is somewhere in
>> the
>> >> guide or Yonik's blog linked.
>> >>
>> >> Regards,
>> >>Alex
>> >>
>> >> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo" 
>> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > Is it possible to do nested faceting on both records in parent and
>> child
>> >> in
>> >> > a single query?
>> >> >
>> >> > For example, I want to facet both author_s and book_s. Author is
>> indexed
>> >> as
>> >> > a parent, whereas Book is indexed as a child.
>> >> >
>> >> > I tried the following JSON Facet query, which is to do a facet of all
>> the
>> >> > list of author (in the parent), followed by a facet of all the list of
>> >> > books (in the child) that are written by the author.
>> >> >
>> >> > http://localhost:8983/solr/collection1/select?q=*:*
>> >> > ={
>> >> >items:{
>> >> >   type:terms,
>> >> >   field:author_s,
>> >> >  facet:{
>> >> > by1:{
>> >> > type:terms,
>> >> > field:book_s
>> >> > }
>> >> > }
>> >> >  }
>> >> >}
>> >> > }=null=0
>> >> >
>> >> >
>> >> > However, it only managed to return me the facet of the list of
>> author. I
>> >> > could not get any results for the list of books. Is this possible to
>> be
>> >> > done, or what could be wrong with my query?
>> >> >
>> >> >
>> >> > Regards,
>> >> > Edwin
>> >> >
>> >>
>>


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Zheng Lin Edwin Yeo
Some of the sample documents are like the following:

Author is the Header, while Books are the Child

Author: Edwin
Books: Book 1
   Book 2
   Book 3

Author: John
Books: Book 4
   Book 5

For this query:

http://localhost:8983/solr/collection1/select?q=*:*
={
   author:{
type:terms,
field:author_s,
  domain: { blockParent : "type_s:author" }
   },
   books:{
type:terms,
field:book_s,
domain: { blockChild : "type_s:book" }
  }
}=null=0

I'll get the following results:

"facets":{
"count":2,
"author":{
  "buckets":[{
  "val":"Edwin",
  "count":1,
  "books":{
"buckets":[]}},
{
  "val":"John",
  "count":1,
  "books":{
"buckets":[]}},

I can't manage to get the list of books to be displayed in the buckets for
books.


Regards,
Edwin


On 4 October 2016 at 19:29, Yonik Seeley  wrote:

> Perhaps show a couple sample documents, and then what data you're
> looking for in a response?
> This stuff can be tough to pin down without concrete examples.
>
> -Yonik
>
>
> On Tue, Oct 4, 2016 at 5:22 AM, Zheng Lin Edwin Yeo
>  wrote:
> > I have tried to use this nested query, but I still can't get results for
> > the list of books.
> >
> > http://localhost:8983/solr/collection1/select?q=*:*
> > ={
> >items:{
> >   type:terms,
> >   field:author_s,
> >  domain: { blockParent : "type_s:author" },
> >  facet:{
> > by1:{
> > type:terms,
> > field:book_s,
> > domain: { blockChild : "type_s:book" }
> > }
> > }
> >  }
> >}
> > }=null=0
> >
> >
> > Only when I didn't use the nested method, but query it individually like
> > the one below, the I managed to get the result.
> >
> > http://localhost:8983/solr/collection1/select?q=*:*
> > ={
> >items:{
> > type:terms,
> > field:author_s,
> >   domain: { blockParent : "type_s:author" }
> >},
> >by1:{
> > type:terms,
> > field:book_s,
> > domain: { blockChild : "type_s:book" }
> >   }
> > }=null=0
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 October 2016 at 15:22, Alexandre Rafalovitch 
> > wrote:
> >
> >> You need to switch the domain to the child records. It is somewhere in
> the
> >> guide or Yonik's blog linked.
> >>
> >> Regards,
> >>Alex
> >>
> >> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo" 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > Is it possible to do nested faceting on both records in parent and
> child
> >> in
> >> > a single query?
> >> >
> >> > For example, I want to facet both author_s and book_s. Author is
> indexed
> >> as
> >> > a parent, whereas Book is indexed as a child.
> >> >
> >> > I tried the following JSON Facet query, which is to do a facet of all
> the
> >> > list of author (in the parent), followed by a facet of all the list of
> >> > books (in the child) that are written by the author.
> >> >
> >> > http://localhost:8983/solr/collection1/select?q=*:*
> >> > ={
> >> >items:{
> >> >   type:terms,
> >> >   field:author_s,
> >> >  facet:{
> >> > by1:{
> >> > type:terms,
> >> > field:book_s
> >> > }
> >> > }
> >> >  }
> >> >}
> >> > }=null=0
> >> >
> >> >
> >> > However, it only managed to return me the facet of the list of
> author. I
> >> > could not get any results for the list of books. Is this possible to
> be
> >> > done, or what could be wrong with my query?
> >> >
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >>
>


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Yonik Seeley
Perhaps show a couple sample documents, and then what data you're
looking for in a response?
This stuff can be tough to pin down without concrete examples.

-Yonik


On Tue, Oct 4, 2016 at 5:22 AM, Zheng Lin Edwin Yeo
 wrote:
> I have tried to use this nested query, but I still can't get results for
> the list of books.
>
> http://localhost:8983/solr/collection1/select?q=*:*
> ={
>items:{
>   type:terms,
>   field:author_s,
>  domain: { blockParent : "type_s:author" },
>  facet:{
> by1:{
> type:terms,
> field:book_s,
> domain: { blockChild : "type_s:book" }
> }
> }
>  }
>}
> }=null=0
>
>
> Only when I didn't use the nested method, but query it individually like
> the one below, the I managed to get the result.
>
> http://localhost:8983/solr/collection1/select?q=*:*
> ={
>items:{
> type:terms,
> field:author_s,
>   domain: { blockParent : "type_s:author" }
>},
>by1:{
> type:terms,
> field:book_s,
> domain: { blockChild : "type_s:book" }
>   }
> }=null=0
>
>
> Regards,
> Edwin
>
>
> On 4 October 2016 at 15:22, Alexandre Rafalovitch 
> wrote:
>
>> You need to switch the domain to the child records. It is somewhere in the
>> guide or Yonik's blog linked.
>>
>> Regards,
>>Alex
>>
>> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo"  wrote:
>>
>> > Hi,
>> >
>> > Is it possible to do nested faceting on both records in parent and child
>> in
>> > a single query?
>> >
>> > For example, I want to facet both author_s and book_s. Author is indexed
>> as
>> > a parent, whereas Book is indexed as a child.
>> >
>> > I tried the following JSON Facet query, which is to do a facet of all the
>> > list of author (in the parent), followed by a facet of all the list of
>> > books (in the child) that are written by the author.
>> >
>> > http://localhost:8983/solr/collection1/select?q=*:*
>> > ={
>> >items:{
>> >   type:terms,
>> >   field:author_s,
>> >  facet:{
>> > by1:{
>> > type:terms,
>> > field:book_s
>> > }
>> > }
>> >  }
>> >}
>> > }=null=0
>> >
>> >
>> > However, it only managed to return me the facet of the list of author. I
>> > could not get any results for the list of books. Is this possible to be
>> > done, or what could be wrong with my query?
>> >
>> >
>> > Regards,
>> > Edwin
>> >
>>


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Zheng Lin Edwin Yeo
I have tried to use this nested query, but I still can't get results for
the list of books.

http://localhost:8983/solr/collection1/select?q=*:*
={
   items:{
  type:terms,
  field:author_s,
 domain: { blockParent : "type_s:author" },
 facet:{
by1:{
type:terms,
field:book_s,
domain: { blockChild : "type_s:book" }
}
}
 }
   }
}=null=0


Only when I didn't use the nested method, but query it individually like
the one below, the I managed to get the result.

http://localhost:8983/solr/collection1/select?q=*:*
={
   items:{
type:terms,
field:author_s,
  domain: { blockParent : "type_s:author" }
   },
   by1:{
type:terms,
field:book_s,
domain: { blockChild : "type_s:book" }
  }
}=null=0


Regards,
Edwin


On 4 October 2016 at 15:22, Alexandre Rafalovitch 
wrote:

> You need to switch the domain to the child records. It is somewhere in the
> guide or Yonik's blog linked.
>
> Regards,
>Alex
>
> On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo"  wrote:
>
> > Hi,
> >
> > Is it possible to do nested faceting on both records in parent and child
> in
> > a single query?
> >
> > For example, I want to facet both author_s and book_s. Author is indexed
> as
> > a parent, whereas Book is indexed as a child.
> >
> > I tried the following JSON Facet query, which is to do a facet of all the
> > list of author (in the parent), followed by a facet of all the list of
> > books (in the child) that are written by the author.
> >
> > http://localhost:8983/solr/collection1/select?q=*:*
> > ={
> >items:{
> >   type:terms,
> >   field:author_s,
> >  facet:{
> > by1:{
> > type:terms,
> > field:book_s
> > }
> > }
> >  }
> >}
> > }=null=0
> >
> >
> > However, it only managed to return me the facet of the list of author. I
> > could not get any results for the list of books. Is this possible to be
> > done, or what could be wrong with my query?
> >
> >
> > Regards,
> > Edwin
> >
>


Re: Faceting on both Parent and Child records in Block Join Query Parser

2016-10-04 Thread Alexandre Rafalovitch
You need to switch the domain to the child records. It is somewhere in the
guide or Yonik's blog linked.

Regards,
   Alex

On 4 Oct 2016 1:55 PM, "Zheng Lin Edwin Yeo"  wrote:

> Hi,
>
> Is it possible to do nested faceting on both records in parent and child in
> a single query?
>
> For example, I want to facet both author_s and book_s. Author is indexed as
> a parent, whereas Book is indexed as a child.
>
> I tried the following JSON Facet query, which is to do a facet of all the
> list of author (in the parent), followed by a facet of all the list of
> books (in the child) that are written by the author.
>
> http://localhost:8983/solr/collection1/select?q=*:*
> ={
>items:{
>   type:terms,
>   field:author_s,
>  facet:{
> by1:{
> type:terms,
> field:book_s
> }
> }
>  }
>}
> }=null=0
>
>
> However, it only managed to return me the facet of the list of author. I
> could not get any results for the list of books. Is this possible to be
> done, or what could be wrong with my query?
>
>
> Regards,
> Edwin
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User
Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variancemaybe network, etc.

Thoughts?


Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr
Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted
Docs578735787317695859369459369457873578735787357873Segment
Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario
#119821014518619020820921020610914273701601098385Scenario
#29288596258727077746873636166545251




On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
I plan to re-test this in a separate environment that I have more control
over and will share the results when I can.

On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:

> Certainly.  And I would of course welcome anyone else to test this for
> themselves especially with facet.method=uif to see if that has indeed
> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
> testing is invalid due to variance, problem in process, etc.  One thing I
> was pondering is if I should force merge the index to a certain amount of
> segments because indexing yields a random number of segments and
> deletions.  The only thing stopping me short of doing that were
> observations of longer Solr 4 times even with more deletions and similar
> number of segments.
>
> We use Soasta as our testing tool.  Before testing, load is sent for 10-15
> minutes to make sure any Solr caches have stabilized.  Then the test is run
> for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
> Scenario #2 tested at 100 req/sec.  Each request is different with input
> being pulled from data files.  The requests are repeatable test to test.
>
> The numbers posted above are average response times as reported by
> Soasta.  However, respective time differences are supported by Splunk which
> indexes the Solr logs and Dynatrace which is instrumented on one of the
> JVM's.
>
> The versions are deployed to the same machines thereby overlaying the
> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
> the same input data.  Being in SolrCloud mode, the full indexing comprises
> of indexing all documents and then deleting any that were not touched.
> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
> results as the previous Solr 4 test.
>
>
> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
> wrote:
>
>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>> > Further testing indicates that any performance difference is not due
>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>> > deletes.
>>
>> Sanity check: Could you describe how you test?
>>
>> * How many queries do you issue for each test?
>> * Are each query a new one or do you re-use the same query?
>> * Do you discard the first X calls?
>> * Are the numbers averages, medians or something third?
>> * What do you do about disk cache?
>> * Are both Solr's on the same machine?
>> * Do they use the same index?
>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
Certainly.  And I would of course welcome anyone else to test this for
themselves especially with facet.method=uif to see if that has indeed
bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
testing is invalid due to variance, problem in process, etc.  One thing I
was pondering is if I should force merge the index to a certain amount of
segments because indexing yields a random number of segments and
deletions.  The only thing stopping me short of doing that were
observations of longer Solr 4 times even with more deletions and similar
number of segments.

We use Soasta as our testing tool.  Before testing, load is sent for 10-15
minutes to make sure any Solr caches have stabilized.  Then the test is run
for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
Scenario #2 tested at 100 req/sec.  Each request is different with input
being pulled from data files.  The requests are repeatable test to test.

The numbers posted above are average response times as reported by Soasta.
However, respective time differences are supported by Splunk which indexes
the Solr logs and Dynatrace which is instrumented on one of the JVM's.

The versions are deployed to the same machines thereby overlaying the
previous installation.  Going Solr 4 to Solr 5, full indexing is run with
the same input data.  Being in SolrCloud mode, the full indexing comprises
of indexing all documents and then deleting any that were not touched.
Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
results as the previous Solr 4 test.


On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
wrote:

> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> > Further testing indicates that any performance difference is not due
> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> > deletes.
>
> Sanity check: Could you describe how you test?
>
> * How many queries do you issue for each test?
> * Are each query a new one or do you re-use the same query?
> * Do you discard the first X calls?
> * Are the numbers averages, medians or something third?
> * What do you do about disk cache?
> * Are both Solr's on the same machine?
> * Do they use the same index?
> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Toke Eskildsen
On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> Further testing indicates that any performance difference is not due
> to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> deletes.

Sanity check: Could you describe how you test?

* How many queries do you issue for each test?
* Are each query a new one or do you re-use the same query?
* Do you discard the first X calls?
* Are the numbers averages, medians or something third?
* What do you do about disk cache?
* Are both Solr's on the same machine?
* Do they use the same index?
* Do you alternate between testing 4.8.1 and 5.5.2 first?

- Toke Eskildsen, State and University Library, Denmark


Re: Faceting search issues

2016-09-27 Thread Tomás Fernández Löbbe
I wonder why in the "facet_field" section of the first query it says:
"facet_fields": {"id": []}
 when it should be saying
"facet_fields": {"name": []}

Also, why is the second query not including the fq in the echoParams
section.
What is that other query with fq=aggregationname:story?

This is not in a SolrCloud environment, right? just a single host with no
replication?

Tomás

On Tue, Sep 27, 2016 at 1:10 AM, Jan Høydahl  wrote:

> Please tell some more
> - Solr version
> - Add to your query: =true=all and paste the result
> - How is “string_ci” defined ()?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 26. sep. 2016 kl. 23.59 skrev Beyene, Iyob :
> >
> > Hi,
> >
> > When I query solr using faceted search to check for duplicates using the
> following ,
> >
> > 'http://localhost:8983/solr/core/
> select?q=*:*=true=name=2`,
> >
> > I get the following response with no facet data.
> >
> >
> > {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
> > "facet.field": "name","facet.mincount": "2","rows": "0","facet":
> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
> {},"facet_heatmaps": {}}}
> >
> >
> > but when I specify the name in fq
> >
> > 'http://localhost:8983/solr/core/
> select?q=*:*=true=name=2&
> fq=name:elephant`
> >
> > I get a facet result like these
> >
> > {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
> >
> >
> > The field I am basing the facet search on is defined like below
> >
> >  required="true" multiValued="true"/>
> >
> >
> > Is there some variation of faceting that could help me analyze the
> difference?
> >
> > Thanks
> >
> > Iyob
> >
> >
> >
> >
> >
>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Solr User
Further testing indicates that any performance difference is not due to
deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes.
The times appear to converge on an optimized index.  Below are the
details.  Not sure what else to make of this at this point other than
moving forward with an upgrade with an optimized index wherever possible.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
4.8.1 (without deletes): 104 ms
5.5.2 (without deletes): 125 ms
4.8.1 (1 segment without deletes): 55 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
4.8.1 (without deletes): 35 ms
5.5.2 (without deletes): 42 ms
4.8.1 (1 segment without deletes): 28 ms
5.5.2 (1 segment without deletes): 34 ms

On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti  wrote:

> Hi !
> At the time we didn't investigate the deletion implication at all.
> This can be interesting.
> if you proceed with your investigations and discover what changed in the
> deletion approach, I would be more than happy to help!
>
> Cheers
>
> On Mon, Sep 26, 2016 at 10:59 PM, Solr User  wrote:
>
> > Thanks again for your work on honoring the facet.method.  I have an
> > observation that I would like to share and get your feedback on if
> > possible.
> >
> > I performance tested Solr 5.5.2 with various facet queries and the only
> way
> > I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> > Here are the details.
> >
> > Scenario #1:  Using facet.method=uif with faceting on several
> multi-valued
> > fields.
> > 4.8.1 (with deletes): 115 ms
> > 5.5.2 (with deletes): 155 ms
> > 5.5.2 (without deletes): 125 ms
> > 5.5.2 (1 segment without deletes): 44 ms
> >
> > Scenario #2:  Using facet.method=enum with faceting on several
> multi-valued
> > fields.  These fields are different than Scenario #1 and perform much
> > better with enum hence that method is used instead.
> > 4.8.1 (with deletes): 38 ms
> > 5.5.2 (with deletes): 49 ms
> > 5.5.2 (without deletes): 42 ms
> > 5.5.2 (1 segment without deletes): 34 ms
> >
> >
> >
> > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > Interesting developments :
> > >
> > > https://issues.apache.org/jira/browse/SOLR-9176
> > >
> > > I think we found why term Enum seems slower in recent Solr !
> > > In our case it is likely to be related to the commit I mention in the
> > Jira.
> > > Have a check Joel !
> > >
> > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > I am investigating this scenario right now.
> > > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > > And I agree with Joel, it seems to be un-related with the famous
> > faceting
> > > > regression :(
> > > >
> > > > Furthermore with the legacy facet approach, if you set docValues for
> > the
> > > > field you are not going to be able to try the enum approach anymore.
> > > >
> > > > org/apache/solr/request/SimpleFacets.java:448
> > > >
> > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > > >   // only fc can handle docvalues types
> > > >   method = FacetMethod.FC;
> > > > }
> > > >
> > > >
> > > > I got really horrible regressions simply using term enum in both
> Solr 4
> > > > and Solr 6.
> > > >
> > > > And even the most optimized fcs approach with docValues and
> > > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > > >
> > > > i.e.
> > > >
> > > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > > I think we should open an issue if we can confirm it is not related
> > with
> > > > the other.
> > > > A lot of people will continue using the legacy approach for a
> while...
> > > >
> > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein  >
> > > > wrote:
> > > >
> > > >> The enum slowness is interesting. It would appear on the surface to
> > not
> > > be
> > > >> related to the FieldCache issue. I don't think the main emphasis of
> > the
> > > >> JSON facet API has been the enum approach. You may find using the
> JSON
> > > >> facet API and eliminating the use of enum meets your performance
> > needs.
> > > >>
> > > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > > >> queries. The tradeoff is slower warming times and increased memory
> > usage
> > > >> if
> > > >> the collapse fields are used in faceting, as faceting will load the
> > > field
> > > >> into a different cache.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > 

Re: Faceting search issues

2016-09-27 Thread Jan Høydahl
Please tell some more
- Solr version
- Add to your query: =true=all and paste the result
- How is “string_ci” defined ()?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. sep. 2016 kl. 23.59 skrev Beyene, Iyob :
> 
> Hi,
> 
> When I query solr using faceted search to check for duplicates using the 
> following ,
> 
> 'http://localhost:8983/solr/core/select?q=*:*=true=name=2`,
> 
> I get the following response with no facet data.
> 
> 
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
> "facet.field": "name","facet.mincount": "2","rows": "0","facet": 
> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs": 
> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name": 
> []},"facet_dates": {},"facet_ranges": {},"facet_intervals": 
> {},"facet_heatmaps": {}}}
> 
> 
> but when I specify the name in fq
> 
> 'http://localhost:8983/solr/core/select?q=*:*=true=name=2=name:elephant`
> 
> I get a facet result like these
> 
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": 
> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount": 
> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start": 
> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries": 
> {},"facet_fields": {"name": ["elephant",4]},"facet_dates": {},"facet_ranges": 
> {},"facet_intervals": {},"facet_heatmaps": {}}}
> 
> 
> The field I am basing the facet search on is defined like below
> 
>  required="true" multiValued="true"/>
> 
> 
> Is there some variation of faceting that could help me analyze the difference?
> 
> Thanks
> 
> Iyob
> 
> 
> 
> 
> 



Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob
From: Beyene, Iyob <ibey...@gannett.com>
Sent: Tuesday, September 27, 2016 11:22 AM
To: solr-user
Subject: Re: Faceting search issues

Here is the result from running the first query, i.e 
http://localhost:8983/solr/core/select?q=*:*=true=name=0=2=all<http://localhost:8983/solr/core/select?q=*:*=true=name=aggregatename:story=0=2=all<http://localhost:8983/solr/core/select?q=*:*=true=name=0=2=all%3Chttp://localhost:8983/solr/core/select?q=*:*=true=name=aggregatename:story=0=2=all>>


{"responseHeader": {"status": 0,"QTime": 460,"params": {"q": 
"*:*","facet.field": "name","df": "text","echoParams": "all","facet.mincount": 
"2",
"collection": "core","rows": "0","facet": "true","wt": "json"}},"response": 
{"numFound": 318828,"start": 0,"maxScore": 1,"docs": []},"facet_counts": 
{"facet_queries": {},"facet_fields": {"id": []},"facet_dates": 
{},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}},


and the second one: 
http://localhost:8983/solr/core/select?q=*:*=true=name=0=2=all=name:elephant


{"responseHeader": {"status": 0,"QTime": 353,"params": {"q": 
"*:*","facet.field": "name","df": "text","echoParams": "all","facet.mincount": 
"2",
"collection": "core","rows": "0","facet": "true","wt": "json"}},"response": 
{"numFound": 2,"start": 0,"maxScore": 1,"docs": []},"facet_counts": 
{"facet_queries": {},"facet_fields": {"name": ["elephant",4]},"facet_dates": 
{},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}


Yes I have that many docs indexed.

Thanks








From: Alexandre Rafalovitch <arafa...@gmail.com>
Sent: Tuesday, September 27, 2016 10:36:19 AM
To: solr-user
Subject: Re: Faceting search issues

That's weird.

Could you rerun both queries with echoParams=all and see if some
additional conditions will show up unexpectedly. Specifically, an 'fq'
in the first query that the second query overrides.

Alternatively, do you definitely have 316544 documents in the index?
That's the number that's your supposed "return all documents" query
gives.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 21:26, Beyene, Iyob <ibey...@gannett.com> wrote:
>
> Alessandro, thanks for your quick reply.
>
> When I say duplicates I meant to say how many documents the term appears in.
>
> All that I wanted to see is the number of times a particular name is 
> appearing in documents in solr.
>
> thanks
>
>
> 
> From: Alessandro Benedetti <abenede...@apache.org>
> Sent: Tuesday, September 27, 2016 5:30:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Faceting search issues
>
> When you say "check for duplicates" what do you mean ? no duplicate tokens
> are in the index per field.
> What is your definition of duplicate for a term? Do you consider lowercase
> and uppercase version duplicate ?
> Maybe you have an analysis problem.
>
> MinCount=2 means : "include only terms appearing at least in 2 documents in
> the result set .
>
> Cheers
>
>
>
> On Mon, Sep 26, 2016 at 10:59 PM, Beyene, Iyob <ibey...@gannett.com> wrote:
>
>> Hi,
>>
>> When I query solr using faceted search to check for duplicates using the
>> following ,
>>
>> 'http://localhost<http://localhost>:8983/solr/core/
>> select?q=*:*=true=name=2`,
>>
>> I get the following response with no facet data.
>>
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
>> "facet.field": "name","facet.mincount": "2","rows": "0","facet":
>> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
>> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
>> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
>> {},"facet_heatmaps": {}}}
>>
>>
>> but when I specify the name in fq
>>
>> 'http://localhost<http://localhost/>:8983/solr/core/
>> select?q=*:*=true=name=2&
>> fq=name:elephant`
>>
>> I get a facet result like these
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
>> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
>> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
>> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
>> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
>> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
>>
>>
>> The field I am basing the facet search on is defined like below
>>
>> > required="true" multiValued="true"/>
>>
>>
>> Is there some variation of faceting that could help me analyze the
>> difference?
>>
>> Thanks
>>
>> Iyob
>>
>>
>>
>>
>>
>>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob
Here is the result from running the first query, i.e 
http://localhost:8983/solr/core/select?q=*:*=true=name=0=2=all<http://localhost:8983/solr/core/select?q=*:*=true=name=aggregatename:story=0=2=all>


{"responseHeader": {"status": 0,"QTime": 460,"params": {"q": 
"*:*","facet.field": "name","df": "text","echoParams": "all","facet.mincount": 
"2",
"collection": "core","rows": "0","facet": "true","wt": "json"}},"response": 
{"numFound": 318828,"start": 0,"maxScore": 1,"docs": []},"facet_counts": 
{"facet_queries": {},"facet_fields": {"id": []},"facet_dates": 
{},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}},


and the second one: 
http://localhost:8983/solr/core/select?q=*:*=true=name=0=2=all=name:elephant


{"responseHeader": {"status": 0,"QTime": 353,"params": {"q": 
"*:*","facet.field": "name","df": "text","echoParams": "all","facet.mincount": 
"2",
"collection": "core","rows": "0","facet": "true","wt": "json"}},"response": 
{"numFound": 2,"start": 0,"maxScore": 1,"docs": []},"facet_counts": 
{"facet_queries": {},"facet_fields": {"id": ["name",4]},"facet_dates": 
{},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}


Yes I have that many docs indexed.

Thanks




From: Alexandre Rafalovitch <arafa...@gmail.com>
Sent: Tuesday, September 27, 2016 10:36:19 AM
To: solr-user
Subject: Re: Faceting search issues

That's weird.

Could you rerun both queries with echoParams=all and see if some
additional conditions will show up unexpectedly. Specifically, an 'fq'
in the first query that the second query overrides.

Alternatively, do you definitely have 316544 documents in the index?
That's the number that's your supposed "return all documents" query
gives.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 21:26, Beyene, Iyob <ibey...@gannett.com> wrote:
>
> Alessandro, thanks for your quick reply.
>
> When I say duplicates I meant to say how many documents the term appears in.
>
> All that I wanted to see is the number of times a particular name is 
> appearing in documents in solr.
>
> thanks
>
>
> 
> From: Alessandro Benedetti <abenede...@apache.org>
> Sent: Tuesday, September 27, 2016 5:30:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Faceting search issues
>
> When you say "check for duplicates" what do you mean ? no duplicate tokens
> are in the index per field.
> What is your definition of duplicate for a term? Do you consider lowercase
> and uppercase version duplicate ?
> Maybe you have an analysis problem.
>
> MinCount=2 means : "include only terms appearing at least in 2 documents in
> the result set .
>
> Cheers
>
>
>
> On Mon, Sep 26, 2016 at 10:59 PM, Beyene, Iyob <ibey...@gannett.com> wrote:
>
>> Hi,
>>
>> When I query solr using faceted search to check for duplicates using the
>> following ,
>>
>> 'http://localhost<http://localhost>:8983/solr/core/
>> select?q=*:*=true=name=2`,
>>
>> I get the following response with no facet data.
>>
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
>> "facet.field": "name","facet.mincount": "2","rows": "0","facet":
>> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
>> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
>> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
>> {},"facet_heatmaps": {}}}
>>
>>
>> but when I specify the name in fq
>>
>> 'http://localhost<http://localhost/>:8983/solr/core/
>> select?q=*:*=true=name=2&
>> fq=name:elephant`
>>
>> I get a facet result like these
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
>> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
>> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
>> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
>> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
>> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
>>
>>
>> The field I am basing the facet search on is defined like below
>>
>> > required="true" multiValued="true"/>
>>
>>
>> Is there some variation of faceting that could help me analyze the
>> difference?
>>
>> Thanks
>>
>> Iyob
>>
>>
>>
>>
>>
>>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Faceting search issues

2016-09-27 Thread Alexandre Rafalovitch
That's weird.

Could you rerun both queries with echoParams=all and see if some
additional conditions will show up unexpectedly. Specifically, an 'fq'
in the first query that the second query overrides.

Alternatively, do you definitely have 316544 documents in the index?
That's the number that's your supposed "return all documents" query
gives.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 21:26, Beyene, Iyob <ibey...@gannett.com> wrote:
>
> Alessandro, thanks for your quick reply.
>
> When I say duplicates I meant to say how many documents the term appears in.
>
> All that I wanted to see is the number of times a particular name is 
> appearing in documents in solr.
>
> thanks
>
>
> 
> From: Alessandro Benedetti <abenede...@apache.org>
> Sent: Tuesday, September 27, 2016 5:30:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Faceting search issues
>
> When you say "check for duplicates" what do you mean ? no duplicate tokens
> are in the index per field.
> What is your definition of duplicate for a term? Do you consider lowercase
> and uppercase version duplicate ?
> Maybe you have an analysis problem.
>
> MinCount=2 means : "include only terms appearing at least in 2 documents in
> the result set .
>
> Cheers
>
>
>
> On Mon, Sep 26, 2016 at 10:59 PM, Beyene, Iyob <ibey...@gannett.com> wrote:
>
>> Hi,
>>
>> When I query solr using faceted search to check for duplicates using the
>> following ,
>>
>> 'http://localhost<http://localhost>:8983/solr/core/
>> select?q=*:*=true=name=2`,
>>
>> I get the following response with no facet data.
>>
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
>> "facet.field": "name","facet.mincount": "2","rows": "0","facet":
>> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
>> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
>> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
>> {},"facet_heatmaps": {}}}
>>
>>
>> but when I specify the name in fq
>>
>> 'http://localhost<http://localhost/>:8983/solr/core/
>> select?q=*:*=true=name=2&
>> fq=name:elephant`
>>
>> I get a facet result like these
>>
>> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
>> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
>> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
>> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
>> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
>> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
>>
>>
>> The field I am basing the facet search on is defined like below
>>
>> > required="true" multiValued="true"/>
>>
>>
>> Is there some variation of faceting that could help me analyze the
>> difference?
>>
>> Thanks
>>
>> Iyob
>>
>>
>>
>>
>>
>>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob

Alessandro, thanks for your quick reply.

When I say duplicates I meant to say how many documents the term appears in.

All that I wanted to see is the number of times a particular name is appearing 
in documents in solr.

thanks



From: Alessandro Benedetti <abenede...@apache.org>
Sent: Tuesday, September 27, 2016 5:30:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Faceting search issues

When you say "check for duplicates" what do you mean ? no duplicate tokens
are in the index per field.
What is your definition of duplicate for a term? Do you consider lowercase
and uppercase version duplicate ?
Maybe you have an analysis problem.

MinCount=2 means : "include only terms appearing at least in 2 documents in
the result set .

Cheers



On Mon, Sep 26, 2016 at 10:59 PM, Beyene, Iyob <ibey...@gannett.com> wrote:

> Hi,
>
> When I query solr using faceted search to check for duplicates using the
> following ,
>
> 'http://localhost<http://localhost>:8983/solr/core/
> select?q=*:*=true=name=2`,
>
> I get the following response with no facet data.
>
>
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
> "facet.field": "name","facet.mincount": "2","rows": "0","facet":
> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
> {},"facet_heatmaps": {}}}
>
>
> but when I specify the name in fq
>
> 'http://localhost<http://localhost/>:8983/solr/core/
> select?q=*:*=true=name=2&
> fq=name:elephant`
>
> I get a facet result like these
>
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
>
>
> The field I am basing the facet search on is defined like below
>
>  required="true" multiValued="true"/>
>
>
> Is there some variation of faceting that could help me analyze the
> difference?
>
> Thanks
>
> Iyob
>
>
>
>
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Faceting search issues

2016-09-27 Thread Alessandro Benedetti
When you say "check for duplicates" what do you mean ? no duplicate tokens
are in the index per field.
What is your definition of duplicate for a term? Do you consider lowercase
and uppercase version duplicate ?
Maybe you have an analysis problem.

MinCount=2 means : "include only terms appearing at least in 2 documents in
the result set .

Cheers



On Mon, Sep 26, 2016 at 10:59 PM, Beyene, Iyob  wrote:

> Hi,
>
> When I query solr using faceted search to check for duplicates using the
> following ,
>
> 'http://localhost:8983/solr/core/
> select?q=*:*=true=name=2`,
>
> I get the following response with no facet data.
>
>
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q": "*:*",
> "facet.field": "name","facet.mincount": "2","rows": "0","facet":
> "true"}},"response": {"numFound": 316544,"start": 0,"maxScore": 1,"docs":
> []},"facet_counts": {"facet_queries": {},"facet_fields": {"name":
> []},"facet_dates": {},"facet_ranges": {},"facet_intervals":
> {},"facet_heatmaps": {}}}
>
>
> but when I specify the name in fq
>
> 'http://localhost:8983/solr/core/
> select?q=*:*=true=name=2&
> fq=name:elephant`
>
> I get a facet result like these
>
> {"responseHeader": {"status": 0,"QTime": 541,"params": {"q":
> "*:*","facet.field": "name","fq": "name:elephant","facet.mincount":
> "2","rows": "0","facet": "true"}},"response": {"numFound": 2,"start":
> 0,"maxScore": 1,"docs": []},"facet_counts": {"facet_queries":
> {},"facet_fields": {"name": ["elephant",4]},"facet_dates":
> {},"facet_ranges": {},"facet_intervals": {},"facet_heatmaps": {}}}
>
>
> The field I am basing the facet search on is defined like below
>
>  required="true" multiValued="true"/>
>
>
> Is there some variation of faceting that could help me analyze the
> difference?
>
> Thanks
>
> Iyob
>
>
>
>
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Alessandro Benedetti
Hi !
At the time we didn't investigate the deletion implication at all.
This can be interesting.
if you proceed with your investigations and discover what changed in the
deletion approach, I would be more than happy to help!

Cheers

On Mon, Sep 26, 2016 at 10:59 PM, Solr User  wrote:

> Thanks again for your work on honoring the facet.method.  I have an
> observation that I would like to share and get your feedback on if
> possible.
>
> I performance tested Solr 5.5.2 with various facet queries and the only way
> I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> Here are the details.
>
> Scenario #1:  Using facet.method=uif with faceting on several multi-valued
> fields.
> 4.8.1 (with deletes): 115 ms
> 5.5.2 (with deletes): 155 ms
> 5.5.2 (without deletes): 125 ms
> 5.5.2 (1 segment without deletes): 44 ms
>
> Scenario #2:  Using facet.method=enum with faceting on several multi-valued
> fields.  These fields are different than Scenario #1 and perform much
> better with enum hence that method is used instead.
> 4.8.1 (with deletes): 38 ms
> 5.5.2 (with deletes): 49 ms
> 5.5.2 (without deletes): 42 ms
> 5.5.2 (1 segment without deletes): 34 ms
>
>
>
> On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > Interesting developments :
> >
> > https://issues.apache.org/jira/browse/SOLR-9176
> >
> > I think we found why term Enum seems slower in recent Solr !
> > In our case it is likely to be related to the commit I mention in the
> Jira.
> > Have a check Joel !
> >
> > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > I am investigating this scenario right now.
> > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > And I agree with Joel, it seems to be un-related with the famous
> faceting
> > > regression :(
> > >
> > > Furthermore with the legacy facet approach, if you set docValues for
> the
> > > field you are not going to be able to try the enum approach anymore.
> > >
> > > org/apache/solr/request/SimpleFacets.java:448
> > >
> > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > >   // only fc can handle docvalues types
> > >   method = FacetMethod.FC;
> > > }
> > >
> > >
> > > I got really horrible regressions simply using term enum in both Solr 4
> > > and Solr 6.
> > >
> > > And even the most optimized fcs approach with docValues and
> > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > >
> > > i.e.
> > >
> > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > I think we should open an issue if we can confirm it is not related
> with
> > > the other.
> > > A lot of people will continue using the legacy approach for a while...
> > >
> > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> > > wrote:
> > >
> > >> The enum slowness is interesting. It would appear on the surface to
> not
> > be
> > >> related to the FieldCache issue. I don't think the main emphasis of
> the
> > >> JSON facet API has been the enum approach. You may find using the JSON
> > >> facet API and eliminating the use of enum meets your performance
> needs.
> > >>
> > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > >> queries. The tradeoff is slower warming times and increased memory
> usage
> > >> if
> > >> the collapse fields are used in faceting, as faceting will load the
> > field
> > >> into a different cache.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
> > >>
> > >> > Joel,
> > >> >
> > >> > Thank you for taking the time to respond to my question.  I tried
> the
> > >> JSON
> > >> > Facet API for one query that uses facet.method=enum (since this one
> > has
> > >> a
> > >> > ton of unique values and performed better with enum) but this was
> way
> > >> > slower than even the slower Solr 5 times.  I did not try the new API
> > >> with
> > >> > the non-enum queries though so I will give that a go.  It looks like
> > >> Solr
> > >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> > >> >
> > >> > If these do not prove helpful, it looks like I will need to wait for
> > >> > SOLR-8096 to be resolved before upgrading.
> > >> >
> > >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> > use
> > >> > collapse/expand for some queries but traditional grouping for others
> > >> due to
> > >> > performance.  It will be interesting to see if those grouping
> queries
> > >> > perform better now using CollapsingQParser with top_fc.
> > >> >
> > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <
> joels...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Yes, SOLR-8096 is the issue here.
> > >> > >
> > >> > > I don't believe indexing with docValues is going to help too 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-26 Thread Solr User
Thanks again for your work on honoring the facet.method.  I have an
observation that I would like to share and get your feedback on if possible.

I performance tested Solr 5.5.2 with various facet queries and the only way
I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
Here are the details.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
5.5.2 (without deletes): 125 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
5.5.2 (without deletes): 42 ms
5.5.2 (1 segment without deletes): 34 ms



On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Interesting developments :
>
> https://issues.apache.org/jira/browse/SOLR-9176
>
> I think we found why term Enum seems slower in recent Solr !
> In our case it is likely to be related to the commit I mention in the Jira.
> Have a check Joel !
>
> On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > I am investigating this scenario right now.
> > I can confirm that the enum slowness is in Solr 6.0 as well.
> > And I agree with Joel, it seems to be un-related with the famous faceting
> > regression :(
> >
> > Furthermore with the legacy facet approach, if you set docValues for the
> > field you are not going to be able to try the enum approach anymore.
> >
> > org/apache/solr/request/SimpleFacets.java:448
> >
> > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> >   // only fc can handle docvalues types
> >   method = FacetMethod.FC;
> > }
> >
> >
> > I got really horrible regressions simply using term enum in both Solr 4
> > and Solr 6.
> >
> > And even the most optimized fcs approach with docValues and
> > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> >
> > i.e.
> >
> > For some sample queries I have 40 ms vs 160 ms and similar...
> > I think we should open an issue if we can confirm it is not related with
> > the other.
> > A lot of people will continue using the legacy approach for a while...
> >
> > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> > wrote:
> >
> >> The enum slowness is interesting. It would appear on the surface to not
> be
> >> related to the FieldCache issue. I don't think the main emphasis of the
> >> JSON facet API has been the enum approach. You may find using the JSON
> >> facet API and eliminating the use of enum meets your performance needs.
> >>
> >> With the CollapsingQParserPlugin top_fc is definitely faster during
> >> queries. The tradeoff is slower warming times and increased memory usage
> >> if
> >> the collapse fields are used in faceting, as faceting will load the
> field
> >> into a different cache.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
> >>
> >> > Joel,
> >> >
> >> > Thank you for taking the time to respond to my question.  I tried the
> >> JSON
> >> > Facet API for one query that uses facet.method=enum (since this one
> has
> >> a
> >> > ton of unique values and performed better with enum) but this was way
> >> > slower than even the slower Solr 5 times.  I did not try the new API
> >> with
> >> > the non-enum queries though so I will give that a go.  It looks like
> >> Solr
> >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >> >
> >> > If these do not prove helpful, it looks like I will need to wait for
> >> > SOLR-8096 to be resolved before upgrading.
> >> >
> >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> use
> >> > collapse/expand for some queries but traditional grouping for others
> >> due to
> >> > performance.  It will be interesting to see if those grouping queries
> >> > perform better now using CollapsingQParser with top_fc.
> >> >
> >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Yes, SOLR-8096 is the issue here.
> >> > >
> >> > > I don't believe indexing with docValues is going to help too much
> with
> >> > > this. The enum slowness may not be related, but I'm not positive
> about
> >> > > that.
> >> > >
> >> > > The major slowdowns are likely due to the removal of the top level
> >> > > FieldCache from general use and the removal of the FieldValuesCache
> >> which
> >> > > was used for multi-value field faceting.
> >> > >
> >> > > The JSON facet API covers all the functionality in the traditional
> >> > > faceting, and it has been developed to be very performant.
> >> > >
> >> > > You may also want to see if Collapse/Expand can meet your
> 

Re: [E] Re: Faceting Question(s)

2016-06-03 Thread MaryJo Sminkey
Just a followup on this, I found that the method below using URL params
doesn't work when using the Rest API, if you try to set the field in your
facet object to something like "{!ex=dt}doctype" it throws an error. Here's
the documentation on the correct method to use with the API.

http://yonik.com/multi-select-faceting/

MJ


On Thu, Jun 2, 2016 at 2:17 PM, Andrew Chillrud 
wrote:

> To return counts for doctype values that are currently not selected, tag
> filters that directly constrain doctype, and exclude those filters when
> faceting on doctype.
>
>
> q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype
>
> Filter exclusion is supported for all types of facets. Both the tag and ex
> local parameters may specify multiple values by separating them with commas.
>





Sent with MailTrack



Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
On Fri, Jun 3, 2016 at 1:25 AM, Erick Erickson 
wrote:

> We can always use more documentation. One of the
> valuable things about people getting started is that it's an
> opportunity to clarify documents. Sometimes the people who
> develop/write the docs jump into the middle and assume
> the reader has knowledge they couldn't be expected to have
>
> Hint, hint.
>

Well not sure how best to document this but it basically related to using
some of the edismax parameters, doing boosts on search fields, using the
phrase matching and phrase boosts, etc. all which are intended to work on
actual search terms. I had added these to my config but in some of my
searches, the filters do all the work and I just have a wildcard search
(q=*.*). It seems that if you have entries for these edismax settings, and
do this kind of search, you can get some odd results ( I didn't really
track down which setting exactly was the culprit but it definitely related
to the edismax-specific params). In my case, some of the docs that should
have showed up based on the filters were going missing. Once I figured this
out and moved the edismax params out of the defaults and only turned them
on when I had actual search terms, I got the results I expected. But it
took quite a lot of time to track them down as the cause due to the
complexity of the code I am working with.

So I guess what would be useful would be a caution on the Edismax page
about having these parameters set when you have wildcard searches possible
(Not sure if this applies to dismax as parser as well but probably does).

Mary Jo




Sent with MailTrack



Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Erick Erickson
We can always use more documentation. One of the
valuable things about people getting started is that it's an
opportunity to clarify documents. Sometimes the people who
develop/write the docs jump into the middle and assume
the reader has knowledge they couldn't be expected to have

Hint, hint.

Best,
Erick

On Thu, Jun 2, 2016 at 10:09 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:
> Yeah even though I'm still fairly new to this, I'm generally a good problem
> solver or I'd never have gotten as far as I have already on my own (really
> wanted to hire a Solr consultant and pushed VERY hard for it, but my boss
> really likes us to figure things out on our own!) Just wish I'd found this
> list long before now, I have a feeling it would have saved me some very
> long nights and weekends trying to work out some of the more baffling
> issues. That's why I jumped in and why I misinterpreted the question...
> because the way I read it was the thing that literally drove me crazy for
> two days straight trying to figure out. ;-)  But I'm very excited to find
> out the real question and answer as that is something that definitely
> applies to us as well and will certainly speed up our searches to drop the
> extra server call.
>
> MJ
>
>
>
> On Fri, Jun 3, 2016 at 12:59 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> One of the most valuable things I did when I started out
>> (way back in the Lucene-only days) was try to answer _one_
>> question every so often. Even if someone else beat me to the
>> punch, I benefitted from the research. And the rest of the time
>> I discovered things I never knew about Solr/Lucene!
>>
>> I think one of the most valuable lessons was "Somebody's
>> probably run into this before, I wonder what _they_ did?"
>> ;)
>>
>> Best,
>> Erick
>>
>> On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> wrote:
>> > Well thanks for asking the question because I had no idea what Andrew
>> > posted was even possible... and I most definitely will be using that
>> > myself! Totally brilliant stuff. I am so loving Solr... well, when it's
>> not
>> > driving me bonkers.
>> >
>> > Mary Jo
>> >
>> >
>> > On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
>> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>> >
>> >> Thank you Andrew, that looks like exactly what I am looking for =)
>> >> Thank you Robert, it looks like we are both doing it in similar fashion
>> =)
>> >> Thank you MaryJo  for jumping right in!
>> >>
>> >> Sas
>> >>
>> >>
>> >>
>> >> -Original Message-
>> >> From: Andrew Chillrud [mailto:achill...@opentext.com]
>> >> Sent: Thursday, June 2, 2016 2:17 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: RE: [E] Re: Faceting Question(s)
>> >>
>> >> It is possible to get the original facet counts for the field you are
>> >> filtering on (we have been using this since Solr 3.6). Don't know if
>> this
>> >> can be extended to get the original counts for all fields however.
>> >>
>> >> This syntax is described here:
>> >> https://cwiki.apache.org/confluence/display/solr/Faceting
>> >>
>> >> Tagging and Excluding Filters
>> >>
>> >> You can tag specific filters and exclude those filters when faceting.
>> This
>> >> is useful when doing multi-select faceting.
>> >>
>> >> Consider the following example query with faceting:
>> >>
>> >>
>> q=mainquery=status:public=doctype:pdf=true=doctype
>> >>
>> >> Because everything is already constrained by the filter doctype:pdf, the
>> >> facet.field=doctype facet command is currently redundant and will
>> return 0
>> >> counts for everything except doctype:pdf.
>> >>
>> >> To implement a multi-select facet for doctype, a GUI may want to still
>> >> display the other doctype values and their associated counts, as if the
>> >> doctype:pdf constraint had not yet been applied. For example:
>> >> === Document Type ===
>> >>   [ ] Word (42)
>> >>   [x] PDF  (96)
>> >>   [ ] Excel(11)
>> >>   [ ] HTML (63)
>> >>
>> >> To return counts for doctype values that are currently not selected, tag
>> >> filters that directly constrain doctype, and exclude those filters when
>

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Yeah even though I'm still fairly new to this, I'm generally a good problem
solver or I'd never have gotten as far as I have already on my own (really
wanted to hire a Solr consultant and pushed VERY hard for it, but my boss
really likes us to figure things out on our own!) Just wish I'd found this
list long before now, I have a feeling it would have saved me some very
long nights and weekends trying to work out some of the more baffling
issues. That's why I jumped in and why I misinterpreted the question...
because the way I read it was the thing that literally drove me crazy for
two days straight trying to figure out. ;-)  But I'm very excited to find
out the real question and answer as that is something that definitely
applies to us as well and will certainly speed up our searches to drop the
extra server call.

MJ



On Fri, Jun 3, 2016 at 12:59 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> One of the most valuable things I did when I started out
> (way back in the Lucene-only days) was try to answer _one_
> question every so often. Even if someone else beat me to the
> punch, I benefitted from the research. And the rest of the time
> I discovered things I never knew about Solr/Lucene!
>
> I think one of the most valuable lessons was "Somebody's
> probably run into this before, I wonder what _they_ did?"
> ;)
>
> Best,
> Erick
>
> On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
> > Well thanks for asking the question because I had no idea what Andrew
> > posted was even possible... and I most definitely will be using that
> > myself! Totally brilliant stuff. I am so loving Solr... well, when it's
> not
> > driving me bonkers.
> >
> > Mary Jo
> >
> >
> > On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
> >
> >> Thank you Andrew, that looks like exactly what I am looking for =)
> >> Thank you Robert, it looks like we are both doing it in similar fashion
> =)
> >> Thank you MaryJo  for jumping right in!
> >>
> >> Sas
> >>
> >>
> >>
> >> -Original Message-
> >> From: Andrew Chillrud [mailto:achill...@opentext.com]
> >> Sent: Thursday, June 2, 2016 2:17 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: [E] Re: Faceting Question(s)
> >>
> >> It is possible to get the original facet counts for the field you are
> >> filtering on (we have been using this since Solr 3.6). Don't know if
> this
> >> can be extended to get the original counts for all fields however.
> >>
> >> This syntax is described here:
> >> https://cwiki.apache.org/confluence/display/solr/Faceting
> >>
> >> Tagging and Excluding Filters
> >>
> >> You can tag specific filters and exclude those filters when faceting.
> This
> >> is useful when doing multi-select faceting.
> >>
> >> Consider the following example query with faceting:
> >>
> >>
> q=mainquery=status:public=doctype:pdf=true=doctype
> >>
> >> Because everything is already constrained by the filter doctype:pdf, the
> >> facet.field=doctype facet command is currently redundant and will
> return 0
> >> counts for everything except doctype:pdf.
> >>
> >> To implement a multi-select facet for doctype, a GUI may want to still
> >> display the other doctype values and their associated counts, as if the
> >> doctype:pdf constraint had not yet been applied. For example:
> >> === Document Type ===
> >>   [ ] Word (42)
> >>   [x] PDF  (96)
> >>   [ ] Excel(11)
> >>   [ ] HTML (63)
> >>
> >> To return counts for doctype values that are currently not selected, tag
> >> filters that directly constrain doctype, and exclude those filters when
> >> faceting on doctype.
> >>
> >>
> >>
> q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype
> >>
> >> Filter exclusion is supported for all types of facets. Both the tag and
> ex
> >> local parameters may specify multiple values by separating them with
> commas.
> >>
> >> - Andy -
> >>
> >> -Original Message-
> >> From: Robert Brown [mailto:r...@intelcompute.com]
> >> Sent: Thursday, June 02, 2016 2:12 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: [E] Re: Faceting Question(s)
> >>
> >> MaryJo, I think you've mis-understood.  The counts are different simply
> >> because the 2nd query contains

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Erick Erickson
One of the most valuable things I did when I started out
(way back in the Lucene-only days) was try to answer _one_
question every so often. Even if someone else beat me to the
punch, I benefitted from the research. And the rest of the time
I discovered things I never knew about Solr/Lucene!

I think one of the most valuable lessons was "Somebody's
probably run into this before, I wonder what _they_ did?"
;)

Best,
Erick

On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:
> Well thanks for asking the question because I had no idea what Andrew
> posted was even possible... and I most definitely will be using that
> myself! Totally brilliant stuff. I am so loving Solr... well, when it's not
> driving me bonkers.
>
> Mary Jo
>
>
> On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Thank you Andrew, that looks like exactly what I am looking for =)
>> Thank you Robert, it looks like we are both doing it in similar fashion =)
>> Thank you MaryJo  for jumping right in!
>>
>> Sas
>>
>>
>>
>> -Original Message-
>> From: Andrew Chillrud [mailto:achill...@opentext.com]
>> Sent: Thursday, June 2, 2016 2:17 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: [E] Re: Faceting Question(s)
>>
>> It is possible to get the original facet counts for the field you are
>> filtering on (we have been using this since Solr 3.6). Don't know if this
>> can be extended to get the original counts for all fields however.
>>
>> This syntax is described here:
>> https://cwiki.apache.org/confluence/display/solr/Faceting
>>
>> Tagging and Excluding Filters
>>
>> You can tag specific filters and exclude those filters when faceting. This
>> is useful when doing multi-select faceting.
>>
>> Consider the following example query with faceting:
>>
>> q=mainquery=status:public=doctype:pdf=true=doctype
>>
>> Because everything is already constrained by the filter doctype:pdf, the
>> facet.field=doctype facet command is currently redundant and will return 0
>> counts for everything except doctype:pdf.
>>
>> To implement a multi-select facet for doctype, a GUI may want to still
>> display the other doctype values and their associated counts, as if the
>> doctype:pdf constraint had not yet been applied. For example:
>> === Document Type ===
>>   [ ] Word (42)
>>   [x] PDF  (96)
>>   [ ] Excel(11)
>>   [ ] HTML (63)
>>
>> To return counts for doctype values that are currently not selected, tag
>> filters that directly constrain doctype, and exclude those filters when
>> faceting on doctype.
>>
>>
>> q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype
>>
>> Filter exclusion is supported for all types of facets. Both the tag and ex
>> local parameters may specify multiple values by separating them with commas.
>>
>> - Andy -
>>
>> -Original Message-
>> From: Robert Brown [mailto:r...@intelcompute.com]
>> Sent: Thursday, June 02, 2016 2:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: [E] Re: Faceting Question(s)
>>
>> MaryJo, I think you've mis-understood.  The counts are different simply
>> because the 2nd query contains an filter of a facet value from the 1st
>> query - that's completely expected.
>>
>> The issue is how to get the original facet counts (with no filters but
>> same q) in the same call as also filtering by one of those facet values.
>>
>> Personally I don't think it's possible, but will be interested to hear
>> others input, since it's a very common situation for me - I cache the first
>> result in memcached and tag future queries as related to the first.
>>
>> Or could you always make 2 calls back to Solr (one original (again), and
>> one with the filters), the caches should help massively.
>>
>>
>>
>> On 02/06/16 19:07, MaryJo Sminkey wrote:
>> > And you're saying the count for the second query is different than
>> > what was returned in the facet? You may need to check for any defaults
>> > you have set up in the solrconfig for the select parser, if for
>> > instance you have any grouping going on, but aren't doing grouping in
>> > your facet, that could result in the counts being off.
>> >
>> > MJ
>> >
>> >
>> >
>> >
>> > On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
>> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>> >
>> >> Absolutely,

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Well thanks for asking the question because I had no idea what Andrew
posted was even possible... and I most definitely will be using that
myself! Totally brilliant stuff. I am so loving Solr... well, when it's not
driving me bonkers.

Mary Jo


On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Thank you Andrew, that looks like exactly what I am looking for =)
> Thank you Robert, it looks like we are both doing it in similar fashion =)
> Thank you MaryJo  for jumping right in!
>
> Sas
>
>
>
> -Original Message-
> From: Andrew Chillrud [mailto:achill...@opentext.com]
> Sent: Thursday, June 2, 2016 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: RE: [E] Re: Faceting Question(s)
>
> It is possible to get the original facet counts for the field you are
> filtering on (we have been using this since Solr 3.6). Don't know if this
> can be extended to get the original counts for all fields however.
>
> This syntax is described here:
> https://cwiki.apache.org/confluence/display/solr/Faceting
>
> Tagging and Excluding Filters
>
> You can tag specific filters and exclude those filters when faceting. This
> is useful when doing multi-select faceting.
>
> Consider the following example query with faceting:
>
> q=mainquery=status:public=doctype:pdf=true=doctype
>
> Because everything is already constrained by the filter doctype:pdf, the
> facet.field=doctype facet command is currently redundant and will return 0
> counts for everything except doctype:pdf.
>
> To implement a multi-select facet for doctype, a GUI may want to still
> display the other doctype values and their associated counts, as if the
> doctype:pdf constraint had not yet been applied. For example:
> === Document Type ===
>   [ ] Word (42)
>   [x] PDF  (96)
>   [ ] Excel(11)
>   [ ] HTML (63)
>
> To return counts for doctype values that are currently not selected, tag
> filters that directly constrain doctype, and exclude those filters when
> faceting on doctype.
>
>
> q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype
>
> Filter exclusion is supported for all types of facets. Both the tag and ex
> local parameters may specify multiple values by separating them with commas.
>
> - Andy -
>
> -----Original Message-
> From: Robert Brown [mailto:r...@intelcompute.com]
> Sent: Thursday, June 02, 2016 2:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [E] Re: Faceting Question(s)
>
> MaryJo, I think you've mis-understood.  The counts are different simply
> because the 2nd query contains an filter of a facet value from the 1st
> query - that's completely expected.
>
> The issue is how to get the original facet counts (with no filters but
> same q) in the same call as also filtering by one of those facet values.
>
> Personally I don't think it's possible, but will be interested to hear
> others input, since it's a very common situation for me - I cache the first
> result in memcached and tag future queries as related to the first.
>
> Or could you always make 2 calls back to Solr (one original (again), and
> one with the filters), the caches should help massively.
>
>
>
> On 02/06/16 19:07, MaryJo Sminkey wrote:
> > And you're saying the count for the second query is different than
> > what was returned in the facet? You may need to check for any defaults
> > you have set up in the solrconfig for the select parser, if for
> > instance you have any grouping going on, but aren't doing grouping in
> > your facet, that could result in the counts being off.
> >
> > MJ
> >
> >
> >
> >
> > On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
> >
> >> Absolutely,
> >>
> >> Here is what it looks like:
> >>
> >> This brings the right counts as it should http://
> >> **select?q=video=true=*=20=true
> >> cet.field=team
> >>
> >> Then when I specify which team
> >> http://
> >> **select?q=video=true=*=20=true
> >> cet.field=team=team:rollback
> >>
> >> The counts are obviously different now, as the result set is limited
> >> to one team.
> >>
> >> Sas
> >>
> >> -Original Message-
> >> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
> >> Sent: Thursday, June 2, 2016 1:56 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: [E] Re: Faceting Question(s)
> >>
> >> Jamai - what is your q= set to? And do you have a fq for the original
> >> query? I have f

RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Thank you Andrew, that looks like exactly what I am looking for =)
Thank you Robert, it looks like we are both doing it in similar fashion =)
Thank you MaryJo  for jumping right in!

Sas



-Original Message-
From: Andrew Chillrud [mailto:achill...@opentext.com] 
Sent: Thursday, June 2, 2016 2:17 PM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Faceting Question(s)

It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery=status:public=doctype:pdf=true=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com]
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than 
> what was returned in the facet? You may need to check for any defaults 
> you have set up in the solrconfig for the select parser, if for 
> instance you have any grouping going on, but aren't doing grouping in 
> your facet, that could result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz < 
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should http:// 
>> **select?q=video=true=*=20=true
>> cet.field=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video=true=*=20=true
>> cet.field=team=team:rollback
>>
>> The counts are obviously different now, as the result set is limited 
>> to one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original 
>> query? I have found that if you do a wildcard search (*.*) you have 
>> to be careful about other parameters you set as that can often result 
>> in the numbers returned being off. In my case, my defaults had things 
>> like edismax settings for phrase boosting, etc. that don't apply if 
>> there isn't a search term, and once I removed those for a wildcard 
>> search I got the correct numbers. So possibly your facet query itself 
>> may be set up correctly but something else in the parameters and/or 
>> filters with the two queries may be the cause of the difference.
>>
>> Mary Jo
>>
>>
>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < 
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I am working on implementing some basic faceting into

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Ah yes I did misunderstand the question, I thought he was just saying the
count was not the same as what the facet in the first query had returned.

MJ



On Thu, Jun 2, 2016 at 2:11 PM, Robert Brown <r...@intelcompute.com> wrote:

> MaryJo, I think you've mis-understood.  The counts are different simply
> because the 2nd query contains an filter of a facet value from the 1st
> query - that's completely expected.
>
> The issue is how to get the original facet counts (with no filters but
> same q) in the same call as also filtering by one of those facet values.
>
> Personally I don't think it's possible, but will be interested to hear
> others input, since it's a very common situation for me - I cache the first
> result in memcached and tag future queries as related to the first.
>
> Or could you always make 2 calls back to Solr (one original (again), and
> one with the filters), the caches should help massively.
>
>
>
> On 02/06/16 19:07, MaryJo Sminkey wrote:
>
>> And you're saying the count for the second query is different than what
>> was
>> returned in the facet? You may need to check for any defaults you have set
>> up in the solrconfig for the select parser, if for instance you have any
>> grouping going on, but aren't doing grouping in your facet, that could
>> result in the counts being off.
>>
>> MJ
>>
>>
>>
>>
>> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>> Absolutely,
>>>
>>> Here is what it looks like:
>>>
>>> This brings the right counts as it should
>>> http://
>>>
>>> **select?q=video=true=*=20=true=team
>>>
>>> Then when I specify which team
>>> http://
>>>
>>> **select?q=video=true=*=20=true=team=team:rollback
>>>
>>> The counts are obviously different now, as the result set is limited to
>>> one team.
>>>
>>> Sas
>>>
>>> -Original Message-
>>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>>> Sent: Thursday, June 2, 2016 1:56 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: [E] Re: Faceting Question(s)
>>>
>>> Jamai - what is your q= set to? And do you have a fq for the original
>>> query? I have found that if you do a wildcard search (*.*) you have to be
>>> careful about other parameters you set as that can often result in the
>>> numbers returned being off. In my case, my defaults had things like
>>> edismax
>>> settings for phrase boosting, etc. that don't apply if there isn't a
>>> search
>>> term, and once I removed those for a wildcard search I got the correct
>>> numbers. So possibly your facet query itself may be set up correctly but
>>> something else in the parameters and/or filters with the two queries may
>>> be
>>> the cause of the difference.
>>>
>>> Mary Jo
>>>
>>>
>>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
>>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>>
>>> Hello Everyone,
>>>>
>>>> I am working on implementing some basic faceting into my project.
>>>>
>>>> I have it working the way I want to, but I feel like there is probably
>>>> a better way the way I went about it.
>>>>
>>>> * I want to show a category and its count.
>>>> * when someone clicks a category, it sets a FQ= to that category.
>>>>
>>>> But now that the results are being filtered, the category counts from
>>>> the original query without the filters are off.
>>>>
>>>> So, I have a single api call that I make with rows set to 0 and the
>>>> base query without any filters, and use that to display my categories.
>>>>
>>>> And then I call the api again, this time to get the results. And the
>>>> category count is the same.
>>>>
>>>> I hope that makes sense.
>>>>
>>>> I was hoping  facet.query would be of help, but I am not sure I
>>>> understood it properly.
>>>>
>>>> Thanks in advance =)
>>>>
>>>> Sas
>>>>
>>>>
>


RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Andrew Chillrud
It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery=status:public=doctype:pdf=true=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery=status:public={!tag=dt}doctype:pdf=true={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com] 
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than what was
> returned in the facet? You may need to check for any defaults you have set
> up in the solrconfig for the select parser, if for instance you have any
> grouping going on, but aren't doing grouping in your facet, that could
> result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should
>> http://
>> **select?q=video=true=*=20=true=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video=true=*=20=true=team=team:rollback
>>
>> The counts are obviously different now, as the result set is limited to
>> one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original
>> query? I have found that if you do a wildcard search (*.*) you have to be
>> careful about other parameters you set as that can often result in the
>> numbers returned being off. In my case, my defaults had things like edismax
>> settings for phrase boosting, etc. that don't apply if there isn't a search
>> term, and once I removed those for a wildcard search I got the correct
>> numbers. So possibly your facet query itself may be set up correctly but
>> something else in the parameters and/or filters with the two queries may be
>> the cause of the difference.
>>
>> Mary Jo
>>
>>
>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I am working on implementing some basic faceting into my project.
>>>
>>> I have it working the way I want to, but I feel like there is probably
>>> a better way the way I went about it.
>>>
>>> * I want to show a category and its count.
>>> * when someone clicks a category, it sets a FQ= to that category.
>>>
>>> But now that the results are being filtered, the category counts from
>>> the original que

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Robert Brown
MaryJo, I think you've mis-understood.  The counts are different simply 
because the 2nd query contains an filter of a facet value from the 1st 
query - that's completely expected.


The issue is how to get the original facet counts (with no filters but 
same q) in the same call as also filtering by one of those facet values.


Personally I don't think it's possible, but will be interested to hear 
others input, since it's a very common situation for me - I cache the 
first result in memcached and tag future queries as related to the first.


Or could you always make 2 calls back to Solr (one original (again), and 
one with the filters), the caches should help massively.




On 02/06/16 19:07, MaryJo Sminkey wrote:

And you're saying the count for the second query is different than what was
returned in the facet? You may need to check for any defaults you have set
up in the solrconfig for the select parser, if for instance you have any
grouping going on, but aren't doing grouping in your facet, that could
result in the counts being off.

MJ




On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:


Absolutely,

Here is what it looks like:

This brings the right counts as it should
http://
**select?q=video=true=*=20=true=team

Then when I specify which team
http://
**select?q=video=true=*=20=true=team=team:rollback

The counts are obviously different now, as the result set is limited to
one team.

Sas

-Original Message-
From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
Sent: Thursday, June 2, 2016 1:56 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Faceting Question(s)

Jamai - what is your q= set to? And do you have a fq for the original
query? I have found that if you do a wildcard search (*.*) you have to be
careful about other parameters you set as that can often result in the
numbers returned being off. In my case, my defaults had things like edismax
settings for phrase boosting, etc. that don't apply if there isn't a search
term, and once I removed those for a wildcard search I got the correct
numbers. So possibly your facet query itself may be set up correctly but
something else in the parameters and/or filters with the two queries may be
the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:


Hello Everyone,

I am working on implementing some basic faceting into my project.

I have it working the way I want to, but I feel like there is probably
a better way the way I went about it.

* I want to show a category and its count.
* when someone clicks a category, it sets a FQ= to that category.

But now that the results are being filtered, the category counts from
the original query without the filters are off.

So, I have a single api call that I make with rows set to 0 and the
base query without any filters, and use that to display my categories.

And then I call the api again, this time to get the results. And the
category count is the same.

I hope that makes sense.

I was hoping  facet.query would be of help, but I am not sure I
understood it properly.

Thanks in advance =)

Sas





Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
And you're saying the count for the second query is different than what was
returned in the facet? You may need to check for any defaults you have set
up in the solrconfig for the select parser, if for instance you have any
grouping going on, but aren't doing grouping in your facet, that could
result in the counts being off.

MJ




On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Absolutely,
>
> Here is what it looks like:
>
> This brings the right counts as it should
> http://
> **select?q=video=true=*=20=true=team
>
> Then when I specify which team
> http://
> **select?q=video=true=*=20=true=team=team:rollback
>
> The counts are obviously different now, as the result set is limited to
> one team.
>
> Sas
>
> -Original Message-
> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
> Sent: Thursday, June 2, 2016 1:56 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: Faceting Question(s)
>
> Jamai - what is your q= set to? And do you have a fq for the original
> query? I have found that if you do a wildcard search (*.*) you have to be
> careful about other parameters you set as that can often result in the
> numbers returned being off. In my case, my defaults had things like edismax
> settings for phrase boosting, etc. that don't apply if there isn't a search
> term, and once I removed those for a wildcard search I got the correct
> numbers. So possibly your facet query itself may be set up correctly but
> something else in the parameters and/or filters with the two queries may be
> the cause of the difference.
>
> Mary Jo
>
>
> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
> > Hello Everyone,
> >
> > I am working on implementing some basic faceting into my project.
> >
> > I have it working the way I want to, but I feel like there is probably
> > a better way the way I went about it.
> >
> > * I want to show a category and its count.
> > * when someone clicks a category, it sets a FQ= to that category.
> >
> > But now that the results are being filtered, the category counts from
> > the original query without the filters are off.
> >
> > So, I have a single api call that I make with rows set to 0 and the
> > base query without any filters, and use that to display my categories.
> >
> > And then I call the api again, this time to get the results. And the
> > category count is the same.
> >
> > I hope that makes sense.
> >
> > I was hoping  facet.query would be of help, but I am not sure I
> > understood it properly.
> >
> > Thanks in advance =)
> >
> > Sas
> >
>


RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Absolutely,

Here is what it looks like:

This brings the right counts as it should
http://**select?q=video=true=*=20=true=team

Then when I specify which team
http://**select?q=video=true=*=20=true=team=team:rollback

The counts are obviously different now, as the result set is limited to one 
team.

Sas

-Original Message-
From: MaryJo Sminkey [mailto:mjsmin...@gmail.com] 
Sent: Thursday, June 2, 2016 1:56 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Faceting Question(s)

Jamai - what is your q= set to? And do you have a fq for the original query? I 
have found that if you do a wildcard search (*.*) you have to be careful about 
other parameters you set as that can often result in the numbers returned being 
off. In my case, my defaults had things like edismax settings for phrase 
boosting, etc. that don't apply if there isn't a search term, and once I 
removed those for a wildcard search I got the correct numbers. So possibly your 
facet query itself may be set up correctly but something else in the parameters 
and/or filters with the two queries may be the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably 
> a better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from 
> the original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the 
> base query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the 
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I 
> understood it properly.
>
> Thanks in advance =)
>
> Sas
>


Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
In other words... to diagnose such a problem it would really help to see
the exact parameters and filters you are using on each of the searches.

Mary Jo

On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably a
> better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from the
> original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the base
> query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I understood
> it properly.
>
> Thanks in advance =)
>
> Sas
>


Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Jamai - what is your q= set to? And do you have a fq for the original
query? I have found that if you do a wildcard search (*.*) you have to be
careful about other parameters you set as that can often result in the
numbers returned being off. In my case, my defaults had things like edismax
settings for phrase boosting, etc. that don't apply if there isn't a search
term, and once I removed those for a wildcard search I got the correct
numbers. So possibly your facet query itself may be set up correctly but
something else in the parameters and/or filters with the two queries may be
the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably a
> better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from the
> original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the base
> query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I understood
> it properly.
>
> Thanks in advance =)
>
> Sas
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-31 Thread Alessandro Benedetti
Interesting developments :

https://issues.apache.org/jira/browse/SOLR-9176

I think we found why term Enum seems slower in recent Solr !
In our case it is likely to be related to the commit I mention in the Jira.
Have a check Joel !

On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> I am investigating this scenario right now.
> I can confirm that the enum slowness is in Solr 6.0 as well.
> And I agree with Joel, it seems to be un-related with the famous faceting
> regression :(
>
> Furthermore with the legacy facet approach, if you set docValues for the
> field you are not going to be able to try the enum approach anymore.
>
> org/apache/solr/request/SimpleFacets.java:448
>
> if (method == FacetMethod.ENUM && sf.hasDocValues()) {
>   // only fc can handle docvalues types
>   method = FacetMethod.FC;
> }
>
>
> I got really horrible regressions simply using term enum in both Solr 4
> and Solr 6.
>
> And even the most optimized fcs approach with docValues and
> facet.threads=nCore does not perform as the simple enum in Solr 4 .
>
> i.e.
>
> For some sample queries I have 40 ms vs 160 ms and similar...
> I think we should open an issue if we can confirm it is not related with
> the other.
> A lot of people will continue using the legacy approach for a while...
>
> On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> wrote:
>
>> The enum slowness is interesting. It would appear on the surface to not be
>> related to the FieldCache issue. I don't think the main emphasis of the
>> JSON facet API has been the enum approach. You may find using the JSON
>> facet API and eliminating the use of enum meets your performance needs.
>>
>> With the CollapsingQParserPlugin top_fc is definitely faster during
>> queries. The tradeoff is slower warming times and increased memory usage
>> if
>> the collapse fields are used in faceting, as faceting will load the field
>> into a different cache.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
>>
>> > Joel,
>> >
>> > Thank you for taking the time to respond to my question.  I tried the
>> JSON
>> > Facet API for one query that uses facet.method=enum (since this one has
>> a
>> > ton of unique values and performed better with enum) but this was way
>> > slower than even the slower Solr 5 times.  I did not try the new API
>> with
>> > the non-enum queries though so I will give that a go.  It looks like
>> Solr
>> > 5.5.1 also has a facet.method=uif which will be interesting to try.
>> >
>> > If these do not prove helpful, it looks like I will need to wait for
>> > SOLR-8096 to be resolved before upgrading.
>> >
>> > Thanks also for your comment on top_fc for the CollapsingQParser.  I use
>> > collapse/expand for some queries but traditional grouping for others
>> due to
>> > performance.  It will be interesting to see if those grouping queries
>> > perform better now using CollapsingQParser with top_fc.
>> >
>> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
>> > wrote:
>> >
>> > > Yes, SOLR-8096 is the issue here.
>> > >
>> > > I don't believe indexing with docValues is going to help too much with
>> > > this. The enum slowness may not be related, but I'm not positive about
>> > > that.
>> > >
>> > > The major slowdowns are likely due to the removal of the top level
>> > > FieldCache from general use and the removal of the FieldValuesCache
>> which
>> > > was used for multi-value field faceting.
>> > >
>> > > The JSON facet API covers all the functionality in the traditional
>> > > faceting, and it has been developed to be very performant.
>> > >
>> > > You may also want to see if Collapse/Expand can meet your applications
>> > > needs rather Grouping. It allows you to specify using a top level
>> > > FieldCache if performance is a blocker without it.
>> > >
>> > >
>> > >
>> > >
>> > > Joel Bernstein
>> > > http://joelsolr.blogspot.com/
>> > >
>> > > On Wed, May 18, 2016 at 10:42 AM, Solr User 
>> wrote:
>> > >
>> > > > Does anyone know the answer to this?
>> > > >
>> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User 
>> wrote:
>> > > >
>> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
>> > but
>> > > > had
>> > > > > to abort due to average response times degraded from a baseline
>> > volume
>> > > > > performance test.  The affected queries involved faceting (both
>> enum
>> > > > method
>> > > > > and default) and grouping.  There is a critical bug
>> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
>> > which I
>> > > > > gather is the cause of the slower response times.  One concern I
>> have
>> > > is
>> > > > > that discussions around the issue offer the suggestion of indexing
>> > with
>> > > > > docValues which alleviated the problem in at least that one
>> reported
>> > > > case.
>> > > > > However, indexing with docValues 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-25 Thread Alessandro Benedetti
I am investigating this scenario right now.
I can confirm that the enum slowness is in Solr 6.0 as well.
And I agree with Joel, it seems to be un-related with the famous faceting
regression :(

Furthermore with the legacy facet approach, if you set docValues for the
field you are not going to be able to try the enum approach anymore.

org/apache/solr/request/SimpleFacets.java:448

if (method == FacetMethod.ENUM && sf.hasDocValues()) {
  // only fc can handle docvalues types
  method = FacetMethod.FC;
}


I got really horrible regressions simply using term enum in both Solr 4 and
Solr 6.

And even the most optimized fcs approach with docValues and
facet.threads=nCore does not perform as the simple enum in Solr 4 .

i.e.

For some sample queries I have 40 ms vs 160 ms and similar...
I think we should open an issue if we can confirm it is not related with
the other.
A lot of people will continue using the legacy approach for a while...

On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein  wrote:

> The enum slowness is interesting. It would appear on the surface to not be
> related to the FieldCache issue. I don't think the main emphasis of the
> JSON facet API has been the enum approach. You may find using the JSON
> facet API and eliminating the use of enum meets your performance needs.
>
> With the CollapsingQParserPlugin top_fc is definitely faster during
> queries. The tradeoff is slower warming times and increased memory usage if
> the collapse fields are used in faceting, as faceting will load the field
> into a different cache.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
>
> > Joel,
> >
> > Thank you for taking the time to respond to my question.  I tried the
> JSON
> > Facet API for one query that uses facet.method=enum (since this one has a
> > ton of unique values and performed better with enum) but this was way
> > slower than even the slower Solr 5 times.  I did not try the new API with
> > the non-enum queries though so I will give that a go.  It looks like Solr
> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >
> > If these do not prove helpful, it looks like I will need to wait for
> > SOLR-8096 to be resolved before upgrading.
> >
> > Thanks also for your comment on top_fc for the CollapsingQParser.  I use
> > collapse/expand for some queries but traditional grouping for others due
> to
> > performance.  It will be interesting to see if those grouping queries
> > perform better now using CollapsingQParser with top_fc.
> >
> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> > wrote:
> >
> > > Yes, SOLR-8096 is the issue here.
> > >
> > > I don't believe indexing with docValues is going to help too much with
> > > this. The enum slowness may not be related, but I'm not positive about
> > > that.
> > >
> > > The major slowdowns are likely due to the removal of the top level
> > > FieldCache from general use and the removal of the FieldValuesCache
> which
> > > was used for multi-value field faceting.
> > >
> > > The JSON facet API covers all the functionality in the traditional
> > > faceting, and it has been developed to be very performant.
> > >
> > > You may also want to see if Collapse/Expand can meet your applications
> > > needs rather Grouping. It allows you to specify using a top level
> > > FieldCache if performance is a blocker without it.
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
> > >
> > > > Does anyone know the answer to this?
> > > >
> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> > > >
> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
> > but
> > > > had
> > > > > to abort due to average response times degraded from a baseline
> > volume
> > > > > performance test.  The affected queries involved faceting (both
> enum
> > > > method
> > > > > and default) and grouping.  There is a critical bug
> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
> > which I
> > > > > gather is the cause of the slower response times.  One concern I
> have
> > > is
> > > > > that discussions around the issue offer the suggestion of indexing
> > with
> > > > > docValues which alleviated the problem in at least that one
> reported
> > > > case.
> > > > > However, indexing with docValues did not improve the performance in
> > my
> > > > case.
> > > > >
> > > > > Can someone please confirm or correct my understanding that this
> > issue
> > > > has
> > > > > no path forward at this time and specifically that it is already
> > known
> > > > that
> > > > > docValues does not necessarily solve this?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Joel Bernstein
The enum slowness is interesting. It would appear on the surface to not be
related to the FieldCache issue. I don't think the main emphasis of the
JSON facet API has been the enum approach. You may find using the JSON
facet API and eliminating the use of enum meets your performance needs.

With the CollapsingQParserPlugin top_fc is definitely faster during
queries. The tradeoff is slower warming times and increased memory usage if
the collapse fields are used in faceting, as faceting will load the field
into a different cache.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:

> Joel,
>
> Thank you for taking the time to respond to my question.  I tried the JSON
> Facet API for one query that uses facet.method=enum (since this one has a
> ton of unique values and performed better with enum) but this was way
> slower than even the slower Solr 5 times.  I did not try the new API with
> the non-enum queries though so I will give that a go.  It looks like Solr
> 5.5.1 also has a facet.method=uif which will be interesting to try.
>
> If these do not prove helpful, it looks like I will need to wait for
> SOLR-8096 to be resolved before upgrading.
>
> Thanks also for your comment on top_fc for the CollapsingQParser.  I use
> collapse/expand for some queries but traditional grouping for others due to
> performance.  It will be interesting to see if those grouping queries
> perform better now using CollapsingQParser with top_fc.
>
> On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> wrote:
>
> > Yes, SOLR-8096 is the issue here.
> >
> > I don't believe indexing with docValues is going to help too much with
> > this. The enum slowness may not be related, but I'm not positive about
> > that.
> >
> > The major slowdowns are likely due to the removal of the top level
> > FieldCache from general use and the removal of the FieldValuesCache which
> > was used for multi-value field faceting.
> >
> > The JSON facet API covers all the functionality in the traditional
> > faceting, and it has been developed to be very performant.
> >
> > You may also want to see if Collapse/Expand can meet your applications
> > needs rather Grouping. It allows you to specify using a top level
> > FieldCache if performance is a blocker without it.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
> >
> > > Does anyone know the answer to this?
> > >
> > > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> > >
> > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
> but
> > > had
> > > > to abort due to average response times degraded from a baseline
> volume
> > > > performance test.  The affected queries involved faceting (both enum
> > > method
> > > > and default) and grouping.  There is a critical bug
> > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
> which I
> > > > gather is the cause of the slower response times.  One concern I have
> > is
> > > > that discussions around the issue offer the suggestion of indexing
> with
> > > > docValues which alleviated the problem in at least that one reported
> > > case.
> > > > However, indexing with docValues did not improve the performance in
> my
> > > case.
> > > >
> > > > Can someone please confirm or correct my understanding that this
> issue
> > > has
> > > > no path forward at this time and specifically that it is already
> known
> > > that
> > > > docValues does not necessarily solve this?
> > > >
> > > > Thanks in advance!
> > > >
> > > >
> > > >
> > >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Joel,

Thank you for taking the time to respond to my question.  I tried the JSON
Facet API for one query that uses facet.method=enum (since this one has a
ton of unique values and performed better with enum) but this was way
slower than even the slower Solr 5 times.  I did not try the new API with
the non-enum queries though so I will give that a go.  It looks like Solr
5.5.1 also has a facet.method=uif which will be interesting to try.

If these do not prove helpful, it looks like I will need to wait for
SOLR-8096 to be resolved before upgrading.

Thanks also for your comment on top_fc for the CollapsingQParser.  I use
collapse/expand for some queries but traditional grouping for others due to
performance.  It will be interesting to see if those grouping queries
perform better now using CollapsingQParser with top_fc.

On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein  wrote:

> Yes, SOLR-8096 is the issue here.
>
> I don't believe indexing with docValues is going to help too much with
> this. The enum slowness may not be related, but I'm not positive about
> that.
>
> The major slowdowns are likely due to the removal of the top level
> FieldCache from general use and the removal of the FieldValuesCache which
> was used for multi-value field faceting.
>
> The JSON facet API covers all the functionality in the traditional
> faceting, and it has been developed to be very performant.
>
> You may also want to see if Collapse/Expand can meet your applications
> needs rather Grouping. It allows you to specify using a top level
> FieldCache if performance is a blocker without it.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
>
> > Does anyone know the answer to this?
> >
> > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> >
> > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> > had
> > > to abort due to average response times degraded from a baseline volume
> > > performance test.  The affected queries involved faceting (both enum
> > method
> > > and default) and grouping.  There is a critical bug
> > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > > gather is the cause of the slower response times.  One concern I have
> is
> > > that discussions around the issue offer the suggestion of indexing with
> > > docValues which alleviated the problem in at least that one reported
> > case.
> > > However, indexing with docValues did not improve the performance in my
> > case.
> > >
> > > Can someone please confirm or correct my understanding that this issue
> > has
> > > no path forward at this time and specifically that it is already known
> > that
> > > docValues does not necessarily solve this?
> > >
> > > Thanks in advance!
> > >
> > >
> > >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Joel Bernstein
Yes, SOLR-8096 is the issue here.

I don't believe indexing with docValues is going to help too much with
this. The enum slowness may not be related, but I'm not positive about
that.

The major slowdowns are likely due to the removal of the top level
FieldCache from general use and the removal of the FieldValuesCache which
was used for multi-value field faceting.

The JSON facet API covers all the functionality in the traditional
faceting, and it has been developed to be very performant.

You may also want to see if Collapse/Expand can meet your applications
needs rather Grouping. It allows you to specify using a top level
FieldCache if performance is a blocker without it.




Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:

> Does anyone know the answer to this?
>
> On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
>
> > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> had
> > to abort due to average response times degraded from a baseline volume
> > performance test.  The affected queries involved faceting (both enum
> method
> > and default) and grouping.  There is a critical bug
> > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > gather is the cause of the slower response times.  One concern I have is
> > that discussions around the issue offer the suggestion of indexing with
> > docValues which alleviated the problem in at least that one reported
> case.
> > However, indexing with docValues did not improve the performance in my
> case.
> >
> > Can someone please confirm or correct my understanding that this issue
> has
> > no path forward at this time and specifically that it is already known
> that
> > docValues does not necessarily solve this?
> >
> > Thanks in advance!
> >
> >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Does anyone know the answer to this?

On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:

> I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
> to abort due to average response times degraded from a baseline volume
> performance test.  The affected queries involved faceting (both enum method
> and default) and grouping.  There is a critical bug
> https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> gather is the cause of the slower response times.  One concern I have is
> that discussions around the issue offer the suggestion of indexing with
> docValues which alleviated the problem in at least that one reported case.
> However, indexing with docValues did not improve the performance in my case.
>
> Can someone please confirm or correct my understanding that this issue has
> no path forward at this time and specifically that it is already known that
> docValues does not necessarily solve this?
>
> Thanks in advance!
>
>
>


Re: Faceting and multiValued field type

2016-01-19 Thread Erick Erickson
Yes.

What do you mean "how does it work"? The low-level
details or what?

Basically, faceting just... facets. I.e. for each unique
value in the field specified it counts the number of
docs in the result set that have that value.

So if you have a doc with two dates and facet on that
field, say 1/1/2015 and 1/1/2016,
that doc will be counted in each bucket.

Best,
Erick

On Tue, Jan 19, 2016 at 8:48 AM, Steven White  wrote:
> Hi everyone,
>
> Can I use facet on a field type of multiValued?  If so, how does facet work
> with field type of "date" set as multiValued?
>
> Thanks
>
> Steve


Re: Faceting and multiValued field type

2016-01-19 Thread Steven White
My apology for not being clear -- I left out the keyword "range search"
with facet.  Let me try again.

Using DateRangeField field type, if this field is multiValued and I have 3
date values stored for one record, 5 for another, etc., which of those date
values will be used for faceting when I use range-search faceting on this
field?

Don't I have the same issue on other field types when it comes to range
searches?  Such as CurrencyField, or int, float, etc.

-- George

On Tue, Jan 19, 2016 at 1:10 PM, Erick Erickson 
wrote:

> Yes.
>
> What do you mean "how does it work"? The low-level
> details or what?
>
> Basically, faceting just... facets. I.e. for each unique
> value in the field specified it counts the number of
> docs in the result set that have that value.
>
> So if you have a doc with two dates and facet on that
> field, say 1/1/2015 and 1/1/2016,
> that doc will be counted in each bucket.
>
> Best,
> Erick
>
> On Tue, Jan 19, 2016 at 8:48 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > Can I use facet on a field type of multiValued?  If so, how does facet
> work
> > with field type of "date" set as multiValued?
> >
> > Thanks
> >
> > Steve
>


Re: Faceting and multiValued field type

2016-01-19 Thread Erick Erickson
bq: which of those date values will be used for faceting when I
use range-search faceting on this field?

All of them. Which values match in a multiValued field, range
query or not, have no bearing on the facet counts. Faceting
essentially says "take all the docs that match the query and,
for each unique value in the field being faceted upon, tell
me how many of the docs have that value."

So I have docs with a multivalued field f1 like this
doc1 f1=>a,b,c,d
doc2 f1=>c,d,e,f

q=f1:[b TO c]

Both docs are hits, and you get back facets like
a-1
b-1
c-2
d-2
e-1
f-1

q=*:*, q=f1:c would give the same facet counts.

Best,
Erick

On Tue, Jan 19, 2016 at 11:28 AM, Steven White  wrote:
> My apology for not being clear -- I left out the keyword "range search"
> with facet.  Let me try again.
>
> Using DateRangeField field type, if this field is multiValued and I have 3
> date values stored for one record, 5 for another, etc., which of those date
> values will be used for faceting when I use range-search faceting on this
> field?
>
> Don't I have the same issue on other field types when it comes to range
> searches?  Such as CurrencyField, or int, float, etc.
>
> -- George
>
> On Tue, Jan 19, 2016 at 1:10 PM, Erick Erickson 
> wrote:
>
>> Yes.
>>
>> What do you mean "how does it work"? The low-level
>> details or what?
>>
>> Basically, faceting just... facets. I.e. for each unique
>> value in the field specified it counts the number of
>> docs in the result set that have that value.
>>
>> So if you have a doc with two dates and facet on that
>> field, say 1/1/2015 and 1/1/2016,
>> that doc will be counted in each bucket.
>>
>> Best,
>> Erick
>>
>> On Tue, Jan 19, 2016 at 8:48 AM, Steven White 
>> wrote:
>> > Hi everyone,
>> >
>> > Can I use facet on a field type of multiValued?  If so, how does facet
>> work
>> > with field type of "date" set as multiValued?
>> >
>> > Thanks
>> >
>> > Steve
>>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-20 Thread William Bell
Thanks Jamie.

On Sat, Dec 19, 2015 at 11:31 PM, Jamie Johnson  wrote:

> Bill,
>
> Check out the patch attached to
> https://issues.apache.org/jira/browse/SOLR-8096.  I had considered making
> the method uif after I had done most of the work, it would be trivial to
> change and would probably be more aligned with not adding unexpected
> changes to people that are currently using fc.
>
> -Jamie
>
> On Sat, Dec 19, 2015 at 11:03 PM, William Bell 
> wrote:
>
> > Can we add method=uif back when not using the JSON Facet API too?
> >
> > That would help a lot of people.
> >
> > On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley  wrote:
> >
> > > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> > > wrote:
> > > > Hi all,
> > > >
> > > > given that solr 5.4 is finally released, is this what's more stable
> and
> > > > efficient version of solrcloud ?
> > > >
> > > > I have a website which receives many search requests. It serve
> normally
> > > > about 2000 concurrent requests, but sometime there are peak from 4000
> > to
> > > > 1 requests in few seconds.
> > > >
> > > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1
> cluster
> > > to
> > > > a new brand version, but following this thread I read about the
> > problems
> > > > that can occur upgrading to latest version.
> > > >
> > > > I have seen that issue SOLR-7730 "speed-up faceting on doc values
> > fields"
> > > > is fixed in 5.4.
> > > >
> > > > I'm using standard faceting without docValues. Should I add docValues
> > in
> > > > order to benefit of such fix?
> > >
> > > You'll have to try it I think...
> > > DocValues have a lot of advantages (much less heap consumption, and
> > > much smaller overhead when opening a new searcher), but they can often
> > > be slower as well.
> > >
> > > Comparing 4x to 5x non-docvalues, top-level field caches were removed
> > > by lucene, and while that benefits certain things like NRT (opening a
> > > new searcher very often), it will hurt performance for other
> > > configurations.
> > >
> > > The JSON Facet API currently allows you to pick your strategy via the
> > > "method" param for multi-valued string fields without docvalues:
> > > "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> > > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> > > "per-segment" strategy.
> > >
> > > -Yonik
> > >
> >
> >
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-19 Thread William Bell
Can we add method=uif back when not using the JSON Facet API too?

That would help a lot of people.

On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley  wrote:

> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 1 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
>
> You'll have to try it I think...
> DocValues have a lot of advantages (much less heap consumption, and
> much smaller overhead when opening a new searcher), but they can often
> be slower as well.
>
> Comparing 4x to 5x non-docvalues, top-level field caches were removed
> by lucene, and while that benefits certain things like NRT (opening a
> new searcher very often), it will hurt performance for other
> configurations.
>
> The JSON Facet API currently allows you to pick your strategy via the
> "method" param for multi-valued string fields without docvalues:
> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> "per-segment" strategy.
>
> -Yonik
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-19 Thread Jamie Johnson
Bill,

Check out the patch attached to
https://issues.apache.org/jira/browse/SOLR-8096.  I had considered making
the method uif after I had done most of the work, it would be trivial to
change and would probably be more aligned with not adding unexpected
changes to people that are currently using fc.

-Jamie

On Sat, Dec 19, 2015 at 11:03 PM, William Bell  wrote:

> Can we add method=uif back when not using the JSON Facet API too?
>
> That would help a lot of people.
>
> On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley  wrote:
>
> > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> > wrote:
> > > Hi all,
> > >
> > > given that solr 5.4 is finally released, is this what's more stable and
> > > efficient version of solrcloud ?
> > >
> > > I have a website which receives many search requests. It serve normally
> > > about 2000 concurrent requests, but sometime there are peak from 4000
> to
> > > 1 requests in few seconds.
> > >
> > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> > to
> > > a new brand version, but following this thread I read about the
> problems
> > > that can occur upgrading to latest version.
> > >
> > > I have seen that issue SOLR-7730 "speed-up faceting on doc values
> fields"
> > > is fixed in 5.4.
> > >
> > > I'm using standard faceting without docValues. Should I add docValues
> in
> > > order to benefit of such fix?
> >
> > You'll have to try it I think...
> > DocValues have a lot of advantages (much less heap consumption, and
> > much smaller overhead when opening a new searcher), but they can often
> > be slower as well.
> >
> > Comparing 4x to 5x non-docvalues, top-level field caches were removed
> > by lucene, and while that benefits certain things like NRT (opening a
> > new searcher very often), it will hurt performance for other
> > configurations.
> >
> > The JSON Facet API currently allows you to pick your strategy via the
> > "method" param for multi-valued string fields without docvalues:
> > "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> > "per-segment" strategy.
> >
> > -Yonik
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-18 Thread Jamie Johnson
Can we still specify the cache implementation for the field cache?  When
this change occurred to faceting (uninverting reader vs field ) it
prevented us from moving to 5.x but if we can get the 4.x functionality
using that api we could look to port to the latest.

Jamie
On Dec 17, 2015 9:18 AM, "Yonik Seeley"  wrote:

> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 1 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
>
> You'll have to try it I think...
> DocValues have a lot of advantages (much less heap consumption, and
> much smaller overhead when opening a new searcher), but they can often
> be slower as well.
>
> Comparing 4x to 5x non-docvalues, top-level field caches were removed
> by lucene, and while that benefits certain things like NRT (opening a
> new searcher very often), it will hurt performance for other
> configurations.
>
> The JSON Facet API currently allows you to pick your strategy via the
> "method" param for multi-valued string fields without docvalues:
> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> "per-segment" strategy.
>
> -Yonik
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-18 Thread Jamie Johnson
Also can we get the capability to choose the method of faceting in the
older faceting component?  I'm not looking for complete feature parity just
the ability to specify the method.  As always thanks.

On Fri, Dec 18, 2015 at 8:04 AM, Jamie Johnson  wrote:

> Can we still specify the cache implementation for the field cache?  When
> this change occurred to faceting (uninverting reader vs field ) it
> prevented us from moving to 5.x but if we can get the 4.x functionality
> using that api we could look to port to the latest.
>
> Jamie
> On Dec 17, 2015 9:18 AM, "Yonik Seeley"  wrote:
>
>> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
>> wrote:
>> > Hi all,
>> >
>> > given that solr 5.4 is finally released, is this what's more stable and
>> > efficient version of solrcloud ?
>> >
>> > I have a website which receives many search requests. It serve normally
>> > about 2000 concurrent requests, but sometime there are peak from 4000 to
>> > 1 requests in few seconds.
>> >
>> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
>> to
>> > a new brand version, but following this thread I read about the problems
>> > that can occur upgrading to latest version.
>> >
>> > I have seen that issue SOLR-7730 "speed-up faceting on doc values
>> fields"
>> > is fixed in 5.4.
>> >
>> > I'm using standard faceting without docValues. Should I add docValues in
>> > order to benefit of such fix?
>>
>> You'll have to try it I think...
>> DocValues have a lot of advantages (much less heap consumption, and
>> much smaller overhead when opening a new searcher), but they can often
>> be slower as well.
>>
>> Comparing 4x to 5x non-docvalues, top-level field caches were removed
>> by lucene, and while that benefits certain things like NRT (opening a
>> new searcher very often), it will hurt performance for other
>> configurations.
>>
>> The JSON Facet API currently allows you to pick your strategy via the
>> "method" param for multi-valued string fields without docvalues:
>> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
>> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
>> "per-segment" strategy.
>>
>> -Yonik
>>
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-17 Thread Mikhail Khludnev
This fix definitely help for facet.field over docvalues field on
mult-segment index since 5.4.
I suppose it's irrelevant to JSON Facets, non-dv field, and pre 5.4.
I can not comment about comparing perfomance of dv and non-dv fields,
because "it depends" (с) benchmarking and profiler are the only advisers.

On Thu, Dec 17, 2015 at 9:22 AM, William Bell  wrote:

> Same question here
>
> Wondering if faceting performance is fixed and how to take advantage of it
> ?
>
> On Wed, Dec 16, 2015 at 2:57 AM, Vincenzo D'Amore 
> wrote:
>
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 1 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
> >
> > Best regards,
> > Vincenzo
> >
> >
> >
> > On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com
> > > wrote:
> >
> > > Uwe, it's good to know! I mean that you've recovered. Take care!
> > >
> > > On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh 
> > > wrote:
> > >
> > > > Sorry for the delay. I had an ugly flu.
> > > >
> > > > SOLR-7730 seems to work fine. Using docValues with Solr
> > > > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast
> again.
> > > > (90ms vs. 2ms) :-)
> > > >
> > > > Thanks
> > > > Uwe
> > > >
> > > >
> > > >
> > > >
> > > > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> > > >
> > > >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <
> r...@hebis.uni-frankfurt.de>
> > > >> wrote:
> > > >>
> > > >> When 5.4 with SOLR-7730 will be released, I will start to use
> > docValues.
> > > >>> Going this way, seems more straight forward to me.
> > > >>>
> > > >>
> > > >>
> > > >> Sure. Giving your answers docValues facets has a really good chance
> to
> > > >> perform in your index after SOLR-7730. It's really interesting to
> see
> > > >> performance numbers on early 5.4 builds:
> > > >>
> > > >>
> > >
> >
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> > > >>
> > > >>
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> > >
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-17 Thread Yonik Seeley
On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore  wrote:
> Hi all,
>
> given that solr 5.4 is finally released, is this what's more stable and
> efficient version of solrcloud ?
>
> I have a website which receives many search requests. It serve normally
> about 2000 concurrent requests, but sometime there are peak from 4000 to
> 1 requests in few seconds.
>
> On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
> a new brand version, but following this thread I read about the problems
> that can occur upgrading to latest version.
>
> I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> is fixed in 5.4.
>
> I'm using standard faceting without docValues. Should I add docValues in
> order to benefit of such fix?

You'll have to try it I think...
DocValues have a lot of advantages (much less heap consumption, and
much smaller overhead when opening a new searcher), but they can often
be slower as well.

Comparing 4x to 5x non-docvalues, top-level field caches were removed
by lucene, and while that benefits certain things like NRT (opening a
new searcher very often), it will hurt performance for other
configurations.

The JSON Facet API currently allows you to pick your strategy via the
"method" param for multi-valued string fields without docvalues:
"uif" (UninvertedField) gets you the top-level strategy from Solr 4,
while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
"per-segment" strategy.

-Yonik


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-16 Thread William Bell
Same question here

Wondering if faceting performance is fixed and how to take advantage of it ?

On Wed, Dec 16, 2015 at 2:57 AM, Vincenzo D'Amore 
wrote:

> Hi all,
>
> given that solr 5.4 is finally released, is this what's more stable and
> efficient version of solrcloud ?
>
> I have a website which receives many search requests. It serve normally
> about 2000 concurrent requests, but sometime there are peak from 4000 to
> 1 requests in few seconds.
>
> On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
> a new brand version, but following this thread I read about the problems
> that can occur upgrading to latest version.
>
> I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> is fixed in 5.4.
>
> I'm using standard faceting without docValues. Should I add docValues in
> order to benefit of such fix?
>
> Best regards,
> Vincenzo
>
>
>
> On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > wrote:
>
> > Uwe, it's good to know! I mean that you've recovered. Take care!
> >
> > On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh 
> > wrote:
> >
> > > Sorry for the delay. I had an ugly flu.
> > >
> > > SOLR-7730 seems to work fine. Using docValues with Solr
> > > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > > (90ms vs. 2ms) :-)
> > >
> > > Thanks
> > > Uwe
> > >
> > >
> > >
> > >
> > > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> > >
> > >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
> > >> wrote:
> > >>
> > >> When 5.4 with SOLR-7730 will be released, I will start to use
> docValues.
> > >>> Going this way, seems more straight forward to me.
> > >>>
> > >>
> > >>
> > >> Sure. Giving your answers docValues facets has a really good chance to
> > >> perform in your index after SOLR-7730. It's really interesting to see
> > >> performance numbers on early 5.4 builds:
> > >>
> > >>
> >
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> > >>
> > >>
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-16 Thread Vincenzo D'Amore
Hi all,

given that solr 5.4 is finally released, is this what's more stable and
efficient version of solrcloud ?

I have a website which receives many search requests. It serve normally
about 2000 concurrent requests, but sometime there are peak from 4000 to
1 requests in few seconds.

On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
a new brand version, but following this thread I read about the problems
that can occur upgrading to latest version.

I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
is fixed in 5.4.

I'm using standard faceting without docValues. Should I add docValues in
order to benefit of such fix?

Best regards,
Vincenzo



On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev  wrote:

> Uwe, it's good to know! I mean that you've recovered. Take care!
>
> On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh 
> wrote:
>
> > Sorry for the delay. I had an ugly flu.
> >
> > SOLR-7730 seems to work fine. Using docValues with Solr
> > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > (90ms vs. 2ms) :-)
> >
> > Thanks
> > Uwe
> >
> >
> >
> >
> > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> >
> >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
> >> wrote:
> >>
> >> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
> >>> Going this way, seems more straight forward to me.
> >>>
> >>
> >>
> >> Sure. Giving your answers docValues facets has a really good chance to
> >> perform in your index after SOLR-7730. It's really interesting to see
> >> performance numbers on early 5.4 builds:
> >>
> >>
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> >>
> >>
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: [Faceting] Exact Value Faceting VS ID Faceting

2015-11-26 Thread Toke Eskildsen
On Wed, 2015-11-25 at 15:56 +, Alessandro Benedetti wrote:
> I would like to have docValues because facets are going to be heavy on
> those fields.

> *Faceting approach *
> *1) *Indexing the human readable field value

Technically this will be a SORTED or SORTED_SET, which again means that
a pool of terms is maintained for each segment. The mapping from
documents to terms are done using ordinals, which are not comparable
across segments.

> Facets will be returned readable, out of the box.
> I can not see any cons in this approach, I would say it is the standard one.

With multiple segments, the terms from each segment must somehow be
aligned, do avoid duplicate entries in the result. This can either be
done be creating a segment_ordinal->global_ordinal map upon first
faceting call (facet.method=fc) or by on-the-fly comparison of top-X
terms from each segment (facet.method=fcs). Either way, there is a
performance penalty.

>- When calculating faceting, in memory it is used the ordinal for each
>term, which means in memory we don't waste space for the actual term, or
>waste the time looking up for the value until the very end of the process,
>after the counts are done .

The segment_ordinal->global_ordinal requires memory linear to the number
of unique values in the field. If fcs is used, there will be more term
lookups.

> *2)* Correlate outside the search system each term to a custom ID. Index
> the custom ID. After facets are calculated resolve the ID and show the
> human readable labels.

Assuming the ID is an integer (about the only thing that makes sense),
this ensures that the IDs are comparable across segments, so no
segment->global mapping is needed. This removes the performance penalty
described above and is (as far as I understand) the principle behind
Lucene faceting.

On the other hand, this approach is indeed more complicated and it
introduces another hotspot both for indexing (as document construction
requires a lookup in the term provider) and searching (for resolving the
final terms).



If we had a hashing method String->long and guaranteed that there would
be no collisions (or we accepted the occasional faulty result), then we
could avoid the segment->global map as well as the centralized term
server. To my knowledge, this has not yet been attempted.

- Toke Eskildsen




Re: [Faceting] Exact Value Faceting VS ID Faceting

2015-11-26 Thread Alessandro Benedetti
Thanks Toke for the answer, let me comment inline :

On 26 November 2015 at 08:32, Toke Eskildsen  wrote:

> On Wed, 2015-11-25 at 15:56 +, Alessandro Benedetti wrote:
> > I would like to have docValues because facets are going to be heavy on
> > those fields.
>
> > *Faceting approach *
> > *1) *Indexing the human readable field value
>
> Technically this will be a SORTED or SORTED_SET, which again means that
> a pool of terms is maintained for each segment. The mapping from
> documents to terms are done using ordinals, which are not comparable
> across segments.
>

Thank you very much, I missed that part in my initial analysis.
I overlooked the fact that a segment is a fully working Lucene index, and
actually they are independent each others ( in term dictionary for example)
.
So the ordinal resolution is absolutely something to consider.


>
> > Facets will be returned readable, out of the box.
> > I can not see any cons in this approach, I would say it is the standard
> one.
>
> With multiple segments, the terms from each segment must somehow be
> aligned, do avoid duplicate entries in the result. This can either be
> done be creating a segment_ordinal->global_ordinal map upon first
> faceting call (facet.method=fc) or by on-the-fly comparison of top-X
> terms from each segment (facet.method=fcs). Either way, there is a
> performance penalty.
>
> >- When calculating faceting, in memory it is used the ordinal for each
> >term, which means in memory we don't waste space for the actual term,
> or
> >waste the time looking up for the value until the very end of the
> process,
> >after the counts are done .
>
> The segment_ordinal->global_ordinal requires memory linear to the number
> of unique values in the field. If fcs is used, there will be more term
> lookups.
>
> > *2)* Correlate outside the search system each term to a custom ID. Index
> > the custom ID. After facets are calculated resolve the ID and show the
> > human readable labels.
>
> Assuming the ID is an integer (about the only thing that makes sense),
> this ensures that the IDs are comparable across segments, so no
> segment->global mapping is needed. This removes the performance penalty
> described above and is (as far as I understand) the principle behind
> Lucene faceting.
>

Ok, so in the case of Integer faceting, we don't do the ordinal resolution
and we count directly the integer values , right ?

>
> On the other hand, this approach is indeed more complicated and it
> introduces another hotspot both for indexing (as document construction
> requires a lookup in the term provider) and searching (for resolving the
> final terms).
>
> I agree.


>
>
> If we had a hashing method String->long and guaranteed that there would
> be no collisions (or we accepted the occasional faulty result), then we
> could avoid the segment->global map as well as the centralized term
> server. To my knowledge, this has not yet been attempted.
>
>
Thank you very much !


> - Toke Eskildsen
>
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [Faceting] Exact Value Faceting VS ID Faceting

2015-11-26 Thread Yonik Seeley
On Thu, Nov 26, 2015 at 3:32 AM, Toke Eskildsen  
wrote:
> If we had a hashing method String->long and guaranteed that there would
> be no collisions (or we accepted the occasional faulty result), then we
> could avoid the segment->global map as well as the centralized term
> server. To my knowledge, this has not yet been attempted.

I've thought about that before, but another problem with that approach
is how to map back to the actual term value (a string->long won't be
reversible).  A naive  approach would also index the hash and then
also store the original string values in docvalues.  Hence after you
find the top K hashes, you can look up a document with that hash to
find a docid containing it, and then use the string docvalues to look
it up (or store it as a payload).  That's a lot of overhead.

-Yonik


Re: faceting is unusable slow since upgrade to 5.3.0

2015-10-08 Thread Uwe Reh

Sorry for the delay. I had an ugly flu.

SOLR-7730 seems to work fine. Using docValues with Solr 
5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again. 
(90ms vs. 2ms) :-)


Thanks
Uwe



Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:

On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh  wrote:


When 5.4 with SOLR-7730 will be released, I will start to use docValues.
Going this way, seems more straight forward to me.



Sure. Giving your answers docValues facets has a really good chance to
perform in your index after SOLR-7730. It's really interesting to see
performance numbers on early 5.4 builds:
https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/





Re: faceting is unusable slow since upgrade to 5.3.0

2015-10-08 Thread Mikhail Khludnev
Uwe, it's good to know! I mean that you've recovered. Take care!

On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh  wrote:

> Sorry for the delay. I had an ugly flu.
>
> SOLR-7730 seems to work fine. Using docValues with Solr
> 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> (90ms vs. 2ms) :-)
>
> Thanks
> Uwe
>
>
>
>
> Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
>
>> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
>> wrote:
>>
>> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
>>> Going this way, seems more straight forward to me.
>>>
>>
>>
>> Sure. Giving your answers docValues facets has a really good chance to
>> perform in your index after SOLR-7730. It's really interesting to see
>> performance numbers on early 5.4 builds:
>>
>> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-28 Thread Toke Eskildsen
On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote:
> Like Walter Underwood wrote, in technical sense faceting on authors 
> isn't a good idea.

In a technical sense, there is no good or bad about faceting on
high-cardinality fields in Solr. The faceting code is fairly efficient
(modulo the newly discovered regression) and scales well with the number
of references and unique terms. It gives the expected performance when
used with high-cardinality fields: Relatively heavy and with substantial
worst-case processing time.

As such should be enabled with care and a clear understanding of the
cost. But the same can be said of a great deal of other features, when
building an IT system. Labelling is a good or bad idea only makes sense
when looking at the specific context.

I am being a stickler about this because high-cardinality faceting in
Solr has an undeserved bad rep. Rather than discouraging it, we should
be better at describing the consequences of using it.

> In the worst case, the relation book to author is 
> n:n. Never the less, thanks to authority files (which are intensively 
> used in Germany) the facet 'author' is often helpful.

We have been faceting on Author (10M uniques) since 2007. It helps our
users navigate the corpus. It is a good idea for us.

We tried faceting on 6 billion uniques/machine as default in our Net
Archive (custom hack). It raised our non-pathological 75% percentile to
2½ second, with little value for the researchers. It was a bad idea for
us.

- Toke Eskildsen, State and University Library, Denmark




  1   2   3   4   >