Re: AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-14 Thread Shawn Heisey
On 1/13/2017 7:36 AM, Sebastian Riemer wrote:
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.

Some users are always going to be confused in one way or another when
something behaves in a way that's contrary to their expectations.  If
you plan your interface correctly, you can eliminate the biggest sources
of confusion ... but there's an applicable saying here:  You can never
make things idiot-proof.  There's always a better idiot.

The facet.mincount parameter is the way to deal with this problem, as
Bill Bell already mentioned.  One of the reasons that facet.mincount
exists is to remove terms that have no documents, but still exist in the
index.

If the q parameter was an actual query instead of "all docs" and the
request didn't have facet.mincount, then the facet for that field would
still have thirteen entries, many of which might be zero.

Thanks,
Shawn



AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Thanks @Toke,  for pointing out these options. I'll have a read about 
expungeDeletes. 

Sounds even more so, that having solr filter out 0-counts is a good idea and I 
should handle my use-case outside of solr.

Thanks again,
Sebastian

On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w 
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the 
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues structure in 
the segment files, without respect to documents marked as deleted. At some 
point you had one or more documents with m_mediaType_s:1, which were later 
deleted.

If your index is not too large, you can verify this by optimizing down to 1 
segment, which will remove all traces of deleted documents (unless the index is 
already 1 segment).

If you cannot live with the false terms, committing with expungeDeletes=true 
should do the trick, although it is likely to make your indexing process a lot 
heavier.

The reason for this inaccuracy is that it is quite heavy to verify whether a 
docvalue is referenced by a document: Each time one or more documents in a 
segment are deleted, all references from all documents in that segment would 
have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where _all_ 
documents with a certain docvalue are deleted, my guess it that it is seen as 
too much of an edge case to handle.
--
Toke Eskildsen, Royal Danish Library



AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Nice, thank you very much for your explanation!

>> Solr returns all fields as facet result where there was some value at 
some time as long as the the documents are somewhere in the index, even when 
they're marked as indexed. So there must have been a document with 
m_mediaType_s=1. Even if all these documents are deleted already, its values 
still appear in the facet result.

I did not know about that! That makes perfect sense. I am quite sure there has 
been a time where that field contained the value "1". Even more, as now where I 
rebuild my index, the value "1" is not present as facet.field result anymore.

I'll think about how to deal with my situation then, maybe it would be better 
to keep solr filtering out 0-count facet-fields and insert the filterquery 
leading to 0 results into the select-dropdown "manually".

-Ursprüngliche Nachricht-
Von: Michael Kuhlmann [mailto:k...@solr.info] 
Gesendet: Freitag, 13. Januar 2017 15:43
An: solr-user@lucene.apache.org
Betreff: Re: FacetField-Result on String-Field contains value with count 0?

Then I don't understand your problem. Solr already does exactly what you want.

Maybe the problem is different: I assume that there never was a value of "1" in 
the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at some time 
as long as the the documents are somewhere in the index, even when they're 
marked as indexed. So there must have been a document with m_mediaType_s=1. 
Even if all these documents are deleted already, its values still appear in the 
facet result.

This holds true until segments get merged so that all deleted documents are 
pruned. So if you send a forceMerge request, chances are good that "1" won't 
come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:
> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.
>
> -Ursprüngliche Nachricht-----
> Von: billnb...@gmail.com [mailto:billnb...@gmail.com]
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 
> 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
>>
>> Pardon me,
>> the second search should have been this: 
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&inden
>> t =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all 
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields 
>> result-count list?
>>
>> 
>>
>> Hi,
>>
>> Please help me understand: 
>> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
>>  returns:
>>
>> "facet_counts":{
>>"facet_queries":{},
>>"facet_fields":{
>>  "m_mediaType_s":[
>>"2",25561,
>>"3",19027,
>>"10",1966,
>>"11",1705,
>>"12",1067,
>>"4",1056,
>>"5",291,
>>"8",68,
>>"13",2,
>>"6",2,
>>"7",1,
>>"9",1,
>>"1",0]},
>>"facet_ranges":{},
>>"facet_intervals":{},
>>"facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&inden
>> t
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&inden
>> t
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it 
>> does not exist?
>>
>> And why does it e.g. not contain
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsInclude
>> I tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> > stored="true" />
>> > />
>>



Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Toke Eskildsen
On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json 
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues
structure in the segment files, without respect to documents marked as
deleted. At some point you had one or more documents with
m_mediaType_s:1, which were later deleted.

If your index is not too large, you can verify this by optimizing down
to 1 segment, which will remove all traces of deleted documents (unless
the index is already 1 segment).

If you cannot live with the false terms, committing with
expungeDeletes=true should do the trick, although it is likely to make
your indexing process a lot heavier.

The reason for this inaccuracy is that it is quite heavy to verify
whether a docvalue is referenced by a document: Each time one or more
documents in a segment are deleted, all references from all documents
in that segment would have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where
_all_ documents with a certain docvalue are deleted, my guess it that
it is seen as too much of an edge case to handle.
-- 
Toke Eskildsen, Royal Danish Library



Re: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Michael Kuhlmann
Then I don't understand your problem. Solr already does exactly what you
want.

Maybe the problem is different: I assume that there never was a value of
"1" in the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at
some time as long as the the documents are somewhere in the index, even
when they're marked as indexed. So there must have been a document with
m_mediaType_s=1. Even if all these documents are deleted already, its
values still appear in the facet result.

This holds true until segments get merged so that all deleted documents
are pruned. So if you send a forceMerge request, chances are good that
"1" won't come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:
> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.
>
> -Ursprüngliche Nachricht-
> Von: billnb...@gmail.com [mailto:billnb...@gmail.com] 
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 
> 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
>>
>> Pardon me,
>> the second search should have been this: 
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all 
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields 
>> result-count list?
>>
>> 
>>
>> Hi,
>>
>> Please help me understand: 
>> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
>>  returns:
>>
>> "facet_counts":{
>>"facet_queries":{},
>>"facet_fields":{
>>  "m_mediaType_s":[
>>"2",25561,
>>"3",19027,
>>"10",1966,
>>"11",1705,
>>"12",1067,
>>"4",1056,
>>"5",291,
>>"8",68,
>>"13",2,
>>"6",2,
>>"7",1,
>>"9",1,
>>"1",0]},
>>"facet_ranges":{},
>>"facet_intervals":{},
>>"facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it 
>> does not exist?
>>
>> And why does it e.g. not contain 
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
>> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> > stored="true" />
>> > />
>>



AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi Bill,

Thanks, that's actually where I come from. But I don't want to exclude values 
leading to a count of zero.

Background to this: A user searched for mediaType "book" which gave him 10 
results. Now some other task/routine whatever changes all those 10 books to be 
say 10 ebooks, because the type has been incorrect. The user makes a refresh, 
still looking for "book" gets 0 results (which is expected) and because we rule 
out facet.fields having count 0, I don't get back the selected mediaType "book" 
and thus I cannot select this value in the select-dropdown-filter for the 
mediaType. This leads to confusion for the user, since he has no results, but 
doesn't see that it's because of he still has that mediaType-filter set to a 
value "books" which now actually leads to 0 results.

-Ursprüngliche Nachricht-
Von: billnb...@gmail.com [mailto:billnb...@gmail.com] 
Gesendet: Freitag, 13. Januar 2017 15:23
An: solr-user@lucene.apache.org
Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?

Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
> 
> Pardon me,
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent
> =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all 
> documents having value "1" for field "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent
> =on&q=*:*&rows=0&start=0&wt=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent
> =on&q=*:*&rows=0&start=0&wt=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
>  />
> 


Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread billnbell
Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
> 
> Pardon me, 
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt=json
>  
> (or in other words, give me all documents having value "1" for field 
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
>  : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
> 
> 


AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Pardon me, 
the second search should have been this: 
http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt=json
 
(or in other words, give me all documents having value "1" for field 
"m_mediaType_s")

Since this search gives zero results, why is it included in the facet.fields 
result-count list?



Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;





FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;