[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling

2013-05-30 Thread Gilad Barkai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670200#comment-13670200
 ] 

Gilad Barkai commented on LUCENE-5016:
--

Patch looks good.
+1 for commit 

> Sampling can break FacetResult labeling 
> 
>
> Key: LUCENE-5016
> URL: https://issues.apache.org/jira/browse/LUCENE-5016
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Assignee: Shai Erera
>Priority: Minor
> Attachments: LUCENE-5016.patch, test-labels.zip
>
>
> When sampling FacetResults, the TopKInEachNodeHandler is used to get the 
> FacetResults.
> This is my case:
> A FacetResult is returned (which matches a FacetRequest) from the 
> StandardFacetAccumulator. The facet has 0 results. The labelling of the 
> root-node seems incorrect. I know, from the StandardFacetAccumulator, that 
> the rootnode has a label, so I can use that one.
> Currently the recursivelyLabel method uses the taxonomyReader.getPath() to 
> retrieve the label. I think we can skip that for the rootNode when there are 
> no children (and gain a little performance on the way too?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling

2013-05-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669445#comment-13669445
 ] 

Shai Erera commented on LUCENE-5016:


I checked that and indeed there is inconsistency here. 
StandardFacetsAccumulator and FacetsAccumulator return an empty result with the 
root node labeled, while the sampling accumulators return the root node not 
labeled. There isn't anything technically wrong here, because the category does 
not exist, but I think we should be consistent.

I was able to reproduce this behavior with an even simpler test Rob: index a 
single document with category "A" and ask to count category "B". The problem is 
as follows:
* SamplingAccumulator delegates to SFA.
* SFA detects this category does not exist and creates an empty FacetResult, 
which sets the label of the root node to the request's CategoryPath.
* SamplingAccumulator receives the results, and potentially runs SampleFixer. 
Then it labels the result, which then sets the label to null, after not finding 
it in the taxonomy.

Perhaps at some point of the code lifecycle this additional labeling was 
needed, I'm not sure :). But I think we should either remove the call to label 
the results in SamplingAccumulator, or at least not call taxoReader.getPath if 
the node.label is not null. For instance, if you ask to count "A" (which does 
exist), then labeling happens twice, once by SFA.accumulate and second time by 
SamplingAccumulator, which is just a waste.

I'll attach later a short test which reproduces this and checks all existing 
accumulators.

> Sampling can break FacetResult labeling 
> 
>
> Key: LUCENE-5016
> URL: https://issues.apache.org/jira/browse/LUCENE-5016
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Assignee: Shai Erera
>Priority: Minor
> Attachments: test-labels.zip
>
>
> When sampling FacetResults, the TopKInEachNodeHandler is used to get the 
> FacetResults.
> This is my case:
> A FacetResult is returned (which matches a FacetRequest) from the 
> StandardFacetAccumulator. The facet has 0 results. The labelling of the 
> root-node seems incorrect. I know, from the StandardFacetAccumulator, that 
> the rootnode has a label, so I can use that one.
> Currently the recursivelyLabel method uses the taxonomyReader.getPath() to 
> retrieve the label. I think we can skip that for the rootNode when there are 
> no children (and gain a little performance on the way too?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling

2013-05-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666436#comment-13666436
 ] 

Shai Erera commented on LUCENE-5016:


I am not near the code and actually read the test in Notepad :). It looks like 
you're indexing 100K docs with categories A/docnum and then ask to count the 
categories "A" and "B". If I understand correctly, the assert in the end fails?

Basically, the FacetRestult that you get back should have the same label as the 
request. If it's not like that (and I will validate that when I'm near the 
code), then it's probably a bug in SamplingAccumulator.

BTW the test actually indexed 200K docs while passing 'j' which is 0 for the 
first 100K and 1 for the second. But 'j' seems to be unused in addDocument. 
This shouldn't affect the test behavior but just FYI.

Thanks for reporting this, I'll take a deeper look later.

> Sampling can break FacetResult labeling 
> 
>
> Key: LUCENE-5016
> URL: https://issues.apache.org/jira/browse/LUCENE-5016
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Priority: Minor
> Attachments: test-labels.zip
>
>
> When sampling FacetResults, the TopKInEachNodeHandler is used to get the 
> FacetResults.
> This is my case:
> A FacetResult is returned (which matches a FacetRequest) from the 
> StandardFacetAccumulator. The facet has 0 results. The labelling of the 
> root-node seems incorrect. I know, from the StandardFacetAccumulator, that 
> the rootnode has a label, so I can use that one.
> Currently the recursivelyLabel method uses the taxonomyReader.getPath() to 
> retrieve the label. I think we can skip that for the rootNode when there are 
> no children (and gain a little performance on the way too?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling

2013-05-24 Thread Rob Audenaerde (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666401#comment-13666401
 ] 

Rob Audenaerde commented on LUCENE-5016:


Now that I wrote the tests, I realise that maybe the behaviour of the 
StandardFacetAccumulator is incorrect, as it labels a FacetResult of a Facet 
that does not exist in the taxonomy...

The behaviour of the SamplingAccumulator and the Standard differ.

For my use case, it is very helpful if all the FacetRequests return a 
FacetResult with the same label as the request, but I can imagine that this is 
not desired.

> Sampling can break FacetResult labeling 
> 
>
> Key: LUCENE-5016
> URL: https://issues.apache.org/jira/browse/LUCENE-5016
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Priority: Minor
> Attachments: test-labels.zip
>
>
> When sampling FacetResults, the TopKInEachNodeHandler is used to get the 
> FacetResults.
> This is my case:
> A FacetResult is returned (which matches a FacetRequest) from the 
> StandardFacetAccumulator. The facet has 0 results. The labelling of the 
> root-node seems incorrect. I know, from the StandardFacetAccumulator, that 
> the rootnode has a label, so I can use that one.
> Currently the recursivelyLabel method uses the taxonomyReader.getPath() to 
> retrieve the label. I think we can skip that for the rootNode when there are 
> no children (and gain a little performance on the way too?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling

2013-05-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666201#comment-13666201
 ] 

Shai Erera commented on LUCENE-5016:


Can you attach a simple testcase exposing the problem? Not sure that I follow 
what's wrong. About not labeling, I doubt it will gain us anything. Labeling is 
not very expensive, and labels are LRU-cached. Also, considering all the work 
that's done during search processing, the labeling part is less than marginal.

> Sampling can break FacetResult labeling 
> 
>
> Key: LUCENE-5016
> URL: https://issues.apache.org/jira/browse/LUCENE-5016
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Priority: Minor
>
> When sampling FacetResults, the TopKInEachNodeHandler is used to get the 
> FacetResults.
> This is my case:
> A FacetResult is returned (which matches a FacetRequest) from the 
> StandardFacetAccumulator. The facet has 0 results. The labelling of the 
> root-node seems incorrect. I know, from the StandardFacetAccumulator, that 
> the rootnode has a label, so I can use that one.
> Currently the recursivelyLabel method uses the taxonomyReader.getPath() to 
> retrieve the label. I think we can skip that for the rootNode when there are 
> no children (and gain a little performance on the way too?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org