[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling
[ https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670200#comment-13670200 ] Gilad Barkai commented on LUCENE-5016: -- Patch looks good. +1 for commit > Sampling can break FacetResult labeling > > > Key: LUCENE-5016 > URL: https://issues.apache.org/jira/browse/LUCENE-5016 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: 4.3 >Reporter: Rob Audenaerde >Assignee: Shai Erera >Priority: Minor > Attachments: LUCENE-5016.patch, test-labels.zip > > > When sampling FacetResults, the TopKInEachNodeHandler is used to get the > FacetResults. > This is my case: > A FacetResult is returned (which matches a FacetRequest) from the > StandardFacetAccumulator. The facet has 0 results. The labelling of the > root-node seems incorrect. I know, from the StandardFacetAccumulator, that > the rootnode has a label, so I can use that one. > Currently the recursivelyLabel method uses the taxonomyReader.getPath() to > retrieve the label. I think we can skip that for the rootNode when there are > no children (and gain a little performance on the way too?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling
[ https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669445#comment-13669445 ] Shai Erera commented on LUCENE-5016: I checked that and indeed there is inconsistency here. StandardFacetsAccumulator and FacetsAccumulator return an empty result with the root node labeled, while the sampling accumulators return the root node not labeled. There isn't anything technically wrong here, because the category does not exist, but I think we should be consistent. I was able to reproduce this behavior with an even simpler test Rob: index a single document with category "A" and ask to count category "B". The problem is as follows: * SamplingAccumulator delegates to SFA. * SFA detects this category does not exist and creates an empty FacetResult, which sets the label of the root node to the request's CategoryPath. * SamplingAccumulator receives the results, and potentially runs SampleFixer. Then it labels the result, which then sets the label to null, after not finding it in the taxonomy. Perhaps at some point of the code lifecycle this additional labeling was needed, I'm not sure :). But I think we should either remove the call to label the results in SamplingAccumulator, or at least not call taxoReader.getPath if the node.label is not null. For instance, if you ask to count "A" (which does exist), then labeling happens twice, once by SFA.accumulate and second time by SamplingAccumulator, which is just a waste. I'll attach later a short test which reproduces this and checks all existing accumulators. > Sampling can break FacetResult labeling > > > Key: LUCENE-5016 > URL: https://issues.apache.org/jira/browse/LUCENE-5016 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: 4.3 >Reporter: Rob Audenaerde >Assignee: Shai Erera >Priority: Minor > Attachments: test-labels.zip > > > When sampling FacetResults, the TopKInEachNodeHandler is used to get the > FacetResults. > This is my case: > A FacetResult is returned (which matches a FacetRequest) from the > StandardFacetAccumulator. The facet has 0 results. The labelling of the > root-node seems incorrect. I know, from the StandardFacetAccumulator, that > the rootnode has a label, so I can use that one. > Currently the recursivelyLabel method uses the taxonomyReader.getPath() to > retrieve the label. I think we can skip that for the rootNode when there are > no children (and gain a little performance on the way too?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling
[ https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666436#comment-13666436 ] Shai Erera commented on LUCENE-5016: I am not near the code and actually read the test in Notepad :). It looks like you're indexing 100K docs with categories A/docnum and then ask to count the categories "A" and "B". If I understand correctly, the assert in the end fails? Basically, the FacetRestult that you get back should have the same label as the request. If it's not like that (and I will validate that when I'm near the code), then it's probably a bug in SamplingAccumulator. BTW the test actually indexed 200K docs while passing 'j' which is 0 for the first 100K and 1 for the second. But 'j' seems to be unused in addDocument. This shouldn't affect the test behavior but just FYI. Thanks for reporting this, I'll take a deeper look later. > Sampling can break FacetResult labeling > > > Key: LUCENE-5016 > URL: https://issues.apache.org/jira/browse/LUCENE-5016 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: 4.3 >Reporter: Rob Audenaerde >Priority: Minor > Attachments: test-labels.zip > > > When sampling FacetResults, the TopKInEachNodeHandler is used to get the > FacetResults. > This is my case: > A FacetResult is returned (which matches a FacetRequest) from the > StandardFacetAccumulator. The facet has 0 results. The labelling of the > root-node seems incorrect. I know, from the StandardFacetAccumulator, that > the rootnode has a label, so I can use that one. > Currently the recursivelyLabel method uses the taxonomyReader.getPath() to > retrieve the label. I think we can skip that for the rootNode when there are > no children (and gain a little performance on the way too?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling
[ https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666401#comment-13666401 ] Rob Audenaerde commented on LUCENE-5016: Now that I wrote the tests, I realise that maybe the behaviour of the StandardFacetAccumulator is incorrect, as it labels a FacetResult of a Facet that does not exist in the taxonomy... The behaviour of the SamplingAccumulator and the Standard differ. For my use case, it is very helpful if all the FacetRequests return a FacetResult with the same label as the request, but I can imagine that this is not desired. > Sampling can break FacetResult labeling > > > Key: LUCENE-5016 > URL: https://issues.apache.org/jira/browse/LUCENE-5016 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: 4.3 >Reporter: Rob Audenaerde >Priority: Minor > Attachments: test-labels.zip > > > When sampling FacetResults, the TopKInEachNodeHandler is used to get the > FacetResults. > This is my case: > A FacetResult is returned (which matches a FacetRequest) from the > StandardFacetAccumulator. The facet has 0 results. The labelling of the > root-node seems incorrect. I know, from the StandardFacetAccumulator, that > the rootnode has a label, so I can use that one. > Currently the recursivelyLabel method uses the taxonomyReader.getPath() to > retrieve the label. I think we can skip that for the rootNode when there are > no children (and gain a little performance on the way too?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5016) Sampling can break FacetResult labeling
[ https://issues.apache.org/jira/browse/LUCENE-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666201#comment-13666201 ] Shai Erera commented on LUCENE-5016: Can you attach a simple testcase exposing the problem? Not sure that I follow what's wrong. About not labeling, I doubt it will gain us anything. Labeling is not very expensive, and labels are LRU-cached. Also, considering all the work that's done during search processing, the labeling part is less than marginal. > Sampling can break FacetResult labeling > > > Key: LUCENE-5016 > URL: https://issues.apache.org/jira/browse/LUCENE-5016 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: 4.3 >Reporter: Rob Audenaerde >Priority: Minor > > When sampling FacetResults, the TopKInEachNodeHandler is used to get the > FacetResults. > This is my case: > A FacetResult is returned (which matches a FacetRequest) from the > StandardFacetAccumulator. The facet has 0 results. The labelling of the > root-node seems incorrect. I know, from the StandardFacetAccumulator, that > the rootnode has a label, so I can use that one. > Currently the recursivelyLabel method uses the taxonomyReader.getPath() to > retrieve the label. I think we can skip that for the rootNode when there are > no children (and gain a little performance on the way too?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org