[
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447984#comment-13447984
]
Robert Muir commented on LUCENE-4355:
-------------------------------------
I don't think I agree Mike. I think we should degrade into expert territory
rather than it being a sharp cliff.
I think we should also make migration from previous versions of lucene easier
too.
I think these apis on IR are a good way to do that. I'm tempted to suggest:
termDocs(Term) & termPositions(Term) as the sugar postings APIs as those pretty
much match the 3.x functionality.
I'm not sure these sugar APIs should take BytesRef, thats another head
explosion for someone above Term which
is simpler and takes Strings.
If someone is going to be calling these on lots of things anyway they can just
use fields()/terms()/etc.
We also have to realize its a lot of work to compute something like docFreq
without any sugar at all,
just look at the code to docFreq:
{code}
final Fields fields = fields();
if (fields == null) {
return 0;
}
final Terms terms = fields.terms(field);
if (terms == null) {
return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
return termsEnum.docFreq();
} else {
return 0;
}
{code}
Thats too much boilerplate and special-cases. the terms(String) sugar helps a
lot here, reducing it to:
{code}
final Terms terms = ir.terms(field);
if (terms == null) {
return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
return termsEnum.docFreq();
} else {
return 0;
}
{code}
But thats still too much. Making a positioned termsenum more accessible could
help with a lot
of expert use-cases like getting enums with different Bits or flags or getting
term-level stats:
{code}
final TermsEnum te = ir.termsEnum(new Term("field", "value"));
if (te == null) {
return 0;
} else {
return te.docFreq();
}
{code}
The oddity might be that compared to 3.x, its a seekExact vs. a seekCeil. But i
think thats ok,
after all we already "backwards-broke" since terms() does something totally
different than 3.x (and I think
we should keep that, making it easy to access field-level metadata!)
And I still think we should keep docFreq/totalTermFreq sugar!
> improve AtomicReader sugar apis
> -------------------------------
>
> Key: LUCENE-4355
> URL: https://issues.apache.org/jira/browse/LUCENE-4355
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
>
> I thought about this after looking @ LUCENE-4353:
> AtomicReader has some sugar APIs that are over top of the flex apis (Fields,
> Terms, ...). But these might be a little trappy/confusing compared to 3.x.
> # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and
> .termPositionsEnum(Bits, ...). I also don't think we need variants that take
> flags here. We should simplify these to be less trappy. I think we only need
> (String, BytesRef) here.
> # This means you need to use the flex apis for more expert usage: but we make
> this a bit too hard since we only let you get a Terms (which you must null
> check, then call .iterator() on, then seekExact, ...). I think it could help
> if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x
> had a method that let you get a 'positioned termsenum'.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]