[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

Robert Muir (JIRA) Tue, 04 Sep 2012 12:31:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447984#comment-13447984
 ]


Robert Muir commented on LUCENE-4355:
-------------------------------------

I don't think I agree Mike. I think we should degrade into expert territory 
rather than it being a sharp cliff.
I think we should also make migration from previous versions of lucene easier 
too.

I think these apis on IR are a good way to do that. I'm tempted to suggest:

termDocs(Term) & termPositions(Term) as the sugar postings APIs as those pretty 
much match the 3.x functionality.

I'm not sure these sugar APIs should take BytesRef, thats another head 
explosion for someone above Term which
is simpler and takes Strings.

If someone is going to be calling these on lots of things anyway they can just 
use fields()/terms()/etc.

We also have to realize its a lot of work to compute something like docFreq 
without any sugar at all,
just look at the code to docFreq:
{code}
final Fields fields = fields();
if (fields == null) {
  return 0;
}
final Terms terms = fields.terms(field);
if (terms == null) {
  return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
  return termsEnum.docFreq();
} else {
  return 0;
}
{code}

Thats too much boilerplate and special-cases. the terms(String) sugar helps a 
lot here, reducing it to:
{code}
final Terms terms = ir.terms(field);
if (terms == null) {
  return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
  return termsEnum.docFreq();
} else {
  return 0;
}
{code}

But thats still too much. Making a positioned termsenum more accessible could 
help with a lot
of expert use-cases like getting enums with different Bits or flags or getting 
term-level stats:

{code}
final TermsEnum te = ir.termsEnum(new Term("field", "value"));
if (te == null) {
  return 0;
} else {
  return te.docFreq();
}
{code}

The oddity might be that compared to 3.x, its a seekExact vs. a seekCeil. But i 
think thats ok,
after all we already "backwards-broke" since terms() does something totally 
different than 3.x (and I think
we should keep that, making it easy to access field-level metadata!) 

And I still think we should keep docFreq/totalTermFreq sugar!

                
> improve AtomicReader sugar apis
> -------------------------------
>
>                 Key: LUCENE-4355
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4355
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>
> I thought about this after looking @ LUCENE-4353:
> AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
> Terms, ...). But these might be a little trappy/confusing compared to 3.x.
> # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
> .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
> flags here. We should simplify these to be less trappy. I think we only need 
> (String, BytesRef) here.
> # This means you need to use the flex apis for more expert usage: but we make 
> this a bit too hard since we only let you get a Terms (which you must null 
> check, then call .iterator() on, then seekExact, ...). I think it could help 
> if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
> had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

Reply via email to