[ 
https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721810#action_12721810
 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

Uwe can you also open an issue for handling byte/short/Date with
Numeric*?

bq. I vote for factories - escaping back-compat woes by exposing minimum 
interface.

By this same logic, should we remove NumericRangeFilter/Query and use
static factories instead?

We can't let fear of our back-compat policies prevent progress.

I seem to be the only one [who's speaking up, at least] who feels
consumability of Lucene's APIs is important...

Here's my reasoning: numeric fields are common; many apps need them.
But, it's painful to use them today; it's a trap for users because
Lucene acts like it can handle them (eg SortField.INT exists) but then
RangeQuery is buggy unless you encode the numbers (zero pad ints, use
Solr's or your own NumberUtils for floats/doubles).  And once you
figured out the encoding, you discovered RangeQuery can have horrific
performance.

For the longest time Lucene could not provide good ootb handling of
numerics, but now finally an awesome step forward (thank you Uwe!)
comes along... and Lucene can provide correct & performant handling of
numerics.

Such an important & useful functionality deserves a consumable API.
It should be obvious to people playing with Lucene how to use numeric
fields.  I should be able to do this:

{code}
Document doc = new Document();
doc.add(new NumericField("price", 15.50f));
{code}

not this:

{code}
Document doc = new Document();
Field f = new Field("price", new NumericTokenStream(4).setFloatValue(15.50f));
f.setOmitNorms(true);
f.setOmitTermFreqAndPositions(true);
doc.add(field);
{code}

nor, this:

{code}
Document doc = new Document();
doc.add(NumericUtils.createFloatField("price", 15.50f));
{code}

When I want to reuse, I should be able to call
{{NumericField.setFloatValue()}}, not ask the TokenStream to set the
value.

In fact, as a user of this API, I shouldn't even have to know that a
powerful TokenStream was created to index my NumericField.  I
shouldn't have to know to set those advanced flags on Field.  These
are implementation details.  In fact with time we may make
improvements to these "implemenation details", so we don't want such
implementation details out in the user's code.

NumericUtils should be utility methods used only by the current
implemention.  Ideally it would not even be public, but Java doesn't
give us the ability to be package private to "org.apache.lucene.*".

Here's what I propose:

  * Add NumericField and NumericSortField, and 
    rename RangeQuery -> TermRangeQuery  (TextRangeQuery?).

  * Move the Numeric FieldCache parsers into FieldCache,
    and make them (PLAIN_TEXT_INT_PARSER vs NUMERIC_INT_PARSER) public.

  * I would also really like to have NumericField come back when you
    retrieve the doc; this only requires 1 bit added to the flags
    stored in each doc's entry in the .fdt file.

Why should we make such an excellent addition to Lucene, only to make
it hard to use?


> Add NumericField and NumericSortField, make plain text numeric parsers public 
> in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField 
> to o.a.l.document specific for easy indexing. An alternative would be to add 
> a NumericUtils.newXxxField() factory, that creates a preconfigured Field 
> instance with norms and tf off, optionally a stored text (LUCENE-1699) and 
> the TokenStream already initialized. On the other hand 
> NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new 
> classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. 
> As the new SortField API (LUCENE-1478) makes it possible to support a parser 
> in SortField instantiation, it would be good to have the static parsers in 
> FieldCache public available. SortField would init its member variable to them 
> (instead of NULL), so making code a lot easier (FieldComparator has this ugly 
> null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make 
> the code cleaner and we would be able to hide the "hack" 
> StopFillCacheException by making it private to FieldCache (currently its 
> public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to