[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Michael McCandless (JIRA) Sat, 20 Jun 2009 03:36:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722166#action_12722166
 ]


Michael McCandless commented on LUCENE-1701:
--------------------------------------------

{quote}
bq. Someday maybe I'll convince you to donate this "schema" layer on top of 
Lucene

It's not generic enough to be of use for every user of Lucene, and it doesn't 
aim to be such.
{quote}

Oh, OK.

bq. Solr has its own schema approach, and it has its merits and downfalls 
compared to mine. That's what is nice, we're able to use the same library in 
differing ways, and it doesn't force its sense of 'best practices' on us.

There's no forcing going on, here.  Even had we added the bit into the
index, there's still no "forcing".  We're not preventing advanced uses
of Lucene by providing strong Numeric* support in Lucene.  Simple
things should be simple; complex things should be possible...


{quote}
bq. But I hope there are SOME named classes in there and not all static factory 
methods returning anonymous untyped impls.

SOME of them aren't static :-D
{quote}

Heh.

{quote}
bq. We shouldn't weaken trie's integration to core just because others have 
private implementations.

You shouldn't integrate into core something that is not core functionality. 
Think microkernels.
It's strange seeing you drive CSFs, custom indexing chains, pluggability 
everywhere on one side, and trying to add some weird custom properties into 
index that are tightly interwoven with only one of possible numeric 
implementations on the other side.
{quote}

I agree: if Lucene had all extension points that'd make it possible
for a good integration of Numeric* without being in "core", we should
use that.  But we're just not there yet.  We want to get there, and we
will, but we can't hold up progress just because we think someday
we'll get there.  That's like saying we can't improve the terms dict
format because it's not pluggable yet.

{quote}
bq. Design for today.

And spend two years deprecating and supporting today's designs after you get a 
better thing tomorrow. Back-compat Lucene-style and agile design aren't 
something that marries well.
{quote}

bq. donating something to Lucene means casting it in concrete.

We can't let fear of back-compat prevent us from making progress.

bq. IndexReader/Writer pair is a good example of what I'm arguing against. A 
dusty closet of microfeatures that are tightly interwoven into a complex 
hard-to-maintain mess with zillions of (possibly broken) control paths - 
remember mutable deletes/norms+clone/reopen permutations? It could be avoided 
if IR/W were kept to the bare minimum (which most people are going to use), and 
more advanced features were built on top of it, not in the same place.

Sure, our approach today isn't perfect ("progress not perfection").
There are always improvements to be done.  If you see concrete steps
to simplify the current approach without losing functionality, please
post a patch.  I too would love to see such simplifications...

bq. NRT seems to tread the same path, and I'm not sure it's going to win that 
much turnaround time after newly-introduced per-segment collection.

I agree, per-segment collection was the bulk of the gains needed for
NRT.  This was a big change and a huge step forward in simple reopen
turnaround.

But, not having to write & read deletes to disk, not commit (fsync)
from writer in order to see those changes in reader should also give
us decent gains.  fsync is surprisingly and intermittently costly.

And this integration lets us take it a step further with LUCENE-1313,
where recently created segments can remain in RAM and be shared with
the reader.

If you have good simplifications/improvements on the approach here,
please post them.

bq. Some time ago I finished a first version of IR plugins, and enjoy pretty 
low reopen times (field/facet/filter cache warmups included). (Yes, I'm going 
to open an issue for plugins once they stabilize enough)

I'm confused: I thought that effort was to make SegmentReader's
components fully pluggable?  (Not to actually change what components
SegmentReader is creating).  EG does this modularization alter the
approach to NRT?  I thought they were orthogonal.

{quote}
bq. If we add some generic storable flags for Lucene fields, this is cool 
(probably), NumericField can then capitalize on it, as well as users writing 
their own NNNFields.
bq. +1 Wanna make a patch?

No, I'd like to continue IR cleanup and play with positionIncrement companion 
value that could enable true multiword synonyms. 
{quote}

Well I'm looking forward to seeing your approach on these two!


> Add NumericField and NumericSortField, make plain text numeric parsers public 
> in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField 
> to o.a.l.document specific for easy indexing. An alternative would be to add 
> a NumericUtils.newXxxField() factory, that creates a preconfigured Field 
> instance with norms and tf off, optionally a stored text (LUCENE-1699) and 
> the TokenStream already initialized. On the other hand 
> NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new 
> classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. 
> As the new SortField API (LUCENE-1478) makes it possible to support a parser 
> in SortField instantiation, it would be good to have the static parsers in 
> FieldCache public available. SortField would init its member variable to them 
> (instead of NULL), so making code a lot easier (FieldComparator has this ugly 
> null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make 
> the code cleaner and we would be able to hide the "hack" 
> StopFillCacheException by making it private to FieldCache (currently its 
> public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Reply via email to