Re: Payloads and TrieRangeQuery

Yonik Seeley Thu, 11 Jun 2009 05:46:42 -0700

On Thu, Jun 11, 2009 at 7:01 AM, Michael
McCandless<[email protected]> wrote:
> On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley<[email protected]> 
> wrote:
>
>> Really goes into Solr land... my pref for Lucene is to remain a core
>> expert-level full-text search library and keep out things that are
>> easy to do in an application or at another level.
>
> I think this must be the crux of our disagreement.


Indeed.  The itch to scratch w.r.t Solr in Lucene is increased core
functionality, not more magic (that duplicates what Solr already does,
but just in a different way and thus makes the lives of Solr
developers harder).
If we asked on java-user about people's priorities/wishes, I bet
column stride fields, near real time indexing, and better performance
would dominate stuff like not having to specify how to sort a field.

> I feel, instead, that Lucene should stand on its own, as a useful
> search library, with a consumable API, good defaults, etc.  Lucene is
> more than "the expert level search API that's embedded in
> Solr". Lucene is consumed directly by apps other than Solr.
>
> In fact, I think there are many things in Solr that naturally belong
> in Lucene (and over time we've been gradually slurping them down).
> The line/criteria has always been rather blurry...

And conversely, Solr isn't just a wrapper around Lucene and an
incubator for Lucene technology.
Ask Lucene users if they would like pretty much any substantial piece
of functionality in Solr moved to Lucene as a module and you'll
probably get an affirmative answer.  But moving something from Solr to
Lucene can have a lot of negative effects for Solr, including taking
it out of the hands of Solr committers who aren't Lucene committers,
and taking it out of Solr's release cycle and easy ability to change -
if Solr needs to make a change to one of the moved classes, it's
necessary to get it through the Lucene change process and then upgrade
to the latest Lucene trunk - all or nothing.

It's also the case that the goals of Lucene classes and Solr classes
are often very different.  Lucene is more concerned with Java APIs (as
should be the case), while they are a bit more secondary in Solr...
the external APIs are of primary importance and one doesn't worry as
much (or at all) about the classes implementing that interface or it's
Java API back compatibility (as a generalization... it depends on the
class).

> In Lucene, we should be able to add a NumericField to a document,
> index it, and then create RangeFilter or Sort on that field and have
> things "just work".

That feels like a false sense of simplicity, and Lucene isn't for
dummies ;-)  One needs to understand how things work under the hood to
avoid shooting oneself in the foot.  You need to understand the memory
implications of sorting on different fields, and you need to
understand that to sort on a text field, there really needs to be just
one token per field.  You need to understand that the way Trie is
indexed, and that multiple values per field won't work if you use a
precision step less than the word size.

There have been a lot of bad design decisions (I'm talking software
development in general) due to citing "the user will be confused".
Often, this hypothetical user doesn't exist (or is an extreme
minority), and hence I prefer things of the form "I think this is
confusing".  Extra magic isn't always a good thing.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Payloads and TrieRangeQuery

Reply via email to