>>Maybe we could do something similar to declare that agiven field uses Trie*,
>>and with what datatype.
With the current implementation you can at least test for the presence of a
field called:
[fieldName]#trie
..which tells you some form of trie is used but could be extended to include
precision step value e.g.
[fieldName]#trie_8
But - overall, I can't help but feel we will struggle to offer facilities like
this when there is a lack of a formal schema for Lucene indexes.
Solr obviously includes some form of index definition - my Lucene-based apps
tend to use a custom config and Luke could certainly benefit from some form of
definition stored with the index.
Time for some standardised index metadata?
This trie/parser issue is an example of a broader issue for me.
Mark
----- Original Message ----
From: Michael McCandless <[email protected]>
To: [email protected]
Sent: Monday, 9 March, 2009 13:10:32
Subject: Re: Lucene 2.9
Uwe Schindler wrote:
>> Or perhaps we should move Trie* into core Lucene, and then build a
>> real (ootb) integration with QueryParser.
>
> The problem is that the query parser does not know if a field is encoded as
> trie or is just a normal text token. Furthermore, the new trie API does not
> differentiate between dates, doubles, longs (same for 32bit) because every
> trie field is identical (it is the application's task to keep track on the
> encoding when indexing and searching, TrieRange only supports the conversion
> of these data types to sortable integers), but the "datatype" itself is not
> stored in index. Solr has support for this in its "schema", but for Lucene
> all fields are identical. For the query parser there is no possibility to
> differentiate between a long, double or date.
Could we add APIs to QueryParser so the application can state the disposition
toward certain fields?
EG QueryParser now tries to guess whether a range query's upper/lower bound
should be parsed as dates, and there are methods exposed to set the resolution
on a per-field basis. Maybe we could do something similar to declare that a
given field uses Trie*, and with what datatype.
Just thinking aloud really... but since we haven't yet released Trie*, now (for
2.9)
is a good time to think hard about how we expose/integrate it... and making it
easier to use ootb seems important.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]