>>The lack of standardized metadata is an issue, of course - we could
start experimenting with this in Luke, to see whether we can squeeze a
subset of Solr schema there.

Actually, an "AnalyzerFactory" interface in Luke might provide the abstraction 
which would allow Solr, my proprietary metadata system, and any other config 
resource to hook into Luke. 
That would be pretty cool 



----- Original Message ----
From: Andrzej Bialecki <a...@getopt.org>
To: java-user@lucene.apache.org
Sent: Fri, 5 March, 2010 11:11:12
Subject: Re: SpanQueries in Luke

On 2010-03-05 11:22, mark harwood wrote:
>>> I'll commit the current mostly-working state today, you can take a look
> 
> OK. However I think this XMLQueryParser addition will only resurface a 
> long-standing issue with Luke and Lucene in general.
> This query parser works best on multiple fields (e.g. free-text<UserQuery>  
> tags and<TermsFilter>  on structured fields). Each field typically requires 
> different analyzers and there is currently no way of recording this 
> information as metadata alongside an index.
> Without this metadata each user's Luke session starts with a game of 
> "guess-which-analyzer-to-use?"

I guess ;) that generally speaking there is no good answer to this - the same 
token stream could have been produced by varying analysis chains, even across 
indexing sessions that append to the same index.

> 
> I use my own proprietary system for storing such index metadata and this is 
> through an XML file that contains a BeanEncoder-serialized 
> PerFieldAnalyserWrapper among other things.
> It would be nice to see some standardisation in how this information can be 
> made available in *any* Lucene index but I guess this overlaps with things 
> like Solr's config.

Yes.

Theoretically one could store such information in IndexCommit.getUserData(). 
The lack of standardized metadata is an issue, of course - we could start 
experimenting with this in Luke, to see whether we can squeeze a subset of Solr 
schema there.

-- Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to