Erik, Thanks for the comments.
> I'm particularly interested in the XPath stuff I saw in LGQueryParser. * xpathFieldParse 'xpath' parser: param allfields[], with query or field[] possibly having wild-card notation: *.start annotation.*.text allowing '/' and '.' field separator This is an *unfinished* attempt to support xpath style queries with wild-cards or parts when you have indexed XML data, such as query: /annotation/*/text:term I had to put this aside when I saw the problem of pulling the xpath fields from a query string would take a fair amount of thought and code. > > BioDataAnalyzer.java -- NumberField formats field for indexing > > *whew* - that is one complex piece of code. I like the DebugFilter Mostly it is just a collection of small 10 line classes, packaged as inner classes (I hate java's insistence on 1 file/class :) Some of the complexity there is because the standard lucene analyzer won't work for biology data (which uses a lot of symbols, upper/lowercase, etc.) and this code allows one to build an analyzer/indexer which is tuned to different types in each field of data. The configuration for a given biology database parsing includes statements like: ## field tokenizers - base CharTokenizer, work before Filters tokenizer.SYM=org.eugenes.index.BiodataAnalyzer$DataTokenizer ## field filters - base TokenFilter, only are used if fieldtype=Text or UnStored tokenfilter.BLOC.start=org.eugenes.index.BiodataAnalyzer$NumberFilter ## fieldrecoder classes manipulate data before indexing, maybe making new fields fieldrecoder.BLOC=LucegeneIndexers$Location_FieldRecoder This method then generates TokenStream using such field-specific parsers, public TokenStream tokenStream( String fieldName, Reader reader) { TokenStream result = null; try { result= getTokenizer(fieldName, reader); } catch (Exception e) { result = new org.apache.lucene.analysis.standard.StandardTokenizer(reader); } try { result= getFilter(fieldName, result); } catch (Exception e) { LowerDataFilter ldf= new LowerDataFilter(); ldf.setInput(result); result= ldf; } return result; } -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- [EMAIL PROTECTED]://marmot.bio.indiana.edu/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]