I agree current Analyzers are a heap of bad copypaste. But I'd rather have an ability to compose a number of CharFilters, Tokenizers and TokenFilters programmatically (without writing a new Analyzer), instead of using config-files.
Something that roughly looks like: Analyzer a = new AnalyzerBuilder(). filterStreamWith(charFilterA, charFilterB). tokenizeWith(new MyFluffyTokenizer()). filter(new StopWordsFilter(..)). filter(whatever). build(); Configgy stuff can then appear as a layer over such API. Building Analyzers programmatically has a number of benefits: 1. Easier tests. Everything being tested is in your test method, not smeared across a bunch of config files (w...@solr). 2. You can play around in REPL. 3. You might have slightly different variations of the same Analyzer. And you don't have to write a bunch of almost-identical config files for that. - i.e. in my code I have Index-mode analyzer, Index-mode analyzer+html handling, Search-mode analyzer, that differ only in parameters to a couple of filters. 4. Typesafety anyone? On Mon, Nov 29, 2010 at 13:59, Uwe Schindler <u...@thetaphi.de> wrote: > I think with declarative model, he means more something like a "generic" > Analyzer class, where you pass in a config file that lists all CharFilters, > Tokenizers, TokenFilters. You can put this xml file or whatever into a jar > file and then you have the same like hardcoded analyzers. We have simply > stupid code duplication. And using these config files you can even supply > variants for backwards compatibility. > > For this to implement, the factories from solr need to be moved to Lucene. > Which would be a good thing, as e.g. Hibernate Search only references Solr > jars to have a declarative (annotation-based) analyzer configuration. And for > that the factories are needed. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: Earwin Burrfoot [mailto:ear...@gmail.com] >> Sent: Monday, November 29, 2010 11:53 AM >> To: dev@lucene.apache.org >> Subject: Re: deprecating Versions >> >> On Mon, Nov 29, 2010 at 13:34, Robert Muir <rcm...@gmail.com> wrote: >> > On Mon, Nov 29, 2010 at 2:50 AM, Earwin Burrfoot <ear...@gmail.com> >> wrote: >> >> And for indexes: >> >> * Index compatibility is guaranteed across two adjacent major >> >> releases. eg 2.x -> 3.x, 3.x -> 4.x. >> >> That includes both binary compat - codecs, and semantic compat - >> >> analyzers (if appropriate Version is used). >> >> * Older releases are most probably unsupported. >> >> e.g. 4.x still supports shared docstores for reading, though never >> >> writes them. 5.x won't read them either, so you'll have to at least >> >> fully optimize your 3.x indexes when going through 4.x to 5.x. >> >> >> > >> > Is it somehow possible i could convince everyone that all the >> > analyzers we provide are simply examples? >> > This way we could really make this a bit more reasonable and clean up >> > a lot of stuff. >> At the very least, you don't have to convince me. :) >> >> > Seems like we really want to move towards a more declarative model >> > where these are just config files... so only then it will ok for us to >> > change them because they suddenly aren't suffixed with .java?! >> No freakin' declarative models! That's the domain of Solr. >> Though others might disagree and then happily store these declarations >> within index, and then per-segment, making the mess even more messy for >> the glory of backasswards compatibility. >> >> >> -- >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> Phone: +7 (495) 683-567-4 >> ICQ: 104465785 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >> commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org