One of the interesting tricks about MPS (I'm not married to the product, but to the concept) is that you essentially develop a domain-specific language that you can then compile into specific languages. MPS will do what (the names escape me: Lex? Yacc?) some unix utilities (parsers) in terms of taking a domain specific language and then compile it into another, or into machine code.
If the core Lucene algorithms were written in a search-specific language, it would be much easier to then compile these into other languages - optimized. The problem today, IMHO, is that we are working with a direct imperative language, when it would be much easier to work with an optimize language that describes that Lucene does, not *HOW* it does it. This would open the door to Lucene being easily ported to just about anything, all one would have to do is write the transformation to these other languages. http://dinosaur.compilertools.net/ http://dinosaur.compilertools.net/#lex http://dinosaur.compilertools.net/#yacc I'm not a believer in "creating" work, but I do see a problem pattern emerge where the core team is constantly trying to improve the java translation process which is both automated and manual. The more Lucene will continue into a direction of being Java-specific the more complicated and painful the process will be. Until eventually the complexity/benefit factors will swap and it will become impossible to keep up. I suspect at this point Lucene.NET will start "jumping" over Lucene.JAVA versions in order to catch up and try to sync up at this point. Does this sound familiar? Karell Ste-Marie C.I.O. - BrainBank Inc (514) 636-6655 P.S. For any support requests, please use the support email or the online helpdesk application support: idealinksupp...@brainbankinc.com http://idealinksupport.brainbankinc.com/OnTimePortal/ -----Original Message----- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, November 12, 2010 7:08 AM To: lucene-net-...@lucene.apache.org Subject: Re: Lucene project announcement I agree with this idea completely. Standardizing the file format and the query parser's syntax (ABNF? probably something similar exists already since the parser is generated) would be a great start. Plus some standards about "what criteria must a implementation of Lucene meet to be valid?".. Obviously the unit tests are great for that, but they are platform specific, and porting unit tests can leak bugs into the tests... so they are not always the most reliable way to validate a port. One easy set of metrics is "for the following set of data <describe some basic documents> indexed the following way <describe field indexing settings> a valid Lucene implementation should generate *exactly* this index <provide MD5 hashcode>... " then assuming that passed "for the following query <describe query> searched against the reference index just built, you should get *exactly* the following results <list expected results>, and it should execute in less than <indicate a timespan>." We can build that into unit tests, but having it described outside of code, with MD5 hashes and in a formalize manner might be more handy. Thanks, Troy