One of the interesting tricks about MPS (I'm not married to the product, but to 
the concept) is that you essentially develop a domain-specific language that 
you can then compile into specific languages. MPS will do what (the names 
escape me: Lex? Yacc?) some unix utilities (parsers) in terms of taking a 
domain specific language and then compile it into another, or into machine code.

If the core Lucene algorithms were written in a search-specific language, it 
would be much easier to then compile these into other languages - optimized. 
The problem today, IMHO, is that we are working with a direct imperative 
language, when it would be much easier to work with an optimize language that 
describes that Lucene does, not *HOW* it does it.

This would open the door to Lucene being easily ported to just about anything, 
all one would have to do is write the transformation to these other languages.

http://dinosaur.compilertools.net/
http://dinosaur.compilertools.net/#lex
http://dinosaur.compilertools.net/#yacc

I'm not a believer in "creating" work, but I do see a problem pattern emerge 
where the core team is constantly trying to improve the java translation 
process which is both automated and manual. The more Lucene will continue into 
a direction of being Java-specific the more complicated and painful the process 
will be. Until eventually the complexity/benefit factors will swap and it will 
become impossible to keep up. I suspect at this point Lucene.NET will start 
"jumping" over Lucene.JAVA versions in order to catch up and try to sync up at 
this point.

Does this sound familiar?


Karell Ste-Marie
C.I.O. - BrainBank Inc
(514) 636-6655

P.S. For any support requests, please use the support email or the online 
helpdesk application
support: idealinksupp...@brainbankinc.com
http://idealinksupport.brainbankinc.com/OnTimePortal/

-----Original Message-----
From: Troy Howard [mailto:thowar...@gmail.com] 
Sent: Friday, November 12, 2010 7:08 AM
To: lucene-net-...@lucene.apache.org
Subject: Re: Lucene project announcement

I agree with this idea completely. Standardizing the file format and
the query parser's syntax (ABNF? probably something similar exists
already since the parser is generated) would be a great start. Plus
some standards about "what criteria must a implementation of Lucene
meet to be valid?".. Obviously the unit tests are great for that, but
they are platform specific, and porting unit tests can leak bugs into
the tests... so they are not always the most reliable way to validate
a port.

One easy set of metrics is "for the following set of data <describe
some basic documents> indexed the following way <describe field
indexing settings> a valid Lucene implementation should generate
*exactly* this index <provide MD5 hashcode>... " then assuming that
passed "for the following query <describe query> searched against the
reference index just built, you should get *exactly* the following
results <list expected results>, and it should execute in less than
<indicate a timespan>."

We can build that into unit tests, but having it described outside of
code, with MD5 hashes and in a formalize manner might be more handy.

Thanks,
Troy

Reply via email to