Hi Marvin, On 25 Feb 2011, at 4:05 PM, Marvin Humphrey wrote:
> On Fri, Feb 25, 2011 at 12:41:02PM +0000, Andrew S. Townley wrote: >> Another issue I just hit with Ferret (actually, the same root problem, but >> manifested in a different way) made me wonder something else about the >> design of lucy: the query parsing API. > > I think there's a fundamental challenge with the proposed design > (Lucy/Ferret/Lucene doesn't preserve metadata during scoring) regardless of > which engine you choose, and I'll address it in a reply on the earlier thread. Will address this in separate reply. >> Is lucy's QueryParser API effectively this one? >> >> http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Search/QueryParser.html > > Yes. > >> If so, is there a way to traverse the query parse tree > > It's not public API yet, but it can be done, and perhaps we should consider > making the API public. > > ANDQuery, ORQuery, RequiredOptionalQuery and NOTQuery are all subclasses of > PolyQuery. PolyQuery provides a PolyQuery_Get_Children() method (which would > be spelled get_children() in the Perl or Ruby-vaporware bindings). Using > that, you can traverse the hierarchy. Indeed, that's what QueryParser does > internally. At the very least, I need to be able to walk the query tree for any string input query (or any query object) with a consistent API. What you have here is pretty similar to my own implementation of the Query Object pattern for my system, so that would be a start. For any compound query term, I'm also "bubbling" up references to the property names as well as the query terms themselves. This means that I can retrieve these easily and do some analysis on the query before actually executing it. Anything that will support me doing the same type of thing with Lucy will work. > >> Also, I mentioned SWIG in passing the other day in a previous message. >> Would it not be possible to just generate the bindings for Ruby with SWIG? > > SWIG won't do: > > http://lucy.markmail.org/thread/5uxmc655dvzzdpvx Yeah, I got that and the rationale from Jens' reference. > There are portions of Lucy that have been intentionally left unimplemented by > the core. The Perl implementation code is located in trunk/perl/xs/ and > trunk/perl/lib/Lucy.pm. This code will have to be ported for each new host > language regardless. Interesting approach. Is there some docs/rationale on which parts and why somewhere? Sounds worth understanding in more detail. > Once that's done, it *might* be theoretically possible to generate SWIG > bindings as a short-term experiment, but there would be a lot of problems. > Lucy's autogenerated header files won't map well. There will be some quirks > that would need to be worked out regarding Lucy's object model. Lots of > features will be missing -- subclassing, automated refcount management, > default parameter values, etc. It will also be quite unwieldy, because > hashes, arrays, and strings won't get automatically converted at the binding > barrier -- you'll have to do crazy stuff like creating Lucy::Object::CharBuf > objects every time you want to pass a string into the Lucy core. Yeah, ugh. > What is planned instead is to adapt the materials under > trunk/clownfish/lib/Clownfish/Binding/Perl to generate Ruby C API code instead > of Perl C API code. There's actually not a lot there: > > $ wc -l lib/Clownfish/Binding/Perl.pm lib/Clownfish/Binding/Perl/* > 528 lib/Clownfish/Binding/Perl.pm > 475 lib/Clownfish/Binding/Perl/Class.pm > 150 lib/Clownfish/Binding/Perl/Constructor.pm > 277 lib/Clownfish/Binding/Perl/Method.pm > 269 lib/Clownfish/Binding/Perl/Subroutine.pm > 298 lib/Clownfish/Binding/Perl/TypeMap.pm > 1997 total > $ > > Most of the work the Clownfish compiler does involves parsing the Lucy header > files and building a model of the Lucy object hierarchy in memory. That work > is done. What's left is to port the code that walks that object hierarchy and > generates binding code. We have such code for Perl; we need to adapt it for > Ruby. > > Once that work is done, it's done. Changes within Lucy's core don't require > changes to Clownfish. > > I'm actively working on the Clownfish code now -- adding host languages is a > force-multiplier for the project, so it's a high priority. See the roadmap at > <http://markmail.org/thread/nfqfphjigqcl2svc>. > > I've been vacillating between Python and Ruby as far as which bindings to work > on next, but I tend to go where there are active collaborators. If you're > interested in contributing, you'll have company. :) > > At the least, I intend to finish porting the bulk of the Clownfish compiler > from Perl to C so that it's easier for non-Perl people to grok. That's sounds good, and the roadmap makes sense. Have since subscribed to lucy-devel as well since a lot of the traffic there seems to address the kinds of things I'm interested in. Thanks for the explanation. ast -- Andrew S. Townley <[email protected]> http://atownley.org
