On Fri, Feb 25, 2011 at 12:41:02PM +0000, Andrew S. Townley wrote: > Another issue I just hit with Ferret (actually, the same root problem, but > manifested in a different way) made me wonder something else about the > design of lucy: the query parsing API.
I think there's a fundamental challenge with the proposed design (Lucy/Ferret/Lucene doesn't preserve metadata during scoring) regardless of which engine you choose, and I'll address it in a reply on the earlier thread. > Is lucy's QueryParser API effectively this one? > > http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Search/QueryParser.html Yes. > If so, is there a way to traverse the query parse tree It's not public API yet, but it can be done, and perhaps we should consider making the API public. ANDQuery, ORQuery, RequiredOptionalQuery and NOTQuery are all subclasses of PolyQuery. PolyQuery provides a PolyQuery_Get_Children() method (which would be spelled get_children() in the Perl or Ruby-vaporware bindings). Using that, you can traverse the hierarchy. Indeed, that's what QueryParser does internally. > Also, I mentioned SWIG in passing the other day in a previous message. > Would it not be possible to just generate the bindings for Ruby with SWIG? SWIG won't do: http://lucy.markmail.org/thread/5uxmc655dvzzdpvx There are portions of Lucy that have been intentionally left unimplemented by the core. The Perl implementation code is located in trunk/perl/xs/ and trunk/perl/lib/Lucy.pm. This code will have to be ported for each new host language regardless. Once that's done, it *might* be theoretically possible to generate SWIG bindings as a short-term experiment, but there would be a lot of problems. Lucy's autogenerated header files won't map well. There will be some quirks that would need to be worked out regarding Lucy's object model. Lots of features will be missing -- subclassing, automated refcount management, default parameter values, etc. It will also be quite unwieldy, because hashes, arrays, and strings won't get automatically converted at the binding barrier -- you'll have to do crazy stuff like creating Lucy::Object::CharBuf objects every time you want to pass a string into the Lucy core. What is planned instead is to adapt the materials under trunk/clownfish/lib/Clownfish/Binding/Perl to generate Ruby C API code instead of Perl C API code. There's actually not a lot there: $ wc -l lib/Clownfish/Binding/Perl.pm lib/Clownfish/Binding/Perl/* 528 lib/Clownfish/Binding/Perl.pm 475 lib/Clownfish/Binding/Perl/Class.pm 150 lib/Clownfish/Binding/Perl/Constructor.pm 277 lib/Clownfish/Binding/Perl/Method.pm 269 lib/Clownfish/Binding/Perl/Subroutine.pm 298 lib/Clownfish/Binding/Perl/TypeMap.pm 1997 total $ Most of the work the Clownfish compiler does involves parsing the Lucy header files and building a model of the Lucy object hierarchy in memory. That work is done. What's left is to port the code that walks that object hierarchy and generates binding code. We have such code for Perl; we need to adapt it for Ruby. Once that work is done, it's done. Changes within Lucy's core don't require changes to Clownfish. I'm actively working on the Clownfish code now -- adding host languages is a force-multiplier for the project, so it's a high priority. See the roadmap at <http://markmail.org/thread/nfqfphjigqcl2svc>. I've been vacillating between Python and Ruby as far as which bindings to work on next, but I tend to go where there are active collaborators. If you're interested in contributing, you'll have company. :) At the least, I intend to finish porting the bulk of the Clownfish compiler from Perl to C so that it's easier for non-Perl people to grok. > I did a couple of Ruby/GTK+ bindings (GtkHTML3 and WebKit), and the issue > there is that it was all hand-coded C. Trying to track a fast moving target > like I was with WebKit proved to be nearly impossible, and I eventually gave > up. Right, absolutely. We've been through the same slog, and now we autogenerate the vast majority of our binding code. Marvin Humphrey
