Re: [lucy-user] Question about query parsing API

Marvin Humphrey Fri, 25 Feb 2011 08:06:53 -0800

On Fri, Feb 25, 2011 at 12:41:02PM +0000, Andrew S. Townley wrote:
> Another issue I just hit with Ferret (actually, the same root problem, but
> manifested in a different way) made me wonder something else about the
> design of lucy: the query parsing API.


I think there's a fundamental challenge with the proposed design
(Lucy/Ferret/Lucene doesn't preserve metadata during scoring) regardless of
which engine you choose, and I'll address it in a reply on the earlier thread.
 
> Is lucy's QueryParser API effectively this one?
> 
> http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/Search/QueryParser.html

Yes.
 
> If so, is there a way to traverse the query parse tree

It's not public API yet, but it can be done, and perhaps we should consider
making the API public.

ANDQuery, ORQuery, RequiredOptionalQuery and NOTQuery are all subclasses of
PolyQuery.  PolyQuery provides a PolyQuery_Get_Children() method (which would
be spelled get_children() in the Perl or Ruby-vaporware bindings).  Using
that, you can traverse the hierarchy.  Indeed, that's what QueryParser does
internally.

> Also, I mentioned SWIG in passing the other day in a previous message.
> Would it not be possible to just generate the bindings for Ruby with SWIG?

SWIG won't do:

    http://lucy.markmail.org/thread/5uxmc655dvzzdpvx

There are portions of Lucy that have been intentionally left unimplemented by
the core.  The Perl implementation code is located in trunk/perl/xs/ and
trunk/perl/lib/Lucy.pm.  This code will have to be ported for each new host
language regardless.

Once that's done, it *might* be theoretically possible to generate SWIG
bindings as a short-term experiment, but there would be a lot of problems.
Lucy's autogenerated header files won't map well.  There will be some quirks
that would need to be worked out regarding Lucy's object model.  Lots of
features will be missing -- subclassing, automated refcount management,
default parameter values, etc.  It will also be quite unwieldy, because
hashes, arrays, and strings won't get automatically converted at the binding
barrier -- you'll have to do crazy stuff like creating Lucy::Object::CharBuf
objects every time you want to pass a string into the Lucy core.

What is planned instead is to adapt the materials under
trunk/clownfish/lib/Clownfish/Binding/Perl to generate Ruby C API code instead
of Perl C API code.  There's actually not a lot there:

    $ wc -l lib/Clownfish/Binding/Perl.pm lib/Clownfish/Binding/Perl/*
         528 lib/Clownfish/Binding/Perl.pm
         475 lib/Clownfish/Binding/Perl/Class.pm
         150 lib/Clownfish/Binding/Perl/Constructor.pm
         277 lib/Clownfish/Binding/Perl/Method.pm
         269 lib/Clownfish/Binding/Perl/Subroutine.pm
         298 lib/Clownfish/Binding/Perl/TypeMap.pm
        1997 total
    $ 

Most of the work the Clownfish compiler does involves parsing the Lucy header
files and building a model of the Lucy object hierarchy in memory.  That work
is done.  What's left is to port the code that walks that object hierarchy and
generates binding code.  We have such code for Perl; we need to adapt it for
Ruby.

Once that work is done, it's done.  Changes within Lucy's core don't require
changes to Clownfish.

I'm actively working on the Clownfish code now -- adding host languages is a
force-multiplier for the project, so it's a high priority.  See the roadmap at
<http://markmail.org/thread/nfqfphjigqcl2svc>.

I've been vacillating between Python and Ruby as far as which bindings to work
on next, but I tend to go where there are active collaborators.  If you're
interested in contributing, you'll have company. :)

At the least, I intend to finish porting the bulk of the Clownfish compiler
from Perl to C so that it's easier for non-Perl people to grok.

> I did a couple of Ruby/GTK+ bindings (GtkHTML3 and WebKit), and the issue
> there is that it was all hand-coded C.  Trying to track a fast moving target
> like I was with WebKit proved to be nearly impossible, and I eventually gave
> up.  

Right, absolutely.  We've been through the same slog, and now we autogenerate
the vast majority of our binding code.

Marvin Humphrey

Re: [lucy-user] Question about query parsing API

Reply via email to