On 6/21/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
Greets,

Both of Lucy's present target platforms provide much of the
functionality missing from C and present in Java -- for instance,
portable filepath handling.  We could also get that from APR a la
Lucene4C, but while it may be possible to add a C target to Lucy
someday that uses APR as its foundation, we don't need to complicate
the install process by making APR a prerequisite for all targets.

There are a few dependencies I think we should bundle with Lucy:

   * Zlib
   * Snowball stemmers
   * some variant of vsnprintf

While Zlib is provided as part of core Perl and possibly as part of
all other platforms Lucy might target, bundling it means we don't
have to call back to the native API should we wish to access it from
C, as we might if FieldsWriter and/or FieldsReader end up implemented
in C.

This seems like a lot to bundle to me, when like you said, it will
probably be available on all platforms that Lucy might target. I don't
see the problem with calling back to the native API. We are going to
have to provide call-backs for things like memory allocation and
exception handling so I don't think an extra inflate and deflate
callback is going to hurt. But if you feel strongly about this I'm not
to fussed.

The Snowball stemmers are also available via CPAN; I now maintain
that distribution (Lingua::Stem::Snowball).  However, other platforms
probably won't have something like that available, and even within
the Perl world, bundling Snowball means greater flexibility with
regards to how Lucy interacts with it.

This I agree with. I've bundled it with Ferret. I've also bundled the
lists of stopwords from http://snowball.tartarus.org/. Do you plan on
doing any other analysis at the C level or do you just want to make
the SnowBall parser available in the target API?

We need vsnprintf for formatting error messages, which may include
user-controllable input and which are therefore ripe for buffer
overflow attack.  There are many variants available -- see <http://
www.ijs.si/software/snprintf/> for links to a few (some are
outdated).  We may be able to derive something from APR's
implementation if we can't find one with a compatible license we can
just bundle and #inclide.

I think I'd rather derive something from APR's implementation.

If those are are only external dependencies, that implies we'll be
building a lot from scratch.  Here are some of the utilities we'll
need to code up:

   * hashtable
   * priority queue
   * byte buffer (an array of bytes that knows its own length)
   * bit vector
   * external sort
   * C test harness

How does that sound?

Sounds good to me. I've done all these before bar the external sort.
What is the byte buffer for in particular?

Cheers,
Dave

PS: Any progress with the test harness? Would you like me to do it?

Reply via email to