On 10/19/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:

On Oct 18, 2006, at 8:03 PM, David Balmain wrote:

> For Ruby I can use the make alternative rake. But I'm thinking about
> Ferret at the moment.

Forgive me, I don't understand why you make the distinction in that
sentence between "Ruby" and "Ferret".  Is there a reason you could
use rake with Lucy but not Ferret?

Sorry, I definitely wasn't very clear. I just don't want the staight C
code in Ferret to have a dependency on Ruby. As far as building Ferret
with Ruby bindings goes, I already use rake so there is no problem
there.

>> I can spec extra flags to CBuilder's compile() function if turns out
>> to be necessary.  However, CBuilder, by default, passes the same set
>> of flags that were used when compiling the Perl executable (which are
>> archived, along with a zillion other settings from Perl's Configure
>> script, in the Config module).  On a RedHat 9 box I have access to,
>> those flags include -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64,
>> and I'm assuming that other Perl installations where LFS isn't the OS
>> default also spec flags rather than defining macros within individual
>> source files.
>
> Unfortunately these values are defined as macros in Ruby.

Could we build a custom Charmonizer probe for Ruby then?

static char ruby_largefiles_code[] = METAQUOTE
     #include "ruby.h" /* or whatever the file is */
     #include "_charm.h";
     int main() {
         Charm_Setup;
         printf("%d", (int)sizeof(off_t));
         return 0;
     }
METAQUOTE;

Good idea but I think I'll have to work on this.

#include <stdio.h>
#include "ruby.h" /* or whatever the file is */
int main() {
   printf("%d\n", _FILE_OFFSET_BITS);
   printf("%d\n", (int)sizeof(off_t));
   return 0;
}

output:

64
4

:( I guess we could just check whether _FILE_OFFSET_BITS is defined
and equal to 64.

> Any reason the native language needs to support LFS? If all access to
> the index files is through Lucy, it shouldn't matter right?

There's two levels of support we need to consider: whether the host
language was compiled using LFS, and whether LFS is available at
all.  I definitely want to avoid supporting systems that can't deal
with large files at all because I don't want to have to think about
how many bytes a file pointer might have every time I see one.  File
pointers in Lucy should be 64-bit integers.  Period.

As for the the case where the host language may not support LFS, we
might get away with it, but I'm not a big fan of the idea, because
LFS bugs are really hard to test for and only bite you when you've
already got a lot going on.  And stuff can hide in funny places like
that stat() call example.

We should make Charmonizer's implementation fail-safe, regardless.
We can add a LargeFiles_try_macros() function which adds those
#defines to the probe code.  We can start off just with
_LARGEFILE_SOURCE and _FILE_OFFSET_BITS=64, getting into the more
esoteric #defines if we get failure reports.

How many Ruby installs are there without LFS?   I'd be shocked if
there were more than a handful of old and decrepit ones.  Should we
support old versions?  I don't think Ferret is, and I'd prefer not
to.  KinoSearch supports only Perl 5.8.3 and later.

Not many if any on *nix based systems but I'm not sure about Windows.
The standard version on windows doesn't have large file support.

I propose that we probe for LFS in Ruby and bomb out if it's not
there.  Then we add LargeFiles_try_macros() to ./charmonize and
define -DLUCY_RUBY as a flag to enable it when compiling charmonize.c.

#ifdef LUCY_RUBY
     LargeFiles_try_macros();
#endif
     LarteFiles_run(conf_fh);

> One other thing. Have you thought about detecting dirent.h in
> charmonizer?

We could add a Dirent module to Charmonizer, but I'm not sure I see
immediate benefits.  We'll definitely need dirent.h for Lucy, because
we need a way to list the contents of an FSDirectory/FSStore/
FSInvIndex.  Fortunately, dirent.h is widely available.  Building
Perl actually requires that it be available -- it's one of the few
non-ANSI C modules Perl can't live without.

Well, unfortunately it's not available on VC6 which I need to use to
compile Ruby extensions. This is a bit of an issue in the ruby
community at the moment.

The thing is, the behavior of dirent.h is predictable enough for our
purposes.  Some systems provide d_namlen as a struct member, but
others don't so if you want to write portable code you use strlen
(entry->d_name).  I think that's the end of the story, isn't it?  We
absolutely must have dirent.h, and we can write portable code for it
without needing the sort of pre-compile-time probing Charmonizer
provides.  We don't need to worry about other struct members that may
or may not be there, and that a couple calls to strlen() on filenames
won't be a performance concern.

The only thing I can think of is whether readdir_r, the reentrant
version of readdir, is always available.  That's something I don't
know.  But I don't see anything in the AutoConf documentation about
it, so I'd gather it's always there.

I think we're closing in on the feature set Lucy needs Charmonizer to
supply.  It'd be sorta nice to detect non-IEEE floats so we could
throw a meaningful error at compile-time rather than just fail
Similarity's tests on encode_norm/decode_norm.  But I don't think
it's worth the effort since those systems are so rare, and I'm going
to back-burner that one.

Filepath handling is the one big feature left I think we ought to put
in Charmonizer.  That sounds ambitious, but it doesn't have to be.
Lucy basically only needs to know what the directory separator is,
because all it ever needs to do is concatenate the filename onto the
index directory.  Directory names ought to be normalized to full
filepaths, but such paths are always going to have to be supplied by
the user at the native level, so we can rely upon native routines for
normalization.

Since Charmonizer is only serving one master for now, its FilePath
module can be cheesy and only supply one constant macro, DIR_SEP.

That sounds fine to me.

> Are we going to need any directory reading functions in
> Lucy? I use it to clear the directory when the IndexWriter create flag
> is set to true but I guess this isn't really necessary.

You also need it when you read an index which resides on the
filesystem into a RAMDirectory/RAMStore/RAMInvIndex.

True, but we could simply use the segments file to see what files are
available. I guess it wasn't much code to make the dirent stuff I
needed available in VC6.

Reply via email to