Re: Segment

Marvin Humphrey Sat, 28 Mar 2009 19:54:40 -0700

On Wed, Mar 25, 2009 at 07:34:01AM -0400, Michael McCandless wrote:
> >> What does "incremented" mean?
> >
> > It means that the caller has to take responsibility for one refcount.  
> > Usually
> > you'll see that on constructors and factory methods.
> >
> > Having "incremented" as part of the method/function signature makes it 
> > easier
> > to autogenerate binding code that doesn't make refcounting errors and leak
> > memory.
> 
> OK got it.  It's like when Python's docs say "returns a reference".
> It's great to make this a "formal" part of the API.


I'm pretty sure you grok this already but for clarity's sake: this is
Boilerplater syntax -- so it's a "formal" part of an *internal* API.

Even though Boilerplater is a very small language, I was deeply reluctant to
write it.  Naturally I hate all programming languages and I have fantasies of
replacing C with something "better" :) -- but I recognize the challenges that
language authors face and have no desire to expose Boilerplater outside of
Lucy.  It's just a means to an end.

The C API docs -- which I expect we'll autogenerate from the .bp source files
just as I'm currently generating Perl POD docs from .bp files -- will probably
be HTML files and will say "returns a new reference" or "returns a borrowed
reference" just like the Python docs.

> Instead of having a bunch of version constants at the top of a class
> (eg FieldsReader.java), we'd invoke the "Versions.add(...)"  to create
> each version.

Where would we keep track of the registrations?  Will each DataReader subclass
keep a class Hash variable?

  static Hash* versions = NULL;

  static void
  S_init_versions_hash()
  {
      versions = Hash_new(2);
      Hash_Store_Str(versions, "1", 1, CB_newf("initial format"));
      Hash_Store_Str(versions, "2", 1, CB_newf("fixed stoopid mistake"));
  }

  Hash*
  LexWriter_versions(LexWriter *self)
  {
      UNUSED_VAR(self);
      if (!versions) { S_init_versions_hash(); }
      return versions;
  }

Actually that'll leak memory without an atexit() or something like that, but
you get the idea.

> Introspection/transparency is the primary reason I can think of --
> it's the same motivation that led you to JSON over private binary.
> Ie, it'd be great to see a string description of what "format: '2'"
> means; eg if each int has a known corresponding description, you could
> add a comment on that line the JSON.
> 
> And, in the source code, we of course assign symbolic names to these
> constants anyway.
> 
> Also, having an explicit method call to "add" a new version avoids
> silly risks that when adding a new version someone messes up adding
> one to the int :) Or, messes up keeping track of the latest format
> (the format that's written).  It may help with the back compat unit
> tests, too, ensuring that each supported version is tested.
> 
> I guess it's a matter of where do you draw the line b/w browseability
> of your JSON metadata vs "you must pull in an external tool to get
> more details".

OK, I'm cool with this so long as we can come up with a sensible API.

There are no performance implications or significant shared-object-bloat issues.

> You are needing to bring online a scary amount of basic
> infrastructure (GC, exception handling, object vtables, etc.) just to
> get the ball rolling.

True to an extent, but there's a huge payoff: the actual search code -- where
the rubber hits the road -- is only marginally harder to follow than Java.

Marvin Humphrey

Re: Segment

Reply via email to