Re: Segment

Michael McCandless Wed, 25 Mar 2009 04:34:32 -0700

Marvin Humphrey <[email protected]> wrote:

>> >    public incremented Hash*
>> >    Metadata(DataWriter *self);
>
>> What does "incremented" mean?
>
> It means that the caller has to take responsibility for one refcount.  Usually
> you'll see that on constructors and factory methods.
>
> Having "incremented" as part of the method/function signature makes it easier
> to autogenerate binding code that doesn't make refcounting errors and leak
> memory.


OK got it.  It's like when Python's docs say "returns a reference".
It's great to make this a "formal" part of the API.

>> Looks good, though, I might add a way for a given module to register
>> the versions it reads & writes (presumably it only writes the most
>> recent one); then min/max can be derived based on what was registered.
>
> I thought about something like that.  It's more awkward, though, and I'm not
> sure how much it buys us.  I think the common case would be to drop support
> for versions below a certain minimum and to support anything later.  In the
> event that your DataReader really does support a discontiguous set of
> versions, you can just do extra error checking yourself.
>
> Even though DataReader is an advanced class, we should still value simplicity
> and try to make it as easy to use as possible.

Agreed, though I don't think recording int <-> string adds much
complexity to the impl nor the API.  That mapping need not be
recorded permanently anywhere (except in the source code).

Instead of having a bunch of version constants at the top of a class
(eg FieldsReader.java), we'd invoke the "Versions.add(...)"  to create
each version.

>>  This can be useful for introspection too, so instead of just seeing
>> "format 2" something could decode that to the string describing what
>> format 2 was (eg "added omitTermFreqAndPositions capability").
>
> So, the advantage would be that we could throw more meaningful error messages?
>
> The thing is, I'm not sure how useful it is to tell the user what kind of
> change occurred at "format 2".  How would that help them to recover?
>
> There's also Luke-style index browsing.  But there's only so much screen
> space, and I can't see how that info has utility compared to other things that
> Luke can show you.
>
> It seems to me that that kind of thing belongs in the plugin class
> documentation.  Am I missing another important runtime application?

Introspection/transparency is the primary reason I can think of --
it's the same motivation that led you to JSON over private binary.
Ie, it'd be great to see a string description of what "format: '2'"
means; eg if each int has a known corresponding description, you could
add a comment on that line the JSON.

And, in the source code, we of course assign symbolic names to these
constants anyway.

Also, having an explicit method call to "add" a new version avoids
silly risks that when adding a new version someone messes up adding
one to the int :) Or, messes up keeping track of the latest format
(the format that's written).  It may help with the back compat unit
tests, too, ensuring that each supported version is tested.

I guess it's a matter of where do you draw the line b/w browseability
of your JSON metadata vs "you must pull in an external tool to get
more details".

> We can throw exceptions that belong to meaningful classes without too much
> difficulty.  We just can't set up try-catch-finally.
>
> But that's not a big deal.  We can just set most things up to check return
> values, and throw fatal errors when necessary.

OK

>> > However, we could create full-fledged exception objects for Lucy, so that 
>> > THROW
>> > calls might look something like this:
>> >
>> >    THROW(Err_data_component_version, /* <--- An integer error id */
>> >        "Format version '%i32' is less than the minimum "
>> >        "supported version '%i32' for %o", format, min,
>> >        DataReader_Get_Class_Name(self));
>> >
>> > The exception objects generated by THROW calls do not have to subclass
>> > Lucy::Obj, because we will always be returning control to the host.  So, 
>> > they
>> > could be, for example, plain old Java Exception subclasses.
>>
>> What would THROW try to do, and, how?
>
> The Lucy core code would format an error message and choose an error number
> from a list of Lucy error codes.  A stack trace would be great, too, though
> that's hard to do portably.
>
> Then it would call a method which would have to be implemented per-Host.

OK so the primary goal of THROW is to cross the bridge back to the
host language and throw the exception there.

> For Java, the implementation might contain something  like this:
>
>  if (errorNumber == lucy_Err_data_component_version) {
>    throw new DataComponentVersionException(message);
>  }
>  else if (...) {
>
> I should also mention that THROW would be a macro, as implied by the all-caps.
> It would call the function lucy_Err_throw_at, automatically inserting line and
> function name information when possible:
>
>  #ifdef CHY_HAS_VARIADIC_MACROS
>    void
>    lucy_Err_throw_at(const char *file, int line, const char *func,
>                      const char *pattern, ...);
>    #ifdef CHY_HAS_ISO_VARIADIC_MACROS
>      #define LUCY_THROW(...) \
>        lucy_Err_throw_at(__FILE__, __LINE__, LUCY_ERR_FUNC_MACRO, \
>                         __VA_ARGS__)
>
> Some compilers don't support variadic macros, though (cough cough MSVC cough),
> so we have to omit the context data and define THROW as a variadic function.
>
>    void
>    LUCY_THROW(const char *pattern, ...);
>
> How about "Lucy::Util::Err" for the exception handling code?  I've been trying
> to avoid things like "String", "Array", "Exception" and such so that we don't
> conflict with core host symbols -- hence the funny names like "CharBuf" and
> "VArray".

Sounds good.  You are needing to bring online a scary amount of basic
infrastructure (GC, exception handling, object vtables, etc.) just to
get the ball rolling.

Mike

Re: Segment

Reply via email to