Marvin Humphrey <[email protected]> wrote:
>> > public incremented Hash*
>> > Metadata(DataWriter *self);
>
>> What does "incremented" mean?
>
> It means that the caller has to take responsibility for one refcount. Usually
> you'll see that on constructors and factory methods.
>
> Having "incremented" as part of the method/function signature makes it easier
> to autogenerate binding code that doesn't make refcounting errors and leak
> memory.
OK got it. It's like when Python's docs say "returns a reference".
It's great to make this a "formal" part of the API.
>> Looks good, though, I might add a way for a given module to register
>> the versions it reads & writes (presumably it only writes the most
>> recent one); then min/max can be derived based on what was registered.
>
> I thought about something like that. It's more awkward, though, and I'm not
> sure how much it buys us. I think the common case would be to drop support
> for versions below a certain minimum and to support anything later. In the
> event that your DataReader really does support a discontiguous set of
> versions, you can just do extra error checking yourself.
>
> Even though DataReader is an advanced class, we should still value simplicity
> and try to make it as easy to use as possible.
Agreed, though I don't think recording int <-> string adds much
complexity to the impl nor the API. That mapping need not be
recorded permanently anywhere (except in the source code).
Instead of having a bunch of version constants at the top of a class
(eg FieldsReader.java), we'd invoke the "Versions.add(...)" to create
each version.
>> This can be useful for introspection too, so instead of just seeing
>> "format 2" something could decode that to the string describing what
>> format 2 was (eg "added omitTermFreqAndPositions capability").
>
> So, the advantage would be that we could throw more meaningful error messages?
>
> The thing is, I'm not sure how useful it is to tell the user what kind of
> change occurred at "format 2". How would that help them to recover?
>
> There's also Luke-style index browsing. But there's only so much screen
> space, and I can't see how that info has utility compared to other things that
> Luke can show you.
>
> It seems to me that that kind of thing belongs in the plugin class
> documentation. Am I missing another important runtime application?
Introspection/transparency is the primary reason I can think of --
it's the same motivation that led you to JSON over private binary.
Ie, it'd be great to see a string description of what "format: '2'"
means; eg if each int has a known corresponding description, you could
add a comment on that line the JSON.
And, in the source code, we of course assign symbolic names to these
constants anyway.
Also, having an explicit method call to "add" a new version avoids
silly risks that when adding a new version someone messes up adding
one to the int :) Or, messes up keeping track of the latest format
(the format that's written). It may help with the back compat unit
tests, too, ensuring that each supported version is tested.
I guess it's a matter of where do you draw the line b/w browseability
of your JSON metadata vs "you must pull in an external tool to get
more details".
> We can throw exceptions that belong to meaningful classes without too much
> difficulty. We just can't set up try-catch-finally.
>
> But that's not a big deal. We can just set most things up to check return
> values, and throw fatal errors when necessary.
OK
>> > However, we could create full-fledged exception objects for Lucy, so that
>> > THROW
>> > calls might look something like this:
>> >
>> > THROW(Err_data_component_version, /* <--- An integer error id */
>> > "Format version '%i32' is less than the minimum "
>> > "supported version '%i32' for %o", format, min,
>> > DataReader_Get_Class_Name(self));
>> >
>> > The exception objects generated by THROW calls do not have to subclass
>> > Lucy::Obj, because we will always be returning control to the host. So,
>> > they
>> > could be, for example, plain old Java Exception subclasses.
>>
>> What would THROW try to do, and, how?
>
> The Lucy core code would format an error message and choose an error number
> from a list of Lucy error codes. A stack trace would be great, too, though
> that's hard to do portably.
>
> Then it would call a method which would have to be implemented per-Host.
OK so the primary goal of THROW is to cross the bridge back to the
host language and throw the exception there.
> For Java, the implementation might contain something like this:
>
> if (errorNumber == lucy_Err_data_component_version) {
> throw new DataComponentVersionException(message);
> }
> else if (...) {
>
> I should also mention that THROW would be a macro, as implied by the all-caps.
> It would call the function lucy_Err_throw_at, automatically inserting line and
> function name information when possible:
>
> #ifdef CHY_HAS_VARIADIC_MACROS
> void
> lucy_Err_throw_at(const char *file, int line, const char *func,
> const char *pattern, ...);
> #ifdef CHY_HAS_ISO_VARIADIC_MACROS
> #define LUCY_THROW(...) \
> lucy_Err_throw_at(__FILE__, __LINE__, LUCY_ERR_FUNC_MACRO, \
> __VA_ARGS__)
>
> Some compilers don't support variadic macros, though (cough cough MSVC cough),
> so we have to omit the context data and define THROW as a variadic function.
>
> void
> LUCY_THROW(const char *pattern, ...);
>
> How about "Lucy::Util::Err" for the exception handling code? I've been trying
> to avoid things like "String", "Array", "Exception" and such so that we don't
> conflict with core host symbols -- hence the funny names like "CharBuf" and
> "VArray".
Sounds good. You are needing to bring online a scary amount of basic
infrastructure (GC, exception handling, object vtables, etc.) just to
get the ball rolling.
Mike