CC to lucy-dev...
On Wed, Jun 03, 2009 at 09:50:09AM -0700, [email protected] wrote:
> The problem is that, if highlight_spans returns undef instead of [],
> we get another bus error. This is easy enough to work around, but
> ‘return [] unless ...’ is a unintuitive.
The intent for highlight_spans() was actually that all implementations should
return an array of Spans, which would in some cases be empty. That way,
compound implementations (such as the one in PolyCompiler, which is inherited
by e.g. ORCompiler) or other consumers wouldn't have to check the return
value for NULL/undef.
However, this being C and not Java or Perl, NULL-checking has to be performed
explicitly. Core C code only NULL-checks when absolutely necessary, expecting
callers to obey API instructions fastidiously; the bindings are where most of
the serious argument-checking happens. In this case, the binding code is not
performing the NULL-checking we'd like it to, so instead of a meaningful
exception, we get a segfault when vulnerable core code is exposed to an
unexpected NULL.
> And it seems your intention was to support undef:
>
> ------------------------------------------------------------------------
> r4631 | creamyg | 2009-05-26 16:27:46 -0700 (Tue, 26 May 2009) | 3 lines
>
> Fix segfaulting bug in Host_callback_obj when the supplied argument is
> something other than either a KinoSearch object or undef.
Host_callback_obj is a general routine, and it definitely has to translate
from Perl undef to C NULL. However, it was handling undef properly all along,
and that commit message actually has a different meaning: Host_callback_obj
used to choke on plain scalars, hashrefs, or arrayrefs -- anything other than
a KinoSearch object or undef -- but as of that commit it processes them
properly.
The root problem here is a limitation of the Boilerplater language and the
auto-generated binding code: there is currently no way of specifying that a
return value cannot be NULL in a Boilerplater method declaration.
Using Boilerplater's support for default argument values, it's possible to
indicate that arguments may or may not be NULL; for example, in Searchable's
Hits() method, "sort_spec" may be NULL, but "query" may not...
public incremented Hits*
Hits(Searchable *self, Obj *query, u32_t offset = 0,
u32_t num_wanted = 10, SortSpec *sort_spec = NULL);
... and the autogenerated bindings will throw a sensible error if you try this:
my $hits = $searcher->hits( query => undef );
The logically consistent fix would be to extend Boilerplater to allow the
"= NULL" construct for return values.
public incremented VArray* = NULL
Highlight_Spans(Compiler *self, Searchable *searchable,
DocVector *doc_vec, const CharBuf *field);
I dunno, though, that's kind of ugly since the return value doesn't have an
explicit name.
> If you don’t have time or don’t want to fix this, I’ll release another
> KSx::Search::RegexpTermQuery.
Having you fix this in RegexpTermQuery would not be sufficient: KS should be
throwing sensible errors rather than segfaulting.
For now, I'll just apply a band-aid: all core calls to highlight_spans() will
have their return values NULL-checked. However, we'll probably see other bugs
of this type until we come up with fix at the Boilerplater level.
Marvin Humphrey