On Sat, 29 Dec 2012, Tomi Ollila <tomi.ollila at iki.fi> wrote: > On Fri, Dec 28 2012, Austin Clements <amdragon at MIT.EDU> wrote: > >> This parses the subset of Xapian's boolean term quoting rules that are >> used by make_boolean_term. This is provided as a generic string >> utility, but will be used shortly in notmuch restore to parse and >> optimize for ID queries. >> --- >> util/string-util.c | 55 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> util/string-util.h | 11 +++++++++++ >> 2 files changed, 66 insertions(+) >> >> diff --git a/util/string-util.c b/util/string-util.c >> index e4bea21..83b4953 100644 >> --- a/util/string-util.c >> +++ b/util/string-util.c >> @@ -96,3 +96,58 @@ make_boolean_term (void *ctx, const char *prefix, const >> char *term, >> >> return 0; >> } >> + >> +int >> +parse_boolean_term (void *ctx, const char *str, >> + char **prefix_out, char **term_out) >> +{ >> + *prefix_out = *term_out = NULL; >> + >> + /* Parse prefix */ >> + const char *pos = strchr (str, ':'); >> + if (! pos) >> + goto FAIL; >> + *prefix_out = talloc_strndup (ctx, str, pos - str); >> + ++pos; >> + >> + /* Implement de-quoting compatible with make_boolean_term. */ >> + if (*pos == '"') { >> + char *out = talloc_array (ctx, char, strlen (pos)); >> + int closed = 0; >> + *term_out = out; >> + /* Skip the opening quote, find the closing quote, and >> + * un-double doubled internal quotes. */ >> + for (++pos; *pos; ) { >> + if (*pos == '"') { >> + ++pos; >> + if (*pos != '"') { >> + /* Found the closing quote. */ >> + closed = 1; >> + break; >> + } >> + } >> + *out++ = *pos++; >> + } >> + /* Did the term terminate without a closing quote or is there >> + * trailing text after the closing quote? */ >> + if (!closed || *pos) >> + goto FAIL; >> + *out = '\0'; >> + } else { >> + const char *start = pos; >> + /* Check for text after the boolean term. */ >> + while (*pos > ' ' && *pos != ')') >> + ++pos; >> + if (*pos) >> + goto FAIL; > > Mark pointed out a good case about trailing whitespace -- It would be nice > if the core were lenient for such cases. I personally remember once wasting > hours of work by just failing to notice trailing whitespace in one system > so this subject is sensitive to me...
Will do. > Another thing I saw earlyer today: make_boolean_term() checks > > if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127) > > but here the check is only > > while (*pos > ' ' && *pos != ')') > > I wonder whether it matters... This is correct in a conservative way. We quote things that we don't strictly need to quote (for example, a double quote in the middle of a term doesn't actually require the term to be quoted, but we do anyway), but we look for the end of an unquoted term in exactly the same way Xapian does. > Everyting else looks good to me. > > > Tomi > >> + /* No trailing text; dup the string so the caller can free >> + * it. */ >> + *term_out = talloc_strdup (ctx, start); >> + } >> + return 0; >> + >> + FAIL: >> + talloc_free (*prefix_out); >> + talloc_free (*term_out); >> + return 1; >> +} >> diff --git a/util/string-util.h b/util/string-util.h >> index b8844a3..43d49d0 100644 >> --- a/util/string-util.h >> +++ b/util/string-util.h >> @@ -33,4 +33,15 @@ char *strtok_len (char *s, const char *delim, size_t >> *len); >> int make_boolean_term (void *talloc_ctx, const char *prefix, const char >> *term, >> char **buf, size_t *len); >> >> +/* Parse a boolean term query produced by make_boolean_term, returning >> + * the prefix in *prefix_out and the term in *term_out. *prefix_out >> + * and *term_out will be talloc'd with context ctx. >> + * >> + * Return: 0 on success, non-zero on parse error (including trailing >> + * data in str). >> + */ >> +int >> +parse_boolean_term (void *ctx, const char *str, >> + char **prefix_out, char **term_out); >> + >> #endif >> -- >> 1.7.10.4 >> >> _______________________________________________ >> notmuch mailing list >> notmuch at notmuchmail.org >> http://notmuchmail.org/mailman/listinfo/notmuch