Jens Alfke created COUCHDB-2327:
-----------------------------------

             Summary: Add string/array prefix match option, for view queries
                 Key: COUCHDB-2327
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2327
             Project: CouchDB
          Issue Type: Improvement
      Security Level: public (Regular issues)
          Components: HTTP Interface
            Reporter: Jens Alfke


View querying provides no clean way to match a string prefix The only advice 
I've seen is to set startkey to the prefix, and endkey to the prefix with "some 
really high Unicode character" appended, which is a total kludge*.

There's a similar issue with matching an array prefix, e.g. "all keys that 
start with [2014, ...]". Here the solution is less kludgy (append a "{}" to the 
endkey) but it's still very unintuitive to people learning CouchDB. I've had to 
explain it to newbies many times.

I suggest adding an explicit query option to enable prefix matching. This 
doesn't need to mess with the actual query engine — all it has to do is modify 
the endkey by appending an appropriate Unicode character (in the string case) 
or empty object (in the array case.) If no `endkey` is given it will be based 
on the `startkey`.

I've already implemented a comparable feature for Couchbase Lite:
https://github.com/couchbase/couchbase-lite-ios/wiki/Query-Enhancements#prefix-matching

Note that I made the `prefix_match` parameter an integer, not a boolean. This 
is to support cases where you want to match a prefix of a _nested component_ of 
the key, for example "all keys in 2014 whose product name starts with 'f'", 
where the startkey would be [2014, "f"] and the prefix_match would be 2 to 
indicate that it's the nested string that should be prefix-matched not the 
array. But in the common case you'd just set the value to 1 to indicate that 
the top level key should be prefix-matched.

* Why is adding "some high Unicode character" a kludge? Because Unicode is so 
complicated and so inconsistently implemented. Doing this immediately opens the 
possibility of weird Unicode issues in your development language's string type, 
in its HTTP client library, and in Erlang's equivalents on the server side. Not 
to mention the swamp that is the Unicode specification itself — for instance, 
I've seen advice to use a character like \uFFFE, which was correct until 
Unicode went 32-bit, and tended to work alright for a while after that, but 
will now fail with emoji characters (which are both very commonly used and well 
outside the 16-bit range.) Actually whether it fails depends on whether your 
string implementation operates on UTF-16 (very common) or true Unicode code 
points. Like I said, it's a kludge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to