In my opinion:
A key can be up to 250 bytes. It may not contain:
null (0x00)
space (0x20)
tab (0x09)
newline (0x0a)
carriage-return (0x0d)
Beyond that, memcached shouldn't care. If your keys are UTF-8, fine.
If not, fine -- just so long as they don't exceed 250 bytes, memcached
will just treat them as binary blobs.
UTF-8, for those who don't know, cannot introduce any of the above
forbidden characters as part of its multibyte sequences. The bytes in
a UTF-8 sequence are always in the 0x80-0xFF range (actually more
restricted than that.)
UTF-16 or UTF-32 would likely cause problems, but that's fine -- the
rules above, being based on raw bytes, will pretty much imply that.
-Steve
On Dec 19, 2007, at 10:30 AM, Dustin Sallings wrote:
I just got a bug report for my client regarding multibyte
characters within a key. In order to fix it, I need to know what
*should* be allowed in a key.
The protocol document is fairly vague as far as what makes up a
key. It says some specific characters that *aren't* valid, but
seems to have been written with an ASCII mindset.
In the binary protocol, we have a lot of freedom, but that freedom
doesn't extend to the text protocol.
Should we constrain keys to ASCII, or force clients to understand
UTF-8 (or some other specific encoding)?
--
Dustin Sallings