Anton, I don't see any reason why this can't be fixed. However, since I'm not familiar with the specifics of the JSON implementation, I'll need assistance. Please open an issue or pull-request on the Python client: https://github.com/basho/riak-python-client/issues. We are open to major, breaking changes for the next release.
On Fri, Feb 1, 2013 at 8:06 AM, Anton <[email protected]> wrote: > Let's talk python and Unicode (yey!) > > The objects that I want to store will have non-ASCII strings in them. > Potentially a lot. How much is a lot? "Very many millions" should be a > good estimate. > > Now, the default behaviour for storing a python object (ok, a dict of > stuff), using the PBC transport is to pass them to json and encode > them. I'm ok with that, I like JSON and the fact that I can read out > an object in JSON, using a browser, helps a lot. It's really great for > developing project-specific tools, say debugging tools. > > But here is where the fun part starts. The JSON encoder in python is > not a simple thing, and takes a lot of parameters. And by default it > works. So well that people rarely look at what's going on. When you > look at what's going on, however, things get more entertaining. > > The JSON encoder works on unicode objects, not strings. When you pass > it unicode objects, it's happy. When you pass it strings, it decodes > them, using a specified encoding. By default this is set to 'utf-8' > which makes everything quite ok. So far so good. However, there's > another option - 'ensure_ascii'. This is set to True by default and it > means that the JSON encoder will spew out an ASCII-encoded string. > That is, in the result, every unicode code-point is encoded as \u0123, > or a total of 6 bytes. > > Now, this is not good. For one, the JSON RFCs expect Unicode, encoded > using UTF-*. Also, even if much of the data will require 3bytes in > UTF-8, that's still only half the bytes that the python default would > take. > > Now, consider this elementary example. It already gives a significant > (in bytes) difference for a short string: > http://pastie.org/6011147 > > > Please tell me I'm not going crazy and all this is the state of > affairs and it is, in fact, wrong and can/should be fixed. > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs <[email protected]> Software Engineer Basho Technologies, Inc. http://basho.com/ _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
