On Thu, Jan 20, 2011 at 8:13 PM, Itagaki Takahiro <itagaki.takah...@gmail.com> wrote: > On Fri, Jan 21, 2011 at 09:11, Bruce Momjian <br...@momjian.us> wrote: >> What happened to our work to add a JSON data type for PG 9.1? > > Nothing will happen in 9.1. > I assume we are in "competition" status: > > http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php > > Also, if PGXN will work well, we might not have to include JSON > in the core. We can download any JSON implementations from the > site after installing the core server. Of course, if we will > use JSON types in the core (EXPLAIN JSON output?), we have to > include one of them.
Thanks for the update. However, I should probably take some blame as well for spending basically zero time on the JSON datatype for lengths at a time. After someone emailed me asking how to install the JSON datatype, I fixed up my module version of it: http://git.postgresql.org/gitweb?p=json-datatype.git I also have a version where JSON is a core datatype (the patches I submitted to CommitFest), but it is obsolete now. The module version has all the same features, but also fixes a Unicode bug* and adds PostgreSQL 8.4 compatibility. For those who want the JSON datatype right now, I recommend using the module. I plan to maintain and improve the JSON module (repository linked above), albeit at a snail's pace. Patches are certainly welcome. See roadmap.markdown in the repository for some minor and not-so-minor design decisions. Joey Adams * P.S. The "Unicode bug" involved UTF-16 surrogate pair calculation. JSON encodes Unicode characters that aren't in the Basic Multilingual Plane the same way UTF-16 does: using two 4-digit hex codes. The correct formula for determining the Unicode codepoint of a surrogate pair is: unicode = 0x10000 + (((uc & 0x3FF) << 10) | (lc & 0x3FF)); where: 0xD800 <= uc <= 0xDBFF 0xDC00 <= lc <= 0xDFFF For example, consider the JSON string "\uD835\uDD0D": uc = 0xD835 lc = 0xDD0D unicode = 0x1D50D This pair of hex indices collapses into one Unicode codepoint, 0x1D50D (or "𝔍"). I originally used this: /* WRONG */ unicode = 0x10000 | ((uc & 0x3FF) << 10) | (lc & 0x3FF); The problem is, ((uc & 0x3FF) << 10) | (lc & 0x3FF) has a range of 0 to 0xFFFFF, and in some of those numbers, `0x10000 |` adds zero. I am sharing all this because I have made this mistake twice, and I believe it is an easy mistake to make. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers