Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON
Thanks for reviewing my patch! On Mon, Jul 4, 2011 at 7:10 AM, Bernd Helmle maili...@oopsware.de wrote: +comment = 'data type for storing and manipulating JSON content' I'm not sure, if manipulating is a correct description. Maybe i missed it, but i didn't see functions to manipulate JSON strings directly, for example, adding an object to a JSON string, ... Indeed. I intend to add manipulation functions in the future. The words and manipulating shouldn't be there yet. -- Coding ... ... It is noticable that the parser seems to define its own is_space() and is_digit() functions, e.g.: +#define is_space(c) ((c) == '\t' || (c) == '\n' || (c) == '\r' || (c) == ' ') Wouldn't it be more clean to use isspace() instead? isspace() accepts '\f', '\v', and sometimes other characters as well, depending on locale. Likewise, isdigit() accepts some non-ASCII characters in addition to [0-9], depending on the locale and platform. Thus, to avoid parsing a superset of the JSON spec, I can't use the ctype.h functions. I should add a comment with this explanation. This code block in function json_escape_unicode() looks suspicious: + /* Convert to UTF-8, if necessary. */ + { + const char *orig = json; + json = (const char *) + pg_do_encoding_conversion((unsigned char *) json, length, + GetDatabaseEncoding(), PG_UTF8); + if (json != orig) + length = strlen(json); + } Seems it *always* wants to convert the string. Isn't there a if condition missing? pg_do_encoding_conversion does this check already. Perhaps I should rephrase the comment slightly: + /* Convert to UTF-8 (only if necessary). */ There's a commented code fragment missing, which should be removed (see last two lines of the snippet below): +static unsigned int +read_hex16(const char *in) ... + Assert(is_hex_digit(c)); + + if (c = '0' c = '9') + tmp = c - '0'; + else if (c = 'A' c = 'F') + tmp = c - 'A' + 10; + else /* if (c = 'a' c = 'f') */ + tmp = c - 'a' + 10; That comment is there for documentation purposes, to indicate the range check that's skipped because we know c is a hex digit, and [a-f] is the only thing it could be (and must be) at that line. Should it still be removed? json.c introduces new appendStringInfo* functions, e.g. appendStringInfoEscape() and appendStringInfoUtf8(). Maybe it's better to name them to something different, since it may occur someday that the backend will introduce the same notion with a different meaning. They are static, but why not naming them into something like jsonAppendStringInfoUtf8() or similar? I agree. -- Regression Tests ... It also tests UNICODE sequences and input encoding with other server encoding than UTF-8. It tests with other *client* encodings than UTF-8, but not other server encodings than UTF-8. Is it possible to write tests for different server encodings? -- Self-review There's a minor bug in the normalization code: select $$ \u0009 $$::json; json -- \u0009 (1 row) This should produce \t instead. I'm thinking that my expect_*/next_*pop_char/... parsing framework is overkill, and that, for a syntax as simple as JSON's, visibly messy parsing code like this: if (s e (*s == 'E' || *s == 'e')) { s++; if (s e (*s == '+' || *s == '-')) s++; if (!(s e is_digit(*s))) return false; do s++; while (s e is_digit(*s)); } would be easier to maintain than the deceptively elegant-looking code currently in place: if (optional_char_cond(s, e, *s == 'E' || *s == 'e')) { optional_char_cond(s, e, *s == '+' || *s == '-'); skip1_pred(s, e, is_digit); } I don't plan on gutting the current macro-driven parser just yet. When I do, the new parser will support building a parse tree (only when needed), and the parsing functions will have signatures like this: static bool parse_value(const char **sp, const char *e, JsonNode **out); static bool parse_string(const char **sp, const char *e, StringInfo *out); ... The current patch doesn't provide manipulation features where a parse tree would come in handy. I'd rather hold off on rewriting the parser until such features are added. I'll try to submit a revised patch within the next couple days. Thanks! - Joey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another swing at JSON
On Fri, Jun 17, 2011 at 2:29 AM, Bernd Helmle maili...@oopsware.de wrote: Joseph, are you able to remove the compatibility code for this CF? Done. Note that this module builds, tests, and installs successfully with USE_PGXS=1. However, building without USE_PGXS=1 produces the following: CREATE EXTENSION json; ERROR: incompatible library /usr/lib/postgresql/json.so: version mismatch DETAIL: Server is version 9.1, library is version 9.2. Similar problems occur with a couple other modules I tried (hstore, intarray). Joey json-contrib-no-compat-20110617.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Wed, May 11, 2011 at 7:53 AM, Robert Haas robertmh...@gmail.com wrote: That's likely to be how it gets implemented, but you seem to have missed the point of some of the discussion upthread: the big problem with that is that someone might type DROP TYPE foo, and when they do, you need an efficient way to figure out whether foo is in use inside an instance of the variant type anywhere in the system. The devil is in the details... Sorry, I missed that. That in mind, I think I would lean more toward the union proposal as well. Can anyone think of a case where VARIANT would be more useful? As for using one or two bytes to store the type of a UNION, that creates a problem when you want to extend the union in the future. That is, if a UNION is simply a collection of possible types values of the UNION type can hold. If UNION is implemented more like a tagged union: CREATE TYPE token AS TAGGED UNION (identifier TEXT, keyword TEXT, number INT); Then the problem of altering it is much like the problem of altering an ENUM. On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote: Examples of open union types could be number, which all the numeric types compose, and so you can know say that you can use the generic numeric operators on values you have simply if their types compose the number union type, and it still works if more numeric types appear later. Likewise, the string open union could include both text and blob, as both support catenation and substring matches or extraction, for example. This would aid to operator overloading in a generic way, letting you use the same syntax for different types, but allowing types to mix is optional; eg, you could support add(int,int) and add(real,real) without supporting add(int,real) etc but the syntax add(x,y) is shared, and you do this while still having a strong type system; allowing the mixing is optional case-by-case. Coming from a Haskell perspective, this is a great idea, but I don't think the union feature should be used to implement it. Closed unions correspond to algebraic data types in Haskell, e.g.: data Ordering = LT | EQ | GT while open unions are better-suited to type classes: (+) :: (Num a) = a - a - a I, for one, would like to see PostgreSQL steal some features from Haskell's type system. PostgreSQL seems to implement a subset of Haskell's system, without type classes and where functions can have only one type variable (anyelement). To express the (+) example in PostgreSQL, it would be tempting to simply say: add(real, real) returns real However, what if each real is a different type (e.g. INT and FLOAT). Is that allowed? In the Haskell example above, (+) constraints both of its arguments to the same type. In ad-hoc syntax, it would look like this in PostgreSQL: real anyelement = add(anyelement, anyelement) returns anyelement Another thing to consider: attempting to use a type class as a column type, e.g.: CREATE TABLE foo (n real); Normally in Haskell, type information is passed implicitly as parameters (hence the term parametric polymorphism), rather than carried alongside values (like in object-oriented languages). In the case above, the type information would have to be carried with each value. Haskell actually supports this, but under a somewhat-weird extension called Existential types (see http://www.haskell.org/haskellwiki/Existential_type#Examples for an example). It isn't terribly useful in Haskell, and I don't think it will be in PostgreSQL either. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
It seems to me a reasonable way to implement VARIANT would be to have a data type called VARIANT that stores an OID of the inner type at the beginning, followed by the binary data. When you say pg_typeof(somevariant), you'll get 'variant'. Instead, you'd use a function like this: variant_typeof(VARIANT) returns REGTYPE VARIANT could then be casted to/from other types. I think the rules for converting to/from variant should be a little stronger than normal PostgreSQL casting rules. For example: SELECT '12345'::TEXT::INT; -- allowed SELECT '12345'::TEXT::VARIANT::INT; -- not allowed SELECT '12345'::TEXT::VARIANT::TEXT::INT; -- allowed I'm not sure how conversions to/from VARIANT could be implemented other than creating conversion functions for every type. It'd be nice if we could avoid that. A more permissive way to convert out of VARIANT might be to have a function like this: variant_unwrap(VARIANT) returns TEXT -- user casts to desired type I suppose the in/out functions could prefix the value with the type name and a colon: SELECT '12345'::INT::VARIANT; variant - integer:12345 The VARIANT type, or similar, would be useful for the JSON data type I've been intermittently working on, as it would allow us to create a function like this: from_json(JSON) returns VARIANT from_json would unwrap a JSON string/number/bool/null/array, converting it to a VARIANT whose inner type is TEXT / (INT or BIGINT or DOUBLE or NUMERIC) / BOOL / [null] / ARRAY. In the [null] case, from_json would actually return NULL (I see no need for VARIANT to wrap nulls). This is rather type-safe compared to what I currently have: from_json(JSON) returns TEXT -- user casts to desired type By returning variant, we can then cast to the desired type, and if the cast is invalid, a type error will occur. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Tue, May 10, 2011 at 10:29 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: The VARIANT type, or similar, would be useful for the JSON data type I've been intermittently working on, as it would allow us to create a function like this: from_json(JSON) returns VARIANT This occurred to me: if PostgreSQL functions could return ANYELEMENT, then we could just say: from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value Likewise, the variant conversion functions (which could be invoked automatically) could be done this way: to_variant(ANYELEMENT) returns VARIANT from_variant(VARIANT) returns ANYELEMENT However, I'm not familiar enough with the innards of PostgreSQL's type system to know if returning ANYELEMENT would be possible and make sense. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cast from integer to money
On Sun, Apr 3, 2011 at 11:23 PM, Robert Haas robertmh...@gmail.com wrote: Thanks for the patch, but I think you forgot to worry about overflow: rhaas=# select 9223372036854775807::money; money -$1.00 (1 row) cash_in doesn't test for overflow, either (tested on 8.4.0, 9.0.3, and HEAD): joey=# select '9223372036854775807'::money; money -$1.00 (1 row) Is this a bug? Detail: unlike cash_in, numeric_cash does check for overflow (implicitly, through its use of numeric_int8). Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cast from integer to money
Attached is an updated version of the patch to allow conversion of int4/int8 directly to money. I added overflow checks, dropped int2-cash, and updated the documentation. - Joey diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index ecc79e2..13b888d 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -886,15 +886,22 @@ ALTER SEQUENCE replaceable class=parametertablename/replaceable_replaceab /para para -Values of the typenumeric/type data type can be cast to -typemoney/type. Other numeric types can be converted to -typemoney/type by casting to typenumeric/type first, for example: +Values of the typenumeric/type, typeint/type, and +typebigint/type data types can be cast to typemoney/type. +Conversion from the typereal/type and typedouble precision/type +data types can be done by casting to typenumeric/type first, for +example: programlisting -SELECT 1234::numeric::money; +SELECT '12.34'::float8::numeric::money; /programlisting +However, this is not recommended. Floating point numbers should not be +used to handle money due to the potential for rounding errors. + /para + + para A typemoney/type value can be cast to typenumeric/type without loss of precision. Conversion to other types could potentially lose -precision, and it must be done in two stages, for example: +precision, and must also be done in two stages: programlisting SELECT '52093.89'::money::numeric::float8; /programlisting diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index 67f5128..61f7370 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -92,6 +92,21 @@ num_word(Cash value) return buf; } /* num_word() */ +static bool +int8mul_direct(int64 arg1, int64 arg2, int64 *result) +{ + *result = arg1 * arg2; + + /* Overflow check. See comment in int8mul */ + if (arg1 != (int64) ((int32) arg1) || arg2 != (int64) ((int32) arg2)) + { + if (arg2 != 0 + (*result / arg2 != arg1 || (arg2 == -1 arg1 0 *result 0))) + return false; + } + + return true; +} /* cash_in() * Convert a string to a cash data type. @@ -938,3 +953,67 @@ numeric_cash(PG_FUNCTION_ARGS) PG_RETURN_CASH(result); } + +/* int4_cash() + * Convert int4 (int) to cash + */ +Datum +int4_cash(PG_FUNCTION_ARGS) +{ + int32 amount = PG_GETARG_INT32(0); + Cash result; + int fpoint; + int64 scale; + int i; + struct lconv *lconvert = PGLC_localeconv(); + + /* see comments about frac_digits in cash_in() */ + fpoint = lconvert-frac_digits; + if (fpoint 0 || fpoint 10) + fpoint = 2; + + /* compute required scale factor */ + scale = 1; + for (i = 0; i fpoint; i++) + scale *= 10; + + /* compute amount * scale, checking for overflow */ + if (!int8mul_direct(amount, scale, result)) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg(money out of range))); + + PG_RETURN_CASH(result); +} + +/* int8_cash() + * Convert int8 (bigint) to cash + */ +Datum +int8_cash(PG_FUNCTION_ARGS) +{ + int64 amount = PG_GETARG_INT64(0); + Cash result; + int fpoint; + int64 scale; + int i; + struct lconv *lconvert = PGLC_localeconv(); + + /* see comments about frac_digits in cash_in() */ + fpoint = lconvert-frac_digits; + if (fpoint 0 || fpoint 10) + fpoint = 2; + + /* compute required scale factor */ + scale = 1; + for (i = 0; i fpoint; i++) + scale *= 10; + + /* compute amount * scale, checking for overflow */ + if (!int8mul_direct(amount, scale, result)) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg(money out of range))); + + PG_RETURN_CASH(result); +} diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h index bf8a6fc..f7b7b76 100644 --- a/src/include/catalog/pg_cast.h +++ b/src/include/catalog/pg_cast.h @@ -126,6 +126,8 @@ DATA(insert ( 1700 700 1745 i f )); DATA(insert ( 1700 701 1746 i f )); DATA(insert ( 790 1700 3823 a f )); DATA(insert ( 1700 790 3824 a f )); +DATA(insert ( 23 790 3811 a f )); +DATA(insert ( 20 790 3812 a f )); /* Allow explicit coercions between int4 and bool */ DATA(insert ( 23 16 2557 e f )); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index cff64ba..7919a40 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -971,6 +971,10 @@ DATA(insert OID = 3823 ( numeric PGNSP PGUID 12 1 0 0 f f f t f s 1 0 1700 DESCR(convert money to numeric); DATA(insert OID = 3824 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 1700 _null_ _null_ _null_ _null_ numeric_cash _null_ _null_ _null_ )); DESCR(convert numeric to money); +DATA(insert OID = 3811 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 23 _null_ _null_ _null_ _null_ int4_cash _null_ _null_ _null_ )); +DESCR(convert int4 to money); +DATA(insert OID = 3812 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 20 _null_ _null_ _null_ _null_ int8_cash _null_ _null_ _null_
Re: [HACKERS] psql 9.1 alpha5: connection pointer is NULL
2011/4/2 Devrim GÜNDÜZ dev...@gunduz.org: I'm getting the following message after upgrading to Alpha5 on my Fedora 14 box: $ psql -p 5433 psql: connection pointer is NULL which comes from libpq. Server is running, and I can connect it to via 9.0's psql. This is a regular RPM build. Am I doing something wrong, or? I couldn't reproduce this (using upstream source on Ubuntu). However, I did find a little bug in libpq causing the connection handle to become NULL in the event of an option parsing error. This bug has been around since release 9.0.0, and may be unrelated to the problem. Joey Adams From 5bcf99b6481dd1393525ee804eeae6a04999d86d Mon Sep 17 00:00:00 2001 From: Joey Adams joeyadams3.14...@gmail.com Date: Sat, 2 Apr 2011 14:27:22 -0400 Subject: [PATCH] Fixed return false; in PQconnectStartParams, which returns a PGconn * --- src/interfaces/libpq/fe-connect.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index a4959ee..aa24c37 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -491,7 +491,7 @@ PQconnectStartParams(const char **keywords, { conn-status = CONNECTION_BAD; /* errorMessage is already set */ - return false; + return conn; } /* -- 1.7.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cast from integer to money
On Thu, Mar 31, 2011 at 6:39 PM, Stephen Frost sfr...@snowman.net wrote: Going just integer-money, with the 1 - $1.00, seems completely reasonable to me. As for being too late in the cycle.. if someone's willing to do the work, I can't imagine it breaking anything, so I wouldn't be against putting it in. It really should be before the first beta tho. Attached is a patch which enables casting int2/int4/int8 to money, with the same scaling as numeric uses. Hence, 1::money yields '$1.00' . The only other numeric types (other than oid, cardinal_number, etc.) that can't be casted directly to money are float4 and float8, and I suspect this is intentional. The patch includes tests, but does not update the documentation. Should the docs be updated where it reads Values of the numeric data type can be cast to money. Other numeric types can be converted to money by casting to numeric first ? Because this change adds rows to the pg_proc and pg_cast catalogs, applying this patch for 9.1 will require alpha users to initdb again. Is that acceptable? Regards, Joey Adams From aca9723db6f8614286a0eb82517e29407b09193e Mon Sep 17 00:00:00 2001 From: Joey Adams joeyadams3.14...@gmail.com Date: Fri, 1 Apr 2011 22:09:12 -0400 Subject: [PATCH] Allow cast from int2/int4/int8 to money --- src/backend/utils/adt/cash.c| 75 +++ src/include/catalog/pg_cast.h |3 + src/include/catalog/pg_proc.h |6 +++ src/include/utils/cash.h|4 ++ src/test/regress/expected/money.out | 73 ++ src/test/regress/sql/money.sql | 14 ++ 6 files changed, 175 insertions(+), 0 deletions(-) diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index 67f5128..1981123 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -938,3 +938,78 @@ numeric_cash(PG_FUNCTION_ARGS) PG_RETURN_CASH(result); } + +/* int2_cash() + * Convert int2 (smallint) to cash + */ +Datum +int2_cash(PG_FUNCTION_ARGS) +{ + int16 amount = PG_GETARG_INT16(0); + Cash result; + int fpoint; + int i; + struct lconv *lconvert = PGLC_localeconv(); + + /* see comments about frac_digits in cash_in() */ + fpoint = lconvert-frac_digits; + if (fpoint 0 || fpoint 10) + fpoint = 2; + + /* result = amount * 10^fpoint */ + result = amount; + for (i = 0; i fpoint; i++) + result *= 10; + + PG_RETURN_CASH(result); +} + +/* int4_cash() + * Convert int4 (int) to cash + */ +Datum +int4_cash(PG_FUNCTION_ARGS) +{ + int32 amount = PG_GETARG_INT32(0); + Cash result; + int fpoint; + int i; + struct lconv *lconvert = PGLC_localeconv(); + + /* see comments about frac_digits in cash_in() */ + fpoint = lconvert-frac_digits; + if (fpoint 0 || fpoint 10) + fpoint = 2; + + /* result = amount * 10^fpoint */ + result = amount; + for (i = 0; i fpoint; i++) + result *= 10; + + PG_RETURN_CASH(result); +} + +/* int8_cash() + * Convert int8 (bigint) to cash + */ +Datum +int8_cash(PG_FUNCTION_ARGS) +{ + int64 amount = PG_GETARG_INT64(0); + Cash result; + int fpoint; + int i; + struct lconv *lconvert = PGLC_localeconv(); + + /* see comments about frac_digits in cash_in() */ + fpoint = lconvert-frac_digits; + if (fpoint 0 || fpoint 10) + fpoint = 2; + + /* result = amount * 10^fpoint */ + result = amount; + for (i = 0; i fpoint; i++) + result *= 10; + + PG_RETURN_CASH(result); +} diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h index bf8a6fc..5c2bbef 100644 --- a/src/include/catalog/pg_cast.h +++ b/src/include/catalog/pg_cast.h @@ -126,6 +126,9 @@ DATA(insert ( 1700 700 1745 i f )); DATA(insert ( 1700 701 1746 i f )); DATA(insert ( 790 1700 3823 a f )); DATA(insert ( 1700 790 3824 a f )); +DATA(insert ( 21 790 3825 a f )); +DATA(insert ( 23 790 3811 a f )); +DATA(insert ( 20 790 3812 a f )); /* Allow explicit coercions between int4 and bool */ DATA(insert ( 23 16 2557 e f )); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index cff64ba..1f46fd9 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -971,6 +971,12 @@ DATA(insert OID = 3823 ( numeric PGNSP PGUID 12 1 0 0 f f f t f s 1 0 1700 DESCR(convert money to numeric); DATA(insert OID = 3824 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 1700 _null_ _null_ _null_ _null_ numeric_cash _null_ _null_ _null_ )); DESCR(convert numeric to money); +DATA(insert OID = 3825 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 21 _null_ _null_ _null_ _null_ int2_cash _null_ _null_ _null_ )); +DESCR(convert int2 to money); +DATA(insert OID = 3811 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 23 _null_ _null_ _null_ _null_ int4_cash _null_ _null_ _null_ )); +DESCR(convert int4 to money); +DATA(insert OID = 3812 ( money PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 20 _null_ _null_ _null_ _null_ int8_cash _null_ _null_ _null_ )); +DESCR(convert int8 to money);
Re: [HACKERS] Another swing at JSON
On Tue, Mar 29, 2011 at 2:42 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Also, should uninstall_json.sql be named something else, like json--uninstall--0.1.sql ? You don't need no uninstall script no more, try DROP EXTENSION json; and DROP EXTENSION json CASCADE; It's there for pre-9.1, where DROP EXTENSION is not available. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another swing at JSON
On Tue, Mar 29, 2011 at 2:42 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Joseph Adams joeyadams3.14...@gmail.com writes: It would be nice if I could make a Makefile conditional that skips the relocatable test and loads init-pre9.1.sql if the new extension interface isn't available. Is there a Makefile variable or something I can use to do this? You can use VERSION and MAJORVERSION variables, those are defined in Makefile.global and available as soon as you did include the PGXS Makefile. The problem is, I'd have to include the PGXS Makefile before defining a parameter used by the PGXS Makefile. I could include the PGXS Makefile twice, once at the top, and again at the bottom, but that produces a bunch of ugly warnings like: ../../src/Makefile.global:418: warning: overriding commands for target `submake-libpq' Not only that, but MAJORVERSION is formatted as a decimal (like 9.1 or 8.4). Although I could use some hacky string manipulation, or compare MAJORVERSION against all prior supported versions, either approach would be needlessly error-prone. I'm thinking the pg_config utility should either make PG_VERSION_NUM (e.g. 90100) available, or it should define something indicating the presence of the new extension system. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another swing at JSON
On Tue, Mar 29, 2011 at 4:02 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Here's the ugly trick from ip4r, that's used by more extension: PREFIX_PGVER = $(shell echo $(VERSION) | awk -F. '{ print $$1*100+$$2 }') Thanks. I applied a minor variation of this trick to the JSON module, so now it builds/installs/tests cleanly on both REL8_4_0 and HEAD (though it won't work if you copy contrib/json into a pre-9.1 PostgreSQL source directory and type `make` without USE_PGXS=1). I also went ahead and renamed uninstall_json.sql to json--uninstall--0.1.sql (again, it's for pre-9.1 users) and removed unnecessary trailing spaces. Anything going into the PostgreSQL code base will be for 9.2, so anything else would be a separate (if somewhat related) project. I suspect the code will be a good deal cleaner if you do just the 9.2+ version and see who wants it back-patched, if anyone does :) It's a trivial matter to remove backward compatibility from contrib/json, if anybody wants me to do it. I can just remove compat.[ch], */init-pre9.1.* , remove the PREFIX_PGVER trick from the Makefile, remove a few lines in the source code, and maintain the backported json module elsewhere. It's just a matter of whether or not explicit backward-compatibility is desirable in modules shipped with releases. Joey Adams add-json-contrib-module-4.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another swing at JSON
On Mon, Mar 28, 2011 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Mar 28, 2011 at 1:21 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Attached is a patch that adds a 'json' contrib module. Although we may want a built-in JSON data type in the near future, making it a module (for the time being) has a couple advantages: Is this something you'd hope to get committed at some point, or do you plan to maintain it as an independent project? I'm hoping to get it committed at some point, perhaps as a module soon, and a built-in later. Plenty of people are still waiting for JSON data type support, for example: http://stackoverflow.com/questions/4995945/optimize-escape-json-in-postgresql-9-0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch: Add PGXS support to hstore's Makefile (trivial)
I discovered today that hstore's Makefile currently does not support PGXS; namely, the ability to install it like so: USE_PGXS=1 make sudo USE_PGXS=1 make install Moreover, hstore is the *only* contrib module whose Makefile does not contain PGXS. The attached patch changes hstore's Makefile so it is like the others. I have tested the above commands on a copy of the hstore directory located outside of the PostgreSQL source tree, and it appears to work just fine. I also tested conventional make and make install, and it still appears to work as well. Joey Adams commit a7384288531ba9a235292836d6ee1e17a65a3997 Author: Joey Adams joeyadams3.14...@gmail.com Date: Sun Jan 23 21:05:03 2011 -0500 Added PGXS support to contrib/hstore/Makefile diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile index e466b6f..1d533fd 100644 --- a/contrib/hstore/Makefile +++ b/contrib/hstore/Makefile @@ -1,9 +1,5 @@ # contrib/hstore/Makefile -subdir = contrib/hstore -top_builddir = ../.. -include $(top_builddir)/src/Makefile.global - MODULE_big = hstore OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \ crc32.o @@ -12,4 +8,13 @@ DATA_built = hstore.sql DATA = uninstall_hstore.sql REGRESS = hstore +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/hstore +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk +endif -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON data type status?
On Thu, Jan 20, 2011 at 8:13 PM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Fri, Jan 21, 2011 at 09:11, Bruce Momjian br...@momjian.us wrote: What happened to our work to add a JSON data type for PG 9.1? Nothing will happen in 9.1. I assume we are in competition status: http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php Also, if PGXN will work well, we might not have to include JSON in the core. We can download any JSON implementations from the site after installing the core server. Of course, if we will use JSON types in the core (EXPLAIN JSON output?), we have to include one of them. Thanks for the update. However, I should probably take some blame as well for spending basically zero time on the JSON datatype for lengths at a time. After someone emailed me asking how to install the JSON datatype, I fixed up my module version of it: http://git.postgresql.org/gitweb?p=json-datatype.git I also have a version where JSON is a core datatype (the patches I submitted to CommitFest), but it is obsolete now. The module version has all the same features, but also fixes a Unicode bug* and adds PostgreSQL 8.4 compatibility. For those who want the JSON datatype right now, I recommend using the module. I plan to maintain and improve the JSON module (repository linked above), albeit at a snail's pace. Patches are certainly welcome. See roadmap.markdown in the repository for some minor and not-so-minor design decisions. Joey Adams * P.S. The Unicode bug involved UTF-16 surrogate pair calculation. JSON encodes Unicode characters that aren't in the Basic Multilingual Plane the same way UTF-16 does: using two 4-digit hex codes. The correct formula for determining the Unicode codepoint of a surrogate pair is: unicode = 0x1 + (((uc 0x3FF) 10) | (lc 0x3FF)); where: 0xD800 = uc = 0xDBFF 0xDC00 = lc = 0xDFFF For example, consider the JSON string \uD835\uDD0D: uc = 0xD835 lc = 0xDD0D unicode = 0x1D50D This pair of hex indices collapses into one Unicode codepoint, 0x1D50D (or 픍). I originally used this: /* WRONG */ unicode = 0x1 | ((uc 0x3FF) 10) | (lc 0x3FF); The problem is, ((uc 0x3FF) 10) | (lc 0x3FF) has a range of 0 to 0xF, and in some of those numbers, `0x1 |` adds zero. I am sharing all this because I have made this mistake twice, and I believe it is an easy mistake to make. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote: Perhaps we should enumerate the attributes of what would make a good binary encoding? Not sure if we're discussing the internal storage format or the binary send/recv format, but in my humble opinion, some attributes of a good internal format are: 1. Lightweight - it'd be really nice for the JSON datatype to be available in core (even if extra features like JSONPath aren't). 2. Efficiency - Retrieval and storage of JSON datums should be efficient. The internal format should probably closely resemble the binary send/recv format so there's a good reason to use it. A good attribute of the binary send/recv format would be compatibility. For instance, if MongoDB (which I know very little about) has binary send/receive, perhaps the JSON data type's binary send/receive should use it. Efficient retrieval and update of values in a large JSON tree would be cool, but would be rather complex, and IMHO, overkill. JSON's main advantage is that it's sort of a least common denominator of the type systems of many popular languages, making it easy to transfer information between them. Having hierarchical key/value store support would be pretty cool, but I don't think it's what PostgreSQL's JSON data type should do. On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote: I think we should take a few steps back and ask why we think that binary encoding is the way to go. We store XML as text, for example, and I can't remember any complaints about that on -bugs or -performance, so why do we think JSON will be different? Binary encoding is a trade-off. A well-designed binary encoding should make it quicker to extract a small chunk of a large JSON object and return it; however, it will also make it slower to return the whole object (because you're adding serialization overhead). I haven't seen any analysis of which of those use cases is more important and why. Speculation: the overhead involved with retrieving/sending and receiving/storing JSON (not to mention TOAST compression/decompression) will be far greater than that of serializing/unserializing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Basic JSON support
On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: Here's one thing I'm worried about: the bison/flex code in your patch looks rather similar to the code in http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the GPL. In particular, the incorrect number regex I discussed above can also be found in jsonval verbatim. However, because there are a lot of differences in both the bison and flex code now, I'm not sure they're close enough to be copied, but I am not a lawyer. It might be a good idea to contact Ben Spencer and ask him for permission to license our modified version of the code under PostgreSQL's more relaxed license, just to be on the safe side. With the help and motivation of David Fetter, I sent an e-mail to Ben Spencer (google jsonval, check out the git repo, and look in the git log to find his e-mail address) a few minutes ago, asking if he would license jsonval under a license compatible with PostgreSQL, and am currently awaiting a response. If he doesn't respond, or outright refuses (which I, for one, doubt will happen), my fallback plan is to rewrite the JSON validation code by drawing from my original code (meaning it won't be in bison/flex) and post a patch for it. Unfortunately, it seems to me that there aren't very many ways of expressing a JSON parser in bison/flex, and thus the idea of JSON parsing with bison/flex is effectively locked down by the GPL unless we can get a more permissive license for jsonval. But, I am not a lawyer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Basic JSON support
On Mon, Oct 4, 2010 at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. Joseph seems to be confusing copyrights with patents. The idea of parse JSON with bison/flex is not patentable by any stretch of the imagination. What I meant is, anyone who sets out to write a JSON parser with bison/flex is probably going to end up with something very similar to jsonval even without looking at it. But having said that, I wonder whether bison/flex are really the best tool for the job in the first place. From what I understand of JSON (which admittedly ain't much) a bison parser seems like overkill: it'd probably be both bloated and slow compared to a simple handwritten recursive-descent parser. I agree, and also because JSON isn't a moving target. However, in my opinion, the differences in quality between a bison parser and a handwritten parser are to small to be a big deal right now. A bison parser is probably slower and bigger than a simple hand-written one, but I haven't benchmarked it, and it's probably not worth benchmarking at this point. In correctness, I suspect a bison parser and a hand-written one to be about the same. Both pass the same army of testcases I built up while writing my original JSON patch. Granted, there could be weird semantics of bison/flex that make the parser wrong in subtle ways (for instance, the . regex char doesn't match \n and EOF), but there can also be weird errors in hand-written code. Pick your poison. In safety, a handwritten recursive descent parser could stack overflow and crash the server unless the proper guards are in place (e.g if someone tries to encode '[[...' as JSON). bison guards against this (chopping the maximum depth of JSON trees to 9997 levels or so), but I don't know if the real stack is smaller than bison thinks it is, or if bison uses its own buffer. A handwritten parser could also accept trees of any depth using a manually managed stack, but it wouldn't do much good because code that works with JSON trees (e.g. json_stringify) is currently recursive and has the same problem. In any case, I think limiting the depth of JSON, to something generous like 1000, is the right way to go. The RFC allows us to do that. In my opinion, the path of least resistance is the best path for now. If we use the bison parser now, then we can replace it with a hand-written one that's functionally equivalent later. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Sat, Sep 18, 2010 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote: Hmm, yeah. I'd be tempted to try to keep the user's original whitespace as far as possible, but disregard it as far as equality comparison goes. However, I'm not quite sure what the right thing to do about 0 vs 0.0 is. Does the JSON spec say anything about that? I didn't find anything in the JSON spec about comparison, but in JavaScript, 0 == 0.0 and 0 === 0.0 are both true. Also, JavaScript considers two arrays or objects equal if and only if they are references to the same object, meaning [1,2,3] == [1,2,3] is false, but if you let var a = [1,2,3]; var b = a; , then a == b and a === b are both true. Hence, JavaScript can help us when deciding how to compare scalars, but as for arrays and objects, we're on our own (actually, JSON arrays could be compared lexically like PostgreSQL arrays already are; I don't think anyone would disagree with that). I cast my vote for 0 == 0.0 being true. As for whitespace preservation, I don't think we should go out of our way to keep it intact. Sure, preserving formatting for input and output makes some sense because we'd have to go out of our way to normalize it, but preserving whitespace in JSONPath tree selections (json_get) and updates (json_set) is a lot of work (that I shouldn't have done), and it doesn't really help anybody. Consider json_get on a node under 5 levels of indentation. Another thing to think about is the possibility of using a non-text format in the future (such as a binary format or even a format that is internally indexed). A binary format would likely be faster to compare (and hence faster to index). If the JSON data type preserves whitespace today, it might break assumptions of future code when it stops preserving whitespace. This should at least be documented. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Fri, Sep 17, 2010 at 8:32 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Updated patch: the JSON code has all been moved into core, so this patch is now for a built-in data type. I have a question about the design of the JSON type. Why do we need to store the value in UTF8 encoding? It's true the RFC of JSON says the the encoding SHALL be encoded in Unicode, but I don't understand why we should reject other encodings. Actually, the code in my original patch should work with any server encoding in PostgreSQL. However, internally, it operates in UTF-8 and converts to/from the server encoding when necessary. I did it this way because the JSON code needs to handle Unicode escapes like \u266B, but there is no simple and efficient way (that I know of) to convert single characters to/from the server encoding. I noticed that in your new patch, you sidestepped the encoding issue by simply storing strings in their encoded form (right?). This is nice and simple, but in the future, JSON tree conversions and updates will still need to deal with the encoding issue somehow. As I said before, I'd like to propose only 3 features in the commitfest: * TYPE json data type * text to json: FUNCTION json_parse(text) * json to text: FUNCTION json_stringify(json, whitelist, space) Although casting from JSON to TEXT does stringify it in my original patch, I think json_stringify would be much more useful. In addition to the formatting options, if the internal format of the JSON type changes and no longer preserves original formatting, then the behavior of the following would change: $$unnecessary\u0020escape $$ :: JSON :: TEXT json_stringify would be more predictable because it would re-encode the whitespace (but not the \u0020, unless we went out of our way to make it do that). Also, json_parse is unnecessary if you allow casting from TEXT to JSON (which my patch does), but I think having json_parse would be more intuitive for the same reason you do. Long story short: I like it :-) If you're keeping track, features from my patch not in the new code yet are: * Programmatically validating JSON ( json_validate() ) * Getting the type of a JSON value ( json_type() ) * Converting scalar values to/from JSON * Converting arrays to JSON * JSONPath JSONPath will be re-implemented on the basic functionalities in the subsequent commitfest. Do you have a plan to split your patch? Or, can I continue to develop my patch? If so, JSONPath needs to be adjusted to the new infrastructure. I think your patch is on a better footing than mine, so maybe I should start contributing to your code rather than the other way around. Before the next commitfest, I could merge the testcases from my patch in and identify parsing discrepancies (if any). Afterward, I could help merge the other features into the new JSON infrastructure. I can't compile your initial patch against the latest checkout because json_parser.h and json_scanner.h are missing. Is there a more recent patch, or could you update the patch so it compiles? I'd like to start tinkering with the new code. Thanks! Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
On Mon, Aug 23, 2010 at 1:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote: What do you need AFTER for? Seems to me that BEFORE should be enough. (You already have the unadorned syntax for adding an item after the last one, which is the corner case that BEFORE alone doesn't cover). You're right. Strictly speaking we don't need it. But it doesn't hurt much to provide it for a degree of symmetry. I'm with Alvaro: drop the AFTER variant. It provides more than one way to do the same thing, which isn't that exciting, and it's also going to make it harder to document the performance issues. Without that, you can just say ADD BEFORE will make the enum slower, but plain ADD won't (ignoring the issue of OID wraparound, which'll confuse matters in any case). But what if you want to insert an OID at the end? You can't do it if all you've got is BEFORE: CREATE TYPE colors AS ENUM ('red', 'green', 'blue'); If I want it to become ('red', 'green', 'blue', 'orange'), what am I to do? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON Patch for PostgreSQL - BSON Support?
On Sun, Aug 15, 2010 at 11:47 PM, Andrew Dunstan and...@dunslane.net wrote: If BSON is simply in effect an efficient encoding of JSON, then it's not clear to me that we would want another type at all. Rather, we might want to consider storing the data in this supposedly more efficient format, and maybe also some conversion routines. An issue is that the current JSON data type implementation preserves the original text (meaning if you say '[1,2,\u0020 ]'::JSON, it will yield '[1,2,\u0020 ]' rather than '[1,2, ]' . I haven't researched BSON much at all, but I seriously doubt that part of its spec includes handling external JSON encoding details like whitespace and superfluous escapes. Even though I spent a long time implementing it, the original text preservation feature should be dropped, in my opinion. Users tend to care more about the data inside of a JSON value rather than how it's encoded, and replacement of values can have funky consequences on indentation when working with indented JSON text (similar to how pasting in a text editor puts pasted content at the wrong indent level). By dropping original text preservation, in the future (or now), JSON could be encoded in BSON (or a more efficient format) in the database rather than in JSON-encoded text. Also, an idea would be to make json_send and json_recv (binary JSON send/receive) use BSON rather than JSON-encoded text, as sending/receiving JSON-encoded text is exactly what text send/receive do. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON Patch for PostgreSQL - BSON Support?
On Mon, Aug 16, 2010 at 7:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, if it's not just a binary encoding of JSON, I think we can forget about it ... certainly it won't work in the form I was visualizing. regards, tom lane I just read the spec, and BSON has a lot of bells and whistles attached (such as labeling binary data with a subtype like UUID or MD5). With a couple exceptions (see below), any JSON can be converted to BSON (but the way BSON does arrays is silly: item indices are stored as strings), but not all BSONs can be converted to JSON without losing some type details. Others already mentioned that you can't convert 2 billion byte long JSON strings to BSON. Another issue is that BSON cannot encode all JSON numbers without precision loss. JSON can hold any number matching '-'? (0 | [1-9][0-9]*) ('.' [0-9]+)? ([Ee] [+-]? [0-9]+)? but BSON pidgenholes numeric values to either double, int32, int64, or a 12-byte MongoDB Object ID. Thus, for people who expect JSON to be able to hold arbitrary-precision numbers (which the JSON data type in my patch can), using BSON for transfer or storage will violate that expectation. Now that I know more about BSON, my opinion is that it shouldn't be used as the transfer or storage format of the JSON data type. Maybe if someone wants to do the work, BSON could be implemented as a contrib module, and functions could be provided in that module to convert to/from JSON with documented caveats. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: utf8_to_unicode (trivial)
On Tue, Jul 27, 2010 at 1:31 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Jul 24, 2010 at 10:34 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: In src/include/mb/pg_wchar.h , there is a function unicode_to_utf8 , but no corresponding utf8_to_unicode . However, there is a static function called utf2ucs that does what utf8_to_unicode would do. I'd like this function to be available because the JSON code needs to convert UTF-8 to and from Unicode codepoints, and I'm currently using a separate UTF-8 to codepoint function for that. This patch renames utf2ucs to utf8_to_unicode and makes it public. It also fixes the version of utf2ucs in src/bin/psql/mbprint.c so that it's equivalent to the one in wchar.c . This is a patch against CVS HEAD for application. It compiles and tests successfully. Comments? Thanks, I feel obliged to respond this since I'm supposed to be covering your GSoC project while Magnus is on vacation, but I actually know very little about this topic. What's undeniable, however, is that the coding in the two versions of utf8ucs() in the tree right now don't match. src/backend/utils/mb/wchar.c has: else if ((*c 0xf8) == 0xf0) while src/bin/psql/mbprint.c, which is otherwise identical, has: else if ((*c 0xf0) == 0xf0) I'm inclined to believe that your patch is right to think that the former version is correct, because it used to match the latter version until Tom Lane changed it in 2007, and I suspect he simply failed to update both copies. But I'd like someone who actually understands what this code is doing to confirm that. http://archives.postgresql.org/pgsql-committers/2007-01/msg00293.php I suspect we need to not only fix this, but back-patch it at least to 8.2, which is the first release where there are two copies of this function. I am not sure whether earlier releases need to be changed, or not. But again, someone who understands the issues better than I do needs to weigh in here. In terms of making this function non-static, I'm inclined to think that a better approach would be to move it to src/port. That gets rid of the need to have two copies in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company I've attached another patch that moves utf8_to_unicode to src/port per Robert Haas's suggestion. This patch itself is not quite as elegant as the first one because it puts platform-independent code that belongs in wchar.c into src/port . It also uses unsigned int instead of pg_wchar because the typedef of pg_wchar isn't available to the frontend, if I'm not mistaken. I'm not sure whether I like the old patch better or the new one. Joey Adams utf8-to-unicode-port.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch: General purpose utility functions used by the JSON data type
I factored out the general-purpose utility functions in the JSON data type code into a patch against HEAD. I have made a few changes to them since I posted about them earlier ( http://archives.postgresql.org/pgsql-hackers/2010-08/msg00692.php ). A summary of the utility functions along with some of my own thoughts about them: getEnumLabelOids * Useful-ometer: ()---o * Rationale: There is currently no streamlined way to return a custom enum value from a PostgreSQL function written in C. This function performs a batch lookup of enum OIDs, which can then be cached with fn_extra. This should be reasonably efficient, and it's quite elegant to use. FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT * Useful-ometer: ()o * Rationale: Using fcinfo-flinfo-fn_extra takes a lot of boilerplate. These macros help cut down the boilerplate, and the comment explains what fn_extra is all about. getTypeInfo * Useful-ometer: ()---o * Rationale: The get_type_io_data six-fer function is very cumbersome to use, since one has to declare all the output variables. The getTypeInfo puts the results in a structure. It also performs the fmgr_info_cxt step, which is a step done after every usage of get_type_io_data in the PostgreSQL code. * Other thoughts: getTypeInfo also retrieves typcategory (and typispreferred), which is rather ad-hoc. This benefits the JSON code because to_json() uses the typcategory to figure out what type of JSON value to convert something to (for instance, things in category 'A' become JSON arrays). Other data types could care less about the typcategory. Should getTypeInfo leave that step out? pg_substring, pg_encoding_substring * Useful-ometer: ()---o * Rationale: The JSONPath code uses it / will use it for extracting substrings, which is probably not a very useful feature (but who am I to say that). This function could probably benefit the text_substring() function in varlena.c , but it would take a bit of work to ensure it continues to comply with standards. server_to_utf8, utf8_to_server, text_to_utf8_cstring, utf8_cstring_to_text, utf8_cstring_to_text_with_len * Useful-ometer: ()--o * Rationale: The JSON data type operates in UTF-8 rather than the server encoding because it needs to deal with Unicode escapes, but individual Unicode characters can't be converted to/from the server encoding simply and efficiently (as far as I know). These routines made the conversions done by the JSON data type vastly simpler, and they could simplify other data types in the future (XML does a lot of server-UTF-8 conversions too). This patch doesn't include tests . How would I go about writing them? I have made the JSON data type built-in, and I will post that patch shortly (it depends on this one). The built-in JSON data type uses all of these utility functions, and the tests for the JSON data type pass. json-util-01.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: General purpose utility functions used by the JSON data type
On Fri, Aug 13, 2010 at 10:46 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 13, 2010 at 5:45 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: getEnumLabelOids * Useful-ometer: ()---o * Rationale: There is currently no streamlined way to return a custom enum value from a PostgreSQL function written in C. This function performs a batch lookup of enum OIDs, which can then be cached with fn_extra. This should be reasonably efficient, and it's quite elegant to use. It's possible that there might be a contrib module out there someplace that wants to do this, but it's clearly a waste of time for a core datatype. The OIDs are fixed. Just get rid of the enum altogether and use the OIDs directly wherever you would have used the enum. Then you don't need this. Indeed, a built-in JSON data type will certainly not need it. However, users may want to return enums from procedures written in C, and this function provides a way to do it. Incidentally, if we were going to accept this function, I think we'd need to add some kind of a check to throw an error if any of the labels can't be mapped onto an Oid. Otherwise, an error in this area might lead to difficult-to-find misbehavior. I agree. Perhaps an ereport(ERROR, ...) would be better, since it would not be hard for a user to cause an enum mapping error (even in a production database) by updating a stored procedure in C but not updating the corresponding enum (or vice versa, if enum labels are renamed). FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT * Useful-ometer: ()o * Rationale: Using fcinfo-flinfo-fn_extra takes a lot of boilerplate. These macros help cut down the boilerplate, and the comment explains what fn_extra is all about. I think this is not a good idea. Macros that use local variables under the covers make it hard for new contributors to read the code and understand what it does. It's only worth doing if it saves a lot of typing, and this doesn't. If we were going to do this, the right thing for your patch to do would be to update every instance of this coding pattern in our source base, so that people who copy those examples in the future do it the new way. But I think there's no real advantage in that: it would complicate back-patching future bug fixes for no particularly compelling reason. Fair enough. Perhaps the comment about FN_EXTRA (which explains fn_extra in more detail) could be reworded (to talk about using fcinfo-flinfo-fn_extra manually) and included in the documentation at xfunc-c.html . fn_extra currently only gets a passing mention in the discussion about set-returning functions. pg_substring, pg_encoding_substring * Useful-ometer: ()---o * Rationale: The JSONPath code uses it / will use it for extracting substrings, which is probably not a very useful feature (but who am I to say that). This function could probably benefit the text_substring() function in varlena.c , but it would take a bit of work to ensure it continues to comply with standards. I'm more positive about this idea. If you can make this generally useful, I'd encourage you to do that. On a random coding style note, I see that you have two copies of the following code, which can fairly obviously be written in a single line rather than five, assuming it's actually safe. + if (sub_end + len e) + { + Assert(false); /* Clipped multibyte character */ + break; + } If I simply say Assert(sub_end + len = e), the function will yield a range hanging off the edge of the input string (out of bounds). The five lines include a safeguard against that when assertion checking is off. This could happen if the input string has a clipped multibyte character at the end. Perhaps I should just drop the assertions and make it so if there's a clipped character at the end, it'll be ignored, no big deal. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: General purpose utility functions used by the JSON data type
On Fri, Aug 13, 2010 at 2:02 PM, David Fetter da...@fetter.org wrote: On Fri, Aug 13, 2010 at 01:33:06PM -0400, Robert Haas wrote: Maybe so, but it's not clear the interface that Joseph implemented is the one everyone wants... Fair enough. What's the interface now in a nutshell? Lack of nutshells might well mean the interface needs more work... Suppose we have: -- SQL -- CREATE TYPE colors AS ENUM ('red', 'green', 'blue'); -- C -- enum Colors {RED, GREEN, BLUE, COLOR_COUNT}; In a nutshell: * Define an enum mapping like so: static EnumLabel enum_labels[COLOR_COUNT] = { {COLOR_RED, red}, {COLOR_GREEN, green}, {COLOR_BLUE, blue} }; * Get the OIDs of the enum labels: Oid oids[COLOR_COUNT]; getEnumLabelOids(colors, enum_labels, oids, COLOR_COUNT); * Convert C enum values to OIDs using the table: PG_RETURN_OID(oids[COLOR_BLUE]); A caller would typically cache the Oid table with fn_extra. Currently, getEnumLabelOids puts InvalidOid for labels that could not successfully be looked up. This will probably be changed to ereport(ERROR) instead. Also, I'm thinking that getEnumLabelOids should be renamed to something that's easier to remember. Maybe enum_oids or get_enum_oids. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Update: I'm in the middle of cleaning up the JSON code ( http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary if you want to see the very latest ), so I haven't addressed all of the major problems with it yet. On Fri, Jul 23, 2010 at 2:34 PM, Robert Haas robertmh...@gmail.com wrote: - I was under the impression that we wanted EXPLAIN (FORMAT JSON) to return type json, but that's obviously not going to be possible if all of this is contrib. We could (a) move it all into core, (b) move the type itself and its input and output functions into core and leave the rest in contrib [or something along those lines], or (c) give up using it as the return type for EXPLAIN (FORMAT JSON). I've been developing it as a contrib module because: * I'd imagine it's easier than developing it as a built-in datatype right away (e.g. editing a .sql.in file versus editing pg_type.h ). * As a module, it has PGXS support, so people can try it out right away rather than having to recompile PostgreSQL. I, for one, think it would be great if the JSON datatype were all in core :-) However, if and how much JSON code should go into core is up for discussion. Thoughts, anyone? A particularly useful aspect of the JSON support is the ability to convert PostgreSQL arrays to JSON arrays (using to_json ), as there currently isn't any streamlined way to parse arrays in the PostgreSQL format client-side (that I know of). Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch: utf8_to_unicode (trivial)
In src/include/mb/pg_wchar.h , there is a function unicode_to_utf8 , but no corresponding utf8_to_unicode . However, there is a static function called utf2ucs that does what utf8_to_unicode would do. I'd like this function to be available because the JSON code needs to convert UTF-8 to and from Unicode codepoints, and I'm currently using a separate UTF-8 to codepoint function for that. This patch renames utf2ucs to utf8_to_unicode and makes it public. It also fixes the version of utf2ucs in src/bin/psql/mbprint.c so that it's equivalent to the one in wchar.c . This is a patch against CVS HEAD for application. It compiles and tests successfully. Comments? Thanks, Joey Adams utf8-to-unicode.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extensible enum types
On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan and...@dunslane.net wrote: Robert Haas wrote: On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.net wrote: You are just bumping up the storage cost. Part of the attraction of enums is their efficiency. What's efficient about them? Aren't we using 4 bytes to store a value that will nearly always fit in 2, if not 1? This was debated when we implemented enums. As between 1,2 and 4 there is often not much to choose, as alignment padding makes it pretty much the same. But any of them are more efficient than storing a numeric value or the label itself. Anyway, it might well be moot. cheers andrew Something I don't understand in all this is: why can't the type of an enum be determined statically rather than stored in every single value? For instance, if we have: CREATE TYPE number AS ENUM ('one', 'two', 'three'); CREATE TYPE color AS ENUM ('red', 'green', 'blue'); PostgreSQL won't allow a comparison between two different enum types, e.g.: SELECT 'one'::number = 'red'::color; ERROR: operator does not exist: number = color However, when we say: SELECT 'one'::number = 'one'::number Couldn't enum_eq just use get_fn_expr_argtype to determine the type of enum input rather than rely on it being stored in the value (either implicitly via OID or explicitly as a word half)? Also, I can't seem to find the original debates from when enums were implemented. Does anyone have a link to that thread in the archives? Thanks. Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should the JSON datatype be a specialization of text?
Currently, the JSON datatype (repository: http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary ) is implemented somewhat like a specialization of TEXT, like XML is. I'm beginning to question if this is the right way to go. This doesn't concern whether the JSON datatype should retain the given content verbatim (e.g. ' string '::JSON doesn't strip spaces) or whether it should be internally stored using varlena (the same way TEXT is stored). What I'm talking about revolves around two fundamental approaches to the API design: A. JSON as a specialization of TEXT. json('string')::TEXT = 'string'. To unwrap it, you need a special function: from_json('string')::TEXT . B. JSON as a naked type. json('string')::TEXT = 'string'. To unwrap it, simply cast to the desired type. Early in the discussion of the JSON datatype proposal, we leaned in favor of approach A (see http://archives.postgresql.org/pgsql-hackers/2010-04/msg00263.php ). However, based on input I've received (mainly questions about why from_json and to_json exist), I'm beginning to think that while approach A makes more sense from an implementor's perspective, approach B makes a heck of a lot more sense to users. Although my code currently implements approach A, I am in favor of approach B. Arguments I can think of in favor of approach A (keeping JSON as a specialization of TEXT): * No surprises when casting between JSON and TEXT. If approach B is used, 'string'::json would be 'string', but 'string'::json::text would be 'string'. * 'null'::json and NULL are distinct. 'null'::json is just a string containing 'null' and won't ever become NULL unless you explicitly pass it through from_json. Also, if I'm not mistaken, input functions can't yield null when given non-null input (see the above link). * For users who just want to store some JSON-encoded text in a database for a while, approach A probably makes more sense. * Is consistent with the XML datatype. Arguments in favor of approach B (making JSON a naked data type): * Makes data more accessible. Just cast to the type you need, just like any other data type. No need to remember to_json and from_json (though these function names might be used for functions to convert JSON-formatted TEXT to/from the JSON datatype). * Is consistent with other programming languages. When you json_decode something in PHP, you get an object or an array. When you paste a JSON literal into JavaScript code, you end up with a native type, not some object you have to convert down to a native type. Notice how in these programming languages, you typically carry verbatim JSON texts around as strings, not a special string type that performs validation. * JSON was meant to be a format representing types in a programming language. It has arrays, objects, strings, true, false and null because JavaScript and many other popular scripting languages have those. * Users tend to care more about the underlying data in JSON values than the notion of JSON-formatted text (though users care about that too). If a user really wants to treat JSON like text, why not just use TEXT along with CHECK (json_validate(content)) ? Granted, it's not as fun :-) One workaround to the null problem of approach B might be to throw an error when 'null' is passed to the input function (as in, don't allow the JSON type to even hold 'null' (though null can be nested within an array/object)), and have a function for converting text to JSON that returns NULL if 'null' is given. Note that I am strongly against only allowing the JSON type to hold objects and arrays, in particular because it would break being able to extract non-compound values from JSON trees (e.g. json_get('[0,1,2,3]', '[2]') ). By the way, how hard would it be to get 'null'::JSON to yield NULL? Keep in mind there's a chance someone will standardize JSON/SQL in the future, so more may be at stake here than just PostgreSQL's codebase and users. Although text versus naked is a fundamental design aspect of the JSON datatype, it shouldn't be a blocker for me moving forward with the project. Most of the code in place and in the works shouldn't be affected by a transition from approach A to B. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore == and deprecate =
On Fri, Jun 11, 2010 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: How about no operator at all? It won't be as cool to read, but consider, the arguments are text and text, not involving any hstore type at all, so whatever operator you choose is in practice blocked from everyone everywhere. No one could ever implement another key/value system that uses a similar interface. Or use the operator for anything else involving strings. Yeah, that's a good point. Maybe we should just deprecate the operator altogether. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers To repeat an earlier question (which was in turn repeating an earlier question), would it be possible to do one of these, yielding ' key=this, key2=that '::hstore : hstore(key := 'this', key2 := 'that') hstore(key = 'this', key2 = 'that') hstore(row('this' AS key, 'that' AS key2)) In my opinion, it's important to find a reasonably elegant way to express hstore literals (rather than something like hstore(hstore_kvp('key', 'this'), hstore_kvp('key2', 'that')) ) because something so basic shouldn't be so difficult to work with. It'd be a bonus if there was a general form for arbitrary named parameters that new functions could opt-in on (in particular, json_object :-) ). This type of function could be created by saying something like: CREATE FUNCTION hstore(NAMED) RETURNS hstore AS ... -- NAMED would be a new argtype Also, if a user needs to name a field from a variable, perhaps there should be a syntax that allows it, such as: hstore(pg_deref(var) = 'this', pg_deref(var2) = 'that') If implementing pg_deref would be problematic in general, perhaps it could initially just be supported in keys of functions like this. Only if we allow for functions with named parameters in this fashion would I say +1 for deprecating the = operator from hstore without a replacement operator like == . Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
On Fri, May 28, 2010 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, May 28, 2010 at 12:07 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I learned that to return an enum value from C, one needs to return the OID of the right row of the pg_enum table. I eventually managed to write the code below, which is mostly based on the enum_in function in src/backend/utils/adt/enum.c . PG_RETURN macros shouldn't do any nontrivial processing (see the existing ones for references). Yeah, that was my first reaction too. If we don't already have one, it would be appropriate to provide a lookup enum value function (functionally about the same as enum_in, but designed to be called conveniently from C). Then, if you needed to work from a textual enum label, you'd call that function and then PG_RETURN_OID. Here is the function I wrote to look up enum values: Oid enum_label_to_oid(const char *typname, const char *label) { Oid enumtypoid; HeapTuple tup; Oid ret; enumtypoid = TypenameGetTypid(typname); Assert(OidIsValid(enumtypoid)); tup = SearchSysCache2(ENUMTYPOIDNAME, ObjectIdGetDatum(enumtypoid), CStringGetDatum(label)); Assert(HeapTupleIsValid(tup)); ret = HeapTupleGetOid(tup); ReleaseSysCache(tup); return ret; } If this were added to PostgreSQL proper, what source/header files would it make sense to put it in? enum.c/builtins.h ? However, for a built-in enum type, I agree with Robert's solution of just #define-ing fixed OIDs for the values of the type. I don't know if the JSON datatype will be a contrib module or built-in yet, but if it were contrib, would it still be better to use fixed OIDs anyway? One issue with setting this precedent is that new contrib modules using enums wouldn't be compatible with older versions of PostgreSQL. One idea might be to insert extend CREATE TYPE name AS ENUM so OIDs can be specified explicitly, but then that could lead to OID clashes. That would be a really messy problem for users. By the way, just curious: why can't the internal representation of an enum just be an INT starting from 0 by default, like in C? That would make a heck of a lot more sense, in my opinion. It might also allow users to do things like this in the future: CREATE TYPE number AS ENUM ('one' = 1, 'two' = 2, 'three' = 3); Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On Wed, May 26, 2010 at 9:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we go with the spec's syntax I think we'd have no realistic choice except to forbid = altogether as an operator name. (And no, I'm not for that.) I suppose the most painful thing about doing that is that it would break hstore. Are there other commonly-used modules that rely on = as an operator name? There don't seem to be any other contrib modules that define = as an operator name, but I'm not sure what's out there on pgfoundry or elsewhere. The bigger issue to me is not so much hstore itself as that this is an awfully attractive operator name for anything container-ish. Wasn't the JSON-datatype proposal using = for an operator at one stage? (The current wiki page for it doesn't seem to reflect any such idea, though.) And I think I remember Oleg Teodor proposing such an operator in conjunction with some GIN-related idea or other. In spite of the difficulties, I'm reluctant to give up on it. I always thought that the AS syntax was a crock and I'm not eager to invent another crock to replace it. Being compatible with the SQL standard and with Oracle is not to be taken lightly. Yeah, I know. Though this could end up being one of the bits of the spec that we politely decline to follow, like upper-casing identifiers. Still, it's a good idea to think again before we've set the release in stone ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers I didn't really consider using = for json because it would interfere with hstore (one of the signatures is text = text returns hstore, for instance). I am considering using - as a json subscript operator (which is what hstore does) as it shouldn't interfere with hstore (as far as I know). Here's a thought: suppose we did use the foo (name = value) syntax for naming parameters. It could still be used in a very similar way for hstore: hstore(key1 = 'value1', key2 = 'value2') One advantage here is that = wouldn't be exclusive to hstore anymore. E.g.: json(key1 = 'value1', key2 = 'value2') However, note that the left hand of = is an identifier here, whereas the left hand of hstore's current = operator is either text, text[], or hstore. If I had to choose between = and := for parameter naming, I'd go with := because it seems more SQLish to me. I wonder if the foo (name : value) syntax would be possible/desirable. Or maybe foo ({name: value}) :-) Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On Fri, Jun 4, 2010 at 9:55 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: If I had to choose between = and := for parameter naming, I'd go with := because it seems more SQLish to me. On second thought, = might actually be a very intuitive syntax for defining dictionary types like hstore and json, since it matches PHP's associative array syntax, as in: hstore('key1' = 'value1', 'key2' = 'value2') -- hypothetical SQL array('key1' = 'value1', 'key2' = 'value2') // PHP That way, when people see =, they can think dictionary whether they're in PHP or SQL. Note that this is a bit different than what I suggested earlier: hstore(key1 = 'value1', key2 = 'value2') Identifier names were used instead of literal names, which conflicts with the other approach. Also, the other approach is more flexible, as the user can generate names with expressions at runtime. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
I've started implementing the JSON datatype; the repo is at http://git.postgresql.org/gitweb?p=json-datatype.git . On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 13, 2010 at 9:47 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Would it be a bad idea to give an enum and a function the same name (which appears to be allowed by PostgreSQL) ? If so, json_type(json) could be json_typeof(json) or something instead. No, I think that's a fine idea. I tried making a function named json_type that has the same name as the type json_type. However, this doesn't work as expected: SELECT json_type('[1,2,3]'); Instead of calling json_type with '[1,2,3]' casted to JSON, it's trying to cast '[1,2,3]' to json_type. Is there a way to override this behavior, or would I be better off renaming the function? Note that if the function were renamed, the literal would implicitly be json: SELECT json_typeof('[1,2,3]'); -- works I tried this: CREATE OR REPLACE FUNCTION json_type(json) RETURNS json_type AS 'MODULE_PATHNAME','json_get_type' LANGUAGE C STRICT IMMUTABLE; CREATE CAST (json AS json_type) WITH FUNCTION json_type(json); However, json_type('[1,2,3]') still doesn't work (it doesn't infer that '[1,2,3]' should be casted to json first). I also tried each of AS ASSIGNMENT and AS IMPLICIT as well. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Working with PostgreSQL enums in C code
I encountered a situation while implementing JSON support where I needed to return an enum value from a C function. To clarify, here's the SQL: CREATE TYPE json_type_t AS ENUM ('null', 'string', 'number', 'bool', 'object', 'array'); CREATE OR REPLACE FUNCTION json_type(json) RETURNS json_type_t AS 'MODULE_PATHNAME','json_get_type' LANGUAGE C STRICT IMMUTABLE; I initially tried looking for another function returning an enum in the PostgreSQL source tree, but I couldn't find any. I guess this is because enums are a relatively new feature in PostgreSQL. I learned that to return an enum value from C, one needs to return the OID of the right row of the pg_enum table. I eventually managed to write the code below, which is mostly based on the enum_in function in src/backend/utils/adt/enum.c . #define PG_RETURN_ENUM(typname, label) return enumLabelToOid(typname, label) static Oid enumLabelToOid(const char *typname, const char *label) { Oid enumtypoid; HeapTuple tup; Oid ret; enumtypoid = TypenameGetTypid(typname); Assert(OidIsValid(enumtypoid)); tup = SearchSysCache2(ENUMTYPOIDNAME, ObjectIdGetDatum(enumtypoid), CStringGetDatum(label)); Assert(HeapTupleIsValid(tup)); ret = HeapTupleGetOid(tup); ReleaseSysCache(tup); return ret; } Feel free to nitpick the code above, as I'm still learning. Note that I replaced the more robust validity checks of enum_in with (quicker?) asserts, with the assumption that correct programs would only pass valid values to PG_RETURN_ENUM . The code using the method above can be found here: http://git.postgresql.org/gitweb?p=json-datatype.git;a=tree;f=contrib/json;h=1dd813da4016b31f35cb39b01c6d5f0999da672e;hb=092fa046f95580dd7906a07370ca401692a1f818 . My testcases passed, so everything seems to work. I suppose my PG_RETURN_ENUM macro is nice and simple, except for the fact that the coder has to keep an enum names table in sync with the SQL code and the C code. However, going the other way around (PG_GETARG_ENUM) would need access to that enum names table. Hence, it'd make sense to have macros for defining this table so both PG_RETURN_ENUM and PG_GETARG_ENUM can reference it. I believe that these macros would be a useful addition to the PostgreSQL function manager API, as they would provide a decent way to receive and return custom enums from C code. Anyone agree/disagree? Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
I started a wiki article for brainstorming the JSON API: http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made substantial changes to the draft of the API based on discussion here and on the #postgresql IRC channel. Is it alright to use the wiki for brainstorming, or should it stay on the mailing list or go somewhere else? I'll try not to spend too much time quibbling over the specifics as I tend to do. While the brainstorming is going on, I plan to start implementing the datatype by itself so I can establish an initial working codebase. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
Well, I think it's fine to use the wiki for brainstorming, but before you change the design you probably need to talk about it here. You can't rely on everyone on -hackers to follow changes on a wiki page somewhere. It looks like the API has been overhauled pretty heavily since the last version we talked about here, and I'm not sure I understand it. I'll try to explain it in one big nutshell: Instead of, for instance, json_to_number('5') and number_to_json(5), I propose changing it to from_json(5)::INT and to_json('5'). Note how from_json simply returns TEXT containing the underlying value for the user to cast. I plan to make calling to_json/from_json with arrays or objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw an error for now, as implementing all the specifics of this could be quite distracting. If I'm not mistaken, json_object([content [AS name] [, ...]] | *) RETURNS json can't be implemented without augmenting the grammar (as was done with xmlforest), so I considered making it take a RECORD parameter like the hstore(RECORD) function does, as was suggested on IRC. However, this may be inadequate for selecting some columns but not others. Using examples from hstore: SELECT hstore(foo) FROM foo; = 'e=2.71828, pi=3.14159' -- this works, but what if we only want one field? SELECT hstore(pi) FROM foo; -- function type error SELECT hstore(row(pi)) FROM foo; = 'f1=3.14159' -- field name is lost SELECT hstore(bar) FROM (select pi FROM foo) AS bar; = 'f1=3.14159' -- ugly, and field name is *still* lost To get (and set, which I overlooked before), use json_get and json_set. These take JSONPath expressions, but I don't plan to implement all sorts of fancy features during the summer. However, I do plan to support some kind of parameter substitution so you can do this: json_get('[0,1,4,9,16,25]', '[%]' %% 2)= '4'::TEXT For this use case, though, it would be simpler to say: '[0,1,4,9,16,25]'::JSON - 2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Hiding data in postgresql
On Tue, May 25, 2010 at 3:39 PM, Hector Beyers hqbey...@gmail.com wrote: Hi guys, (I tried the question in another forum first) Does someone have any ideas how I can hide data without the meta data noticing? To explain further, I would like to save some collection of data where the meta-data does not see it. I am trying to do some security through obscurity. It is for research purposes. For example, populate a table with 1000 rows, but the meta-data only knows of about 500 of them? Only on an export of a dump can you find all the data again. Or maybe to make a hidden duplicate schema that can point to the hidden data? Does someone have any good ideas on how to achieve this or something similar? Kind regards Hector On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote: Hi guys, does ANYONE have any tips on hiding data on a database server? This means that data is stored in places that is not necessarily picked up in the schema of the database. I am doing some research on databases and need some direction. Any help or direction will be highly appreciated. Kind regards Hector Not sure if this helpful, but be sure to know about views, which can be used to filter out rows of a table. Example: CREATE TABLE foo (name TEXT, visible BOOL); INSERT INTO foo VALUES ('two', true); INSERT INTO foo VALUES ('three', true); INSERT INTO foo VALUES ('four', false); INSERT INTO foo VALUES ('five', true); INSERT INTO foo VALUES ('six', false); INSERT INTO foo VALUES ('seven', true); INSERT INTO foo VALUES ('eight', false); INSERT INTO foo VALUES ('nine', false); INSERT INTO foo VALUES ('ten', false); INSERT INTO foo VALUES ('eleven', true); CREATE VIEW foo_view AS SELECT foo.name FROM foo WHERE visible=true; = SELECT * FROM foo; name | visible +- two| t three | t four | f five | t six| f seven | t eight | f nine | f ten| f eleven | t (10 rows) = SELECT * FROM foo_view; name two three five seven eleven (5 rows) Note that views are SELECT-only, but you can use CREATE RULE to simulate an updatable view. You may also want to read about Veil: http://veil.projects.postgresql.org/curdocs/main.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
On Fri, May 14, 2010 at 11:33 AM, Bruce Momjian br...@momjian.us wrote: Joseph Adams wrote: == array/object conversion == The json_object function converts a tuple to a JSON object. If there are duplicate column names, there will be duplicate keys in the resulting JSON object. json_object([content [AS name] [, ...]]) returns json Likewise, the json_array function converts a tuple to a JSON array. Column names are ignored. json_array([content [AS name] [, ...]]) returns json Do you see any problems with the fact that JSON arrays can use mixed data types, e.g.: [ 1, 2, 'hi', false] I suppose the json_object and json_array functions would determine which JSON types to employ by looking at the types of arguments given (TEXT values would become strings, INT/FLOAT/NUMERIC/etc. values would become numbers, TRUE/FALSE would become true/false, NULLS would just be null, and JSON values would just be inserted as themselves). Note that json_array('Hello'::TEXT) would yield '[\Hello\]'::JSON, while json_array('Hello'::JSON) would yield '[Hello]' . Going the other way around, values pulled out of JSON objects and arrays would just be of type JSON. This (revised) function signature says it all: json_values(JSON) returns JSON[] In short, I don't believe mixed data types in arrays will be a problem. json_to_* and *_to_json functions would be used for individual conversions. On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote: json_keys gets the keys of a JSON object as a set. json_keys(json) returns setof text I would tend to make this return text[] rather than SETOF text. json_values gets the values of a JSON object or the iems of a JSON array as a set. json_values(json) returns setof json Similarly I would make this return json[]. Agreed. For those who want sets, the unnest() function can be used. - retrieves an item of a JSON object by key. [snip] [] retrieves a value of a JSON array/object by (one-based) index. [snip] I think some kind of array deference and object deference mechanism is absolutely, positively 100% required. I don't know whether the particular syntax you've proposed here is best or whether we should pick another syntax or just use function notation, but I think we definitely need *something*. If the dereferencing operations aren't available, one could work around it by using json_keys/json_values. Of course, it would be a really clunky solution, and implementing - will probably be easy compared to implementing those functions. I also think we need a function called something like json_length() which returns the length of a list or the number of keys in an object. Definitely. By the way, I'm considering making it so JSON arrays will be treated like objects when it comes to - and the json_keys function. Thus, json_keys('[1,4,9,16,25]') would yield '{1,2,3,4,5}', and ('[1,4,9,16,25]'::JSON) - 3 would yield the third item. This would obviate the need for an array-only subscript function/operator. In general, I prefer zero-based counting, but because PostgreSQL's array indexing is one-based, one-based array keys would be better for the sake of consistency. Note that if there was a function like this in the future: -- Access a JSON object like you would in JavaScript json_path('{squares: [1,4,9,16,25]}', '.squares[2]') There could be confusion, as JavaScript uses zero-based indexing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] JSON manipulation functions
First off, thank you for allowing me to participate in Google Summer of Code 2010. I'm sorry I haven't been active for the past few weeks. Today, I added the wiki page for my project, but the project schedule is highly tentative: http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010 . I'd like to discuss how the functions for type checking and conversion (the majority of my project) should work. Below is my first draft for the JSON manipulation function specs, along with annotations. == Type checking == The following function returns the type of any JSON value. json_type as enum ('null', 'string', 'number', 'boolean', 'object', 'array') json_type(json) returns json_type Would it be a bad idea to give an enum and a function the same name (which appears to be allowed by PostgreSQL) ? If so, json_type(json) could be json_typeof(json) or something instead. I thought about having predicates like IS STRING and IS NUMBER, similar to the IS DOCUMENT predicate used for XML. However, a major problem with that approach is that it could lead to confusion involving IS NULL. By my understanding, the JSON datatype will just be a specialization of TEXT (it just validates the input). Like TEXT, a JSON string can be 'null'. 'null'::JSON is not NULL. Bear in mind that json_to_*('null') is NULL, though. I also thought about having a series of json_is_* functions. I don't think it's a bad idea, but I think json_type is a better solution. == text/number/boolean conversion == These functions each convert a non-compound JSON value to its respective return type. Run-time type checking is performed; a conversion will throw an error if the input JSON is not the correct type. If the JSON value is 'null', then the return value will be NULL. json_to_text(json) returns text json_to_number(json) returns numeric json_to_bool(json) returns boolean These functions convert values to JSON. Passing NULL to any of the functions below will return 'null': text_to_json(text) returns json number_to_json(numeric) returns json bool_to_json(boolean) returns json There could be generic value_to_json(any), but not a json_to_value(json) function. See http://archives.postgresql.org/pgsql-hackers/2010-04/msg00321.php for more details. Conversion to/from number or boolean can also be achieved with casting. Note well that 'string'::JSON::TEXT is 'string', not the string's actual value. json_to_text is needed for this conversion. For this reason, casting JSON might seem like something to recommend against. However, IMHO, casting numbers and booleans to/from JSON is fine and dandy; the paragraphs below give some weight to this. I originally considered making json_to_number and number_to_json work with TEXT instead of NUMERIC. However, as Tom Lane pointed out in the above link, Forcing people to insert explicit coercions from text isn't going to be particularly convenient to use.. Nevertheless, NUMERIC introduces a problem. For instance, if you say: SELECT '-1e-38'::NUMERIC; This conversion knocks out the scientific notation and produces a 41-character string. I seriously doubt that all outside applications will handle 41-character numbers correctly. Perhaps there should be individual functions for specific data types, or maybe just a handful for particular cases. There might be json_to_int, json_to_float, and json_to_numeric. In any case, converting to/from number types can be achieved quite easily with casting. == array/object conversion == The json_object function converts a tuple to a JSON object. If there are duplicate column names, there will be duplicate keys in the resulting JSON object. json_object([content [AS name] [, ...]]) returns json Likewise, the json_array function converts a tuple to a JSON array. Column names are ignored. json_array([content [AS name] [, ...]]) returns json The json_agg function reduces a set of JSON values to a single array containing those values. aggregate json_agg(json) returns json json_object and json_agg can be used together to convert an entire result set to one JSON array: SELECT json_agg(json_object(*)) FROM tablename; json_keys gets the keys of a JSON object as a set. json_keys(json) returns setof text json_values gets the values of a JSON object or the iems of a JSON array as a set. json_values(json) returns setof json Note that all JSON slicing and splicing operations retain the original formatting of JSON content. == Miscellaneous == The features below would be nice, but will probably not be regarded as required for this Google Summer of Code project to be considered complete. json_cleanup accepts a superset of JSON and, if it can, cleans it up and returns a valid JSON string. This superset of JSON supports the following extra features: * Comments: - Single-line comments with // and # - C-style comments: /* comment */ * Unquoted object keys: {key: value} * Single quote strings: 'single quotes; double quotes do not need to be escaped here'
[HACKERS] SELECT * in a CREATE VIEW statement doesn't update column set automatically
This isn't exactly a bug, but it could be considered unintuitive behavior. Consider this: CREATE VIEW foo AS SELECT * FROM a; CREATE VIEW foo_v AS SELECT * FROM foo; ALTER TABLE foo ADD COLUMN b INT; The ALTER TABLE statement affects VIEW foo, but the column addition does not propagate to VIEW foo_v. Thus, it makes this deceptive: ... AS SELECT * FROM foo; I ran into this with an application where a real table is accessed if the user is an admin, while regular users access a view instead. I considered AS SELECT * FROM foo to be a promise that all columns from foo would be included in the view, but the promise is broken when ADD COLUMN is applied later on. Would it be a desirable feature to make `CREATE VIEW foo_v AS SELECT * FROM foo;` automatically update the column set when foo's columns change? Instead of the wildcard * being expanded once at CREATE VIEW time, it would (semantically) be expanded every time foo_v is selected on. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Virtual Private Database
On Sat, Apr 10, 2010 at 10:00 AM, Jean-Gérard Pailloncy j...@rilk.com wrote: Hello, 1) VPD: Virtual Private Database I would appreciate to have a new feature in PostgreSQL. This is an oracle-like feature that implement Row Level Security. This feature may be emulated by using VIEW/RULE but this is very time consuming and error prone. I would appreciated to have an estimated of the faisability and the cost to implement it. 2) Description The feature may be implemented with a simple expression associated to the table. ALTER TABLE table_name ADD FILTER filter_name CHECK(expression); ALTER TABLE table_name DROP FILTER filter_name; Usage/example: ALTER TABLE filtered_table ADD FILTER tf_username CHECK(filtered_table.creator=user) SELECT * FROM filtered_table; will really do SELECT * FROM filtered_table WHERE filtered_table.creator=user; Same thing for INSERT, UDPATE, and DELETE UPDATE filtered_table SET b_column=1 WHERE a_column='a'; wille really do UPDATE filtered_table SET b_column=1 WHERE a_column='a' and filtered_table.creator=user; In practice, the devs will create few function: my_login, my_logout, my_filter and the simple filtered_table.creator=user will be replace by ACL encapsulated in the function my_filter and add a triger to check data on INSERT, UDPATE. We could use veil to build a very efficient filter. 3) Question - Is it doable ? - Is it the sound way of doing it ? - Is it possible to have it in core ? - Is there a pgsql dev interested to implemented it ? - Is there other people interested in such feature ? - How much this will cost ? - With which delay ? Cordialement, Jean-Gérard Pailloncy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers This is quite similar to an idea I posted about not long ago called access control jails ( http://archives.postgresql.org/pgsql-hackers/2010-03/msg00832.php ). I encountered this very problem writing a simple web application involving access control. There are a variety of ways to make implementing access control easier, and I think copying off of Oracle would be among the best ;-) Disclaimer: I am not a PostgreSQL hacker, but a newbie with some experience in other communities, absorbing what he can. Access control jailing, as I conceived it, would not simply filter per table, but would make it so all queries would be filtered. If used correctly, it would even be safe to execute untrusted SQL (though it might not be advisable). I looked at the Veil demo application a tiny bit, and the only thing I drew from it was the impression that it makes things more complicated, not less :( Then again, it may just be the example demonstrating a lot of different features at once. One problem that ought to be addressed for any of these ideas is how to do connection-local variables. For instance: UPDATE filtered_table SET b_column=1 WHERE a_column='a'; wille really do UPDATE filtered_table SET b_column=1 WHERE a_column='a' and filtered_table.creator=user; Here, what is =user referring to? I suppose it is a variable that is set not long after the session starts and only applies to that session? PostgreSQL has temporary tables and such, but you can't reference them until they're already created. Hence, I don't think PostgreSQL elegantly supports free variables that are bound temporarily per connection. There are GUCs and such, but using them for this purpose is far from elegant, if I understand correctly. Another problem is that session-local context doesn't go well with connection pooling, so you might need some workaround like passing context IDs back and forth. That's my own summary of the discussion about access control jails linked above. By the way, here's a hack to bind a free variable to a session: CREATE FUNCTION get_user_id() RETURNS INT AS $$ DECLARE ret INT; BEGIN SELECT INTO ret id FROM user_id_tbl; RETURN ret; END $$ LANGUAGE 'plpgsql'; Then, per-session: CREATE TEMPORARY TABLE user_id_tbl (id INT); INSERT INTO user_id_tbl VALUES (5); SELECT get_user_id(); It relies on plpgsql not complaining about user_id_tbl not existing at creation time. What this trick allows one to do is set the user ID once (e.g. after connecting), then views and such that call get_user_id() will have the appropriate user ID without needing to specify it per-query. I'm curious: is this trick a good idea? Does connection pooling play well with temporary tables (and thus this trick)? Could it result in substantial slowdowns (I don't see why it should, since get_user_id() needs to be called once per query that uses it)? I guess creating a temporary table every connection has the potential to be slow. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Gsoc XQuery
On Fri, Apr 9, 2010 at 5:02 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Necati Batur escribió: *Delivarables on Timeline* ... *Implementation of a GUI for XQuery for the PostgreSql users (2 week) What, you will *also* implement a GUI? Gimme a break. There's no way you can achieve most of these in 1-2 weeks. Are you Tom Lane perchance? Indeed. I've learned it's not a good idea to throw a GUI into a proposal when the rest of the project is already substantial, as it can lead to embarrassment when it doesn't get realized ;-) I did that last year. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC PostgreSQL partitioning issue
On Fri, Apr 9, 2010 at 4:08 PM, Dimitri Fontaine dfonta...@hi-media.com wrote: I guess a GSoC of reasonable size would be to define a spec for how to implement partitioning in PostgreSQL with a sound and accepted proposal on independent steps to contribute separately, in order to reach the full implementation in an incremental fashion and by different hackers. Then you could pick up one of those items. By then I mean after the summary and the plan both have been accepted by core people and by contributors who said in the past they wanted to spend precious hours on the topic. But I don't know if a GSoC can be completed without even coding. According to the link below, GSoC proposals for documentation aren't accepted. This probably extends to other non-coding work as well. http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#doc_proposals -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Petr Jelinek pjmo...@pjmodos.net writes: Dne 6.4.2010 7:57, Joseph Adams napsal(a): To me, the most logical approach is to do the obvious thing: make JSON's 'null' be SQL's NULL. For instance, SELECTing on a table with NULLs in it and converting the result set to JSON would yield a structure with 'null's in it. 'null'::JSON would yield NULL. I'm not sure what startling results would come of this approach, but I'm guessing this would be most intuitive and useful. +1 I think it's a pretty bad idea for 'null'::JSON to yield NULL. AFAIR there is no other standard datatype for which the input converter can yield NULL from a non-null input string, and I'm not even sure that the InputFunctionCall protocol allows it. (In fact a quick look indicates that it doesn't...) To me, what this throws into question is not so much whether JSON null should equate to SQL NULL (it should), but whether it's sane to accept atomic values. If I understood the beginning of this discussion, that's not strictly legal. I think it would be better for strict input mode to reject this, and permissive mode to convert it to a non-atomic value. Thus jsonify('null') wouldn't yield NULL but a structure containing a null. regards, tom lane Actually, I kind of made a zany mistake here. If 'null'::JSON yielded NULL, that would mean some type of automatic conversion was going on. Likewise, '3.14159'::JSON shouldn't magically turn into a FLOAT. I think the JSON datatype should behave more like TEXT. 'null'::JSON would yield a JSON fragment containing 'null'. 'null'::JSON::TEXT would yield the literal text 'null'. However, '3.14159'::JSON::FLOAT should probably not be allowed as a precaution, as 'hello'::JSON::TEXT would yield 'hello', not 'hello'. In other words, casting to the target type directly isn't the same as parsing JSON and extracting a value. Perhaps there could be conversion functions. E.g.: json_to_string('hello') yields 'hello' json_to_number('3.14159') yields '3.14159' as text (it is up to the user to cast it to the number type s/he wants) json_to_bool('true') yields TRUE json_to_null('null') yields NULL, json_null('nonsense') fails string_to_json('hello') yields 'hello' as JSON number_to_json(3.14159) yields '3.14159' as JSON bool_to_json(TRUE) yields 'true' as JSON null_to_json(NULL) yields 'null' as JSON (kinda useless) I wonder if these could all be reduced to two generic functions, like json_to_value and value_to_json. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
Another JSON strictness issue: the JSON standard ( http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only be an array or object. However, my implementation currently accepts any valid value. Thus, '3', 'hello', 'true', 'false', and 'null' are all accepted by my implementation, but are not strictly JSON text. The question is: should the JSON datatype accept atomic values (those that aren't arrays or objects) as valid JSON? I tried a few other JSON implementations to see where they stand regarding atomic types as input: JSON_checker (C) does not accept them. JSON.parse() (JavaScript) accepts them. json_decode() (PHP) accepts them. However, support is currently buggy (e.g. '1' is accepted, but '1 ' is not). cJSON (C) accepts them. JSON.pm (Perl) accepts them if you specify the allow_nonref option. Otherwise, it accepts 'true' and 'false', but not 'null', a number, or a string by itself. In my opinion, we should accept an atomic value as valid JSON content. I suppose we could get away with calling it a content fragment as is done with XML without a doctype. Accepting atomic values as valid JSON would be more orthagonal, as it would be possible to have a function like this: json_values(object_or_array JSON) RETURNS SETOF JSON -- extracts values from an object or members from an array, returning them as JSON fragments. Also, should we go even further and accept key:value pairs by themselves? : 'key:value'::JSON I don't think we should because doing so would be rather zany. It would mean JSON content could be invalid in value context, as in: // JavaScript var content = key : value; I improved my JSON library. It now only accepts strict, UTF-8 encoded JSON values (that is, objects, arrays, strings, numbers, true, false, and null). It also has a json_decode_liberal() function that accepts a string, cleans it up, and passes it through the stricter json_decode(). json_decode_liberal() filters out comments, allows single quoted strings, and accepts a lax number format compared to strict JSON. I may add Unicode repair to it later on, but implementing that well really depends on what type of Unicode errors appear in real life, I think. http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2 My json.c is now 1161 lines long, so I can't quite call it small anymore. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
I've been wondering whether the JSON datatype should be strict or conservative. For one, there's strict JSON (following the exact specification). Then there's more conservative JSON variants. Some JSON parsers support comments, some support invalid number formats (e.g. '3.' or '+5'), etc.. The consensus seems to be that JSON content should be stored verbatim (it should store the exact string the client sent to it), as is done with XML. However, this notion is somewhat incompatible with Be conservative in what you do; be liberal in what you accept from others because we can't accept loose JSON, then spit out conservative JSON without messing with the content. Here's my idea: the datatype should only allow strict JSON, but there should be a function that accepts a liberal format, cleans it up to make it strict JSON, and converts it to JSON. I think making strict JSON the default makes the most sense because: * Inputs to the database will most likely be coming from programs, not humans. * Output is expected to be valid JSON and work anywhere JSON should work. * Strict JSON is what more people would expect, I'd think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
I ended up reinventing the wheel and writing another JSON library: http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2 This is a first release, and it doesn't really have a name besides json. It's very similar to cJSON, except it is (sans unknown bugs) more reliable, more correct, and cleaner (unless you hate gotos ;-) ). It has a simple test suite. It is not prone to stack overflows, as it doesn't recurse. It is strict, requires input to be UTF-8 (it validates it first) and only outputs UTF-8. Other than treating numbers liberally, my implementation only accepts valid JSON code (it doesn't try to correct anything, even Unicode problems). It is under the MIT license. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
On Mon, Mar 29, 2010 at 2:23 PM, David E. Wheeler da...@kineticode.com wrote: On Mar 29, 2010, at 9:02 AM, Tom Lane wrote: If this is going to end up being one fairly small C file implementing a spec that is not a moving target, I'd vote against depending on an external library instead, no matter how spiffy and license-compatible the external library might be. Perhaps you could fork one, in that case. Best, David I'm considering using and adapting cJSON instead of continuing with my redundant implementation. I could run `indent -kr -i4` on it (will that match PostgreSQL's coding style?), add support for UTF-16 surrogate pairs (pair of \u... escapes for each character above U+ as required by the JSON spec), and add a switch to turn on/off pure ASCII output. P.S.: Sorry for the repeat, David. I forgot to CC the mailing list. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: Add JSON support
I introduced myself in the thread Proposal: access control jails (and introduction as aspiring GSoC student), and we discussed jails and session-local variables. But, as Robert Haas suggested, implementing variable support in the backend would probably be way too ambitious a project for a newbie like me. I decided instead to pursue the task of adding JSON support to PostgreSQL, hence the new thread. I plan to reference datatype-xml.html and functions-xml.html in some design decisions, but there are some things that apply to XML that don't apply to JSON and vice versa. For instance, jsoncomment wouldn't make sense because (standard) JSON doesn't have comments. For access, we might have something like json_get('foo[1].bar') and json_set('foo[1].bar', 'hello'). jsonforest and jsonagg would be beautiful. For mapping, jsonforest/jsonagg could be used to build a JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2, ...)) FROM tbl), but I'm not sure on the best way to go the other way around (generate a result set from JSON). CSS-style selectors would be cool, but selecting is what SQL is all about, and I'm not sure having a json_select(dom-element[key=value]) function is a good, orthogonal approach. I'm wondering whether the internal representation of JSON should be plain JSON text, or some binary code that's easier to traverse and whatnot. For the sake of code size, just keeping it in text is probably best. Now my thoughts and opinions on the JSON parsing/unparsing itself: It should be built-in, rather than relying on an external library (like XML does). Priorities of the JSON implementation, in descending order, are: * Small * Correct * Fast Moreover, JSON operations shall not crash due to stack overflows. I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen any JSON implementations out there that use it anyway). I would probably end up writing the JSON parser/serializer manually. It should not take more than a week. As far as character encodings, I'd rather keep that out of the JSON parsing/serializing code itself and assume UTF-8. Wherever I'm wrong, I'll just throw encode/decode/validate operations at it. Thoughts? Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Add JSON support
On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Now my thoughts and opinions on the JSON parsing/unparsing itself: It should be built-in, rather than relying on an external library (like XML does). Why? I'm not saying you aren't right, but you need to make an argument rather than an assertion. This is a community, so no one is entitled to decide anything unilaterally, and people want to be convinced - including me. I apologize; I was just starting the conversation with some of my ideas to receive feedback. I didn't want people to have to wade through too many I thinks . I'll be sure to use opinion tags in the future :-) My reasoning for It should be built-in is: * It would be nice to have a built-in serialization format that's available by default. * It might be a little faster because it doesn't have to link to an external library. * The code to interface between JSON logic and PostgreSQL will probably be much larger than the actual JSON encoding/decoding itself. * The externally-maintained and packaged libjson implementations I saw brought in lots of dependencies (e.g. glib). * Everyone else (e.g. PHP) uses a statically-linked JSON implementation. Is the code in question *really* small? Well, not really, but it's not enormous either. By the way, I found a bug in PHP's JSON_parser (json_decode(true ); /* with a space */ returns null instead of true). I'll have to get around to reporting that. Now, assuming JSON support is built-in to PostgreSQL and is enabled by default, it is my opinion that encoding issues should not be dealt with in the JSON code itself, but that the JSON code itself should assume UTF-8. I think conversions should be done to/from UTF-8 before passing it through the JSON code because this would likely be the smallest way to implement it (not necessarily the fastest, though). Mike Rylander pointed out something wonderful, and that is that JSON code can be stored in plain old ASCII using \u... . If a target encoding supports all of Unicode, the JSON serializer could be told not to generate \u escapes. Otherwise, the \u escapes would be necessary. Thus, here's an example of how (in my opinion) character sets and such should be handled in the JSON code: Suppose the client's encoding is UTF-16, and the server's encoding is Latin-1. When JSON is stored to the database: 1. The client is responsible and sends a valid UTF-16 JSON string. 2. PostgreSQL checks to make sure it is valid UTF-16, then converts it to UTF-8. 3. The JSON code parses it (to ensure it's valid). 4. The JSON code unparses it (to get a representation without needless whitespace). It is given a flag indicating it should only output ASCII text. 5. The ASCII is stored in the server, since it is valid Latin-1. When JSON is retrieved from the database: 1. ASCII is retrieved from the server 2. If user needs to extract one or more fields, the JSON is parsed, and the fields are extracted. 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client. Note that I am being biased toward optimizing code size rather than speed. Here's a question about semantics: should converting JSON to text guarantee that Unicode will be \u escaped, or should it render actual Unicode whenever possible (when the client uses a Unicode-complete charset) ? As for reinventing the wheel, I'm in the process of writing yet another JSON implementation simply because I didn't find the other ones I looked at palatable. I am aiming for simple code, not fast code. I am using malloc for structures and realloc for strings/arrays rather than resorting to clever buffering tricks. Of course, I'll switch it over to palloc/repalloc before migrating it to PostgreSQL. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
On Fri, Mar 26, 2010 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote: Hmm. I'm not sure exactly what problem you're trying to solve here. I don't think this is a particularly good design for supporting variables inside the server, since, well, it doesn't actually support variables inside the server. If we just want a crude hack for allowing the appearance of session-local server-side variables, that could be implemented entirely in client code - in fact it could be done as a thin wrapper around libpq that just does textual substitution of the variables actually referenced by a particular query. That wouldn't require any modifications to core PostgreSQL at all, and it would probably perform better too since you'd not send all the unnecessary variables with every query. One problem with a textual substitution is that implicit variable use (e.g. selecting from a view) can't be substituted, at least not trivially. As for sending unnecessary variables with every query, my idea was to store those variables in a global table keyed by context ID, then just send that context ID with every query. But, I think that implementing any kind of variable support in the backend is way too ambitious a project for a first-time hacker to get done in a couple of months. I would guess that's a two-year project for a first time hacker or a one-year project for an experienced hacker (or a three week project for Tom Lane). Here are some ideas from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer to the right size for GSOC: [...] Add JSON (JavaScript Object Notation) data type [tricky part will be getting community buy-in on which JSON library to use] The JSON idea caught my eye. I guess the best approach here would be not to use an external library, but to implement it manually using flex/bison. Most of the work would probably revolve around converting things to/from PostgreSQL types, writing test cases, and getting it integrated; writing the parser itself should be a piece of cake. At first, I figured adding JSON support would be almost too trivial: just parse it, then you're done. After seeing that src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit more to it :) I skimmed through some JSON implementations in C, and I didn't find any using bison/flex. From the looks of it, I do like JSON_parser ( http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears to be written for speed. I think one benefit of adding JSON support is that it would provide a way to store EAV-type data with less overhead than XML (and no dependency on an external library). If this were the only goal, binary encoding would be even better. However, I suppose JSON is more popular and easier to work with in practice. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
I apologize for my silence, as I've been busy reading up more on the internals of PostgreSQL. From what I can tell, a big problem with my jails idea (as well as the variables Robert described) is that there really isn't a way to store context in the backend specifically for the end client (e.g. a PHP script) due to connection pooling. Also, I almost feel that storing such context would be a disadvantage, as it would harm some of the referential transparency that pooling and caching take advantage of, now and in the future. However, I'm not going to give up :) Perhaps we could have some sort of LET statement that allows the client to pass data to the server, then have libpq automatically wrap queries with the LET statement (when necessary). Here's what it would look like to the PHP scripter: // New libpq function pg_set('current_user', 'bob'); $result = pg_query_params( 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', array('Birth place')); What this really does is something like: $result = pg_query_params( 'LET current_user=$1 DO $2 $3', array( 'bob', 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', 'Birth place') )); Here, the hypothetical LET statement executes a query string, binding current_user to our desired value. The client library would wrap all future queries in this fashion. Granted, it would be silly to pass the value itself to the server over and over, so a serious implementation would probably pass a context ID, and these variable assignments would live in the backend instead. Moreover, LET is a terrible keyword choice here, considering most PostgreSQL users won't need to use it explicitly thanks to additional libpq support. Alternatively (this might require changing the client/server protocol), a context ID could be passed back and forth, thus providing a way to tell clients apart. Implementing this idea requires adding to the backend and to libpq. The backend would need at least two new statements. One would set a variable of a session context, creating one if necessary and returning its ID. Another would execute a string as a parameter and bind both immediate arguments and session context to it. libpq would need a function to set a variable, and it would need to wrap queries it sends out with LET statements if necessary. Note that these variables can't be used in pre-defined functions unless they are somehow declared in advance. One idea would be to first add global variable support, then make session-local contexts be able to temporarily reassign those variables. Another would be to provide an explicit declaration statement. Would this make a good proposal for GSoC?: Implement the backend part of my proposal, and create a proof-of-concept wrapper demonstrating it. This way, I add the new statements, but don't mess around with existing functionality too much. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
Hello, I'm Joey Adams, and I'm interested in applying for Google Summer of Code to work on PostgreSQL. I'm a former GSoC student (I worked on CCAN last year), and a strong C programmer, though I am still new to working with large, established communities. I apologize if this is the wrong place to send GSoC student introductions. My proposal is bold, though I believe it can be trimmed down and refined into something very useful, yet simple enough to implement in 3 months by a newcomer to the PostgreSQL code base. I propose adding application-level access control to PostgreSQL via a jails concept. In a nutshell, a jail is created as part of the database definition (typically exposing a free variable for the current user). When a jail is activated for a session, the only accesses allowed are those indicated in the jail itself. A jail cannot be exited without closing the session. If used properly, jails make it possible to safely execute untrusted SQL code (though one may not want to, citing the principle of least privilege). For example, suppose we have the following database definition for a trivial discussion board system: CREATE TABLE users ( id SERIAL PRIMARY KEY, name TEXT, password TEXT, email TEXT, email_public BOOLEAN ); CREATE TABLE posts ( id SERIAL PRIMARY KEY, title TEXT, user INT REFERENCES users(id), content TEXT, published BOOLEAN ); CREATE TABLE tags ( post INT REFERENCES posts(id), tag TEXT ); Suppose our discussion board is written in PHP. It authenticates the user via cookies, after which it has an ID for the current user. One way to arbitrate access is to code it into the PHP using queries like this: pg_query_params('SELECT users.email WHERE id=$1 AND (email_public=TRUE OR id=$2)', $_GET['user'], $current_user); Simple enough. However, this access control check has to be done for every query, and it can get rather complex as more tables are involved. Views are a way to alleviate this complexity, but: * Views only apply to SELECTs * Queries still have to specify the current user (unless one uses a global variable (which requires a bit of a hack, if I'm not mistaken)) My proposal would make it possible to have a statement in the database definition much like this: CREATE JAIL jail ( SELECT id, name FROM users; SELECT email FROM users WHERE id=current_user OR email_public=TRUE; SELECT * FROM posts WHERE user=current_user OR published=TRUE; UPDATE posts WHERE user=current_user SET title, content, published; INSERT INTO posts WHERE user=current_user; DELETE FROM posts WHERE user=current_user; SELECT * FROM tags, posts WHERE tags.post=posts.id; INSERT INTO tags WHERE post IN ( SELECT id FROM posts WHERE user=current_user); DELETE FROM posts WHERE post IN ( SELECT id FROM posts WHERE user=current_user); ); Inside of the jail definition is a series of pseudo-statements that indicate the space of queries the user can perform. Simply creating a jail does not make it go into effect. A jail is activated using another query, and it remains in effect for the remainder of the session. It cannot be deactivated through the protocol, as doing so would constitute a privilege escalation. Example of a PHP script invoking a jail: ?php pg_connect('user=foo password=spiderman') or die('Database error occurred'); /* authentication-fu */ // $current_user now contains the id of the logged-in user. pg_query_params('USE JAIL jail WITH current_user=$1', array($current_user)); /* Remaining queries in this session will only be able to SELECT, UPDATE, INSERT, and DELETE on users, posts, and tags, following the rules set forth by the jail. */ ? I came up with this idea while working on a small software-as-a-service application using PostgreSQL. I needed to implement access control in the PHP frontend. I ended up creating a database abstraction class to protect the rest of the frontend code from SQL messiness. Every database query I needed, I wrote/extended a PHP function for it. My database access class for my small application exploded to over 1000 lines (granted, a lot of it was comments). Although having accessor functions can make the code easier to read and maintain, it can result in a lot of redundancy. Why not just use the database abstraction library PostgreSQL gives you: SQL? Jails seem to me like a revolutionary addition to SQL, and implementing them fully could be quite involved. However, my guess is that by restricting what may be done in a jail simply to SELECT, UPDATE, INSERT and DELETE, an initial jail implementation