from:"Joseph Adams"

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-04 Thread Joseph Adams

Thanks for reviewing my patch!

On Mon, Jul 4, 2011 at 7:10 AM, Bernd Helmle maili...@oopsware.de wrote:
 +comment = 'data type for storing and manipulating JSON content'

 I'm not sure, if manipulating is a correct description. Maybe i missed it,
 but i didn't see functions to manipulate JSON strings directly, for example,
 adding an object to a JSON string, ...

Indeed.  I intend to add manipulation functions in the future.  The
words and manipulating shouldn't be there yet.

 -- Coding
 ...
 ... It is noticable that the parser seems to define
 its own is_space() and is_digit() functions, e.g.:

 +#define is_space(c) ((c) == '\t' || (c) == '\n' || (c) == '\r' || (c) == '
 ')

 Wouldn't it be more clean to use isspace() instead?

isspace() accepts '\f', '\v', and sometimes other characters as well,
depending on locale.  Likewise, isdigit() accepts some non-ASCII
characters in addition to [0-9], depending on the locale and platform.
 Thus, to avoid parsing a superset of the JSON spec, I can't use the
ctype.h functions.  I should add a comment with this explanation.

 This code block in function json_escape_unicode() looks suspicious:

 +   /* Convert to UTF-8, if necessary. */
 +   {
 +       const char *orig = json;
 +       json = (const char *)
 +           pg_do_encoding_conversion((unsigned char *) json, length,
 +                                     GetDatabaseEncoding(), PG_UTF8);
 +       if (json != orig)
 +           length = strlen(json);
 +   }

 Seems it *always* wants to convert the string. Isn't there a if condition
 missing?

pg_do_encoding_conversion does this check already.  Perhaps I should
rephrase the comment slightly:

+   /* Convert to UTF-8 (only if necessary). */

 There's a commented code fragment missing, which should be removed (see last
 two lines of the snippet below):

 +static unsigned int
 +read_hex16(const char *in)
 ...
 +               Assert(is_hex_digit(c));
 +
 +               if (c = '0'  c = '9')
 +                       tmp = c - '0';
 +               else if (c = 'A'  c = 'F')
 +                       tmp = c - 'A' + 10;
 +               else /* if (c = 'a'  c = 'f') */
 +                       tmp = c - 'a' + 10;

That comment is there for documentation purposes, to indicate the
range check that's skipped because we know c is a hex digit, and [a-f]
is the only thing it could be (and must be) at that line.  Should it
still be removed?

 json.c introduces new appendStringInfo* functions, e.g.
 appendStringInfoEscape() and appendStringInfoUtf8(). Maybe it's better
 to name them to something different, since it may occur someday that the
 backend
 will introduce the same notion with a different meaning. They are static,
 but why not naming them into something like jsonAppendStringInfoUtf8() or
 similar?

I agree.

 -- Regression Tests
...
 It also tests UNICODE sequences and input encoding with other server
 encoding than UTF-8.

It tests with other *client* encodings than UTF-8, but not other
server encodings than UTF-8.  Is it possible to write tests for
different server encodings?

-- Self-review

There's a minor bug in the normalization code:

 select $$ \u0009 $$::json;
   json
--
 \u0009
(1 row)

This should produce \t instead.

I'm thinking that my expect_*/next_*pop_char/... parsing framework is
overkill, and that, for a syntax as simple as JSON's, visibly messy
parsing code like this:

if (s  e  (*s == 'E' || *s == 'e'))
{
s++;
if (s  e  (*s == '+' || *s == '-'))
s++;
if (!(s  e  is_digit(*s)))
return false;
do
s++;
while (s  e  is_digit(*s));
}

would be easier to maintain than the deceptively elegant-looking code
currently in place:

if (optional_char_cond(s, e, *s == 'E' || *s == 'e'))
{
optional_char_cond(s, e, *s == '+' || *s == '-');
skip1_pred(s, e, is_digit);
}

I don't plan on gutting the current macro-driven parser just yet.
When I do, the new parser will support building a parse tree (only
when needed), and the parsing functions will have signatures like
this:

static bool parse_value(const char **sp, const char *e, JsonNode **out);
static bool parse_string(const char **sp, const char *e, StringInfo 
*out);
...

The current patch doesn't provide manipulation features where a parse
tree would come in handy.  I'd rather hold off on rewriting the parser
until such features are added.

I'll try to submit a revised patch within the next couple days.

Thanks!

- Joey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another swing at JSON

2011-06-17 Thread Joseph Adams

On Fri, Jun 17, 2011 at 2:29 AM, Bernd Helmle maili...@oopsware.de wrote:
 Joseph, are you able to remove the compatibility code for this CF?

Done.  Note that this module builds, tests, and installs successfully
with USE_PGXS=1.  However, building without USE_PGXS=1 produces the
following:

CREATE EXTENSION json;
ERROR:  incompatible library /usr/lib/postgresql/json.so: version mismatch
DETAIL:  Server is version 9.1, library is version 9.2.

Similar problems occur with a couple other modules I tried (hstore, intarray).


Joey


json-contrib-no-compat-20110617.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Joseph Adams

On Wed, May 11, 2011 at 7:53 AM, Robert Haas robertmh...@gmail.com wrote:
 That's likely to be how it gets implemented, but you seem to have
 missed the point of some of the discussion upthread: the big problem
 with that is that someone might type DROP TYPE foo, and when they
 do, you need an efficient way to figure out whether foo is in use
 inside an instance of the variant type anywhere in the system.  The
 devil is in the details...

Sorry, I missed that.  That in mind, I think I would lean more toward
the union proposal as well.  Can anyone think of a case where VARIANT
would be more useful?

As for using one or two bytes to store the type of a UNION, that
creates a problem when you want to extend the union in the future.
That is, if a UNION is simply a collection of possible types values of
the UNION type can hold.

If UNION is implemented more like a tagged union:

CREATE TYPE token AS TAGGED UNION (identifier TEXT, keyword TEXT,
number INT);

Then the problem of altering it is much like the problem of altering an ENUM.

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:
 Examples of open union types could be number, which all the numeric types
 compose, and so you can know say that you can use the generic numeric
 operators on values you have simply if their types compose the number union
 type, and it still works if more numeric types appear later.  Likewise, the
 string open union could include both text and blob, as both support
 catenation and substring matches or extraction, for example.

 This would aid to operator overloading in a generic way, letting you use the
 same syntax for different types, but allowing types to mix is optional; eg,
 you could support add(int,int) and add(real,real) without supporting
 add(int,real) etc but the syntax add(x,y) is shared, and you do this
 while still having a strong type system; allowing the mixing is optional
 case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a

I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).

To express the (+) example in PostgreSQL, it would be tempting to simply say:

add(real, real) returns real

However, what if each real is a different type (e.g. INT and FLOAT).
Is that allowed?  In the Haskell example above, (+) constraints both
of its arguments to the same type.  In ad-hoc syntax, it would look
like this in PostgreSQL:

real anyelement = add(anyelement, anyelement) returns anyelement

Another thing to consider: attempting to use a type class as a column
type, e.g.:

CREATE TABLE foo (n real);

Normally in Haskell, type information is passed implicitly as
parameters (hence the term parametric polymorphism), rather than
carried alongside values (like in object-oriented languages).  In the
case above, the type information would have to be carried with each
value.  Haskell actually supports this, but under a somewhat-weird
extension called Existential types (see
http://www.haskell.org/haskellwiki/Existential_type#Examples for an
example).  It isn't terribly useful in Haskell, and I don't think it
will be in PostgreSQL either.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Joseph Adams

It seems to me a reasonable way to implement VARIANT would be to have
a data type called VARIANT that stores an OID of the inner type at the
beginning, followed by the binary data.  When you say
pg_typeof(somevariant), you'll get 'variant'.  Instead, you'd use a
function like this:

variant_typeof(VARIANT) returns REGTYPE

VARIANT could then be casted to/from other types.  I think the rules
for converting to/from variant should be a little stronger than normal
PostgreSQL casting rules.  For example:

SELECT '12345'::TEXT::INT; -- allowed
SELECT '12345'::TEXT::VARIANT::INT; -- not allowed
SELECT '12345'::TEXT::VARIANT::TEXT::INT; -- allowed

I'm not sure how conversions to/from VARIANT could be implemented
other than creating conversion functions for every type.  It'd be nice
if we could avoid that.

A more permissive way to convert out of VARIANT might be to have a
function like this:

variant_unwrap(VARIANT) returns TEXT -- user casts to desired type

I suppose the in/out functions could prefix the value with the type
name and a colon:

 SELECT '12345'::INT::VARIANT;
   variant
-
integer:12345

The VARIANT type, or similar, would be useful for the JSON data type
I've been intermittently working on, as it would allow us to create a
function like this:

from_json(JSON) returns VARIANT

from_json would unwrap a JSON string/number/bool/null/array,
converting it to a VARIANT whose inner type is TEXT / (INT or BIGINT
or DOUBLE or NUMERIC) / BOOL / [null] / ARRAY.  In the [null] case,
from_json would actually return NULL (I see no need for VARIANT to
wrap nulls).

This is rather type-safe compared to what I currently have:

from_json(JSON) returns TEXT -- user casts to desired type

By returning variant, we can then cast to the desired type, and if the
cast is invalid, a type error will occur.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Joseph Adams

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 The VARIANT type, or similar, would be useful for the JSON data type
 I've been intermittently working on, as it would allow us to create a
 function like this:

    from_json(JSON) returns VARIANT

This occurred to me: if PostgreSQL functions could return ANYELEMENT,
then we could just say:

from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value

Likewise, the variant conversion functions (which could be invoked
automatically) could be done this way:

to_variant(ANYELEMENT) returns VARIANT
from_variant(VARIANT) returns ANYELEMENT

However, I'm not familiar enough with the innards of PostgreSQL's type
system to know if returning ANYELEMENT would be possible and make
sense.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cast from integer to money

2011-04-04 Thread Joseph Adams

On Sun, Apr 3, 2011 at 11:23 PM, Robert Haas robertmh...@gmail.com wrote:
 Thanks for the patch, but I think you forgot to worry about overflow:

 rhaas=# select 9223372036854775807::money;
  money
 
  -$1.00
 (1 row)

cash_in doesn't test for overflow, either (tested on 8.4.0, 9.0.3, and HEAD):

joey=# select '9223372036854775807'::money;
 money

 -$1.00
(1 row)

Is this a bug?

Detail: unlike cash_in, numeric_cash does check for overflow
(implicitly, through its use of numeric_int8).


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cast from integer to money

2011-04-04 Thread Joseph Adams

Attached is an updated version of the patch to allow conversion of
int4/int8 directly to money.  I added overflow checks, dropped
int2-cash, and updated the documentation.


- Joey
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index ecc79e2..13b888d 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -886,15 +886,22 @@ ALTER SEQUENCE replaceable class=parametertablename/replaceable_replaceab
/para
 
para
-Values of the typenumeric/type data type can be cast to
-typemoney/type.  Other numeric types can be converted to
-typemoney/type by casting to typenumeric/type first, for example:
+Values of the typenumeric/type, typeint/type, and
+typebigint/type data types can be cast to typemoney/type.
+Conversion from the typereal/type and typedouble precision/type
+data types can be done by casting to typenumeric/type first, for
+example:
 programlisting
-SELECT 1234::numeric::money;
+SELECT '12.34'::float8::numeric::money;
 /programlisting
+However, this is not recommended.  Floating point numbers should not be
+used to handle money due to the potential for rounding errors.
+   /para
+
+   para
 A typemoney/type value can be cast to typenumeric/type without
 loss of precision. Conversion to other types could potentially lose
-precision, and it must be done in two stages, for example:
+precision, and must also be done in two stages:
 programlisting
 SELECT '52093.89'::money::numeric::float8;
 /programlisting
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 67f5128..61f7370 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -92,6 +92,21 @@ num_word(Cash value)
 	return buf;
 }	/* num_word() */
 
+static bool
+int8mul_direct(int64 arg1, int64 arg2, int64 *result)
+{
+	*result = arg1 * arg2;
+
+	/* Overflow check.  See comment in int8mul */
+	if (arg1 != (int64) ((int32) arg1) || arg2 != (int64) ((int32) arg2))
+	{
+		if (arg2 != 0 
+			(*result / arg2 != arg1 || (arg2 == -1  arg1  0  *result  0)))
+			return false;
+	}
+
+	return true;
+}
 
 /* cash_in()
  * Convert a string to a cash data type.
@@ -938,3 +953,67 @@ numeric_cash(PG_FUNCTION_ARGS)
 
 	PG_RETURN_CASH(result);
 }
+
+/* int4_cash()
+ * Convert int4 (int) to cash
+ */
+Datum
+int4_cash(PG_FUNCTION_ARGS)
+{
+	int32	amount = PG_GETARG_INT32(0);
+	Cash	result;
+	int		fpoint;
+	int64   scale;
+	int		i;
+	struct lconv *lconvert = PGLC_localeconv();
+
+	/* see comments about frac_digits in cash_in() */
+	fpoint = lconvert-frac_digits;
+	if (fpoint  0 || fpoint  10)
+		fpoint = 2;
+
+	/* compute required scale factor */
+	scale = 1;
+	for (i = 0; i  fpoint; i++)
+		scale *= 10;
+
+	/* compute amount * scale, checking for overflow */
+	if (!int8mul_direct(amount, scale, result))
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg(money out of range)));
+
+	PG_RETURN_CASH(result);
+}
+
+/* int8_cash()
+ * Convert int8 (bigint) to cash
+ */
+Datum
+int8_cash(PG_FUNCTION_ARGS)
+{
+	int64	amount = PG_GETARG_INT64(0);
+	Cash	result;
+	int		fpoint;
+	int64   scale;
+	int		i;
+	struct lconv *lconvert = PGLC_localeconv();
+
+	/* see comments about frac_digits in cash_in() */
+	fpoint = lconvert-frac_digits;
+	if (fpoint  0 || fpoint  10)
+		fpoint = 2;
+
+	/* compute required scale factor */
+	scale = 1;
+	for (i = 0; i  fpoint; i++)
+		scale *= 10;
+
+	/* compute amount * scale, checking for overflow */
+	if (!int8mul_direct(amount, scale, result))
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg(money out of range)));
+
+	PG_RETURN_CASH(result);
+}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bf8a6fc..f7b7b76 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -126,6 +126,8 @@ DATA(insert ( 1700	700 1745 i f ));
 DATA(insert ( 1700	701 1746 i f ));
 DATA(insert (  790 1700 3823 a f ));
 DATA(insert ( 1700	790 3824 a f ));
+DATA(insert ( 23	790 3811 a f ));
+DATA(insert ( 20	790 3812 a f ));
 
 /* Allow explicit coercions between int4 and bool */
 DATA(insert (	23	16	2557 e f ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cff64ba..7919a40 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -971,6 +971,10 @@ DATA(insert OID = 3823 (  numeric		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 1700
 DESCR(convert money to numeric);
 DATA(insert OID = 3824 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 1700 _null_ _null_ _null_ _null_	numeric_cash _null_ _null_ _null_ ));
 DESCR(convert numeric to money);
+DATA(insert OID = 3811 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 23 _null_ _null_ _null_ _null_	int4_cash _null_ _null_ _null_ ));
+DESCR(convert int4 to money);
+DATA(insert OID = 3812 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 20 _null_ _null_ _null_ _null_	int8_cash _null_ _null_ _null_

Re: [HACKERS] psql 9.1 alpha5: connection pointer is NULL

2011-04-02 Thread Joseph Adams

2011/4/2 Devrim GÜNDÜZ dev...@gunduz.org:

 I'm getting the following message after upgrading to Alpha5 on my Fedora
 14 box:

 $ psql -p 5433
 psql: connection pointer is NULL

 which comes from libpq. Server is running, and I can connect it to via
 9.0's psql.

 This is a regular RPM build. Am I doing something wrong, or?

I couldn't reproduce this (using upstream source on Ubuntu).  However,
I did find a little bug in libpq causing the connection handle to
become NULL in the event of an option parsing error.  This bug has
been around since release 9.0.0, and may be unrelated to the problem.


Joey Adams
From 5bcf99b6481dd1393525ee804eeae6a04999d86d Mon Sep 17 00:00:00 2001
From: Joey Adams joeyadams3.14...@gmail.com
Date: Sat, 2 Apr 2011 14:27:22 -0400
Subject: [PATCH] Fixed return false; in PQconnectStartParams, which returns a PGconn *

---
 src/interfaces/libpq/fe-connect.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index a4959ee..aa24c37 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -491,7 +491,7 @@ PQconnectStartParams(const char **keywords,
 	{
 		conn-status = CONNECTION_BAD;
 		/* errorMessage is already set */
-		return false;
+		return conn;
 	}
 
 	/*
-- 
1.7.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cast from integer to money

2011-04-01 Thread Joseph Adams

On Thu, Mar 31, 2011 at 6:39 PM, Stephen Frost sfr...@snowman.net wrote:
 Going just integer-money, with the 1 - $1.00, seems completely
 reasonable to me.  As for being too late in the cycle..  if someone's
 willing to do the work, I can't imagine it breaking anything, so I
 wouldn't be against putting it in.  It really should be before the
 first beta tho.

Attached is a patch which enables casting int2/int4/int8 to money,
with the same scaling as numeric uses.  Hence, 1::money yields '$1.00'
.  The only other numeric types (other than oid, cardinal_number,
etc.) that can't be casted directly to money are float4 and float8,
and I suspect this is intentional.

The patch includes tests, but does not update the documentation.
Should the docs be updated where it reads Values of the numeric data
type can be cast to money. Other numeric types can be converted to
money by casting to numeric first ?

Because this change adds rows to the pg_proc and pg_cast catalogs,
applying this patch for 9.1 will require alpha users to initdb again.
Is that acceptable?


Regards,
Joey Adams
From aca9723db6f8614286a0eb82517e29407b09193e Mon Sep 17 00:00:00 2001
From: Joey Adams joeyadams3.14...@gmail.com
Date: Fri, 1 Apr 2011 22:09:12 -0400
Subject: [PATCH] Allow cast from int2/int4/int8 to money

---
 src/backend/utils/adt/cash.c|   75 +++
 src/include/catalog/pg_cast.h   |3 +
 src/include/catalog/pg_proc.h   |6 +++
 src/include/utils/cash.h|4 ++
 src/test/regress/expected/money.out |   73 ++
 src/test/regress/sql/money.sql  |   14 ++
 6 files changed, 175 insertions(+), 0 deletions(-)

diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 67f5128..1981123 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -938,3 +938,78 @@ numeric_cash(PG_FUNCTION_ARGS)
 
 	PG_RETURN_CASH(result);
 }
+
+/* int2_cash()
+ * Convert int2 (smallint) to cash
+ */
+Datum
+int2_cash(PG_FUNCTION_ARGS)
+{
+	int16	amount = PG_GETARG_INT16(0);
+	Cash	result;
+	int		fpoint;
+	int		i;
+	struct lconv *lconvert = PGLC_localeconv();
+	
+	/* see comments about frac_digits in cash_in() */
+	fpoint = lconvert-frac_digits;
+	if (fpoint  0 || fpoint  10)
+		fpoint = 2;
+	
+	/* result = amount * 10^fpoint */
+	result = amount;
+	for (i = 0; i  fpoint; i++)
+		result *= 10;
+	
+	PG_RETURN_CASH(result);
+}
+
+/* int4_cash()
+ * Convert int4 (int) to cash
+ */
+Datum
+int4_cash(PG_FUNCTION_ARGS)
+{
+	int32	amount = PG_GETARG_INT32(0);
+	Cash	result;
+	int		fpoint;
+	int		i;
+	struct lconv *lconvert = PGLC_localeconv();
+	
+	/* see comments about frac_digits in cash_in() */
+	fpoint = lconvert-frac_digits;
+	if (fpoint  0 || fpoint  10)
+		fpoint = 2;
+	
+	/* result = amount * 10^fpoint */
+	result = amount;
+	for (i = 0; i  fpoint; i++)
+		result *= 10;
+	
+	PG_RETURN_CASH(result);
+}
+
+/* int8_cash()
+ * Convert int8 (bigint) to cash
+ */
+Datum
+int8_cash(PG_FUNCTION_ARGS)
+{
+	int64	amount = PG_GETARG_INT64(0);
+	Cash	result;
+	int		fpoint;
+	int		i;
+	struct lconv *lconvert = PGLC_localeconv();
+	
+	/* see comments about frac_digits in cash_in() */
+	fpoint = lconvert-frac_digits;
+	if (fpoint  0 || fpoint  10)
+		fpoint = 2;
+	
+	/* result = amount * 10^fpoint */
+	result = amount;
+	for (i = 0; i  fpoint; i++)
+		result *= 10;
+	
+	PG_RETURN_CASH(result);
+}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bf8a6fc..5c2bbef 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -126,6 +126,9 @@ DATA(insert ( 1700	700 1745 i f ));
 DATA(insert ( 1700	701 1746 i f ));
 DATA(insert (  790 1700 3823 a f ));
 DATA(insert ( 1700	790 3824 a f ));
+DATA(insert ( 21	790 3825 a f ));
+DATA(insert ( 23	790 3811 a f ));
+DATA(insert ( 20	790 3812 a f ));
 
 /* Allow explicit coercions between int4 and bool */
 DATA(insert (	23	16	2557 e f ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cff64ba..1f46fd9 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -971,6 +971,12 @@ DATA(insert OID = 3823 (  numeric		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 1700
 DESCR(convert money to numeric);
 DATA(insert OID = 3824 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 1700 _null_ _null_ _null_ _null_	numeric_cash _null_ _null_ _null_ ));
 DESCR(convert numeric to money);
+DATA(insert OID = 3825 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 21 _null_ _null_ _null_ _null_	int2_cash _null_ _null_ _null_ ));
+DESCR(convert int2 to money);
+DATA(insert OID = 3811 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 23 _null_ _null_ _null_ _null_	int4_cash _null_ _null_ _null_ ));
+DESCR(convert int4 to money);
+DATA(insert OID = 3812 (  money			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 790 20 _null_ _null_ _null_ _null_	int8_cash _null_ _null_ _null_ ));
+DESCR(convert int8 to money);

Re: [HACKERS] Another swing at JSON

2011-03-29 Thread Joseph Adams

On Tue, Mar 29, 2011 at 2:42 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Also, should uninstall_json.sql be named something else, like
 json--uninstall--0.1.sql ?

 You don't need no uninstall script no more, try DROP EXTENSION json; and
 DROP EXTENSION json CASCADE;

It's there for pre-9.1, where DROP EXTENSION is not available.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another swing at JSON

2011-03-29 Thread Joseph Adams

On Tue, Mar 29, 2011 at 2:42 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Joseph Adams joeyadams3.14...@gmail.com writes:
 It would be nice if I could make a Makefile conditional that skips the
 relocatable test and loads init-pre9.1.sql if the new extension
 interface isn't available.  Is there a Makefile variable or something
 I can use to do this?

 You can use VERSION and MAJORVERSION variables, those are defined in
 Makefile.global and available as soon as you did include the PGXS
 Makefile.

The problem is, I'd have to include the PGXS Makefile before defining
a parameter used by the PGXS Makefile.  I could include the PGXS
Makefile twice, once at the top, and again at the bottom, but that
produces a bunch of ugly warnings like:

../../src/Makefile.global:418: warning: overriding commands for
target `submake-libpq'

Not only that, but MAJORVERSION is formatted as a decimal (like 9.1 or
8.4).  Although I could use some hacky string manipulation, or compare
MAJORVERSION against all prior supported versions, either approach
would be needlessly error-prone.  I'm thinking the pg_config utility
should either make PG_VERSION_NUM (e.g. 90100) available, or it should
define something indicating the presence of the new extension system.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another swing at JSON

2011-03-29 Thread Joseph Adams

On Tue, Mar 29, 2011 at 4:02 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Here's the ugly trick from ip4r, that's used by more extension:

 PREFIX_PGVER = $(shell echo $(VERSION) | awk -F. '{ print $$1*100+$$2 }')

Thanks.  I applied a minor variation of this trick to the JSON module,
so now it builds/installs/tests cleanly on both REL8_4_0 and HEAD
(though it won't work if you copy contrib/json into a pre-9.1
PostgreSQL source directory and type `make` without USE_PGXS=1).

I also went ahead and renamed uninstall_json.sql to
json--uninstall--0.1.sql (again, it's for pre-9.1 users) and removed
unnecessary trailing spaces.

 Anything going into the PostgreSQL code base will be for 9.2, so
 anything else would be a separate (if somewhat related) project.  I
 suspect the code will be a good deal cleaner if you do just the 9.2+
 version and see who wants it back-patched, if anyone does :)

It's a trivial matter to remove backward compatibility from
contrib/json, if anybody wants me to do it.  I can just remove
compat.[ch], */init-pre9.1.* , remove the PREFIX_PGVER trick from the
Makefile, remove a few lines in the source code, and maintain the
backported json module elsewhere.  It's just a matter of whether or
not explicit backward-compatibility is desirable in modules shipped
with releases.


Joey Adams


add-json-contrib-module-4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another swing at JSON

2011-03-28 Thread Joseph Adams

On Mon, Mar 28, 2011 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 28, 2011 at 1:21 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Attached is a patch that adds a 'json' contrib module.  Although we
 may want a built-in JSON data type in the near future, making it a
 module (for the time being) has a couple advantages:

 Is this something you'd hope to get committed at some point, or do you
 plan to maintain it as an independent project?

I'm hoping to get it committed at some point, perhaps as a module
soon, and a built-in later.  Plenty of people are still waiting for
JSON data type support, for example:


http://stackoverflow.com/questions/4995945/optimize-escape-json-in-postgresql-9-0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: Add PGXS support to hstore's Makefile (trivial)

2011-01-23 Thread Joseph Adams

I discovered today that hstore's Makefile currently does not support
PGXS; namely, the ability to install it like so:

USE_PGXS=1 make
sudo USE_PGXS=1 make install

Moreover, hstore is the *only* contrib module whose Makefile does not
contain PGXS.

The attached patch changes hstore's Makefile so it is like the others.
 I have tested the above commands on a copy of the hstore directory
located outside of the PostgreSQL source tree, and it appears to work
just fine.  I also tested conventional make and make install, and it
still appears to work as well.


Joey Adams
commit a7384288531ba9a235292836d6ee1e17a65a3997
Author: Joey Adams joeyadams3.14...@gmail.com
Date:   Sun Jan 23 21:05:03 2011 -0500

Added PGXS support to contrib/hstore/Makefile

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index e466b6f..1d533fd 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -1,9 +1,5 @@
 # contrib/hstore/Makefile
 
-subdir = contrib/hstore
-top_builddir = ../..
-include $(top_builddir)/src/Makefile.global
-
 MODULE_big = hstore
 OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
 	crc32.o
@@ -12,4 +8,13 @@ DATA_built = hstore.sql
 DATA = uninstall_hstore.sql
 REGRESS = hstore
 
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/hstore
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON data type status?

2011-01-21 Thread Joseph Adams

On Thu, Jan 20, 2011 at 8:13 PM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Fri, Jan 21, 2011 at 09:11, Bruce Momjian br...@momjian.us wrote:
 What happened to our work to add a JSON data type for PG 9.1?

 Nothing will happen in 9.1.
 I assume we are in competition status:

 http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php

 Also, if PGXN will work well, we might not have to include JSON
 in the core. We can download any JSON implementations from the
 site after installing the core server.  Of course, if we will
 use JSON types in the core (EXPLAIN JSON output?), we have to
 include one of them.

Thanks for the update.  However, I should probably take some blame as
well for spending basically zero time on the JSON datatype for lengths
at a time.

After someone emailed me asking how to install the JSON datatype, I
fixed up my module version of it:

http://git.postgresql.org/gitweb?p=json-datatype.git

I also have a version where JSON is a core datatype (the patches I
submitted to CommitFest), but it is obsolete now.  The module version
has all the same features, but also fixes a Unicode bug* and adds
PostgreSQL 8.4 compatibility.  For those who want the JSON datatype
right now, I recommend using the module.

I plan to maintain and improve the JSON module (repository linked
above), albeit at a snail's pace.  Patches are certainly welcome.  See
roadmap.markdown in the repository for some minor and not-so-minor
design decisions.


Joey Adams

* P.S.

The Unicode bug involved UTF-16 surrogate pair calculation.  JSON
encodes Unicode characters that aren't in the Basic Multilingual Plane
the same way UTF-16 does: using two 4-digit hex codes.  The correct
formula for determining the Unicode codepoint of a surrogate pair is:

unicode = 0x1 + (((uc  0x3FF)  10) | (lc  0x3FF));

where:

0xD800 = uc = 0xDBFF
0xDC00 = lc = 0xDFFF

For example, consider the JSON string \uD835\uDD0D:

uc  = 0xD835
lc  = 0xDD0D
unicode = 0x1D50D

This pair of hex indices collapses into one Unicode codepoint, 0x1D50D
(or 픍).

I originally used this:

/* WRONG */
unicode = 0x1 | ((uc  0x3FF)  10) | (lc  0x3FF);

The problem is, ((uc  0x3FF)  10) | (lc  0x3FF) has a range of 0
to 0xF, and in some of those numbers, `0x1 |` adds zero.

I am sharing all this because I have made this mistake twice, and I
believe it is an easy mistake to make.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Joseph Adams

On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote:
 Perhaps we should enumerate the attributes of what would make a good binary
 encoding?

Not sure if we're discussing the internal storage format or the binary
send/recv format, but in my humble opinion, some attributes of a good
internal format are:

1. Lightweight - it'd be really nice for the JSON datatype to be
available in core (even if extra features like JSONPath aren't).
2. Efficiency - Retrieval and storage of JSON datums should be
efficient.  The internal format should probably closely resemble the
binary send/recv format so there's a good reason to use it.

A good attribute of the binary send/recv format would be
compatibility.  For instance, if MongoDB (which I know very little
about) has binary send/receive, perhaps the JSON data type's binary
send/receive should use it.

Efficient retrieval and update of values in a large JSON tree would be
cool, but would be rather complex, and IMHO, overkill.  JSON's main
advantage is that it's sort of a least common denominator of the type
systems of many popular languages, making it easy to transfer
information between them.  Having hierarchical key/value store support
would be pretty cool, but I don't think it's what PostgreSQL's JSON
data type should do.

On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.

Speculation: the overhead involved with retrieving/sending and
receiving/storing JSON (not to mention TOAST
compression/decompression) will be far greater than that of
serializing/unserializing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Basic JSON support

2010-10-04 Thread Joseph Adams

On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 Here's one thing I'm worried about: the bison/flex code in your patch
 looks rather similar to the code in
 http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
 GPL.  In particular, the incorrect number regex I discussed above can
 also be found in jsonval verbatim.  However, because there are a lot
 of differences in both the bison and flex code now,  I'm not sure
 they're close enough to be copied, but I am not a lawyer.  It might
 be a good idea to contact Ben Spencer and ask him for permission to
 license our modified version of the code under PostgreSQL's more
 relaxed license, just to be on the safe side.

With the help and motivation of David Fetter, I sent an e-mail to Ben
Spencer (google jsonval, check out the git repo, and look in the git
log to find his e-mail address) a few minutes ago, asking if he would
license jsonval under a license compatible with PostgreSQL, and am
currently awaiting a response.

If he doesn't respond, or outright refuses (which I, for one, doubt
will happen), my fallback plan is to rewrite the JSON validation code
by drawing from my original code (meaning it won't be in bison/flex)
and post a patch for it.  Unfortunately, it seems to me that there
aren't very many ways of expressing a JSON parser in bison/flex, and
thus the idea of JSON parsing with bison/flex is effectively locked
down by the GPL unless we can get a more permissive license for
jsonval.  But, I am not a lawyer.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Basic JSON support

2010-10-04 Thread Joseph Adams

On Mon, Oct 4, 2010 at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  Joseph seems to be confusing copyrights with patents.  The idea
 of parse JSON with bison/flex is not patentable by any stretch of the
 imagination.

What I meant is, anyone who sets out to write a JSON parser with
bison/flex is probably going to end up with something very similar to
jsonval even without looking at it.

 But having said that, I wonder whether bison/flex are really the best
 tool for the job in the first place.  From what I understand of JSON
 (which admittedly ain't much) a bison parser seems like overkill:
 it'd probably be both bloated and slow compared to a simple handwritten
 recursive-descent parser.

I agree, and also because JSON isn't a moving target.  However, in my
opinion, the differences in quality between a bison parser and a
handwritten parser are to small to be a big deal right now.

A bison parser is probably slower and bigger than a simple
hand-written one, but I haven't benchmarked it, and it's probably not
worth benchmarking at this point.

In correctness, I suspect a bison parser and a hand-written one to be
about the same.  Both pass the same army of testcases I built up while
writing my original JSON patch.  Granted, there could be weird
semantics of bison/flex that make the parser wrong in subtle ways (for
instance, the . regex char doesn't match \n and EOF), but there can
also be weird errors in hand-written code.  Pick your poison.

In safety, a handwritten recursive descent parser could stack overflow
and crash the server unless the proper guards are in place (e.g if
someone tries to encode '[[...' as JSON).  bison guards against
this (chopping the maximum depth of JSON trees to 9997 levels or so),
but I don't know if the real stack is smaller than bison thinks it is,
or if bison uses its own buffer.  A handwritten parser could also
accept trees of any depth using a manually managed stack, but it
wouldn't do much good because code that works with JSON trees (e.g.
json_stringify) is currently recursive and has the same problem.  In
any case, I think limiting the depth of JSON, to something generous
like 1000, is the right way to go.  The RFC allows us to do that.

In my opinion, the path of least resistance is the best path for now.
If we use the bison parser now, then we can replace it with a
hand-written one that's functionally equivalent later.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-18 Thread Joseph Adams

On Sat, Sep 18, 2010 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, yeah.  I'd be tempted to try to keep the user's original
 whitespace as far as possible, but disregard it as far as equality
 comparison goes.  However, I'm not quite sure what the right thing to
 do about 0 vs 0.0 is.  Does the JSON spec say anything about that?

I didn't find anything in the JSON spec about comparison, but in
JavaScript, 0 == 0.0 and 0 === 0.0 are both true.  Also, JavaScript
considers two arrays or objects equal if and only if they are
references to the same object, meaning [1,2,3] == [1,2,3] is false,
but if you let var a = [1,2,3]; var b = a; , then a == b and a === b
are both true.  Hence, JavaScript can help us when deciding how to
compare scalars, but as for arrays and objects, we're on our own
(actually, JSON arrays could be compared lexically like PostgreSQL
arrays already are; I don't think anyone would disagree with that).

I cast my vote for 0 == 0.0 being true.

As for whitespace preservation, I don't think we should go out of our
way to keep it intact.  Sure, preserving formatting for input and
output makes some sense because we'd have to go out of our way to
normalize it, but preserving whitespace in JSONPath tree selections
(json_get) and updates (json_set) is a lot of work (that I shouldn't
have done), and it doesn't really help anybody.  Consider json_get on
a node under 5 levels of indentation.

Another thing to think about is the possibility of using a non-text
format in the future (such as a binary format or even a format that is
internally indexed).  A binary format would likely be faster to
compare (and hence faster to index).  If the JSON data type preserves
whitespace today, it might break assumptions of future code when it
stops preserving whitespace.  This should at least be documented.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Joseph Adams

On Fri, Sep 17, 2010 at 8:32 AM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Updated patch:  the JSON code has all been moved into core, so this
 patch is now for a built-in data type.

 I have a question about the design of the JSON type. Why do we need to
 store the value in UTF8 encoding? It's true the RFC of JSON says the
 the encoding SHALL be encoded in Unicode, but I don't understand why
 we should reject other encodings.

Actually, the code in my original patch should work with any server
encoding in PostgreSQL.  However, internally, it operates in UTF-8 and
converts to/from the server encoding when necessary.  I did it this
way because the JSON code needs to handle Unicode escapes like
\u266B, but there is no simple and efficient way (that I know of) to
convert single characters to/from the server encoding.

I noticed that in your new patch, you sidestepped the encoding issue
by simply storing strings in their encoded form (right?).  This is
nice and simple, but in the future, JSON tree conversions and updates
will still need to deal with the encoding issue somehow.

 As I said before, I'd like to propose only 3 features in the commitfest:
  * TYPE json data type
  * text to json: FUNCTION json_parse(text)
  * json to text: FUNCTION json_stringify(json, whitelist, space)

Although casting from JSON to TEXT does stringify it in my original
patch, I think json_stringify would be much more useful.  In addition
to the formatting options, if the internal format of the JSON type
changes and no longer preserves original formatting, then the behavior
of the following would change:

$$unnecessary\u0020escape $$ :: JSON :: TEXT

json_stringify would be more predictable because it would re-encode
the whitespace (but not the \u0020, unless we went out of our way to
make it do that).

Also, json_parse is unnecessary if you allow casting from TEXT to
JSON (which my patch does), but I think having json_parse would be
more intuitive for the same reason you do.

Long story short: I like it :-)  If you're keeping track, features
from my patch not in the new code yet are:
 * Programmatically validating JSON ( json_validate() )
 * Getting the type of a JSON value ( json_type() )
 * Converting scalar values to/from JSON
 * Converting arrays to JSON
 * JSONPath


 JSONPath will be re-implemented on the basic functionalities in the
 subsequent commitfest. Do you have a plan to split your patch?
 Or, can I continue to develop my patch? If so, JSONPath needs
 to be adjusted to the new infrastructure.

I think your patch is on a better footing than mine, so maybe I should
start contributing to your code rather than the other way around.
Before the next commitfest, I could merge the testcases from my patch
in and identify parsing discrepancies (if any).  Afterward, I could
help merge the other features into the new JSON infrastructure.

I can't compile your initial patch against the latest checkout because
json_parser.h and json_scanner.h are missing.  Is there a more recent
patch, or could you update the patch so it compiles?   I'd like to
start tinkering with the new code.  Thanks!


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-08-23 Thread Joseph Adams

On Mon, Aug 23, 2010 at 1:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
 What do you need AFTER for?  Seems to me that BEFORE should be enough.
 (You already have the unadorned syntax for adding an item after the last
 one, which is the corner case that BEFORE alone doesn't cover).

 You're right. Strictly speaking we don't need it. But it doesn't hurt much
 to provide it for a degree of symmetry.

 I'm with Alvaro: drop the AFTER variant.  It provides more than one way
 to do the same thing, which isn't that exciting, and it's also going to
 make it harder to document the performance issues.  Without that, you
 can just say ADD BEFORE will make the enum slower, but plain ADD won't
 (ignoring the issue of OID wraparound, which'll confuse matters in any
 case).

But what if you want to insert an OID at the end?  You can't do it if
all you've got is BEFORE:

CREATE TYPE colors AS ENUM ('red', 'green', 'blue');

If I want it to become ('red', 'green', 'blue', 'orange'), what am I to do?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Patch for PostgreSQL - BSON Support?

2010-08-16 Thread Joseph Adams

On Sun, Aug 15, 2010 at 11:47 PM, Andrew Dunstan and...@dunslane.net wrote:

 If BSON is simply in effect an efficient encoding of JSON, then it's not
 clear to me that we would want another type at all. Rather, we might want to
 consider storing the data in this supposedly more efficient format, and
 maybe also some conversion routines.

An issue is that the current JSON data type implementation preserves
the original text (meaning if you say '[1,2,\u0020   ]'::JSON, it
will yield '[1,2,\u0020   ]' rather than '[1,2, ]' .  I haven't
researched BSON much at all, but I seriously doubt that part of its
spec includes handling external JSON encoding details like whitespace
and superfluous escapes.

Even though I spent a long time implementing it, the original text
preservation feature should be dropped, in my opinion.  Users tend to
care more about the data inside of a JSON value rather than how it's
encoded, and replacement of values can have funky consequences on
indentation when working with indented JSON text (similar to how
pasting in a text editor puts pasted content at the wrong indent
level).

By dropping original text preservation, in the future (or now), JSON
could be encoded in BSON (or a more efficient format) in the database
rather than in JSON-encoded text.

Also, an idea would be to make json_send and json_recv (binary JSON
send/receive) use BSON rather than JSON-encoded text, as
sending/receiving JSON-encoded text is exactly what text send/receive
do.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Patch for PostgreSQL - BSON Support?

2010-08-16 Thread Joseph Adams

On Mon, Aug 16, 2010 at 7:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Well, if it's not just a binary encoding of JSON, I think we can forget
 about it ... certainly it won't work in the form I was visualizing.

                        regards, tom lane

I just read the spec, and BSON has a lot of bells and whistles
attached (such as labeling binary data with a subtype like UUID or
MD5).  With a couple exceptions (see below), any JSON can be converted
to BSON (but the way BSON does arrays is silly: item indices are
stored as strings), but not all BSONs can be converted to JSON without
losing some type details.

Others already mentioned that you can't convert 2 billion byte long
JSON strings to BSON.  Another issue is that BSON cannot encode all
JSON numbers without precision loss.  JSON can hold any number
matching

'-'? (0 | [1-9][0-9]*) ('.' [0-9]+)? ([Ee] [+-]? [0-9]+)?

but BSON pidgenholes numeric values to either double, int32, int64, or
a 12-byte MongoDB Object ID.  Thus, for people who expect JSON to be
able to hold arbitrary-precision numbers (which the JSON data type in
my patch can), using BSON for transfer or storage will violate that
expectation.

Now that I know more about BSON, my opinion is that it shouldn't be
used as the transfer or storage format of the JSON data type.  Maybe
if someone wants to do the work, BSON could be implemented as a
contrib module, and functions could be provided in that module to
convert to/from JSON with documented caveats.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: utf8_to_unicode (trivial)

2010-08-13 Thread Joseph Adams

On Tue, Jul 27, 2010 at 1:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jul 24, 2010 at 10:34 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 In src/include/mb/pg_wchar.h , there is a function unicode_to_utf8 ,
 but no corresponding utf8_to_unicode .  However, there is a static
 function called utf2ucs that does what utf8_to_unicode would do.

 I'd like this function to be available because the JSON code needs to
 convert UTF-8 to and from Unicode codepoints, and I'm currently using
 a separate UTF-8 to codepoint function for that.

 This patch renames utf2ucs to utf8_to_unicode and makes it public.  It
 also fixes the version of utf2ucs in  src/bin/psql/mbprint.c so that
 it's equivalent to the one in wchar.c .

 This is a patch against CVS HEAD for application.  It compiles and
 tests successfully.

 Comments?  Thanks,

 I feel obliged to respond this since I'm supposed to be covering your
 GSoC project while Magnus is on vacation, but I actually know very
 little about this topic.  What's undeniable, however, is that the
 coding in the two versions of utf8ucs() in the tree right now don't
 match.  src/backend/utils/mb/wchar.c has:

        else if ((*c  0xf8) == 0xf0)

 while src/bin/psql/mbprint.c, which is otherwise identical, has:

        else if ((*c  0xf0) == 0xf0)

 I'm inclined to believe that your patch is right to think that the
 former version is correct, because it used to match the latter version
 until Tom Lane changed it in 2007, and I suspect he simply failed to
 update both copies.  But I'd like someone who actually understands
 what this code is doing to confirm that.

 http://archives.postgresql.org/pgsql-committers/2007-01/msg00293.php

 I suspect we need to not only fix this, but back-patch it at least to
 8.2, which is the first release where there are two copies of this
 function.  I am not sure whether earlier releases need to be changed,
 or not.  But again, someone who understands the issues better than I
 do needs to weigh in here.

 In terms of making this function non-static, I'm inclined to think
 that a better approach would be to move it to src/port.  That gets rid
 of the need to have two copies in the first place.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


I've attached another patch that moves utf8_to_unicode to src/port per
Robert Haas's suggestion.

This patch itself is not quite as elegant as the first one because it
puts platform-independent code that belongs in wchar.c into src/port
.  It also uses unsigned int instead of pg_wchar because the typedef
of pg_wchar isn't available to the frontend, if I'm not mistaken.

I'm not sure whether I like the old patch better or the new one.


Joey Adams


utf8-to-unicode-port.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: General purpose utility functions used by the JSON data type

2010-08-13 Thread Joseph Adams

I factored out the general-purpose utility functions in the JSON data
type code into a patch against HEAD.  I have made a few changes to
them since I posted about them earlier (
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00692.php ).

A summary of the utility functions along with some of my own thoughts
about them:

getEnumLabelOids
 * Useful-ometer: ()---o
 * Rationale: There is currently no streamlined way to return a custom
enum value from a PostgreSQL function written in C.  This function
performs a batch lookup of enum OIDs, which can then be cached with
fn_extra.  This should be reasonably efficient, and it's quite elegant
to use.

FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT
 * Useful-ometer: ()o
 * Rationale: Using fcinfo-flinfo-fn_extra takes a lot of
boilerplate.  These macros help cut down the boilerplate, and the
comment explains what fn_extra is all about.

getTypeInfo
 * Useful-ometer: ()---o
 * Rationale: The get_type_io_data six-fer function is very
cumbersome to use, since one has to declare all the output variables.
The getTypeInfo puts the results in a structure.  It also performs the
fmgr_info_cxt step, which is a step done after every usage of
get_type_io_data in the PostgreSQL code.
 * Other thoughts: getTypeInfo also retrieves typcategory (and
typispreferred), which is rather ad-hoc.  This benefits the JSON code
because to_json() uses the typcategory to figure out what type of JSON
value to convert something to (for instance, things in category 'A'
become JSON arrays).  Other data types could care less about the
typcategory.  Should getTypeInfo leave that step out?

pg_substring, pg_encoding_substring
 * Useful-ometer: ()---o
 * Rationale: The JSONPath code uses it / will use it for extracting
substrings, which is probably not a very useful feature (but who am I
to say that).  This function could probably benefit the
text_substring() function in varlena.c , but it would take a bit of
work to ensure it continues to comply with standards.

server_to_utf8, utf8_to_server, text_to_utf8_cstring,
utf8_cstring_to_text, utf8_cstring_to_text_with_len
 * Useful-ometer: ()--o
 * Rationale:  The JSON data type operates in UTF-8 rather than the
server encoding because it needs to deal with Unicode escapes, but
individual Unicode characters can't be converted to/from the server
encoding simply and efficiently (as far as I know).  These routines
made the conversions done by the JSON data type vastly simpler, and
they could simplify other data types in the future (XML does a lot of
server-UTF-8 conversions too).

This patch doesn't include tests .  How would I go about writing them?

I have made the JSON data type built-in, and I will post that patch
shortly (it depends on this one).  The built-in JSON data type uses
all of these utility functions, and the tests for the JSON data type
pass.


json-util-01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: General purpose utility functions used by the JSON data type

2010-08-13 Thread Joseph Adams

On Fri, Aug 13, 2010 at 10:46 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Aug 13, 2010 at 5:45 AM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 getEnumLabelOids
  * Useful-ometer: ()---o
  * Rationale: There is currently no streamlined way to return a custom
 enum value from a PostgreSQL function written in C.  This function
 performs a batch lookup of enum OIDs, which can then be cached with
 fn_extra.  This should be reasonably efficient, and it's quite elegant
 to use.

 It's possible that there might be a contrib module out there someplace
 that wants to do this, but it's clearly a waste of time for a core
 datatype.  The OIDs are fixed.  Just get rid of the enum altogether
 and use the OIDs directly wherever you would have used the enum.  Then
 you don't need this.

Indeed, a built-in JSON data type will certainly not need it.
However, users may want to return enums from procedures written in C,
and this function provides a way to do it.

 Incidentally, if we were going to accept this function, I think we'd
 need to add some kind of a check to throw an error if any of the
 labels can't be mapped onto an Oid.  Otherwise, an error in this area
 might lead to difficult-to-find misbehavior.

I agree.  Perhaps an ereport(ERROR, ...) would be better, since it
would not be hard for a user to cause an enum mapping error (even in a
production database) by updating a stored procedure in C but not
updating the corresponding enum (or vice versa, if enum labels are
renamed).

 FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT
  * Useful-ometer: ()o
  * Rationale: Using fcinfo-flinfo-fn_extra takes a lot of
 boilerplate.  These macros help cut down the boilerplate, and the
 comment explains what fn_extra is all about.

 I think this is not a good idea.  Macros that use local variables
 under the covers make it hard for new contributors to read the code
 and understand what it does.  It's only worth doing if it saves a lot
 of typing, and this doesn't.  If we were going to do this, the right
 thing for your patch to do would be to update every instance of this
 coding pattern in our source base, so that people who copy those
 examples in the future do it the new way.  But I think there's no real
 advantage in that: it would complicate back-patching future bug fixes
 for no particularly compelling reason.

Fair enough.  Perhaps the comment about FN_EXTRA (which explains
fn_extra in more detail) could be reworded (to talk about using
fcinfo-flinfo-fn_extra manually) and included in the documentation
at xfunc-c.html .  fn_extra currently only gets a passing mention in
the discussion about set-returning functions.

 pg_substring, pg_encoding_substring
  * Useful-ometer: ()---o
  * Rationale: The JSONPath code uses it / will use it for extracting
 substrings, which is probably not a very useful feature (but who am I
 to say that).  This function could probably benefit the
 text_substring() function in varlena.c , but it would take a bit of
 work to ensure it continues to comply with standards.

 I'm more positive about this idea.  If you can make this generally
 useful, I'd encourage you to do that.  On a random coding style note,
 I see that you have two copies of the following code, which can fairly
 obviously be written in a single line rather than five, assuming it's
 actually safe.

 +               if (sub_end + len  e)
 +               {
 +                       Assert(false);          /* Clipped multibyte
 character */
 +                       break;
 +               }

If I simply say Assert(sub_end + len = e), the function will yield a
range hanging off the edge of the input string (out of bounds).  The
five lines include a safeguard against that when assertion checking is
off.  This could happen if the input string has a clipped multibyte
character at the end.  Perhaps I should just drop the assertions and
make it so if there's a clipped character at the end, it'll be
ignored, no big deal.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: General purpose utility functions used by the JSON data type

2010-08-13 Thread Joseph Adams

On Fri, Aug 13, 2010 at 2:02 PM, David Fetter da...@fetter.org wrote:
 On Fri, Aug 13, 2010 at 01:33:06PM -0400, Robert Haas wrote:
 Maybe so, but it's not clear the interface that Joseph implemented is
 the one everyone wants...

 Fair enough.  What's the interface now in a nutshell?  Lack of
 nutshells might well mean the interface needs more work...

Suppose we have:

-- SQL --
CREATE TYPE colors AS ENUM ('red', 'green', 'blue');

-- C --
enum Colors {RED, GREEN, BLUE, COLOR_COUNT};

In a nutshell:

* Define an enum mapping like so:

static EnumLabel enum_labels[COLOR_COUNT] =
{
{COLOR_RED, red},
{COLOR_GREEN, green},
{COLOR_BLUE, blue}
};

* Get the OIDs of the enum labels:

Oid oids[COLOR_COUNT];
getEnumLabelOids(colors, enum_labels, oids, COLOR_COUNT);

* Convert C enum values to OIDs using the table:

PG_RETURN_OID(oids[COLOR_BLUE]);

A caller would typically cache the Oid table with fn_extra.

Currently, getEnumLabelOids puts InvalidOid for labels that could not
successfully be looked up.  This will probably be changed to
ereport(ERROR) instead.

Also, I'm thinking that getEnumLabelOids should be renamed to
something that's easier to remember.  Maybe enum_oids or
get_enum_oids.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-07-24 Thread Joseph Adams

Update: I'm in the middle of cleaning up the JSON code (
http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary if you
want to see the very latest ), so I haven't addressed all of the major
problems with it yet.

On Fri, Jul 23, 2010 at 2:34 PM, Robert Haas robertmh...@gmail.com wrote:
 - I was under the impression that we wanted EXPLAIN (FORMAT JSON) to
 return type json, but that's obviously not going to be possible if all
 of this is contrib.  We could (a) move it all into core, (b) move the
 type itself and its input and output functions into core and leave the
 rest in contrib [or something along those lines], or (c) give up using
 it as the return type for EXPLAIN (FORMAT JSON).

I've been developing it as a contrib module because:
* I'd imagine it's easier than developing it as a built-in datatype
right away (e.g. editing a .sql.in file versus editing pg_type.h ).
* As a module, it has PGXS support, so people can try it out right
away rather than having to recompile PostgreSQL.

I, for one, think it would be great if the JSON datatype were all in
core :-)  However, if and how much JSON code should go into core is up
for discussion.  Thoughts, anyone?

A particularly useful aspect of the JSON support is the ability to
convert PostgreSQL arrays to JSON arrays (using to_json ), as there
currently isn't any streamlined way to parse arrays in the PostgreSQL
format client-side (that I know of).


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: utf8_to_unicode (trivial)

2010-07-24 Thread Joseph Adams

In src/include/mb/pg_wchar.h , there is a function unicode_to_utf8 ,
but no corresponding utf8_to_unicode .  However, there is a static
function called utf2ucs that does what utf8_to_unicode would do.

I'd like this function to be available because the JSON code needs to
convert UTF-8 to and from Unicode codepoints, and I'm currently using
a separate UTF-8 to codepoint function for that.

This patch renames utf2ucs to utf8_to_unicode and makes it public.  It
also fixes the version of utf2ucs in  src/bin/psql/mbprint.c so that
it's equivalent to the one in wchar.c .

This is a patch against CVS HEAD for application.  It compiles and
tests successfully.

Comments?  Thanks,


Joey Adams


utf8-to-unicode.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extensible enum types

2010-06-18 Thread Joseph Adams

On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan and...@dunslane.net wrote:


 Robert Haas wrote:

 On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.net
 wrote:


 You are just bumping up the storage cost. Part of the attraction of enums
 is
 their efficiency.


 What's efficient about them?  Aren't we using 4 bytes to store a value
 that will nearly always fit in 2, if not 1?



 This was debated when we implemented enums. As between 1,2 and 4 there is
 often not much to choose, as alignment padding makes it pretty much the
 same. But any of them are more efficient than storing a numeric value or the
 label itself.

 Anyway, it might well be moot.

 cheers

 andrew

Something I don't understand in all this is: why can't the type of an
enum be determined statically rather than stored in every single
value?  For instance, if we have:

CREATE TYPE number AS ENUM ('one', 'two', 'three');
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

PostgreSQL won't allow a comparison between two different enum types, e.g.:

 SELECT 'one'::number = 'red'::color;
ERROR:  operator does not exist: number = color

However, when we say:

SELECT 'one'::number = 'one'::number

Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
enum input rather than rely on it being stored in the value (either
implicitly via OID or explicitly as a word half)?

Also, I can't seem to find the original debates from when enums were
implemented.  Does anyone have a link to that thread in the archives?
Thanks.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Should the JSON datatype be a specialization of text?

2010-06-17 Thread Joseph Adams

Currently, the JSON datatype (repository:
http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary ) is
implemented somewhat like a specialization of TEXT, like XML is.  I'm
beginning to question if this is the right way to go.  This doesn't
concern whether the JSON datatype should retain the given content
verbatim (e.g. '  string  '::JSON doesn't strip spaces) or whether
it should be internally stored using varlena (the same way TEXT is
stored).  What I'm talking about revolves around two fundamental
approaches to the API design:

A. JSON as a specialization of TEXT.  json('string')::TEXT =
'string'.  To unwrap it, you need a special function:
from_json('string')::TEXT .
B. JSON as a naked type.  json('string')::TEXT = 'string'.  To
unwrap it, simply cast to the desired type.

Early in the discussion of the JSON datatype proposal, we leaned in
favor of approach A (see
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00263.php ).
However, based on input I've received (mainly questions about why
from_json and to_json exist), I'm beginning to think that while
approach A makes more sense from an implementor's perspective,
approach B makes a heck of a lot more sense to users.

Although my code currently implements approach A, I am in favor of approach B.

Arguments I can think of in favor of approach A (keeping JSON as a
specialization of TEXT):

* No surprises when casting between JSON and TEXT.  If approach B is
used, 'string'::json would be 'string', but 'string'::json::text
would be 'string'.
* 'null'::json and NULL are distinct.  'null'::json is just a string
containing 'null' and won't ever become NULL unless you explicitly
pass it through from_json.  Also, if I'm not mistaken, input functions
can't yield null when given non-null input (see the above link).
* For users who just want to store some JSON-encoded text in a
database for a while, approach A probably makes more sense.
* Is consistent with the XML datatype.

Arguments in favor of approach B (making JSON a naked data type):

* Makes data more accessible.  Just cast to the type you need, just
like any other data type.  No need to remember to_json and from_json
(though these function names might be used for functions to convert
JSON-formatted TEXT to/from the JSON datatype).
* Is consistent with other programming languages.  When you
json_decode something in PHP, you get an object or an array.  When you
paste a JSON literal into JavaScript code, you end up with a native
type, not some object you have to convert down to a native type.
Notice how in these programming languages, you typically carry
verbatim JSON texts around as strings, not a special string type that
performs validation.
* JSON was meant to be a format representing types in a programming
language.  It has arrays, objects, strings, true, false and null
because JavaScript and many other popular scripting languages have
those.
* Users tend to care more about the underlying data in JSON values
than the notion of JSON-formatted text (though users care about that
too).  If a user really wants to treat JSON like text, why not just
use TEXT along with CHECK (json_validate(content)) ?  Granted, it's
not as fun :-)

One workaround to the null problem of approach B might be to throw an
error when 'null' is passed to the input function (as in, don't allow
the JSON type to even hold 'null' (though null can be nested within an
array/object)), and have a function for converting text to JSON that
returns NULL if 'null' is given.  Note that I am strongly against only
allowing the JSON type to hold objects and arrays, in particular
because it would break being able to extract non-compound values from
JSON trees (e.g. json_get('[0,1,2,3]', '[2]') ).  By the way, how hard
would it be to get 'null'::JSON to yield NULL?

Keep in mind there's a chance someone will standardize JSON/SQL in the
future, so more may be at stake here than just PostgreSQL's codebase
and users.

Although text versus naked is a fundamental design aspect of the JSON
datatype, it shouldn't be a blocker for me moving forward with the
project.  Most of the code in place and in the works shouldn't be
affected by a transition from approach A to B.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hstore == and deprecate =

2010-06-11 Thread Joseph Adams

On Fri, Jun 11, 2010 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 How about no operator at all?  It won't be as cool to read, but
 consider, the arguments are text and text, not involving any hstore type
 at all, so whatever operator you choose is in practice blocked from
 everyone everywhere.  No one could ever implement another key/value
 system that uses a similar interface.  Or use the operator for anything
 else involving strings.

 Yeah, that's a good point.  Maybe we should just deprecate the operator
 altogether.

                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


To repeat an earlier question (which was in turn repeating an earlier
question), would it be possible to do one of these, yielding '
key=this, key2=that '::hstore  :

hstore(key := 'this', key2 := 'that')
hstore(key = 'this', key2 = 'that')
hstore(row('this' AS key, 'that' AS key2))

In my opinion, it's important to find a reasonably elegant way to
express hstore literals (rather than something like
hstore(hstore_kvp('key', 'this'), hstore_kvp('key2', 'that')) )
because something so basic shouldn't be so difficult to work with.
It'd be a bonus if there was a general form for arbitrary named
parameters that new functions could opt-in on (in particular,
json_object :-) ).

This type of function could be created by saying something like:

CREATE FUNCTION hstore(NAMED) RETURNS hstore AS ...
-- NAMED would be a new argtype

Also, if a user needs to name a field from a variable, perhaps there
should be a syntax that allows it, such as:

hstore(pg_deref(var) = 'this', pg_deref(var2) = 'that')

If implementing pg_deref would be problematic in general, perhaps it
could initially just be supported in keys of functions like this.

Only if we allow for functions with named parameters in this fashion
would I say +1 for deprecating the = operator from hstore without a
replacement operator like == .


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Working with PostgreSQL enums in C code

2010-06-07 Thread Joseph Adams

On Fri, May 28, 2010 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, May 28, 2010 at 12:07 AM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I learned that to return an enum value from C, one needs to return the
 OID of the right row of the pg_enum table.  I eventually managed to
 write the code below, which is mostly based on the enum_in function in
 src/backend/utils/adt/enum.c .

 PG_RETURN macros shouldn't do any nontrivial processing (see the
 existing ones for references).

 Yeah, that was my first reaction too.  If we don't already have one,
 it would be appropriate to provide a lookup enum value function
 (functionally about the same as enum_in, but designed to be called
 conveniently from C).  Then, if you needed to work from a textual
 enum label, you'd call that function and then PG_RETURN_OID.

Here is the function I wrote to look up enum values:

Oid enum_label_to_oid(const char *typname, const char *label)
{
Oid enumtypoid;
HeapTuple   tup;
Oid ret;

enumtypoid = TypenameGetTypid(typname);
Assert(OidIsValid(enumtypoid));

tup = SearchSysCache2(ENUMTYPOIDNAME,
  
ObjectIdGetDatum(enumtypoid),
  
CStringGetDatum(label));
Assert(HeapTupleIsValid(tup));

ret = HeapTupleGetOid(tup);

ReleaseSysCache(tup);

return ret;
}

If this were added to PostgreSQL proper, what source/header files
would it make sense to put it in?  enum.c/builtins.h ?

 However, for a built-in enum type, I agree with Robert's solution of
 just #define-ing fixed OIDs for the values of the type.

I don't know if the JSON datatype will be a contrib module or built-in
yet, but if it were contrib, would it still be better to use fixed
OIDs anyway?  One issue with setting this precedent is that new
contrib modules using enums wouldn't be compatible with older versions
of PostgreSQL.  One idea might be to insert extend CREATE TYPE name AS
ENUM so OIDs can be specified explicitly, but then that could lead to
OID clashes.  That would be a really messy problem for users.

By the way, just curious: why can't the internal representation of an
enum just be an INT starting from 0 by default, like in C?  That would
make a heck of a lot more sense, in my opinion.  It might also allow
users to do things like this in the future:

CREATE TYPE number AS ENUM ('one' = 1, 'two' = 2, 'three' = 3);


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-06-04 Thread Joseph Adams

On Wed, May 26, 2010 at 9:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we go with the spec's syntax I think we'd have no realistic choice
 except to forbid = altogether as an operator name.  (And no, I'm not
 for that.)

 I suppose the most painful thing about doing that is that it would
 break hstore.  Are there other commonly-used modules that rely on =
 as an operator name?

 There don't seem to be any other contrib modules that define = as an
 operator name, but I'm not sure what's out there on pgfoundry or
 elsewhere.  The bigger issue to me is not so much hstore itself as that
 this is an awfully attractive operator name for anything container-ish.
 Wasn't the JSON-datatype proposal using = for an operator at one stage?
 (The current wiki page for it doesn't seem to reflect any such idea,
 though.)  And I think I remember Oleg  Teodor proposing such an
 operator in conjunction with some GIN-related idea or other.

 In spite of the difficulties, I'm reluctant to give up on it.  I
 always thought that the AS syntax was a crock and I'm not eager to
 invent another crock to replace it.  Being compatible with the SQL
 standard and with Oracle is not to be taken lightly.

 Yeah, I know.  Though this could end up being one of the bits of the
 spec that we politely decline to follow, like upper-casing identifiers.
 Still, it's a good idea to think again before we've set the release
 in stone ...

                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


I didn't really consider using = for json because it would interfere
with hstore (one of the signatures is text = text returns hstore, for
instance).  I am considering using - as a json subscript operator
(which is what hstore does) as it shouldn't interfere with hstore (as
far as I know).

Here's a thought:  suppose we did use the foo (name = value) syntax
for naming parameters.  It could still be used in a very similar way
for hstore:

hstore(key1 = 'value1', key2 = 'value2')

One advantage here is that = wouldn't be exclusive to hstore anymore.  E.g.:

json(key1 = 'value1', key2 = 'value2')

However, note that the left hand of = is an identifier here, whereas
the left hand of hstore's current = operator is either text, text[],
or hstore.

If I had to choose between = and := for parameter naming, I'd go with
:= because it seems more SQLish to me.

I wonder if the foo (name : value) syntax would be possible/desirable.
 Or maybe foo ({name: value}) :-)


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-06-04 Thread Joseph Adams

On Fri, Jun 4, 2010 at 9:55 PM, Joseph Adams joeyadams3.14...@gmail.com wrote:
 If I had to choose between = and := for parameter naming, I'd go with
 := because it seems more SQLish to me.

On second thought, = might actually be a very intuitive syntax for
defining dictionary types like hstore and json, since it matches PHP's
associative array syntax, as in:

hstore('key1' = 'value1', 'key2' = 'value2') -- hypothetical SQL
array('key1' = 'value1', 'key2' = 'value2') // PHP

That way, when people see =, they can think dictionary whether
they're in PHP or SQL.

Note that this is a bit different than what I suggested earlier:

hstore(key1 = 'value1', key2 = 'value2')

Identifier names were used instead of literal names, which conflicts
with the other approach.  Also, the other approach is more flexible,
as the user can generate names with expressions at runtime.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON manipulation functions

2010-05-27 Thread Joseph Adams

I've started implementing the JSON datatype; the repo is at
http://git.postgresql.org/gitweb?p=json-datatype.git .

On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Would it be a bad idea to give an enum and a function the same name
 (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
 could be json_typeof(json) or something instead.

 No, I think that's a fine idea.

I tried making a function named json_type that has the same name as
the type json_type.  However, this doesn't work as expected:

SELECT json_type('[1,2,3]');

Instead of calling json_type with '[1,2,3]' casted to JSON, it's
trying to cast '[1,2,3]' to json_type.  Is there a way to override
this behavior, or would I be better off renaming the function?

Note that if the function were renamed, the literal would implicitly be json:

SELECT json_typeof('[1,2,3]'); -- works

I tried this:

CREATE OR REPLACE FUNCTION json_type(json)
RETURNS json_type
AS 'MODULE_PATHNAME','json_get_type'
LANGUAGE C STRICT IMMUTABLE;

CREATE CAST (json AS json_type) WITH FUNCTION json_type(json);

However, json_type('[1,2,3]') still doesn't work (it doesn't infer
that '[1,2,3]' should be casted to json first).  I also tried each of
AS ASSIGNMENT and AS IMPLICIT as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Working with PostgreSQL enums in C code

2010-05-27 Thread Joseph Adams

I encountered a situation while implementing JSON support where I
needed to return an enum value from a C function.  To clarify, here's
the SQL:

CREATE TYPE json_type_t AS ENUM ('null', 'string', 'number', 'bool',
'object', 'array');

CREATE OR REPLACE FUNCTION json_type(json)
RETURNS json_type_t
AS 'MODULE_PATHNAME','json_get_type'
LANGUAGE C STRICT IMMUTABLE;

I initially tried looking for another function returning an enum in
the PostgreSQL source tree, but I couldn't find any.  I guess this is
because enums are a relatively new feature in PostgreSQL.

I learned that to return an enum value from C, one needs to return the
OID of the right row of the pg_enum table.  I eventually managed to
write the code below, which is mostly based on the enum_in function in
src/backend/utils/adt/enum.c .

#define PG_RETURN_ENUM(typname, label) return enumLabelToOid(typname, label)

static Oid enumLabelToOid(const char *typname, const char *label)
{
Oid enumtypoid;
HeapTuple   tup;
Oid ret;

enumtypoid = TypenameGetTypid(typname);
Assert(OidIsValid(enumtypoid));

tup = SearchSysCache2(ENUMTYPOIDNAME,
  ObjectIdGetDatum(enumtypoid),
  CStringGetDatum(label));
Assert(HeapTupleIsValid(tup));

ret = HeapTupleGetOid(tup);

ReleaseSysCache(tup);

return ret;
}

Feel free to nitpick the code above, as I'm still learning.  Note that
I replaced the more robust validity checks of enum_in with (quicker?)
asserts, with the assumption that correct programs would only pass
valid values to PG_RETURN_ENUM .

The code using the method above can be found here:
http://git.postgresql.org/gitweb?p=json-datatype.git;a=tree;f=contrib/json;h=1dd813da4016b31f35cb39b01c6d5f0999da672e;hb=092fa046f95580dd7906a07370ca401692a1f818
.  My testcases passed, so everything seems to work.

I suppose my PG_RETURN_ENUM macro is nice and simple, except for the
fact that the coder has to keep an enum names table in sync with the
SQL code and the C code.  However, going the other way around
(PG_GETARG_ENUM) would need access to that enum names table.  Hence,
it'd make sense to have macros for defining this table so both
PG_RETURN_ENUM and PG_GETARG_ENUM can reference it.

I believe that these macros would be a useful addition to the
PostgreSQL function manager API, as they would provide a decent way to
receive and return custom enums from C code.  Anyone agree/disagree?


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Joseph Adams

I started a wiki article for brainstorming the JSON API:
http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
substantial changes to the draft of the API based on discussion here
and on the #postgresql IRC channel.

Is it alright to use the wiki for brainstorming, or should it stay on
the mailing list or go somewhere else?

I'll try not to spend too much time quibbling over the specifics as I
tend to do.  While the brainstorming is going on, I plan to start
implementing the datatype by itself so I can establish an initial
working codebase.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON manipulation functions

2010-05-25 Thread Joseph Adams

 Well, I think it's fine to use the wiki for brainstorming, but before
 you change the design you probably need to talk about it here.  You
 can't rely on everyone on -hackers to follow changes on a wiki page
 somewhere.  It looks like the API has been overhauled pretty heavily
 since the last version we talked about here, and I'm not sure I
 understand it.

I'll try to explain it in one big nutshell:

Instead of, for instance, json_to_number('5') and number_to_json(5), I
propose changing it to from_json(5)::INT and to_json('5').  Note how
from_json simply returns TEXT containing the underlying value for the
user to cast.  I plan to make calling to_json/from_json with arrays or
objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw
an error for now, as implementing all the specifics of this could be
quite distracting.

If I'm not mistaken, json_object([content [AS name] [, ...]] | *)
RETURNS json can't be implemented without augmenting the grammar (as
was done with xmlforest), so I considered making it take a RECORD
parameter like the hstore(RECORD) function does, as was suggested on
IRC.  However, this may be inadequate for selecting some columns but
not others.  Using examples from hstore:

SELECT hstore(foo) FROM foo;  = 'e=2.71828, pi=3.14159'
-- this works, but what if we only want one field?

SELECT hstore(pi) FROM foo;
-- function type error

SELECT hstore(row(pi)) FROM foo;  = 'f1=3.14159'
-- field name is lost

SELECT hstore(bar) FROM (select pi FROM foo) AS bar;  = 'f1=3.14159'
-- ugly, and field name is *still* lost

To get (and set, which I overlooked before), use json_get and
json_set.  These take JSONPath expressions, but I don't plan to
implement all sorts of fancy features during the summer.  However, I
do plan to support some kind of parameter substitution so you can do
this:

json_get('[0,1,4,9,16,25]', '[%]' %% 2)= '4'::TEXT

For this use case, though, it would be simpler to say:

'[0,1,4,9,16,25]'::JSON - 2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: Hiding data in postgresql

2010-05-25 Thread Joseph Adams

On Tue, May 25, 2010 at 3:39 PM, Hector Beyers hqbey...@gmail.com wrote:

 Hi guys,
 (I tried the question in another forum first)
 Does someone have any ideas how I can hide data without the meta data
 noticing? To explain further, I would like to save some collection of data
 where the meta-data does not see it. I am trying to do some security through
 obscurity. It is for research purposes.
 For example, populate a table with 1000 rows, but the meta-data only knows
 of about 500 of them? Only on an export of a dump can you find all the data
 again. Or maybe to make a hidden duplicate schema that can point to the
 hidden data?
 Does someone have any good ideas on how to achieve this or something
 similar?
 Kind regards
 Hector


 On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote:

 Hi guys,
 does ANYONE have any tips on hiding data on a database server? This means
 that data is stored in places that is not necessarily picked up in the
 schema of the database. I am doing some research on databases and need some
 direction.
 Any help or direction will be highly appreciated.
 Kind regards
 Hector

Not sure if this helpful, but be sure to know about views, which can
be used to filter out rows of a table.  Example:

CREATE TABLE foo (name TEXT, visible BOOL);
INSERT INTO foo VALUES ('two', true);
INSERT INTO foo VALUES ('three', true);
INSERT INTO foo VALUES ('four', false);
INSERT INTO foo VALUES ('five', true);
INSERT INTO foo VALUES ('six', false);
INSERT INTO foo VALUES ('seven', true);
INSERT INTO foo VALUES ('eight', false);
INSERT INTO foo VALUES ('nine', false);
INSERT INTO foo VALUES ('ten', false);
INSERT INTO foo VALUES ('eleven', true);

CREATE VIEW foo_view AS SELECT foo.name FROM foo WHERE visible=true;

= SELECT * FROM foo;
  name  | visible
+-
 two| t
 three  | t
 four   | f
 five   | t
 six| f
 seven  | t
 eight  | f
 nine   | f
 ten| f
 eleven | t
(10 rows)

= SELECT * FROM foo_view;
  name

 two
 three
 five
 seven
 eleven
(5 rows)

Note that views are SELECT-only, but you can use CREATE RULE to
simulate an updatable view.

You may also want to read about Veil:
http://veil.projects.postgresql.org/curdocs/main.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON manipulation functions

2010-05-14 Thread Joseph Adams

On Fri, May 14, 2010 at 11:33 AM, Bruce Momjian br...@momjian.us wrote:
 Joseph Adams wrote:
 == array/object conversion ==

 The json_object function converts a tuple to a JSON object.  If there
 are duplicate column names, there will be duplicate keys in the
 resulting JSON object.

 json_object([content [AS name] [, ...]]) returns json

 Likewise, the json_array function converts a tuple to a JSON array.
 Column names are ignored.

 json_array([content [AS name] [, ...]]) returns json

 Do you see any problems with the fact that JSON arrays can use mixed
 data types, e.g.:

        [ 1, 2, 'hi', false]

I suppose the json_object and json_array functions would determine
which JSON types to employ by looking at the types of arguments given
(TEXT values would become strings, INT/FLOAT/NUMERIC/etc. values would
become numbers, TRUE/FALSE would become true/false, NULLS would just
be null, and JSON values would just be inserted as themselves).  Note
that json_array('Hello'::TEXT) would yield '[\Hello\]'::JSON,
while json_array('Hello'::JSON) would yield '[Hello]' .

Going the other way around, values pulled out of JSON objects and
arrays would just be of type JSON.  This (revised) function signature
says it all:

json_values(JSON) returns JSON[]

In short, I don't believe mixed data types in arrays will be a
problem.  json_to_* and *_to_json functions would be used for
individual conversions.

On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote:
 json_keys gets the keys of a JSON object as a set.

 json_keys(json) returns setof text

 I would tend to make this return text[] rather than SETOF text.

 json_values gets the values of a JSON object or the iems of a JSON
 array as a set.

 json_values(json) returns setof json

 Similarly I would make this return json[].

Agreed.  For those who want sets, the unnest() function can be used.

 - retrieves an item of a JSON object by key.
[snip]
 [] retrieves a value of a JSON array/object by (one-based) index.
[snip]

 I think some kind of array deference and object deference mechanism is
 absolutely, positively 100% required.  I don't know whether the
 particular syntax you've proposed here is best or whether we should
 pick another syntax or just use function notation, but I think we
 definitely need *something*.

If the dereferencing operations aren't available, one could work
around it by using json_keys/json_values.  Of course, it would be a
really clunky solution, and implementing - will probably be easy
compared to implementing those functions.

 I also think we need a function called something like json_length()
 which returns the length of a list or the number of keys in an object.

Definitely.

By the way, I'm considering making it so JSON arrays will be treated
like objects when it comes to - and the json_keys function.  Thus,
json_keys('[1,4,9,16,25]') would yield '{1,2,3,4,5}', and
('[1,4,9,16,25]'::JSON) - 3 would yield the third item.  This would
obviate the need for an array-only subscript function/operator.

In general, I prefer zero-based counting, but because PostgreSQL's
array indexing is one-based, one-based array keys would be better for
the sake of consistency.  Note that if there was a function like this
in the future:

-- Access a JSON object like you would in JavaScript
json_path('{squares: [1,4,9,16,25]}', '.squares[2]')

There could be confusion, as JavaScript uses zero-based indexing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] JSON manipulation functions

2010-05-13 Thread Joseph Adams

First off, thank you for allowing me to participate in Google Summer
of Code 2010.  I'm sorry I haven't been active for the past few weeks.
 Today, I added the wiki page for my project, but the project schedule
is highly tentative:
http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010 .

I'd like to discuss how the functions for type checking and conversion
(the majority of my project) should work.  Below is my first draft for
the JSON manipulation function specs, along with annotations.

== Type checking ==

The following function returns the type of any JSON value.

json_type as enum ('null', 'string', 'number', 'boolean', 'object', 'array')
json_type(json) returns json_type

Would it be a bad idea to give an enum and a function the same name
(which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
could be json_typeof(json) or something instead.

I thought about having predicates like IS STRING and IS NUMBER,
similar to the IS DOCUMENT predicate used for XML.  However, a major
problem with that approach is that it could lead to confusion
involving IS NULL.  By my understanding, the JSON datatype will just
be a specialization of TEXT (it just validates the input).  Like TEXT,
a JSON string can be 'null'.  'null'::JSON is not NULL.  Bear in mind
that json_to_*('null') is NULL, though.

I also thought about having a series of json_is_* functions.  I don't
think it's a bad idea, but I think json_type is a better solution.

== text/number/boolean conversion ==

These functions each convert a non-compound JSON value to its
respective return type.  Run-time type checking is performed; a
conversion will throw an error if the input JSON is not the correct
type.  If the JSON value is 'null', then the return value will be
NULL.

json_to_text(json) returns text
json_to_number(json) returns numeric
json_to_bool(json) returns boolean

These functions convert values to JSON.  Passing NULL to any of the
functions below will return 'null':

text_to_json(text) returns json
number_to_json(numeric) returns json
bool_to_json(boolean) returns json

There could be generic value_to_json(any), but not a
json_to_value(json) function.  See
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00321.php for
more details.

Conversion to/from number or boolean can also be achieved with
casting.  Note well that 'string'::JSON::TEXT is 'string', not the
string's actual value.  json_to_text is needed for this conversion.
For this reason, casting JSON might seem like something to recommend
against.  However, IMHO, casting numbers and booleans to/from JSON is
fine and dandy; the paragraphs below give some weight to this.

I originally considered making json_to_number and number_to_json work
with TEXT instead of NUMERIC.  However, as Tom Lane pointed out in the
above link, Forcing people to insert explicit coercions from text
isn't going to be particularly convenient to use..  Nevertheless,
NUMERIC introduces a problem.  For instance, if you say:

SELECT '-1e-38'::NUMERIC;

This conversion knocks out the scientific notation and produces a
41-character string.  I seriously doubt that all outside applications
will handle 41-character numbers correctly.

Perhaps there should be individual functions for specific data types,
or maybe just a handful for particular cases.  There might be
json_to_int, json_to_float, and json_to_numeric.  In any case,
converting to/from number types can be achieved quite easily with
casting.

== array/object conversion ==

The json_object function converts a tuple to a JSON object.  If there
are duplicate column names, there will be duplicate keys in the
resulting JSON object.

json_object([content [AS name] [, ...]]) returns json

Likewise, the json_array function converts a tuple to a JSON array.
Column names are ignored.

json_array([content [AS name] [, ...]]) returns json

The json_agg function reduces a set of JSON values to a single array
containing those values.

aggregate json_agg(json) returns json

json_object and json_agg can be used together to convert an entire
result set to one JSON array:

SELECT json_agg(json_object(*)) FROM tablename;

json_keys gets the keys of a JSON object as a set.

json_keys(json) returns setof text

json_values gets the values of a JSON object or the iems of a JSON
array as a set.

json_values(json) returns setof json

Note that all JSON slicing and splicing operations retain the original
formatting of JSON content.

== Miscellaneous ==

The features below would be nice, but will probably not be regarded as
required for this Google Summer of Code project to be considered
complete.

json_cleanup accepts a superset of JSON and, if it can, cleans it up
and returns a valid JSON string.  This superset of JSON supports the
following extra features:

 * Comments:
   - Single-line comments with // and #
   - C-style comments: /* comment */
 * Unquoted object keys: {key: value}
 * Single quote strings: 'single quotes; double quotes do not need
to be escaped here'

[HACKERS] SELECT * in a CREATE VIEW statement doesn't update column set automatically

2010-05-06 Thread Joseph Adams

This isn't exactly a bug, but it could be considered unintuitive
behavior.  Consider this:

CREATE VIEW foo AS SELECT * FROM a;
CREATE VIEW foo_v AS SELECT * FROM foo;
ALTER TABLE foo ADD COLUMN b INT;

The ALTER TABLE statement affects VIEW foo, but the column addition
does not propagate to VIEW foo_v.  Thus, it makes this deceptive:

... AS SELECT * FROM foo;

I ran into this with an application where a real table is accessed if
the user is an admin, while regular users access a view instead.  I
considered AS SELECT * FROM foo to be a promise that all columns
from foo would be included in the view, but the promise is broken when
ADD COLUMN is applied later on.

Would it be a desirable feature to make `CREATE VIEW foo_v AS SELECT *
FROM foo;` automatically update the column set when foo's columns
change?  Instead of the wildcard * being expanded once at CREATE VIEW
time, it would (semantically) be expanded every time foo_v is selected
on.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Virtual Private Database

2010-04-10 Thread Joseph Adams

On Sat, Apr 10, 2010 at 10:00 AM, Jean-Gérard Pailloncy j...@rilk.com wrote:
 Hello,

 1) VPD: Virtual Private Database
 I would appreciate to have a new feature in PostgreSQL.
 This is an oracle-like feature that implement Row Level Security.
 This feature may be emulated by using VIEW/RULE but this is very time
 consuming and error prone.

 I would appreciated to have an estimated of the faisability and the cost
 to implement it.

 2) Description
 The feature may be implemented with a simple expression associated to the
 table.

 ALTER TABLE table_name ADD FILTER filter_name CHECK(expression);
 ALTER TABLE table_name DROP FILTER filter_name;

 Usage/example:
 ALTER TABLE filtered_table ADD FILTER tf_username
 CHECK(filtered_table.creator=user)
 SELECT * FROM filtered_table;
 will really do
 SELECT * FROM filtered_table WHERE filtered_table.creator=user;

 Same thing for INSERT, UDPATE, and DELETE

 UPDATE filtered_table SET b_column=1 WHERE a_column='a';
 wille really do
 UPDATE filtered_table SET b_column=1 WHERE a_column='a' and
 filtered_table.creator=user;

 In practice, the devs will create few function: my_login, my_logout,
 my_filter
 and the simple filtered_table.creator=user will be replace by ACL
 encapsulated in the function my_filter and add a triger to check data on
 INSERT, UDPATE.
 We could use veil to build a very efficient filter.

 3) Question
 - Is it doable ?
 - Is it the sound way of doing it ?
 - Is it possible to have it in core ?
 - Is there a pgsql dev interested to implemented it ?
 - Is there other people interested in such feature ?
 - How much this will cost ?
 - With which delay ?


 Cordialement,
 Jean-Gérard Pailloncy


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


This is quite similar to an idea I posted about not long ago called
access control jails (
http://archives.postgresql.org/pgsql-hackers/2010-03/msg00832.php ).
I encountered this very problem writing a simple web application
involving access control.  There are a variety of ways to make
implementing access control easier, and I think copying off of Oracle
would be among the best ;-)

Disclaimer:  I am not a PostgreSQL hacker, but a newbie with some
experience in other communities, absorbing what he can.

Access control jailing, as I conceived it, would not simply filter per
table, but would make it so all queries would be filtered.  If used
correctly, it would even be safe to execute untrusted SQL (though it
might not be advisable).

I looked at the Veil demo application a tiny bit, and the only thing I
drew from it was the impression that it makes things more complicated,
not less :(  Then again, it may just be the example demonstrating a
lot of different features at once.

One problem that ought to be addressed for any of these ideas is how
to do connection-local variables.  For instance:

 UPDATE filtered_table SET b_column=1 WHERE a_column='a';
 wille really do
 UPDATE filtered_table SET b_column=1 WHERE a_column='a' and
 filtered_table.creator=user;

Here, what is =user referring to?  I suppose it is a variable that
is set not long after the session starts and only applies to that
session?  PostgreSQL has temporary tables and such, but you can't
reference them until they're already created.  Hence, I don't think
PostgreSQL elegantly supports free variables that are bound
temporarily per connection.  There are GUCs and such, but using them
for this purpose is far from elegant, if I understand correctly.

Another problem is that session-local context doesn't go well with
connection pooling, so you might need some workaround like passing
context IDs back and forth.

That's my own summary of the discussion about access control jails linked above.

By the way, here's a hack to bind a free variable to a session:

CREATE FUNCTION get_user_id() RETURNS INT AS $$
DECLARE
ret INT;
BEGIN
SELECT INTO ret id FROM user_id_tbl;
RETURN ret;
END
$$ LANGUAGE 'plpgsql';

Then, per-session:

CREATE TEMPORARY TABLE user_id_tbl (id INT);
INSERT INTO user_id_tbl VALUES (5);
SELECT get_user_id();

It relies on plpgsql not complaining about user_id_tbl not existing at
creation time.  What this trick allows one to do is set the user ID
once (e.g. after connecting), then views and such that call
get_user_id() will have the appropriate user ID without needing to
specify it per-query.

I'm curious:  is this trick a good idea?  Does connection pooling play
well with temporary tables (and thus this trick)?  Could it result in
substantial slowdowns (I don't see why it should, since get_user_id()
needs to be called once per query that uses it)?  I guess creating a
temporary table every connection has the potential to be slow.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Gsoc XQuery

2010-04-09 Thread Joseph Adams

On Fri, Apr 9, 2010 at 5:02 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Necati Batur escribió:

 *Delivarables on Timeline*
...
 *Implementation of a GUI for XQuery for the PostgreSql users (2 week)

 What, you will *also* implement a GUI?  Gimme a break.  There's no way
 you can achieve most of these in 1-2 weeks.  Are you Tom Lane
 perchance?

Indeed.  I've learned it's not a good idea to throw a GUI into a
proposal when the rest of the project is already substantial, as it
can lead to embarrassment when it doesn't get realized ;-)  I did that
last year.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-09 Thread Joseph Adams

On Fri, Apr 9, 2010 at 4:08 PM, Dimitri Fontaine dfonta...@hi-media.com wrote:
 I guess a GSoC of reasonable size would be to define a spec for how to
 implement partitioning in PostgreSQL with a sound and accepted proposal
 on independent steps to contribute separately, in order to reach the
 full implementation in an incremental fashion and by different hackers.

 Then you could pick up one of those items. By then I mean after the
 summary and the plan both have been accepted by core people and by
 contributors who said in the past they wanted to spend precious hours on
 the topic.

 But I don't know if a GSoC can be completed without even coding.

According to the link below, GSoC proposals for documentation aren't
accepted.  This probably extends to other non-coding work as well.

http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#doc_proposals

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Joseph Adams

On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Petr Jelinek pjmo...@pjmodos.net writes:
 Dne 6.4.2010 7:57, Joseph Adams napsal(a):
 To me, the most logical approach is to do the obvious thing: make
 JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
 NULLs in it and converting the result set to JSON would yield a
 structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
 sure what startling results would come of this approach, but I'm
 guessing this would be most intuitive and useful.

 +1

 I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
 there is no other standard datatype for which the input converter can
 yield NULL from a non-null input string, and I'm not even sure that the
 InputFunctionCall protocol allows it.  (In fact a quick look indicates
 that it doesn't...)

 To me, what this throws into question is not so much whether JSON null
 should equate to SQL NULL (it should), but whether it's sane to accept
 atomic values.  If I understood the beginning of this discussion, that's
 not strictly legal.  I think it would be better for strict input mode
 to reject this, and permissive mode to convert it to a non-atomic value.
 Thus jsonify('null') wouldn't yield NULL but a structure containing a
 null.

                        regards, tom lane


Actually, I kind of made a zany mistake here.  If 'null'::JSON yielded
NULL, that would mean some type of automatic conversion was going on.
Likewise, '3.14159'::JSON shouldn't magically turn into a FLOAT.

I think the JSON datatype should behave more like TEXT.  'null'::JSON
would yield a JSON fragment containing 'null'.  'null'::JSON::TEXT
would yield the literal text 'null'.  However, '3.14159'::JSON::FLOAT
should probably not be allowed as a precaution, as
'hello'::JSON::TEXT would yield 'hello', not 'hello'.  In other
words, casting to the target type directly isn't the same as parsing
JSON and extracting a value.

Perhaps there could be conversion functions.  E.g.:

json_to_string('hello') yields 'hello'
json_to_number('3.14159') yields '3.14159' as text
(it is up to the user to cast it to the number type s/he wants)
json_to_bool('true') yields TRUE
json_to_null('null') yields NULL, json_null('nonsense') fails

string_to_json('hello') yields 'hello' as JSON
number_to_json(3.14159) yields '3.14159' as JSON
bool_to_json(TRUE) yields 'true' as JSON
null_to_json(NULL) yields 'null' as JSON (kinda useless)

I wonder if these could all be reduced to two generic functions, like
json_to_value and value_to_json.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-04-05 Thread Joseph Adams

Another JSON strictness issue:  the JSON standard (
http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
be an array or object.  However, my implementation currently accepts
any valid value.  Thus, '3', 'hello', 'true', 'false', and 'null'
are all accepted by my implementation, but are not strictly JSON text.
 The question is: should the JSON datatype accept atomic values (those
that aren't arrays or objects) as valid JSON?

I tried a few other JSON implementations to see where they stand
regarding atomic types as input:

JSON_checker (C) does not accept them.
JSON.parse() (JavaScript) accepts them.
json_decode() (PHP) accepts them.  However, support is currently buggy
(e.g. '1' is accepted, but '1 ' is not).
cJSON (C) accepts them.
JSON.pm (Perl) accepts them if you specify the allow_nonref option.
Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
a string by itself.

In my opinion, we should accept an atomic value as valid JSON content.
 I suppose we could get away with calling it a content fragment as
is done with XML without a doctype.

Accepting atomic values as valid JSON would be more orthagonal, as it
would be possible to have a function like this:

json_values(object_or_array JSON) RETURNS SETOF JSON
-- extracts values from an object or members from an array, returning
them as JSON fragments.

Also, should we go even further and accept key:value pairs by themselves? :

'key:value'::JSON

I don't think we should because doing so would be rather zany.  It
would mean JSON content could be invalid in value context, as in:

// JavaScript
var content = key : value;

I improved my JSON library.  It now only accepts strict, UTF-8 encoded
JSON values (that is, objects, arrays, strings, numbers, true, false,
and null).  It also has a json_decode_liberal() function that accepts
a string, cleans it up, and passes it through the stricter
json_decode().  json_decode_liberal() filters out comments, allows
single quoted strings, and accepts a lax number format compared to
strict JSON.  I may add Unicode repair to it later on, but
implementing that well really depends on what type of Unicode errors
appear in real life, I think.

http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2

My json.c is now 1161 lines long, so I can't quite call it small anymore.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-04-03 Thread Joseph Adams

I've been wondering whether the JSON datatype should be strict or conservative.

For one, there's strict JSON (following the exact specification).
Then there's more conservative JSON variants.  Some JSON parsers
support comments, some support invalid number formats (e.g. '3.' or
'+5'), etc..

The consensus seems to be that JSON content should be stored verbatim
(it should store the exact string the client sent to it), as is done
with XML.  However, this notion is somewhat incompatible with Be
conservative in what you do; be liberal in what you accept from
others because we can't accept loose JSON, then spit out conservative
JSON without messing with the content.

Here's my idea: the datatype should only allow strict JSON, but there
should be a function that accepts a liberal format, cleans it up to
make it strict JSON, and converts it to JSON.  I think making strict
JSON the default makes the most sense because:
 * Inputs to the database will most likely be coming from programs, not humans.
 * Output is expected to be valid JSON and work anywhere JSON should work.
 * Strict JSON is what more people would expect, I'd think.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-31 Thread Joseph Adams

I ended up reinventing the wheel and writing another JSON library:

http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2

This is a first release, and it doesn't really have a name besides
json.  It's very similar to cJSON, except it is (sans unknown bugs)
more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
 It has a simple test suite.  It is not prone to stack overflows, as
it doesn't recurse.  It is strict, requires input to be UTF-8 (it
validates it first) and only outputs UTF-8.  Other than treating
numbers liberally, my implementation only accepts valid JSON code (it
doesn't try to correct anything, even Unicode problems).  It is under
the MIT license.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Joseph Adams

On Mon, Mar 29, 2010 at 2:23 PM, David E. Wheeler da...@kineticode.com wrote:
 On Mar 29, 2010, at 9:02 AM, Tom Lane wrote:

 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

 Perhaps you could fork one, in that case.

 Best,

 David



I'm considering using and adapting cJSON instead of continuing with my
redundant implementation.  I could run `indent -kr -i4` on it (will
that match PostgreSQL's coding style?), add support for UTF-16
surrogate pairs (pair of \u... escapes for each character above U+
as required by the JSON spec), and add a switch to turn on/off pure
ASCII output.

P.S.: Sorry for the repeat, David.  I forgot to CC the mailing list.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams

I introduced myself in the thread Proposal: access control jails (and
introduction as aspiring GSoC student), and we discussed jails and
session-local variables.  But, as Robert Haas suggested, implementing
variable support in the backend would probably be way too ambitious a
project for a newbie like me.  I decided instead to pursue the task of
adding JSON support to PostgreSQL, hence the new thread.

I plan to reference datatype-xml.html and functions-xml.html in some
design decisions, but there are some things that apply to XML that
don't apply to JSON and vice versa.  For instance, jsoncomment
wouldn't make sense because (standard) JSON doesn't have comments.
For access, we might have something like json_get('foo[1].bar') and
json_set('foo[1].bar', 'hello').  jsonforest and jsonagg would be
beautiful.  For mapping, jsonforest/jsonagg could be used to build a
JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2,
...)) FROM tbl), but I'm not sure on the best way to go the other way
around (generate a result set from JSON).  CSS-style selectors would
be cool, but selecting is what SQL is all about, and I'm not sure
having a json_select(dom-element[key=value]) function is a good,
orthogonal approach.

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).  Priorities of the JSON implementation, in descending
order, are:

* Small
* Correct
* Fast

Moreover, JSON operations shall not crash due to stack overflows.

I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen
any JSON implementations out there that use it anyway).  I would
probably end up writing the JSON parser/serializer manually.  It
should not take more than a week.

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.

Thoughts?  Thanks.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams

On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

 Why?  I'm not saying you aren't right, but you need to make an
 argument rather than an assertion.  This is a community, so no one is
 entitled to decide anything unilaterally, and people want to be
 convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback.  I didn't want people to have to wade
through too many I thinks .  I'll be sure to use opinion tags in
the future :-)

My reasoning for It should be built-in is:
 * It would be nice to have a built-in serialization format that's
available by default.
 * It might be a little faster because it doesn't have to link to an
external library.
 * The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.
 * The externally-maintained and packaged libjson implementations I
saw brought in lots of dependencies (e.g. glib).
 * Everyone else (e.g. PHP) uses a statically-linked JSON implementation.

Is the code in question *really* small?  Well, not really, but it's
not enormous either.  By the way, I found a bug in PHP's JSON_parser
(json_decode(true ); /* with a space */ returns null instead of
true).  I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8.  I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... .  If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes.  Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1.  When JSON is stored to the database:
 1. The client is responsible and sends a valid UTF-16 JSON string.
 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
it to UTF-8.
 3. The JSON code parses it (to ensure it's valid).
 4. The JSON code unparses it (to get a representation without
needless whitespace).  It is given a flag indicating it should only
output ASCII text.
 5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:
 1. ASCII is retrieved from the server
 2. If user needs to extract one or more fields, the JSON is parsed,
and the fields are extracted.
 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable.  I am aiming for simple code, not fast
code.  I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks.  Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-26 Thread Joseph Adams

On Fri, Mar 26, 2010 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  I'm not sure exactly what problem you're trying to solve here.
 I don't think this is a particularly good design for supporting
 variables inside the server, since, well, it doesn't actually support
 variables inside the server.  If we just want a crude hack for
 allowing the appearance of session-local server-side variables, that
 could be implemented entirely in client code - in fact it could be
 done as a thin wrapper around libpq that just does textual
 substitution of the variables actually referenced by a particular
 query.  That wouldn't require any modifications to core PostgreSQL at
 all, and it would probably perform better too since you'd not send all
 the unnecessary variables with every query.

One problem with a textual substitution is that implicit variable use
(e.g. selecting from a view) can't be substituted, at least not
trivially.  As for sending unnecessary variables with every query,
my idea was to store those variables in a global table keyed by
context ID, then just send that context ID with every query.

 But, I think that implementing any kind of variable support in the
 backend is way too ambitious a project for a first-time hacker to get
 done in a couple of months.  I would guess that's a two-year project
 for a first time hacker or a one-year project for an experienced
 hacker (or a three week project for Tom Lane).  Here are some ideas
 from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
 to the right size for GSOC:
 [...]
 Add JSON (JavaScript Object Notation) data type [tricky part will be
 getting community buy-in on which JSON library to use]

The JSON idea caught my eye.  I guess the best approach here would be
not to use an external library, but to implement it manually using
flex/bison.  Most of the work would probably revolve around converting
things to/from PostgreSQL types, writing test cases, and getting it
integrated; writing the parser itself should be a piece of cake.

At first, I figured adding JSON support would be almost too trivial:
just parse it, then you're done.  After seeing that
src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit
more to it :)

I skimmed through some JSON implementations in C, and I didn't find
any using bison/flex.  From the looks of it, I do like JSON_parser (
http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears
to be written for speed.

I think one benefit of adding JSON support is that it would provide a
way to store EAV-type data with less overhead than XML (and no
dependency on an external library).  If this were the only goal,
binary encoding would be even better.  However, I suppose JSON is more
popular and easier to work with in practice.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-25 Thread Joseph Adams

I apologize for my silence, as I've been busy reading up more on the
internals of PostgreSQL.

From what I can tell, a big problem with my jails idea (as well as the
variables Robert described) is that there really isn't a way to store
context in the backend specifically for the end client (e.g. a PHP
script) due to connection pooling.  Also, I almost feel that storing
such context would be a disadvantage, as it would harm some of the
referential transparency that pooling and caching take advantage of,
now and in the future.  However, I'm not going to give up :)

Perhaps we could have some sort of LET statement that allows the
client to pass data to the server, then have libpq automatically wrap
queries with the LET statement (when necessary).  Here's what it would
look like to the PHP scripter:

// New libpq function
pg_set('current_user', 'bob');

$result = pg_query_params(
'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
array('Birth place'));


What this really does is something like:

$result = pg_query_params(
'LET current_user=$1 DO $2 $3',
array(
'bob',
'SELECT answer FROM secrets WHERE user=current_user AND 
question=$1',
'Birth place')
));


Here, the hypothetical LET statement executes a query string, binding
current_user to our desired value.  The client library would wrap all
future queries in this fashion.

Granted, it would be silly to pass the value itself to the server over
and over, so a serious implementation would probably pass a context
ID, and these variable assignments would live in the backend instead.
Moreover, LET is a terrible keyword choice here, considering most
PostgreSQL users won't need to use it explicitly thanks to additional
libpq support.

Alternatively (this might require changing the client/server
protocol), a context ID could be passed back and forth, thus providing
a way to tell clients apart.

Implementing this idea requires adding to the backend and to libpq.
The backend would need at least two new statements.  One would set a
variable of a session context, creating one if necessary and returning
its ID.  Another would execute a string as a parameter and bind both
immediate arguments and session context to it.  libpq would need a
function to set a variable, and it would need to wrap queries it sends
out with LET statements if necessary.

Note that these variables can't be used in pre-defined functions
unless they are somehow declared in advance.  One idea would be to
first add global variable support, then make session-local contexts be
able to temporarily reassign those variables.  Another would be to
provide an explicit declaration statement.

Would this make a good proposal for GSoC?:  Implement the backend part
of my proposal, and create a proof-of-concept wrapper demonstrating
it.  This way, I add the new statements, but don't mess around with
existing functionality too much.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-21 Thread Joseph Adams

Hello, I'm Joey Adams, and I'm interested in applying for Google
Summer of Code to work on PostgreSQL.  I'm a former GSoC student (I
worked on CCAN last year), and a strong C programmer, though I am
still new to working with large, established communities.  I apologize
if this is the wrong place to send GSoC student introductions.

My proposal is bold, though I believe it can be trimmed down and
refined into something very useful, yet simple enough to implement in
3 months by a newcomer to the PostgreSQL code base.

I propose adding application-level access control to PostgreSQL via a
jails concept.  In a nutshell, a jail is created as part of the
database definition (typically exposing a free variable for the
current user).  When a jail is activated for a session, the only
accesses allowed are those indicated in the jail itself.  A jail
cannot be exited without closing the session.  If used properly, jails
make it possible to safely execute untrusted SQL code (though one may
not want to, citing the principle of least privilege).

For example, suppose we have the following database definition for a
trivial discussion board system:

CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
password TEXT,
email TEXT,
email_public BOOLEAN
);

CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title TEXT,
user INT REFERENCES users(id),
content TEXT,
published BOOLEAN
);

CREATE TABLE tags (
post INT REFERENCES posts(id),
tag TEXT
);

Suppose our discussion board is written in PHP.  It authenticates the
user via cookies, after which it has an ID for the current user.  One
way to arbitrate access is to code it into the PHP using queries like
this:

pg_query_params('SELECT users.email WHERE id=$1 AND (email_public=TRUE
OR id=$2)', $_GET['user'], $current_user);

Simple enough.  However, this access control check has to be done for
every query, and it can get rather complex as more tables are
involved.

Views are a way to alleviate this complexity, but:

* Views only apply to SELECTs
* Queries still have to specify the current user (unless one uses a
global variable (which requires a bit of a hack, if I'm not mistaken))

My proposal would make it possible to have a statement in the database
definition much like this:

CREATE JAIL jail (
SELECT id, name FROM users;
SELECT email FROM users
WHERE id=current_user
OR email_public=TRUE;

SELECT * FROM posts
WHERE user=current_user
OR published=TRUE;
UPDATE posts
WHERE user=current_user
SET title, content, published;
INSERT INTO posts
WHERE user=current_user;
DELETE FROM posts
WHERE user=current_user;

SELECT * FROM tags, posts
WHERE tags.post=posts.id;
INSERT INTO tags
WHERE post IN (
SELECT id FROM posts
WHERE user=current_user);
DELETE FROM posts
WHERE post IN (
SELECT id FROM posts
WHERE user=current_user);
);

Inside of the jail definition is a series of pseudo-statements that
indicate the space of queries the user can perform.  Simply creating a
jail does not make it go into effect.  A jail is activated using
another query, and it remains in effect for the remainder of the
session.  It cannot be deactivated through the protocol, as doing so
would constitute a privilege escalation.

Example of a PHP script invoking a jail:

?php
pg_connect('user=foo password=spiderman') or die('Database error occurred');

/* authentication-fu */
// $current_user now contains the id of the logged-in user.

pg_query_params('USE JAIL jail WITH current_user=$1', array($current_user));

/* Remaining queries in this session will only be able to SELECT,
UPDATE, INSERT, and DELETE on users, posts, and tags, following the
rules set forth by the jail. */
?

I came up with this idea while working on a small
software-as-a-service application using PostgreSQL.  I needed to
implement access control in the PHP frontend.  I ended up creating a
database abstraction class to protect the rest of the frontend code
from SQL messiness.  Every database query I needed, I wrote/extended a
PHP function for it.  My database access class for my small
application exploded to over 1000 lines (granted, a lot of it was
comments).

Although having accessor functions can make the code easier to read
and maintain, it can result in a lot of redundancy.  Why not just use
the database abstraction library PostgreSQL gives you: SQL?

Jails seem to me like a revolutionary addition to SQL, and
implementing them fully could be quite involved.  However, my guess
is that by restricting what may be done in a jail simply to SELECT,
UPDATE, INSERT and DELETE, an initial jail implementation

56 matches

Mail list logo