Re: [HACKERS] additional json functionality

2013-11-21 Thread Maciej Gajewski
Hi everyone

I used to work on a project storing large quantities of schema-less data,
initially using MongoDB, then Postgres with JSON, and eventually I
implemented BSON support for Postgres to get the best of both worlds:
https://github.com/maciekgajewski/postgresbson

I don't think that JSONB is a good idea. There is a lot to learn from
MongoDB's mistakes in this area.

1. As noted in this thread previously, JSON is a serialization format, not
a document format.

2. Almost any structured data type, self-describing or not, can be
serialized to/from JSON, but always using only subset of it, and
interpreting it in it's own specific way.

3. JSON greatest strength is interoperability. It is a great feature of
Postgres that JSON is stored as a text; it's basically a 'text, but you can
do something with it'.  There is many JSON implementations out there, and
one should make no assumption about application's expectations.
For instance: JSON standard (RFS-4627) defines all number to be doubles.
Yet I've seen application storing 64-bit integers (wouldn't fit in double
precision), or even arbitrary precision integers. Most parsers are OK with
that.

4. JSON greatest weakness is performance. Because of 1. it needs to be
parsed before any useful information is extracted.

5. 1. and 3. are mutually exclusive; this is one of the most valuable
takeaways I have from working with Mongo and BSON in particular. BSON is an
attempt to create 'binary JSON', and a failed one. It is a poor
serialization format: faster than JSON, but less flexible. Being binary, it
is strongly typed, and it uses various gimmicks to preserve flexibility:
implicit type casts, 3 different equality comparison functions etc. And
it's not fully convertible to/from JSON; no binary format is.
It is a poor document format as well: it retains some of the JSON's
performance problems: serial nature and ineffective storage.

6. Speed matters to some, and being able to generate binary data in
application and send it to database without any serialization/parsing in
between provides great optimization opportunity. One thing that Mongo guys
got right is the fixed, well-defined binary representation. Application can
use provided library to generate objects, and doesn't need to worry about
server's version or endianess.

In the application I've mention before, switching from JSON to BSON (in
Postgres 9.2, using postgresbson) increased throughput by an order of
magnitude. It was an insert-heavy database with indexes on object fields.
Both serializing in application and desalinizing in server was faster ~10x.

7. It seems to me that JSONB is going to repeat all the mistakes of BSON,
it's going to be 'neither'. If there is an agreement that Postgres needs a
'document' format, why not acknowledge 5., and simply adopt one of the
existing formats. Or even better: adopt none, provide many, provide binary
send/recv and conversion to and from JSON, let the user choose.

The world is full of self-describing binary formats: BSON, MessagePack (
http://msgpack.org/), protobuf, hierarchical H-Store is coming along.
Adding another one would create confusion, and a situation similar to this:
http://xkcd.com/927/


And a side note:

Postgres' greatest and most under-advertised feature is it's extensibility.
People tend to notice only the features present in the core package, while
there should be a huge banner on top of http://www.postgresql.org/: Kids,
we support all data types: we have XML, we have JSON, we have H-store, we
have BSON, and all it with build-in indexing, storage compression and full
transaction support!



Maciej G.


Re: [HACKERS] additional json functionality

2013-11-21 Thread Josh Berkus
Maciej,

Thanks for feedback -- it's helpful to get a different perspective on this.

 I used to work on a project storing large quantities of schema-less data,
 initially using MongoDB, then Postgres with JSON, and eventually I
 implemented BSON support for Postgres to get the best of both worlds:
 https://github.com/maciekgajewski/postgresbson

Huh, is that linked anywhere else?

 
 I don't think that JSONB is a good idea. There is a lot to learn from
 MongoDB's mistakes in this area.
 
 1. As noted in this thread previously, JSON is a serialization format, not
 a document format.

The momentum of the industry would argue against you.

 5. 1. and 3. are mutually exclusive. the this is one of the most valuable
 takeaways I have from working with Mongo and BSON in particular. BSON is an
 attempt to create 'binary JSON', and a failed one.

BSON was also developed part-time, by an inexperienced developer, as a
side effect of a different project.  And they froze the API to early.
And they only made a halfhearted effort to be compatible with JSON.

So the fact that BSON is a failure doesn't tell us any more than don't
do this in a half-assed way.

 7. It seems to me that JSONB is going to repeat all the mistakes of BSON,
 it's going to be 'neither'. If there is an agreement that Postgres needs a
 'document' format, why not acknowledge 5., and simply adopt one of the
 existing formats. Or even better: adopt none, provide many, provide binary
 send/recv and conversion to and from JSON, let the user choose.

Well, I think others have already documented why we don't want BSON.

 The world is full of self-describing binary formats: BSON, MessagePack (
 http://msgpack.org/), protobuf, hierarchical H-Store is coming along.
 Adding another one would create confusion, and a situation similar to this:
 http://xkcd.com/927/

Apparently you missed that JSONB is just Hstore2 with different in/out
functions?

 Postgres' greatest and most under-advertised feature is it's extensibility.

I agree that we don't do enough to promote our extensions.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Robert Haas
On Tue, Nov 19, 2013 at 1:43 PM, David Johnston pol...@yahoo.com wrote:
 IMO A reasonable default cast function should error if the json contents
 require anything more than a straight parse to be stored into jsonb.  If the
 user still needs to make the conversion we should have a standard and
 configurable parser function with json input and jsonb output.  In this case
 the key-keep options would be keep first encountered or keep last
 encountered or fail on duplicate the last of which would be the default.

 I have not really pondered storing scalars into jsonb but before pondering
 usability are there any technical concerns.  If the goal is to share the
 backend with hstore then current hstore does not allow for this and so the
 json aspect would either transfer back over or it would need customized
 code.

I confess to being a bit perplexed by why we want hstore and json to
share a common binary format.  hstore doesn't store hierarchical data;
json does.  If we design a binary format for json, doesn't that just
obsolete store?  Why go to a lot of trouble to extend our home-grown
format if we've got a standard format to plug into?

The thing that's really missing in all of these discussions (AFAICS)
is the issue of creating index support for these types.  If using some
variant of the existing hstore format makes that easier, then I guess
I understand the connection - but I'm not sure why or how that would
be the case, and it would be nice to make the connection more
explicit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Andrew Dunstan


On 11/20/2013 07:52 AM, Robert Haas wrote:

On Tue, Nov 19, 2013 at 1:43 PM, David Johnston pol...@yahoo.com wrote:

IMO A reasonable default cast function should error if the json contents
require anything more than a straight parse to be stored into jsonb.  If the
user still needs to make the conversion we should have a standard and
configurable parser function with json input and jsonb output.  In this case
the key-keep options would be keep first encountered or keep last
encountered or fail on duplicate the last of which would be the default.

I have not really pondered storing scalars into jsonb but before pondering
usability are there any technical concerns.  If the goal is to share the
backend with hstore then current hstore does not allow for this and so the
json aspect would either transfer back over or it would need customized
code.

I confess to being a bit perplexed by why we want hstore and json to
share a common binary format.  hstore doesn't store hierarchical data;
json does.  If we design a binary format for json, doesn't that just
obsolete store?  Why go to a lot of trouble to extend our home-grown
format if we've got a standard format to plug into?

The thing that's really missing in all of these discussions (AFAICS)
is the issue of creating index support for these types.  If using some
variant of the existing hstore format makes that easier, then I guess
I understand the connection - but I'm not sure why or how that would
be the case, and it would be nice to make the connection more
explicit.




Oleg and Teodor have done quite a lot of work on a version of hstore 
that supports nested structures. See their pgcon talk. With some 
additions it has become in effect a non-standard notation for json. 
Rather than repeat that work, my suggestion has been that they abstract 
the common parts into a library that can be used by jsonb or whatever we 
end up calling it as well as nested hstore. I understand Teodor is 
working on this.


In general I share your feelings, though.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Hannu Krosing
On 11/18/2013 06:49 PM, Josh Berkus wrote:
 On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.
 Doesn't work; with XML, the underlying storage format didn't change.
 With JSONB, it will ... so changing the typemod would require a total
 rewrite of the table.  That's a POLS violation if I ever saw one
We do rewrites on typmod changes already.

To me having json(string) and json(hstore) does not seem too bad.

Cheers
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread David Johnston
Hannu Krosing-3 wrote
 On 11/18/2013 06:49 PM, Josh Berkus wrote:
 On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed
 in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.
 Doesn't work; with XML, the underlying storage format didn't change.
 With JSONB, it will ... so changing the typemod would require a total
 rewrite of the table.  That's a POLS violation if I ever saw one
 We do rewrites on typmod changes already.
 
 To me having json(string) and json(hstore) does not seem too bad.

Three things:

1) How would this work in the face of functions that erase typemod
information?
2) json [no type mod] would have to effectively default to json(string)?
3) how would #1 and #2 interact?

I pondered the general idea but my (admittedly limited) gut feeling is that
using typemod would possibly be technically untenable and from an end-user
perspective would be even more confusing than having two types.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779428.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Greg Stark
On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus j...@agliodbs.com wrote:

 On 11/15/2013 04:00 PM, David Johnston wrote:
  Looking at this a different way: could we just implement BSON and leave
 json
  alone?
 
  http://bsonspec.org/

 In short?  No.

 For one thing, our storage format is different from theirs (better,
 frankly), and as a result is not compliant with their standard.


Not being super familiar with either BSON our JSONB what advantages are we
gaining from the difference?

It might be interesting if we supported the same binary representation so
we could have a binary send/recv routines that don't need to do any
serialization/deserialization. Especially since a standard format would
potentially be skipping the serialization/deserialization on both the
server and client.



-- 
greg


Re: [HACKERS] additional json functionality

2013-11-20 Thread Josh Berkus
On 11/20/2013 04:52 AM, Robert Haas wrote:
 I confess to being a bit perplexed by why we want hstore and json to
 share a common binary format.  hstore doesn't store hierarchical data;
 json does.  If we design a binary format for json, doesn't that just
 obsolete store?  Why go to a lot of trouble to extend our home-grown
 format if we've got a standard format to plug into?

See hstore2 patch from Teodor.  That's what we're talking about, not
hstore1, which as you point out is non-heirarchical.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Andrew Dunstan


On 11/20/2013 12:50 PM, Greg Stark wrote:


On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus j...@agliodbs.com 
mailto:j...@agliodbs.com wrote:


On 11/15/2013 04:00 PM, David Johnston wrote:
 Looking at this a different way: could we just implement BSON
and leave json
 alone?

 http://bsonspec.org/

In short?  No.

For one thing, our storage format is different from theirs (better,
frankly), and as a result is not compliant with their standard.


Not being super familiar with either BSON our JSONB what advantages 
are we gaining from the difference?


It might be interesting if we supported the same binary representation 
so we could have a binary send/recv routines that don't need to do any 
serialization/deserialization. Especially since a standard format 
would potentially be skipping the serialization/deserialization on 
both the server and client.







To start with, it doesn't support arbitrary precision numerics. That 
means that right off the bat it's only accepting a subset of what the 
JSON spec allows. 'Nuff said, I think.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Tom Lane
David Johnston pol...@yahoo.com writes:
 On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.

 Three things:

 1) How would this work in the face of functions that erase typemod
 information?

You'd have to make the data self-identifying (which I think was the plan
already), and ensure that *every* function taking json could cope with
both formats on input.  The typmod could only be expected to be enforced
when storing or explicitly casting to one subformat, much like operations
on numeric pay little attention to the original precision/scale if any.

I agree that this solution isn't terribly workable, mainly because it'd
break any third-party C functions that take json today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Andrew Dunstan


On 11/20/2013 01:36 PM, Tom Lane wrote:


You'd have to make the data self-identifying (which I think was the plan
already), and ensure that *every* function taking json could cope with
both formats on input.  The typmod could only be expected to be enforced
when storing or explicitly casting to one subformat, much like operations
on numeric pay little attention to the original precision/scale if any.

I agree that this solution isn't terribly workable, mainly because it'd
break any third-party C functions that take json today.





Yeah, I had come to this conclusion. I don't think we can bolt on 
typmods after the event.


I don't think having a separate jsonb type will be a tragedy.

I'm already planning on overloading the existing json functions and 
operators.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread Robert Haas
On Wed, Nov 20, 2013 at 12:50 PM, Greg Stark st...@mit.edu wrote:
 For one thing, our storage format is different from theirs (better,
 frankly), and as a result is not compliant with their standard.

 Not being super familiar with either BSON our JSONB what advantages are we
 gaining from the difference?

BSON assumes, for example, that all integers fit in 64-bits and all
floating point values can be accurately represented as float8.  So not
all JSON objects can be represented as BSON without loss of
information.

BSON also adds a bunch of extra types that are not part of JSON, like
timestamps, regular expressions, and embedded documents.  So not all
BSON objects can be represented as JSON without loss of information.

While it's tempting to think that BSON is a serialization format for
JSON, and the name is meant to suggest that, it really isn't.  It's
just a serialization format for approximately whatever the authors
thought would be useful, which happens to be kinda like JSON.  Sorta.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Robert Haas
On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

I don't even have to think hard.  If you want your JSON to be
human-readable, it's entirely possible that you want it stored using
the same whitespace that was present on input.  There is a valid use
case for normalizing whitespace, too, of course.

Everyone on this thread who thinks that there is Only One Right Way To
Do It should take a chill pill.  There is, in fact, more than one
right way to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Robert Haas
On Tue, Nov 19, 2013 at 12:27 PM, David E. Wheeler
da...@justatheory.com wrote:
 On Nov 19, 2013, at 8:14 AM, Robert Haas robertmh...@gmail.com wrote:
 Everyone on this thread who thinks that there is Only One Right Way To
 Do It should take a chill pill.  There is, in fact, more than one
 right way to do it.

 You shoulda been a Perl hacker, Robert.

I don't hack on Perl, but I spent about 10 years hacking *in* Perl, so
the phrasing was not a coincidence.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread David E. Wheeler
On Nov 19, 2013, at 8:14 AM, Robert Haas robertmh...@gmail.com wrote:

 Everyone on this thread who thinks that there is Only One Right Way To
 Do It should take a chill pill.  There is, in fact, more than one
 right way to do it.

You shoulda been a Perl hacker, Robert.

D


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Josh Berkus
On 11/19/2013 08:14 AM, Robert Haas wrote:
 On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)
 
 I don't even have to think hard.  If you want your JSON to be
 human-readable, it's entirely possible that you want it stored using
 the same whitespace that was present on input.  There is a valid use
 case for normalizing whitespace, too, of course.

Given that JSON is a data interchange format, I suspect that there are
an extremely large combination of factors which would result in an
unimplementably large number of parser settings.  For example, I
personally would have use for a type which allowed the storage of JSON
*fragments*.  Therefore I am interested only in supporting two:

a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
apps, and

b) the optimal behavior for Hstore2/JSONB.

(a) is defined as:
* complete documents only (no fragments)
* whitespace not significant
* no reordering of keys
* duplicate keys allowed

(b) is defined as:
* complete documents only (no fragments)
* whitespace not significant
* reordering of keys
* duplicate keys prohibited 

If people want other manglings of JSON, they can use TEXT fields and
custom parsers written in PL/v8, the same way I do.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Robert Haas
On Tue, Nov 19, 2013 at 12:59 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/19/2013 08:14 AM, Robert Haas wrote:
 On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

 I don't even have to think hard.  If you want your JSON to be
 human-readable, it's entirely possible that you want it stored using
 the same whitespace that was present on input.  There is a valid use
 case for normalizing whitespace, too, of course.

 Given that JSON is a data interchange format, I suspect that there are
 an extremely large combination of factors which would result in an
 unimplementably large number of parser settings.  For example, I
 personally would have use for a type which allowed the storage of JSON
 *fragments*.  Therefore I am interested only in supporting two:

 a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
 apps, and

 b) the optimal behavior for Hstore2/JSONB.

 (a) is defined as:
 * complete documents only (no fragments)
 * whitespace not significant
 * no reordering of keys
 * duplicate keys allowed

 (b) is defined as:
 * complete documents only (no fragments)
 * whitespace not significant
 * reordering of keys
 * duplicate keys prohibited

 If people want other manglings of JSON, they can use TEXT fields and
 custom parsers written in PL/v8, the same way I do.

For (a), I assume you mean whitespace not significant, but
preserved, because that is the current behavior, whereas for (b), I
think we would want to say whitespace neither significant nor
preserved.  Other than that, I completely agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Andrew Dunstan


On 11/19/2013 12:59 PM, Josh Berkus wrote:

On 11/19/2013 08:14 AM, Robert Haas wrote:

On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

I am sure you could also devise an json encoding scheme
where white space is significant ;)

I don't even have to think hard.  If you want your JSON to be
human-readable, it's entirely possible that you want it stored using
the same whitespace that was present on input.  There is a valid use
case for normalizing whitespace, too, of course.

Given that JSON is a data interchange format, I suspect that there are
an extremely large combination of factors which would result in an
unimplementably large number of parser settings.  For example, I
personally would have use for a type which allowed the storage of JSON
*fragments*.  Therefore I am interested only in supporting two:

a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
apps, and

b) the optimal behavior for Hstore2/JSONB.

(a) is defined as:
* complete documents only (no fragments)
* whitespace not significant
* no reordering of keys
* duplicate keys allowed

(b) is defined as:
* complete documents only (no fragments)
* whitespace not significant
* reordering of keys
* duplicate keys prohibited 

If people want other manglings of JSON, they can use TEXT fields and
custom parsers written in PL/v8, the same way I do.




Fragments are currently allowed in a):

   andrew=# select 'a'::json;
 json
   --
 a


Given that, I'm not sure we shouldn't permit them in b) either. I think 
I lost that argument back in the 9.2 dev cycle. I really don't want to 
get to a situation where foo::json::jsonb can produce an error.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread David Johnston
Andrew Dunstan wrote
 Given that, I'm not sure we shouldn't permit them in b) either. I think 
 I lost that argument back in the 9.2 dev cycle. I really don't want to 
 get to a situation where foo::json::jsonb can produce an error.

So what do you propose happens when the input json has duplicate keys?  

IMO A reasonable default cast function should error if the json contents
require anything more than a straight parse to be stored into jsonb.  If the
user still needs to make the conversion we should have a standard and
configurable parser function with json input and jsonb output.  In this case
the key-keep options would be keep first encountered or keep last
encountered or fail on duplicate the last of which would be the default.

I have not really pondered storing scalars into jsonb but before pondering
usability are there any technical concerns.  If the goal is to share the
backend with hstore then current hstore does not allow for this and so the
json aspect would either transfer back over or it would need customized
code.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Gavin Flower

On 20/11/13 05:14, Robert Haas wrote:

On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

I am sure you could also devise an json encoding scheme
where white space is significant ;)

I don't even have to think hard.  If you want your JSON to be
human-readable, it's entirely possible that you want it stored using
the same whitespace that was present on input.  There is a valid use
case for normalizing whitespace, too, of course.

Everyone on this thread who thinks that there is Only One Right Way To
Do It should take a chill pill.  There is, in fact, more than one
right way to do it.


There is only one obvious 'Right Way', and that is 'My Way'!  :-)

More seriously, there are obviously variants in what people consider 
useful human readable form of JSON output, but it is probably 
inefficient to store white space.  Which suggests it might be useful to 
allow users to store rules so that the output and include the white 
space that they want.  However, this is non-trivial - for example 
Eclipse allows Java/XML source to be formatted in different ways (here 
the source files are store with white space), but lacks a couple of 
options that I would like.  Possibly, JSON output of white space would 
be less problematical?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Andrew Dunstan


On 11/19/2013 01:43 PM, David Johnston wrote:

Andrew Dunstan wrote

Given that, I'm not sure we shouldn't permit them in b) either. I think
I lost that argument back in the 9.2 dev cycle. I really don't want to
get to a situation where foo::json::jsonb can produce an error.

So what do you propose happens when the input json has duplicate keys?



I propose that we do what V8 does, what hstore does (yes, I know it's 
not json, but consistency is good), what we do now with the json_get() 
functions, what almost every other JSON engine I know of does, namely 
that the last key wins.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread David Johnston
Gavin Flower-2 wrote
 More seriously, there are obviously variants in what people consider 
 useful human readable form of JSON output, but it is probably 
 inefficient to store white space.  

Enough to matter?  Maybe the extra whitespace causes a marginal value to be
toasted but, IIUC, for a value that is going to be toasted anyway the
compression factors for both speed and space is going to make whitespace
considerations insignificant.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779227.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Andrew Dunstan


On 11/19/2013 01:43 PM, David Johnston wrote:




I have not really pondered storing scalars into jsonb but before pondering
usability are there any technical concerns.  If the goal is to share the
backend with hstore then current hstore does not allow for this and so the
json aspect would either transfer back over or it would need customized
code.



Your premise here is simply wrong. The new hstore code does support 
scalar root elements.


cheers

andew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Merlin Moncure
On Tue, Nov 19, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote:
 On 11/19/2013 08:14 AM, Robert Haas wrote:
 On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

 I don't even have to think hard.  If you want your JSON to be
 human-readable, it's entirely possible that you want it stored using
 the same whitespace that was present on input.  There is a valid use
 case for normalizing whitespace, too, of course.

 Given that JSON is a data interchange format, I suspect that there are
 an extremely large combination of factors which would result in an
 unimplementably large number of parser settings.  For example, I
 personally would have use for a type which allowed the storage of JSON
 *fragments*.  Therefore I am interested only in supporting two:

 a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's

I'm uncomfortable with the word 'legacy'.   This suggests the new type
will essentially deprecate the old type.  jsonb will be likely be
pessimal to large serializations.   If you're not manipulating and
searching the documents, why would you use it?  It's going to take
more space on disk and memory and should provide little benefit for
present *as well as future code* .  (note, it will provide extreme
benefits for nosql type uses which is of huge strategic importance for
the project).  json and jsonb APIs should work together cleanly, and
the documentation should suggest which types are different and better
for which cases.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread Andrew Dunstan


On 11/19/2013 03:06 PM, Merlin Moncure wrote:



  Therefore I am interested only in supporting two:

a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's

I'm uncomfortable with the word 'legacy'.   This suggests the new type
will essentially deprecate the old type.



Existing might be a better word.


jsonb will be likely be
pessimal to large serializations.   If you're not manipulating and
searching the documents, why would you use it?  It's going to take
more space on disk and memory and should provide little benefit for
present *as well as future code* .  (note, it will provide extreme
benefits for nosql type uses which is of huge strategic importance for
the project).  json and jsonb APIs should work together cleanly, and
the documentation should suggest which types are different and better
for which cases.



I agree with most of this.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Hannu Krosing
On 11/18/2013 05:19 AM, Andrew Dunstan wrote:

 On 11/17/2013 08:49 PM, Josh Berkus wrote:
 Now, if it turns out that the new hstore is not dealing with json input
 and output, we could have json, jstore and hstore.
 Jstore isn't the worst name suggestion I've heard on this thread.  The
 reason I prefer JSONB though, is that a new user looking for a place to
 put JSON data will clearly realize that JSON and JSONB are alternatives
 and related in some way.  They won't necessarily expect that jstore
 has anything to do with JSON, especially when there is another type
 called JSON.  Quite a few people are liable to think it's something to
 do with Java.

 Besides, we might get sued by these people: http://www.jstor.org/  ;-)


 I don't think any name that doesn't begin with json is acceptable.
 I could live with jsonb. It has the merit of brevity, but maybe it's
 a tad
 too close to json to be the right answer.
How about jsondoc, or jsonobj ?

It is still reasonably 'json' but not too easy to confuse with existing
json
when typing

And it perhaps hints better at the main difference from string-json, namely
that it is an object and not textual source code / notation / processing
info .

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Peter Eisentraut
On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.

We could do something like SQL/XML and specify the level of validity
in a typmod, e.g., json(loose), json(strict), etc.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure
On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote:
 I don't think any name that doesn't begin with json is acceptable. I could
 live with jsonb. It has the merit of brevity, but maybe it's a tad too
 close to json to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?

*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a record_to_jsonb etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?

Maybe we can cheat a little bit overload the functions so that one the
existing APIs (hstore or json) can be recovered -- only adding what
minimal functionality needs to be added to handle the type distinction
(mostly on serialization routines and casts).  What I'm driving at
here is that it would be nice if the API was not strongly bifurcated:
this would cause quite a bit of mindspace confusion.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Andrew Dunstan


On 11/18/2013 09:38 AM, Merlin Moncure wrote:

On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote:

I don't think any name that doesn't begin with json is acceptable. I could
live with jsonb. It has the merit of brevity, but maybe it's a tad too
close to json to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?


The discussion has been around making a common library that would be 
used for both.





*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a record_to_jsonb etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?



Overloading is what I was planning to do.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Josh Berkus
On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.

Doesn't work; with XML, the underlying storage format didn't change.
With JSONB, it will ... so changing the typemod would require a total
rewrite of the table.  That's a POLS violation if I ever saw one

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Josh Berkus
Merlin,

 *) Aside from the text in and out routines, how is 'jsbonb' different
 from the coming 'nested hstore'?   Enough to justify two code bases?

In/out functions and defaults are all different.  Otherwise, the two
types will be accessing the same code base, so no duplication.  Think of
is as XML vs. TEXT.

 Maybe we can cheat a little bit overload the functions so that one the
 existing APIs (hstore or json) can be recovered -- only adding what
 minimal functionality needs to be added to handle the type distinction
 (mostly on serialization routines and casts).  What I'm driving at
 here is that it would be nice if the API was not strongly bifurcated:
 this would cause quite a bit of mindspace confusion.

I'll also note that for functions designed for output to the client, it
doesn't make much of a difference whether the result is JSON or JSONB,
since the string representation will be identical.  However, it makes a
difference if the data is going to be stored, since a double conversion
on a large JSON value would be expensive.

In other words, we need a version of each function which outputs JSONB,
but that version doesn't have to be the default if users don't specify.

Note that this raises the issue of first alternate data type ambiguity
again for overloading builtin functions.  We really need that method of
prefering a specific version of the function ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure
On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus j...@agliodbs.com wrote:
 Merlin,

 *) Aside from the text in and out routines, how is 'jsbonb' different
 from the coming 'nested hstore'?   Enough to justify two code bases?

 In/out functions and defaults are all different.  Otherwise, the two
 types will be accessing the same code base, so no duplication.  Think of
 is as XML vs. TEXT.

 Maybe we can cheat a little bit overload the functions so that one the
 existing APIs (hstore or json) can be recovered -- only adding what
 minimal functionality needs to be added to handle the type distinction
 (mostly on serialization routines and casts).  What I'm driving at
 here is that it would be nice if the API was not strongly bifurcated:
 this would cause quite a bit of mindspace confusion.

 I'll also note that for functions designed for output to the client, it
 doesn't make much of a difference whether the result is JSON or JSONB,
 since the string representation will be identical.  However, it makes a
 difference if the data is going to be stored, since a double conversion
 on a large JSON value would be expensive.

Hm, but it would matter wouldn't it...the jsonb representation would
give output with the record fields reordered, deduplicated, etc.
Also, presumably, the jsonb serialization would be necessarily slower
for exactly that reason, although perhaps not enough to matter much.

 In other words, we need a version of each function which outputs JSONB,
 but that version doesn't have to be the default if users don't specify.

 Note that this raises the issue of first alternate data type ambiguity
 again for overloading builtin functions.  We really need that method of
 prefering a specific version of the function ...

You'd need explicit jsonb creating functions: record_to_jsonb,
array_to_jsonb etc.  AFAIK, these functions would be the only ones
that would have to explicitly reference the jsonb type if you don't
count casts.

It's tempting to *not* make those functions as that would only require
the user to specify jsonb for table columns...you'd then go from json
to jsonb with a cast, perhaps even an implicit one.  The disadvantage
there is that you'd then get unsimplified json always.

Hm -- on that note, is it technically feasible to *not* duplicate the
json API implementations, but just (ab)use implicit casting between
the APIs?  That way the text API would own all the serialization
routines as it does now but you'd run mutation and searching through
jsonb...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David E. Wheeler
On Nov 16, 2013, at 2:04 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 It’s still input and output as JSON, though.
 Yes, because JavaScript Object Notation *is* a serialization format
 (aka Notation) for converting JavaScript Objects to text format
 and back :)
 I still like JSONB best.
 To me it feels redundant, like binarytextbinary
 
 the binary representation of JSON is JavaScript(-like) Object, not
 binary json
 
 So my vote would be either jsobj or jsdoc (as document databases) tend
 to call the structured types documents

You know that both types support scalar values right? 'a'::JSON works now, and 
'a'::hstore works with the WIP patch. For that reason I would not think that 
doc or obj would be good choices.

I like JSONB because:

1. The B means binary
2. The B means second
3. It's short
4. See also BYTEA.

Best,

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David Johnston
David E. Wheeler-3 wrote
 I like JSONB because:
 
 1. The B means binary
 2. The B means second
 3. It's short
 4. See also BYTEA.

json_strict :

Not sure about the bytea reference off-hand...

I was pondering jsons which meets the short property just fine and the
trailing s would stand for strict which is the user-visible semantic
that this type exhibits rather than some less-visible binary attribute
which most users would not really care about.  I dislike the implication of
plural-ness that the s imparts, though.

Implication of second doesn't seem that important since both types provide
useful semantics.

I can imagine where the short aspect will lead people to accidentally type
json where they mean to use jsonb and having a just a single extra
character will increase the likelihood they will not notice.  Knowing about
and having used json_strict previously it will be more probable that such
users will noticeably feel something is missing if they drop the whole
_strict suffix.

So, I'll toss out json_strict for my bikeshed contribution.

David J.
 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Dimitri Fontaine
David E. Wheeler da...@justatheory.com writes:
 You know that both types support scalar values right? 'a'::JSON works now,
 and 'a'::hstore works with the WIP patch. For that reason I would not think
 that doc or obj would be good choices.

I'm wondering about just pushing hstore in core (even if technically
still an extension, install it by default, like we do for PLpgSQL), and
calling it a day.

If you need pre-9.4 JSON-is-text compatibility, use the json datatype,
if you want something with general index support, use hstore.

For bikeshedding purposes, what about calling it jstore, as in “we
actually know how to store your json documents”?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Hannu Krosing
On 11/17/2013 09:02 PM, David E. Wheeler wrote:
 On Nov 16, 2013, at 2:04 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 It’s still input and output as JSON, though.
 Yes, because JavaScript Object Notation *is* a serialization format
 (aka Notation) for converting JavaScript Objects to text format
 and back :)
 I still like JSONB best.
 To me it feels redundant, like binarytextbinary

 the binary representation of JSON is JavaScript(-like) Object, not
 binary json

 So my vote would be either jsobj or jsdoc (as document databases) tend
 to call the structured types documents
 You know that both types support scalar values right? 
 'a'::JSON works now, 
Yeah, and I remember all the bikeshedding about how
scalars should not be supported as they are
not really JSON by standard ...

At that time I was also quite vocal about not painting
ourselves in corner by not normalising json on input and
thus generating a backwards compatibility problem in
case we would ever get proper json support.
 and 'a'::hstore works with the WIP patch. For that reason I would not think 
 that doc or obj would be good choices.
this is like claiming that text should not be text because you
can store a single character there as well.

I feel that both doc and obj convey the meaning that it is a
structured type meant for fast component lookup as opposed to
jsoN(otation) type which is text.

Also jsdoc/jsobj would be a natural bridge to pgdoc/pgobj which would be
similar to new json but allow any type supported by postgresql as a value.

(... and in several languages even scalars really are objects)

 I like JSONB because:

 1. The B means binary
Binary has really little to do with the fact that we
normalise on input, which is the real significant feature
of the new json type.
 2. The B means second
Why not just json2 , (you know, like varchar2 in a certain other database ;)
 3. It's short
jsobj and jsdoc are exactly as short as jsonb
 4. See also BYTEA.
BYTEA is byte array, so not really relevant.

(unless you try to rhyme a byte-a, json-b sequence ;) )

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Hannu Krosing
On 11/17/2013 10:51 PM, Dimitri Fontaine wrote:
 David E. Wheeler da...@justatheory.com writes:
 You know that both types support scalar values right? 'a'::JSON works now,
 and 'a'::hstore works with the WIP patch. For that reason I would not think
 that doc or obj would be good choices.
 I'm wondering about just pushing hstore in core (even if technically
 still an extension, install it by default, like we do for PLpgSQL), and
 calling it a day.

 If you need pre-9.4 JSON-is-text compatibility, use the json datatype,
 if you want something with general index support, use hstore.
+1 for getting also hstore in

I think hstore needs to keep its text format compatible with older hstore
(in this discussion lets call this text format hson, short for
HStore Object Notation for added confusion :)

 For bikeshedding purposes, what about calling it jstore, 
+1 for jstore as well. I am happy with jstore, jsdoc, jsobj

jstore/jsobj/jsdoc really is *not* JSON, but a bona-fide freeform
structured datatype that happens to have JSON as convenient
I/O format.

You may want to use jstore even if you have never needed
JSON as serialisation/transport format before.

I do not like jsonB (sounds too much like json2, i.e. like we were
trying to cover up a design accident) nor json_strict (as this is not
really strict as it indeed does accept scalars, not just Arrays/Lists
and Objects/Dictionaries as per JSON standard)
 as in “we actually know how to store your json documents”?

 Regards,

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David E. Wheeler
On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:

 I'm wondering about just pushing hstore in core (even if technically
 still an extension, install it by default, like we do for PLpgSQL), and
 calling it a day.

It’s syntax is different than JSON, so one would need to convert to and from 
JSON all the time to parse and serialize. PITA.

 For bikeshedding purposes, what about calling it jstore, as in “we
 actually know how to store your json documents”?

-1 Sounds like a Java storage API.

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Dimitri Fontaine
David E. Wheeler da...@justatheory.com writes:
 On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'm wondering about just pushing hstore in core (even if technically
 still an extension, install it by default, like we do for PLpgSQL), and
 calling it a day.

 It’s syntax is different than JSON, so one would need to convert to
 and from JSON all the time to parse and serialize. PITA.

Oh I misremembered about that, I though it would take JSON as input
as-is and could be made to output JSON. And IIRC the community input at
pgconf.eu has been to just always output json texts and get rid of the
formating GUCs.

Now, if it turns out that the new hstore is not dealing with json input
and output, we could have json, jstore and hstore.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David E. Wheeler
On Nov 17, 2013, at 2:26 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:

 It’s syntax is different than JSON, so one would need to convert to
 and from JSON all the time to parse and serialize. PITA.
 
 Oh I misremembered about that, I though it would take JSON as input
 as-is and could be made to output JSON. And IIRC the community input at
 pgconf.eu has been to just always output json texts and get rid of the
 formating GUCs.

Yeah, but for back-compate, it has to use = instead of : to separate keys from 
values, and cannot use braces for a root-level object. :-(

 Now, if it turns out that the new hstore is not dealing with json input
 and output, we could have json, jstore and hstore.

That's where this is headed, yes.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Andrew Dunstan


On 11/17/2013 04:51 PM, Dimitri Fontaine wrote:

David E. Wheeler da...@justatheory.com writes:

You know that both types support scalar values right? 'a'::JSON works now,
and 'a'::hstore works with the WIP patch. For that reason I would not think
that doc or obj would be good choices.

I'm wondering about just pushing hstore in core (even if technically
still an extension, install it by default, like we do for PLpgSQL), and
calling it a day.


That would be one of the silliest and most short-sighted decisions we 
have made in many years, IMNSHO. The demand for strong JSON support is 
enormous. I don't think I have ever received as many positive comments 
on any other feature I have worked on in the last 9 years. What these 
people want is not something jsonish, they want json, pure and simple. 
And they want it fast and featured and efficient.


Much as I love hstore, it isn't json, and so it won't satisfy that demand.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Dimitri Fontaine
Andrew Dunstan and...@dunslane.net writes:
 That would be one of the silliest and most short-sighted decisions we have
 made in many years, IMNSHO. The demand for strong JSON support is enormous.

One of the silliest and most short-sighted decisions we made recently
might have been to actually ship that json variant in 9.2, after all.

The most popular PostgreSQL commands in 9.4 are going to be:

  $ sudo apt-get install postgresql-contrib-9.4

  # create extension jstore;
  # alter table foo alter column documents type jstore;
  # create index on foo using gist(documents);

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Gavin Flower

On 18/11/13 09:02, David E. Wheeler wrote:

On Nov 16, 2013, at 2:04 PM, Hannu Krosing ha...@2ndquadrant.com wrote:


It’s still input and output as JSON, though.

Yes, because JavaScript Object Notation *is* a serialization format
(aka Notation) for converting JavaScript Objects to text format
and back :)

I still like JSONB best.

To me it feels redundant, like binarytextbinary

the binary representation of JSON is JavaScript(-like) Object, not
binary json

So my vote would be either jsobj or jsdoc (as document databases) tend
to call the structured types documents

You know that both types support scalar values right? 'a'::JSON works now, and 'a'::hstore works 
with the WIP patch. For that reason I would not think that doc or obj would 
be good choices.

I like JSONB because:

1. The B means binary
2. The B means second
3. It's short
4. See also BYTEA.

Best,

David


 

Whatever, I think the first 4 characters have to 'JSON' - for easy 
identification.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Gavin Flower

On 18/11/13 09:45, David Johnston wrote:

David E. Wheeler-3 wrote

I like JSONB because:

1. The B means binary
2. The B means second
3. It's short
4. See also BYTEA.

json_strict :

Not sure about the bytea reference off-hand...

I was pondering jsons which meets the short property just fine and the
trailing s would stand for strict which is the user-visible semantic
that this type exhibits rather than some less-visible binary attribute
which most users would not really care about.  I dislike the implication of
plural-ness that the s imparts, though.

Implication of second doesn't seem that important since both types provide
useful semantics.

I can imagine where the short aspect will lead people to accidentally type
json where they mean to use jsonb and having a just a single extra
character will increase the likelihood they will not notice.  Knowing about
and having used json_strict previously it will be more probable that such
users will noticeably feel something is missing if they drop the whole
_strict suffix.

So, I'll toss out json_strict for my bikeshed contribution.

David J.
  




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



+1


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Andrew Dunstan


On 11/17/2013 05:44 PM, Dimitri Fontaine wrote:

Andrew Dunstan and...@dunslane.net writes:

That would be one of the silliest and most short-sighted decisions we have
made in many years, IMNSHO. The demand for strong JSON support is enormous.

One of the silliest and most short-sighted decisions we made recently
might have been to actually ship that json variant in 9.2, after all.



The fact is that we had been going round and round on Json for a while, 
and Robert Haas rightly made a move to break the Gordian knot. We would 
not have done ourselves any service by not accepting it.


Hindsight is always 20-20, but even with what we have today people get 
excited, and more people move to use Postgres every day because we have 
that support.




The most popular PostgreSQL commands in 9.4 are going to be:

   $ sudo apt-get install postgresql-contrib-9.4

   # create extension jstore;
   # alter table foo alter column documents type jstore;
   # create index on foo using gist(documents);




Umm, not if I have anything to do with it.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Josh Berkus

 Now, if it turns out that the new hstore is not dealing with json input
 and output, we could have json, jstore and hstore.

Jstore isn't the worst name suggestion I've heard on this thread.  The
reason I prefer JSONB though, is that a new user looking for a place to
put JSON data will clearly realize that JSON and JSONB are alternatives
and related in some way.  They won't necessarily expect that jstore
has anything to do with JSON, especially when there is another type
called JSON.  Quite a few people are liable to think it's something to
do with Java.

Besides, we might get sued by these people: http://www.jstor.org/  ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David E. Wheeler

On Nov 17, 2013, at 5:49 PM, Josh Berkus j...@agliodbs.com wrote:

 Jstore isn't the worst name suggestion I've heard on this thread.  The
 reason I prefer JSONB though, is that a new user looking for a place to
 put JSON data will clearly realize that JSON and JSONB are alternatives
 and related in some way.  They won't necessarily expect that jstore
 has anything to do with JSON, especially when there is another type
 called JSON.  Quite a few people are liable to think it's something to
 do with Java.
 
 Besides, we might get sued by these people: http://www.jstor.org/  ;-)

Okay, how about JDATE? ;-P

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Gavin Flower

On 18/11/13 14:51, David E. Wheeler wrote:

On Nov 17, 2013, at 5:49 PM, Josh Berkus j...@agliodbs.com wrote:


Jstore isn't the worst name suggestion I've heard on this thread.  The
reason I prefer JSONB though, is that a new user looking for a place to
put JSON data will clearly realize that JSON and JSONB are alternatives
and related in some way.  They won't necessarily expect that jstore
has anything to do with JSON, especially when there is another type
called JSON.  Quite a few people are liable to think it's something to
do with Java.

Besides, we might get sued by these people: http://www.jstor.org/  ;-)

Okay, how about JDATE? ;-P

David




I don't want a Japanese Date - would cause complications with my wife!  :-)


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread Andrew Dunstan


On 11/17/2013 08:49 PM, Josh Berkus wrote:

Now, if it turns out that the new hstore is not dealing with json input
and output, we could have json, jstore and hstore.

Jstore isn't the worst name suggestion I've heard on this thread.  The
reason I prefer JSONB though, is that a new user looking for a place to
put JSON data will clearly realize that JSON and JSONB are alternatives
and related in some way.  They won't necessarily expect that jstore
has anything to do with JSON, especially when there is another type
called JSON.  Quite a few people are liable to think it's something to
do with Java.

Besides, we might get sued by these people: http://www.jstor.org/  ;-)



I don't think any name that doesn't begin with json is acceptable. I 
could live with jsonb. It has the merit of brevity, but maybe it's a 
tad too close to json to be the right answer.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David E. Wheeler
On Nov 17, 2013, at 8:19 PM, Andrew Dunstan and...@dunslane.net wrote:

 I don't think any name that doesn't begin with json is acceptable. I could 
 live with jsonb. It has the merit of brevity, but maybe it's a tad too 
 close to json to be the right answer.

JSONFTW.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-16 Thread Hannu Krosing
On 11/16/2013 12:15 AM, Josh Berkus wrote:
 On 11/15/2013 02:59 PM, Merlin Moncure wrote:
  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing ha...@2ndquadrant.com 
 wrote:
 I think you may be on to something here.  This might also be a way
 opt-in to fast(er) serialization (upthread it was noted this is
 unimportant; I'm skeptical).  I deeply feel that two types is not the
 right path but I'm pretty sure that this can be finessed.

 As far as I understand merlin is mostly ok with stored json being
 normalised and the problem is just with constructing extended
 json (a.k.a. processing instructions) to be used as source for
 specialised parsers and renderers.
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.

 That's why naming old JSON as json_text won't work; it'll be a
 hardened roadblock to upgrading.
Then perhaps name the new binary json as jsob (JavaScript Object Binary)
or just jsobj (JavaScript Object) and keep current json for what it is,
namely
JavaScript Object Notation.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-16 Thread David E. Wheeler

On Nov 16, 2013, at 12:04 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 Then perhaps name the new binary json as jsob (JavaScript Object Binary)
 or just jsobj (JavaScript Object) and keep current json for what it is,
 namely
 JavaScript Object Notation.

It’s still input and output as JSON, though. I still like JSONB best.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-16 Thread Hannu Krosing
On 11/16/2013 10:30 PM, David E. Wheeler wrote:
 On Nov 16, 2013, at 12:04 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 Then perhaps name the new binary json as jsob (JavaScript Object Binary)
 or just jsobj (JavaScript Object) and keep current json for what it is,
 namely
 JavaScript Object Notation.
 It’s still input and output as JSON, though.
Yes, because JavaScript Object Notation *is* a serialization format
(aka Notation) for converting JavaScript Objects to text format
and back :)
  I still like JSONB best.
To me it feels redundant, like binarytextbinary

the binary representation of JSON is JavaScript(-like) Object, not
binary json

So my vote would be either jsobj or jsdoc (as document databases) tend
to call the structured types documents

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-16 Thread Josh Berkus
On 11/16/2013 02:04 PM, Hannu Krosing wrote:
 On 11/16/2013 10:30 PM, David E. Wheeler wrote:
  I still like JSONB best.
 To me it feels redundant, like binarytextbinary
 
 the binary representation of JSON is JavaScript(-like) Object, not
 binary json

JSONB is as close as we can get to JSON, and it gives people the idea
that this is the successor type to JSON.  So +1 from me as well on JSONB.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?
 It doesn't, but the row_to_json function has a very clear mechanism of
 action.  And, 'not being documented' is not the standard for latitude
 to make arbitrary changes to existing function behaviors.
 the whole hash*() function family was changed based on not documented
 premise, so we do have a precedent .

 In my experience, no JSON parser guarantees key order or duplication.
 I found one in about two seconds.  http://docs.python.org/2/library/json.html

 object_pairs_hook, if specified will be called with the result of
 every JSON object decoded with an ordered list of pairs. The return
 value ofobject_pairs_hook will be used instead of the dict. This
 feature can be used to implement custom decoders that rely on the
 order that the key and value pairs are decoded (for example,
 collections.OrderedDict() will remember the order of insertion). If
 object_hook is also defined, the object_pairs_hooktakes priority.

 That makes the rest of your argument moot.  Plus, I quite clearly am
 dealing with parsers that do.
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

 The question is, how much of it should json *type* support.

 As discussed in other thread, most of your requirements
 would be met by having json/row/row set-to-text serializer
 functions which output json-formatted text.

No, that would not work putting aside the fact it would require
rewriting heaps of code.  What I do now inside the json wrapping
routines is create things like

{
  x: [
{dynamic object},
{dynamic object},
...
  ],
  y: ...,
  ...
}

The only way to do it is to build 'dynamic object' into json in
advance of the outer xxx_to_json call.  The 'dynamic object' is
created out of a json builder that takes a paired array -- basically a
variant of Andrew's 'json_build' upthread.  If the 'json serializer'
outputted text, the 'outer' to_json call would then re-escape the
object.  I can't use hstore for that purpose precisely because of the
transformations it does on the object.

Stepping back, I'm using json serialization as a kind of 'supercharged
crosstab'.  To any client that can parse json, json serialization
completely displaces crosstabbing -- it's superior in every way.  I
am, if you may, kind of leading research efforts in the area and I can
tell you with absolute certainty that breaking this behavior is a
mistake.

Forcing hstore-ish output mechanisms removes the ability to handle
certain important edge cases that work just fine today. If that
ability was taken away, it would be a very bitter pill for me to
swallow and would have certain ramifications for me professionally; I
went out on a pretty big limb and pushed pg/json aggressively (over
strenuous objection) in an analytics product which is now in the final
stages of beta testing.  I would hate to see the conclusion of the
case study be Ultimately we had to migrate the code back to Hibernate
due to compatibility issues.

Here are the options on the table:
1) convert existing json type to binary flavor (notwithstanding objections)
2) maintain side by side types, one representing binary, one text.
unfortunately, i think the text one must get the name 'json' due to
unfortunate previous decision.
3) merge the behaviors into a single type and get the best of both
worlds (as suggested upthread).

I think we need to take a *very* hard look at #3 before exploring #1
or #2: Haven't through it through yet but it may be possible to handle
this in such a way that will be mostly transparent to the end user and
may have other benefits such as a faster path for serialization.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David E. Wheeler
On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).
 
 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.

If it’s possible to preserve order and still get the advantages of binary 
representation --- which are substantial (see 
http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
examples) --- without undue maintenance overhead, then great.

I am completely opposed to duplicate key preservation in JSON, though. It has 
caused us a fair number of headaches at $work.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure
On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com wrote:
 On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).

 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.

 If it’s possible to preserve order and still get the advantages of binary 
 representation --- which are substantial (see 
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It has 
 caused us a fair number of headaches at $work.

Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered).  The binary representation is the hstore-ish
variant.  The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there.  otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode.   So, if you went: 'select '{a:
1, a: 2}'::json', you'd get the same thing back.  (This is how
it works now.).  but, if you went: 'insert into foo select '{a: 1,
  a: 2}'::json returning *', you'd get {a: 2} back essentially
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such.  It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data.  They create *only* the
text mode.  However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant.  This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise.  so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling  json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Andrew Dunstan


On 11/15/2013 03:25 PM, Merlin Moncure wrote:

On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com wrote:

On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:


Here are the options on the table:
1) convert existing json type to binary flavor (notwithstanding objections)
2) maintain side by side types, one representing binary, one text.
unfortunately, i think the text one must get the name 'json' due to
unfortunate previous decision.
3) merge the behaviors into a single type and get the best of both
worlds (as suggested upthread).

I think we need to take a *very* hard look at #3 before exploring #1
or #2: Haven't through it through yet but it may be possible to handle
this in such a way that will be mostly transparent to the end user and
may have other benefits such as a faster path for serialization.

If it’s possible to preserve order and still get the advantages of binary 
representation --- which are substantial (see 
http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
examples) --- without undue maintenance overhead, then great.

I am completely opposed to duplicate key preservation in JSON, though. It has 
caused us a fair number of headaches at $work.

Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered).  The binary representation is the hstore-ish
variant.  The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there.  otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode.   So, if you went: 'select '{a:
1, a: 2}'::json', you'd get the same thing back.  (This is how
it works now.).  but, if you went: 'insert into foo select '{a: 1,
   a: 2}'::json returning *', you'd get {a: 2} back essentially
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such.  It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data.  They create *only* the
text mode.  However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant.  This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise.  so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling  json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...




It's making my head hurt, to be honest, and it sounds like a recipe for 
years and years of inconsistencies and bugs.


I don't want to have two types, but I think I'd probably rather have two 
clean types than this. I can't imagine it being remotely acceptable to 
have behaviour depend in whether or not something was ever stored, which 
is what this looks like.


cheers

andrew











--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure
On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 11/15/2013 03:25 PM, Merlin Moncure wrote:

 On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com
 wrote:

 On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding
 objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).

 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.

 If it’s possible to preserve order and still get the advantages of binary
 representation --- which are substantial (see
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It
 has caused us a fair number of headaches at $work.

 Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

 *) 'json' type internally has a binary as well a text representation.
 The text representation is basically the current type behavior
 (duduplicated unordered).  The binary representation is the hstore-ish
 variant.  The text mode is discarded when it's deemed no longer
 appropriate to be needed, and, once gone, can never be rebuilt as it
 was.

 *) only the binary internal representation ever gets stored to disk
 (or anything else).

 *) the text mode is preferred for output if it is there.  otherwise, a
 deduplicated, reordered text representation is generated

 *) When literal text is casted to json, the binary structure is built
 up and kept alongside binary mode.   So, if you went: 'select '{a:
 1, a: 2}'::json', you'd get the same thing back.  (This is how
 it works now.).  but, if you went: 'insert into foo select '{a: 1,
a: 2}'::json returning *', you'd get {a: 2} back essentially
 (although technically that would be a kind of race).

 *) When the json is stored to table, the text representation gets
 immediately discarded on the basis that it's no longer the true
 representation of the data.

 *) Ditto when making any equality operation (not as sure on this point).

 *) Ditto when doing any operation that mutates the structure in any
 way. the text representation is immutable except during serialization
 and if it gets invalidated it gets destroyed.

 *) New API function: json_simplify(); or some such.  It reorders and
 dedups from user's point of view (but really just kills off the text
 representation)

 *) once the text mode is gone, you get basically the proposed 'hstore'
 behavior.

 *) serialization functions are generally used in contexts that do not
 store anything but get output as query data.  They create *only* the
 text mode.  However, if the resultant json is stored anywhere, the
 text mode is destroyed and replaced with binary variant.  This is both
 a concession to the current behavior and an optimization of
 'serialization-in-query' for which I think the binary mode is pessimal
 performance wise.  so, xxx_to_json serialization functions work
 exactly as they do now which fixes my problem essentially.

 *) if you are unhappy with duplicates in the above, just get use to
 calling  json_simpify() on the serialized json (or deal with in on the
 client side).

 This is all pretty glossy, but maybe there is a way forward...



 It's making my head hurt, to be honest, and it sounds like a recipe for
 years and years of inconsistencies and bugs.

 I don't want to have two types, but I think I'd probably rather have two
 clean types than this. I can't imagine it being remotely acceptable to have
 behaviour depend in whether or not something was ever stored, which is what
 this looks like.

Well, maybe so.  My main gripe with the 'two types' solutions is that:
1) current type is already in core (that is, not an extension). In
hindsight, I think this was a huge mistake.
2) current type has grabbed the 'json' type name and the 'json_xxx' API.
3) current type is getting used all over the place

'Two types' means that (AIUI) you can't mess around with the existing
API too much. And the new type (due out in 2016?) will be something of
a second citizen.  The ramifications of dealing with the bifurcation
is what makes *my* head hurt.  Every day the json stuff is getting
more and more widely adopted.  9.4 isn't going to drop until 2014 best
case and it won't be widely deployed in the enterprise until 2015 and
beyond.  So you're going to have a huge code base 

Re: [HACKERS] additional json functionality

2013-11-15 Thread Josh Berkus
On 11/15/2013 12:25 PM, Merlin Moncure wrote:
 Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
 
 *) 'json' type internally has a binary as well a text representation.
 The text representation is basically the current type behavior

snip long detailed explanation of behavior-dependant type

That's not at all workable.  Users would be completely unable to predict
or understand the JSON type and how it acts.  That's not just violating
POLS; that's bashing POLS' head in with a shovel.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure
On Fri, Nov 15, 2013 at 2:54 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/15/2013 12:25 PM, Merlin Moncure wrote:
 Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

 *) 'json' type internally has a binary as well a text representation.
 The text representation is basically the current type behavior

 snip long detailed explanation of behavior-dependant type

 That's not at all workable.  Users would be completely unable to predict
 or understand the JSON type and how it acts.  That's not just violating
 POLS; that's bashing POLS' head in with a shovel.

All right: make a new type then, and leave the current one alone please.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Andres Freund
On 2013-11-15 12:54:53 -0800, Josh Berkus wrote:
 On 11/15/2013 12:25 PM, Merlin Moncure wrote:
  Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
  
  *) 'json' type internally has a binary as well a text representation.
  The text representation is basically the current type behavior
 
 snip long detailed explanation of behavior-dependant type
 
 That's not at all workable.  Users would be completely unable to predict
 or understand the JSON type and how it acts.  That's not just violating
 POLS; that's bashing POLS' head in with a shovel.

It's also not currently possible to implement such a behaviour inside a
type's functions. You'd need core code cooperation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David E. Wheeler
On Nov 15, 2013, at 12:37 PM, Andrew Dunstan and...@dunslane.net wrote:

 It's making my head hurt, to be honest, and it sounds like a recipe for years 
 and years of inconsistencies and bugs.
 
 I don't want to have two types, but I think I'd probably rather have two 
 clean types than this. I can't imagine it being remotely acceptable to have 
 behaviour depend in whether or not something was ever stored, which is what 
 this looks like.

I disklike having two types (no, three -- there is hstore, too!). But if there 
is consensus for it (and I am not at all convinced that there is at this 
point), I can live with it. Docs would have to be pretty explicit, though.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Josh Berkus
On 11/15/2013 01:12 PM, David E. Wheeler wrote:
 On Nov 15, 2013, at 12:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 
 It's making my head hurt, to be honest, and it sounds like a recipe for 
 years and years of inconsistencies and bugs.

 I don't want to have two types, but I think I'd probably rather have two 
 clean types than this. I can't imagine it being remotely acceptable to have 
 behaviour depend in whether or not something was ever stored, which is what 
 this looks like.
 
 I disklike having two types (no, three -- there is hstore, too!). But if 
 there is consensus for it (and I am not at all convinced that there is at 
 this point), I can live with it. Docs would have to be pretty explicit, 
 though.

I would be happy to do a survey on how common key ordering and/or
duplicate keys are in postgresql+json.  However, I'm not clear on what
set of survey responses would decide us in either direction.  Even as a
pool of one, Merlin's case is a pretty persuasive example ... and, as he
points out, there will be applications built around 9.3's JSON which
havent even been written yet.

I believe this was a danger we recognized when we added the JSON type,
including the possibility that a future binary type might need to be a
separate type due to compatibility issues.  The only sad thing is the
naming; it would be better for the new type to carry the JSON name in
the future, but there's no way to make that work that I can think of.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Merlin Moncure-2 wrote
 I don't want to have two types, but I think I'd probably rather have two
 clean types than this. I can't imagine it being remotely acceptable to
 have
 behaviour depend in whether or not something was ever stored, which is
 what
 this looks like.
 
 Well, maybe so.  My main gripe with the 'two types' solutions is that:
 1) current type is already in core (that is, not an extension). In
 hindsight, I think this was a huge mistake.
 2) current type has grabbed the 'json' type name and the 'json_xxx' API.
 3) current type is getting used all over the place
 
 'Two types' means that (AIUI) you can't mess around with the existing
 API too much. And the new type (due out in 2016?) will be something of
 a second citizen.  The ramifications of dealing with the bifurcation
 is what makes *my* head hurt.  Every day the json stuff is getting
 more and more widely adopted.  9.4 isn't going to drop until 2014 best
 case and it won't be widely deployed in the enterprise until 2015 and
 beyond.  So you're going to have a huge code base operating on the
 'legacy' json type.
 
 merlin

The current type can store the exact same data as what a hash-like type
could store.  It can also store stuff a hash-like type would not be able to
store.  From my reading the main reason for adding the new hash-like type
would be to increase the performance characteristics of using said type. So:

1) if reasonable performance can be had with the current type the new type
would be unnecessary
2) if #1 is not possible then the new type trades of leniency in format for
performance improvements

One implication of #2 is that existing json that wants the improved
performance will need to undergo a full-table rewrite in order to be
converted.

Both output textual representations are identical and function overloading
and API should be able to maintained substantially identical between the two
types.

David J



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778628.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread k...@rice.edu
On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:
 
 I believe this was a danger we recognized when we added the JSON type,
 including the possibility that a future binary type might need to be a
 separate type due to compatibility issues.  The only sad thing is the
 naming; it would be better for the new type to carry the JSON name in
 the future, but there's no way to make that work that I can think of.
 
 -- 
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com
 

What about a GUC for json version? Then you could choose and they
could both be call json.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Tom Lane
k...@rice.edu k...@rice.edu writes:
 On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:
 I believe this was a danger we recognized when we added the JSON type,
 including the possibility that a future binary type might need to be a
 separate type due to compatibility issues.  The only sad thing is the
 naming; it would be better for the new type to carry the JSON name in
 the future, but there's no way to make that work that I can think of.

 What about a GUC for json version? Then you could choose and they
 could both be call json.

GUCs that change user-visible semantics have historically proven to be
much less good ideas than they seem at first glance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Andrew Dunstan


On 11/15/2013 04:53 PM, Tom Lane wrote:

k...@rice.edu k...@rice.edu writes:

On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:

I believe this was a danger we recognized when we added the JSON type,
including the possibility that a future binary type might need to be a
separate type due to compatibility issues.  The only sad thing is the
naming; it would be better for the new type to carry the JSON name in
the future, but there's no way to make that work that I can think of.

What about a GUC for json version? Then you could choose and they
could both be call json.

GUCs that change user-visible semantics have historically proven to be
much less good ideas than they seem at first glance.





Yeah, it would be a total foot gun here I think.

I've come to the conclusion that the only possible solution is to have a 
separate type. That's a bit sad, but there it is. The upside is that 
this will make the work Teodor has mentioned simpler. (Desperately 
making lemonade from lemons here.)



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David E. Wheeler
On Nov 15, 2013, at 2:02 PM, Andrew Dunstan and...@dunslane.net wrote:

 Yeah, it would be a total foot gun here I think.
 
 I've come to the conclusion that the only possible solution is to have a 
 separate type. That's a bit sad, but there it is. The upside is that this 
 will make the work Teodor has mentioned simpler. (Desperately making lemonade 
 from lemons here.)

Fine. My bikeshedding: Call the new type jsonb. “B” for “binary.” Also, the 
old one is implicitly jsona. Get it?

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Hannu Krosing
On 11/15/2013 09:25 PM, Merlin Moncure wrote:
 On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com 
 wrote:
 On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).

 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.
 If it’s possible to preserve order and still get the advantages of binary 
 representation --- which are substantial (see 
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It 
 has caused us a fair number of headaches at $work.
Let's just  change the current json-constructing functions return type to
json_text which is exactly like text with 2 extra properties:

1) it is syntax-checked for valid json (that is it can be cast to json)

and

2) if included in outer json as data, it is included directly and is not
quoted like text


With just these two it should possible to have the following

a) Merlin and others can keep (ab)using json_text as this
wonderfully versatile format for feeding json parsers and
visualisers which accept duplicates and consider order significant

b) cast this to binary json object if de-duplication and fast access to
internals is needed

I do not think we need anything else for this

As far as I understand merlin is mostly ok with stored json being
normalised and the problem is just with constructing extended
json (a.k.a. processing instructions) to be used as source for
specialised parsers and renderers.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure
 On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.
 If it’s possible to preserve order and still get the advantages of binary 
 representation --- which are substantial (see 
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It 
 has caused us a fair number of headaches at $work.
 Let's just  change the current json-constructing functions return type to
 json_text which is exactly like text with 2 extra properties:

 1) it is syntax-checked for valid json (that is it can be cast to json)

 and

 2) if included in outer json as data, it is included directly and is not
 quoted like text


 With just these two it should possible to have the following

 a) Merlin and others can keep (ab)using json_text as this
 wonderfully versatile format for feeding json parsers and
 visualisers which accept duplicates and consider order significant

 b) cast this to binary json object if de-duplication and fast access to
 internals is needed

 I do not think we need anything else for this

I think you may be on to something here.  This might also be a way
opt-in to fast(er) serialization (upthread it was noted this is
unimportant; I'm skeptical).  I deeply feel that two types is not the
right path but I'm pretty sure that this can be finessed.

 As far as I understand merlin is mostly ok with stored json being
 normalised and the problem is just with constructing extended
 json (a.k.a. processing instructions) to be used as source for
 specialised parsers and renderers.

yes, this is correct.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Josh Berkus
On 11/15/2013 02:59 PM, Merlin Moncure wrote:
  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I think you may be on to something here.  This might also be a way
 opt-in to fast(er) serialization (upthread it was noted this is
 unimportant; I'm skeptical).  I deeply feel that two types is not the
 right path but I'm pretty sure that this can be finessed.
 
 As far as I understand merlin is mostly ok with stored json being
 normalised and the problem is just with constructing extended
 json (a.k.a. processing instructions) to be used as source for
 specialised parsers and renderers.

Thing is, I'm not particularly concerned about *Merlin's* specific use
case, which there are ways around. What I am concerned about is that we
may have users who have years of data stored in JSON text fields which
won't survive an upgrade to binary JSON, because we will stop allowing
certain things (ordering, duplicate keys) which are currently allowed in
those columns.  At the very least, if we're going to have that kind of
backwards compatibilty break we'll want to call the new version 10.0.

That's why naming old JSON as json_text won't work; it'll be a
hardened roadblock to upgrading.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Josh Berkus wrote
 On 11/15/2013 02:59 PM, Merlin Moncure wrote:
  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing lt;

 hannu@

 gt; wrote:
 I think you may be on to something here.  This might also be a way
 opt-in to fast(er) serialization (upthread it was noted this is
 unimportant; I'm skeptical).  I deeply feel that two types is not the
 right path but I'm pretty sure that this can be finessed.
 
 As far as I understand merlin is mostly ok with stored json being
 normalised and the problem is just with constructing extended
 json (a.k.a. processing instructions) to be used as source for
 specialised parsers and renderers.
 
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 
 That's why naming old JSON as json_text won't work; it'll be a
 hardened roadblock to upgrading.

Agreed.  I can't imagine a use-case that would warrant breaking the current
behavior of json.  Either we live with just one, text-oriented, json type
and finesse whatever performance gains we can without breaking
compatibility; or we introduce additional types (I personally like adding 2
instead of one but just adding the binary one would be ok) which - barring
an overwhelming desire by -core to group-self-flagellate - means giving the
new type an as yet unused name.

From a marketing perspective having 3 types with the following properties is
an easy message to sell:

1) json - liberal interpretation w/ validation only; stored as text; output
as-is
2) json_text - strict interpretation w/ validation only; stored as text;
output as-is
3) json_binary - strict interpretation w/ validation  parsing; stored as
binary; output normalized

This way json seems less like a mistake but rather an intentional desire
to introduce a liberal type that meets data exchange needs in the short term
and now, later, a structured data storage mechanism similar to hstore.

Even if you have json_binary I can imaging that some people would want to be
able to store the original strict json as-is.  Sure, they can use text, but
this way intent is made clear and validation is attached directly to the
type as opposed to having to be done separately.  The use-cases described
for needing a liberal json prove this out.  That said json would be an
acceptable replacement for json_text in many cases and separate validation
for strict json prior to storing into json isn't that heinous.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778655.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Looking at this a different way: could we just implement BSON and leave json
alone?

http://bsonspec.org/

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778656.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Josh Berkus
On 11/15/2013 04:00 PM, David Johnston wrote:
 Looking at this a different way: could we just implement BSON and leave json
 alone?
 
 http://bsonspec.org/

In short?  No.

For one thing, our storage format is different from theirs (better,
frankly), and as a result is not compliant with their standard.

That's a reason why we won't use the name BSON, either, since it's a
trademark of 10gen.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread Andrew Dunstan


On 11/15/2013 07:32 PM, Josh Berkus wrote:

On 11/15/2013 04:00 PM, David Johnston wrote:

Looking at this a different way: could we just implement BSON and leave json
alone?

http://bsonspec.org/

In short?  No.

For one thing, our storage format is different from theirs (better,
frankly), and as a result is not compliant with their standard.

That's a reason why we won't use the name BSON, either, since it's a
trademark of 10gen.




What is more, it has restrictions which we do not wish to have. See for 
example its treatment of numerics.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Garick Hamlin
On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote:
 On Nov 13, 2013, at 4:45 PM, Andrew Dunstan and...@dunslane.net wrote:
 
  It should be a pretty-printing function option, perhaps, but not core to 
  the type itself, IMO.
  
  I don't in the least understand how it could be a pretty printing option.
  If we move to a binary rep using the hstore stuff order will be destroyed
  and not stored anywhere, and duplicate keys will be lost. Once that's done,
  how would a pretty print option restore the lost info?
 
 I meant ordering the keys, usually in lexicographic order. I agree that 
 preserving order is untenable.

There is a canonical form.

http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00

A Canonical form would be very useful.  Thats a bit trickier than sorting the 
keys and I don't know there is an accepted canonical form for json yet that
can represent all json documents.  (The canonical form is not the pretty form, 
but I think the key ordering should be the same.)

It might be nice to have a more general canonical form if one emerges from 
somewhere that could encode any json.  Since without something like this,
hashing can only be well specified for the 'sensible subset of json' used in
security protocols.

Garick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 12:09 AM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..
 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.
 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.
 Common practice?  The internet is littered with complaints about
 documents being spontaneously re-ordered and or de-duplicated in
 various stacks.  Other stacks provide mechanisms for explicit key
 order handling (see here: http://docs.python.org/2/library/json.html).
   Why do you think they did that?

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.
 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.

 Maybe call it jsdoc or jdoc or jsobj or somesuch.

This is exactly what needs to be done, full stop (how about: hstore).
It really comes down to this: changing the serialization behaviors
that have been in production for 2 releases (three if you count the
extension) is bad enough, but making impossible some legal json
constructions which are currently possible is an unacceptable
compatibility break.  It's going to break applications I've currently
put into production with no clear workaround.  This is quite frankly
not ok and and I'm calling foul.  The RFC may claim that these
constructions are dubious but that's irrelevant.  It's up to the
parser to decide that and when serializing you are not in control of
the parser.

Had the json type been stuffed into an extension, there would be a
clearer path to get to where you want to go since we could have walled
off the old functionality and introduced side by side API calls.  As
things stand now, I don't see a clean path to do that.

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.

It's not just a matter of performance.  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus j...@agliodbs.com wrote:
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.

That's not very convincing.  What *exactly* is impossible and why to
you think it justifies breaking compatibility with current
applications?   The way forward seems pretty straightforward: given
that hstore is getting nesting power and is moving closer to the json
way of doing things it is essentially 'binary mode json'.  I'm ok with
de-duplication and key ordering when moving into that structure since
it's opt in and doesn't break the serialization behaviors we have
today.  If you want to go further and unify the types then you have to
go through the design work to maintain compatibility.

Furthermore, I bet the performance argument isn't so clear cut either.
 The current json type is probably faster at bulk serialization
precisely because you *dont* need to deduplicate and reorder keys: the
serialization operates without context.  It will certainly be much
better for in place manipulations but it's not nearly as simple as you
are making it out to be.

merlin


-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 12:20 AM, Josh Berkus wrote:
 Merlin,


 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.

 XML went through the same thing, which is part of how we got a bunch of
 incompatible dialects of XML.

 Now, your use case does show us that there's a case to be made for still
 having text JSON even after we have binary JSON. 
text-json could easily be a domain (text + check that it is convertible
to json)

maybe it is even possible to teach pg_upgrade to do this automatically
  There's a strong simplicity argument against that, though ...
I think it confuses most people, similar to how storing 1+1 as
processing instructions instead of just evaluationg it and storing 2 :)

OTOH we are in this mess now and have to solve the backwards
compatibility somehow.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 01:42 AM, Andrew Dunstan wrote:

 On 11/13/2013 07:01 PM, Hannu Krosing wrote:

 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.


 Well, that's two voices for that course of action.
I am not really for it (I would have liked to have a
json_object/json_structure instead of
json_string as the meaning of json) but I think there is quite strong
argument
for not breaking backwards compatibility.

 Interesting that I don't think I heard a single voice for this either
 at pgCon or pgOpen,
I attended neither, but I did voice my preferences for _not_ having the
json-as-source-code
type on the mailing lists during previous json discussions.

 although I spent large amounts of time at both talking to people about
 Json, so I'd be interested to hear more voices.

 It would actually simplify things in a way if we do that - we've been
 working on
 a way of doing this that wouldn't upset pg_upgrade. This would render
 that effort unnecessary.
I wonder how hard it would be to rename current json to json_source and
have a new
nested-hstore based json ?


 However it will complicate things for users who will have to choose
 between the data types,
 and function authors who will possibly have to write versions of
 functions to work with both types.
You mostly want the functions for json-object type.

This is supported by the fact that current functions on json-source
treat it as json-object (for example key lookup gives you the value
of latest key and not a list of all matching key values).

You may want some new functions on json-source
(maybe json_source_enumerate_key_values(json, key))
but the current ones are really for json-object.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).

yeah. hm. that's a good point.

Maybe there's a middle ground here: I bet the compatibility issues
would be minimized to an acceptable level if the 'xxx_to_json'
functions maintained their current behaviors; they would construct the
json type in a special internal mode that would behave like the
current type does.   In other words, the marshalling into binary
structure could happen when:

*) stored do a column in a table
*) when any modifying routine is called, updating a key, value, etc
*) manually via a function

but not at cast time.  This preserves compatibility for the important
points and allows serialization of structures that are difficult with
the binary mode variant.

merln


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 04:07 PM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.

 Maybe call it jsdoc or jdoc or jsobj or somesuch.
 This is exactly what needs to be done, full stop (how about: hstore).
hstore has completely different i/o formats and thus has similar
backwards compatibility problems.
 It really comes down to this: changing the serialization behaviors
It is really not serialisation behaviours as there is nothing you
can sensibly serialise to have repeated keys.

I agree that you can generate such JSON which would be valid
input tu any json parser, but no JavaScript Object which really serializes
to such JSON.
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  
we should have disallowed this from the beginning and should
have encourages using text as storage for JavaScript source code.
 It's going to break applications I've currently
 put into production with no clear workaround.  
we could rename the old json type during pg_upgrade, but this
would likely break at least implicit casts in functions.
 This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
You could choose a sane serializer ;)

The main argument here is still weather json is source
code or serialization result for JavaScript Object (Notation).

 Had the json type been stuffed into an extension, there would be a
 clearer path to get to where you want to go since we could have walled
 off the old functionality and introduced side by side API calls.  As
 things stand now, I don't see a clean path to do that.

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  
But one could argue that this is not json either but rather some
json-like input format for special parsers.

Current recommendation is to use text for these kinds of things.

 The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.
All these would be possible if we redefined json as another notation
for XML instead of string representation of JavaScript Object :)

And things could really be in-place only inside pl/language functions,
as PostgreSQL is still MVCC.

What should be faster is access to nested values, though I suspect
that it is not significantly faster unless you have very large json
documents.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 05:06 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).
 yeah. hm. that's a good point.

 Maybe there's a middle ground here: I bet the compatibility issues
 would be minimized to an acceptable level if the 'xxx_to_json'
 functions maintained their current behaviors; they would construct the
 json type in a special internal mode that would behave like the
 current type does.   
Do you have any xxx_to_json usage which can generate a field with
multiple equal keys ?

Or is it just about preserving order ?
 In other words, the marshalling into binary
 structure could happen when:

 *) stored do a column in a table
 *) when any modifying routine is called, updating a key, value, etc
 *) manually via a function

 but not at cast time.  This preserves compatibility for the important
 points and allows serialization of structures that are difficult with
 the binary mode variant.
Seems like this would not play nice with how PostgreSQL type system work
in general, but could be a way forward if you say that you really do not
need
to store the order-preserving, multi-valued json.

But in this case it could also be possible for these function to just
generate
json-format text, and with proper casts this would act exactly as you
describe
above, no ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 10:54 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 05:06 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).
 yeah. hm. that's a good point.

 Maybe there's a middle ground here: I bet the compatibility issues
 would be minimized to an acceptable level if the 'xxx_to_json'
 functions maintained their current behaviors; they would construct the
 json type in a special internal mode that would behave like the
 current type does.

 Do you have any xxx_to_json usage which can generate a field with
 multiple equal keys ?

Absolutely -- that's what I've been saying all along.  For example:

 IIRC the end consumer is jqgrid, although the structure format may be
being done to satisfy some intermediate transformations perhaps in GWT
or in the browser itself.  The point is I didn't define the structure
(I think it sucks too), it was given to me to create and I did.  The
object's dynamic keys and values are moved into json structure by
passing two parallel arrays into a userland function similar to what
Andrew is proposing with json_build functionality.

{
classDisplayName: null,
rows: [
{
PropertyName: xxx,
Row: 1,
Group: Executive Dashboard,
MetricName: Occupancy,
2012: 95.4%,
Q2: 96.5%,
Q3: 96.3%,
Q4: 94.8%,
2013: 95.1%,
Q2: 94.1%,
Q3: 96.0%,
Q4: 96.1%
},
{
PropertyName: xxx,
Row: 2,
Group: Executive Dashboard,
MetricName: Occupancy,
2012: 95.9%,
Q2: 97.3%,
Q3: 95.7%,
Q4: 95.2%,
2013: 93.9%,
Q2: 93.4%,
Q3: 95.3%,
Q4: 95.1%
}
]
}

 but not at cast time.  This preserves compatibility for the important
 points and allows serialization of structures that are difficult with
 the binary mode variant.

 Seems like this would not play nice with how PostgreSQL type system work
 in general, but could be a way forward if you say that you really do not
 need
 to store the order-preserving, multi-valued json.

Yes, exactly. I'm OK with simplifying the structure for storage
purposes because in that context postgres is the parser and gets to
decide what the precise behaviors are.  Simplifying the stored
structures during upgrade is an OK concession to make, I think.  It is
not safe to assume the structure should be simplified when
serializing.

 But in this case it could also be possible for these function to just
 generate
 json-format text, and with proper casts this would act exactly as you
 describe
 above, no ?

I think so. if I'm following you correctly.  Maybe you get the best of
both worlds and (mostly) maintaining compatibility by deferring the
decomposition into binary structure in certain contexts.  I'd even
throw in the equality operator (which, thankfully, we haven't defined
yet) as a place where decomposition could happen.  Pretty much any
scenario that isn't involved in raw assembly and output.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread David E. Wheeler
On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.

The current JSON type preserves key order and duplicates. But is it documented 
that this is a feature, or something to be guaranteed? Just because people have 
come to depend on something doesn’t mean we can’t change it. It’s one thing if 
we said this was a feature you could depend on, but AFAIK we haven’t. And 
frankly, the dupes have caused problems for some of my colleagues at work. To 
me, it’s a bug (or, at best, a mis-feature) that causes more issues than it 
prevents.

In my experience, no JSON parser guarantees key order or duplication. You can’t 
have dupes and there is no ordering in a Perl hash, Objective-C NSDictionary, 
or JavaScript object. There is of course order and there can be dupes in a JSON 
string, but not in the objects built from it. If you go in and out of a parser, 
dupes are eliminated and key order is not preserved. I expect the same from 
JSON storage.

With no guarantees of preserved ordering or duplication, and with no formal 
expectation of such by JSON parsers written for various programming languages, 
I think there is little to be lost by removing those aspects of the JSON type. 
For those (hopefully rare) situations where such expectations exist, the JSON 
should be stored as text, as Hannu suggests.

My $0.02.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.

 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?

It doesn't, but the row_to_json function has a very clear mechanism of
action.  And, 'not being documented' is not the standard for latitude
to make arbitrary changes to existing function behaviors.

 In my experience, no JSON parser guarantees key order or duplication.

I found one in about two seconds.  http://docs.python.org/2/library/json.html

object_pairs_hook, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value ofobject_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hooktakes priority.

That makes the rest of your argument moot.  Plus, I quite clearly am
dealing with parsers that do.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?
 It doesn't, but the row_to_json function has a very clear mechanism of
 action.  And, 'not being documented' is not the standard for latitude
 to make arbitrary changes to existing function behaviors.
the whole hash*() function family was changed based on not documented
premise, so we do have a precedent .

 In my experience, no JSON parser guarantees key order or duplication.
 I found one in about two seconds.  http://docs.python.org/2/library/json.html

 object_pairs_hook, if specified will be called with the result of
 every JSON object decoded with an ordered list of pairs. The return
 value ofobject_pairs_hook will be used instead of the dict. This
 feature can be used to implement custom decoders that rely on the
 order that the key and value pairs are decoded (for example,
 collections.OrderedDict() will remember the order of insertion). If
 object_hook is also defined, the object_pairs_hooktakes priority.

 That makes the rest of your argument moot.  Plus, I quite clearly am
 dealing with parsers that do.
I am sure you could also devise an json encoding scheme
where white space is significant ;)

The question is, how much of it should json *type* support.

As discussed in other thread, most of your requirements
would be met by having json/row/row set-to-text serializer
functions which output json-formatted text.

Then if you actually want to save this as easy to manipulate
json document, you can save this text to a field of type
json, which does de-duplication and loses order.

So my suggestion is to upgrade existing json data type to
text - or maybe json_text with format check - when upgrading
to 9.4, to change current function which output json  to
output text and have new json type which stores proper
JavaScript Object - like structured data.

I would like to go a step further and have it automatically support
not only the json data types as data but all postgresql data types
by including type oid in the binary encoding, but this is probably not
something for json but rather for a new pgdoc data type in 9.5

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread David Johnston
Hannu Krosing-5 wrote
 On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 lt;

 david@

 gt; wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure lt;

 mmoncure@

 gt; wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  

The current json format is a minimally conforming (i.e., does not enforce
the should not contain duplicates suggestion) structured json validating
type that stores its input as-is once validated.  Its presence is going to
probably cause difficulties with function API for reasons already mentioned
but its place in core type-library is already firmly established.  Andrew's
API additions seem like good things to have for this type.  I haven't seen
any comments on this but do these functions facilitate creating json that
can have duplicates and that maintain order?  Even if we accept input to
json with these limitations we are not obligated to make our own json output
minimally conforming - though we should at maintain such if it is already in
place.


 So my suggestion is to upgrade existing json data type to
 text - or maybe json_text with format check - when upgrading
 to 9.4, to change current function which output json  to
 output text and have new json type which stores proper
 JavaScript Object - like structured data.

Technically a down-grade but anyway...

How does this work with a pg_dump/pg_restore upgrade?


If we want to have maximally conforming json type(s) we can still create
them.  I'd say we'd still want two versions, similar in a way to how we have
bytea and text even though any text can technically be stored like
bytea.  The constructor API for both would want to be identical with the
only real difference being that text-json_source would be layout preserving
(i.e., validation only) while text-json_binary would be a true parsing
conversion.  Likewise json_source-text would output the same input while
json_binary-text would output the canonical form (pretty-printing and such
would need to be initiated via functions).

If things are going to be a little more complex anyway why not just go and
toss in the kitchen sink too?  This way we provide maximal flexibility. 
From a development perspective some features (indexes, equality, in-place
updates and related modification API) may only make sense on a subset of the
available types but trade-offs are a fact of life.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

 If there's agreement on taking these, I will prepare patches and submit
 them by the 15th.

 With JSON enhancement, my only concern is that there's work ongoing to
 integrate the v2 development version of hstore with json, providing
 typed hstore and an efficient binary storage format for json.

 It might be worth seeing how that work is going and what functionality
 needs to be added to it, rather than enhancing the existing json support
 that may soon change dramatically.

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).  Andrew's patches just fill out a couple
of missing cases that are handled in the existing API.   Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_  prefix was abandoned for build_json_object
and build_json_array.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement but it is the caller's reponsibility to ensure that keys
are not repeated..  I think the caller should have no such
responsibility.  Keys should be able to repeated.  Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice.  I greatly prefer the aggregation and the
variadic methods in json_build.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Andrew Dunstan


On 11/13/2013 09:45 AM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote:

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.


I'm going to be fairly upset if I'm told I have to wait for the new 
format work, and then I'm later told it's too late to bring this into 9.4.


I think these are really orthogonal issues. Adding a new serialization 
format (which I have been discussing with Oleg and Teodor, and which I 
hope to help in bringing to JSON) will make some things lots faster than 
they now are, and might make some things easier or possible where now 
they are hard or impossible, but it won't remove any functionality 
requirement.


In particular, json_build, which lets you build up arbitrarily complex 
and irregular json in a way that's just not possible without using a PL 
right now, is quite an important requirement. I've given talks about it 
and JSON users have been quite excited by the possibilities it opens up.


The patch for json_to_record is quite small (two functions), and it has 
the advantage that unlike the json_populate_record functions you don't 
need to have or create a named type to use it. I think that makes it 
worth having in itself.



I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).


I have no idea what this means.

The plan with the work that Oleg and Teodor are doing is to provide a 
set of common code that can be used by either a binary json 
representation (which will be able to be distinguished from a text 
representation, so there would be no pg_upgrade problems) or nested 
hstore. In effect, nested hstore and json would have pretty much 
identical capabilities, so using one ovber another should be largely a 
matter of preference than a forced choice. Frankly, I believe the 
audience for JSON is vastly larger, and I expect it to be the treeish 
data format of choice for almost all users.




Andrew's patches just fill out a couple
of missing cases that are handled in the existing API.   Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_  prefix was abandoned for build_json_object
and build_json_array.



I'm quite happy to change it.



Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format,



What? there is no new serialization format. This is a way to generate a 
json object in the existing format from a one or two dimensional array 
of text. c.f. |existing function hstore(text[]) = hstore|



and I don't agree with the
statement but it is the caller's reponsibility to ensure that keys
are not repeated..  I think the caller should have no such
responsibility.  Keys should be able to repeated.



They can be repeated, as they can in the current json text format. 
However, the function makes no attempt to deal with repeated keys. If a 
key is repeated in the inout it will be repeated in the output. In this 
respect it differs from the hstore function.


Note too, that one effect of moving to a non-text representation of json 
will be that duplicated keys will be resolved (last value will win). But 
that's a much wider issue that this function.




Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice.  I greatly prefer the aggregation and the
variadic methods in json_build.



The extension was built before json_build. But it met a requirement that 
existed at the time. It probably wouldn't be a tragedy to leave it out, 
but there is probably a place for it just as there is for the hstore 
function.




Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.


The third point seems to conflict with the first. I'd only consider that 
if we *do* add the one-array version of json_object.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan and...@dunslane.net wrote:
 I'm not so sure we should require hstore to do things like build
 arbitrary json objects even though I agree that hstore will probably
 displace json for must cases where you want to store nested data (as
 opposed to (de-)serialize).

 I have no idea what this means.

What I'm saying there is I agree: what's going with hstore is not
relevant here.  json features currently provide standalone
serialization and deserialization minus a couple of edge cases that
you are fixing up here.  Hstore will emerge as a json manipulation
engine.

Aside: I thought we blew it (as you know) by not unifying the hstore
and json APIs in the 9.2 cycle and now with the emerging json stuff I
really think so...it's way to late to do anything about it now even if
there was consensus on that point.

 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format,

 What? there is no new serialization format. This is a way to generate a json
 object in the existing format from a one or two dimensional array of text.
 c.f. |existing function hstore(text[]) = hstore|

Right --  I thought it took text, not text[] -- withdrawn.  I consider
a variant taking (text[], text[]) to be generally more practical than
the one argument version (either 1d or 2d variant).  Dealing with 2d
arrays is a headache unless you really know what you're doing.

 and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..  I think the caller should have no such
 responsibility.  Keys should be able to repeated.

 They can be repeated, as they can in the current json text format. However,
 the function makes no attempt to deal with repeated keys. If a key is
 repeated in the inout it will be repeated in the output. In this respect it
 differs from the hstore function.

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

 Note too, that one effect of moving to a non-text representation of json
 will be that duplicated keys will be resolved (last value will win). But
 that's a much wider issue that this function.

Right, exactly.  And I think this a pretty serious problem with 'non
text json' unless there is a relatively robust and fast process to
recompose the json properly for serialization purposes (but that's
mostly off topic for your proposed patch).

 Also, I'm not sure
 how the {k,v,k,v,k,v}...convention serialized into a string is very
 useful in general practice.  I greatly prefer the aggregation and the
 variadic methods in json_build.

 The extension was built before json_build. But it met a requirement that
 existed at the time. It probably wouldn't be a tragedy to leave it out, but
 there is probably a place for it just as there is for the hstore function.

 Putting it all together, I'd consider:
 *) dropping json_object (although maybe there is a case I'm not thinking
 about)
 *) changing json_build function names to get the json prefix
 *) adding a json object constructor that takes two parallel arrays as
 arguments.

 The third point seems to conflict with the first. I'd only consider that if
 we *do* add the one-array version of json_object.

ok, agreed.  so now I'm just saying to unify function names over json
prefix and maybe add text[], text[] variant for the object builder, or
maybe just drop json_object completely.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Andrew Dunstan


On 11/13/2013 11:37 AM, Merlin Moncure wrote:


Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.


I think you're probably going to lose any argument that says we should 
necessarily preserve key order (and possibly key duplication) in 
objects. The standard doesn't support such a contention, either:


   An object is an unordered collection of zero or more name/value
   pairs

   ...

   The names within an object SHOULD be unique.


Forcing us to preserve order and key duplication would be a pretty 
effective barrier to any performance improvements.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 11/13/2013 11:37 AM, Merlin Moncure wrote:

 Yes.  and I think this is one of the major advantages of the json API
 vs hstore: you can serialize objects that hstore cannot -- at least
 not without extra scaffolding (at least, AIUI, I haven't fully
 grappled with the coming hstore stuff yet).  In other words, just
 because key order and cardinality is unimportant in an associative
 array, it does not in any way follow it is similarly unimportant for
 object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

 Forcing us to preserve order and key duplication would be a pretty effective
 barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

  An object structure is represented as a pair of curly brackets
   surrounding zero or more name/value pairs (or members).  A name is a
   string.  A single colon comes after each name, separating the name
   from the value.  A single comma separates a value from a following
   name.  The names within an object SHOULD be unique.

And SHOULD means
  3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there.  Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients.  It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Josh Berkus
On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..

This is pretty standard in the programming languages I know of which use
JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle).  I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Certainly the upcoming binary storage is going to require unique keys.
For that matter, both MongoDB and CouchDB store unique, unordered keys.
 And ever supporting CRUD functions (i.e. update this key) is going to
require uniqueness.

 Putting it all together, I'd consider:
 *) dropping json_object (although maybe there is a case I'm not
thinking about)
 *) changing json_build function names to get the json prefix
 *) adding a json object constructor that takes two parallel arrays as
 arguments.

I was with you until the third idea.  Huh?

The scripting languages I use (Perl, Python) have functions which
convert a list/array to a hash/dictionary.  In each case, the standard
input is a single list/array in the form [ k, v, k, v, k, v ].  Now,
while there are standard language functions which support munging two
parallel arrays into one hash (such as Python's zip()), these are less
frequently used.  Supporting the zip() option without supporting the [
k, v ] array option would be a bizarre and puzzling approach to most
programmers I know.  I can see three approaches which make sense:

1. we don't include json_object at all.
2. we include the existing json_object
3. we include json_object, plus a second json_object function which
takes two arrays

Keep in mind that all of Andrew's functions came out of real-life use
cases of writing applications which return JSON to the caller, so they
are based on real needs to fill holes in our JSON-building function library.

In the case of json_object, the need was to supply column labels
where, usually due to calculated columns, none exist in the input.  Take
the example where I want to return a bunch of aggregates from a table as
a series of json objects with user-friendly labels:

SELECT build_json_object( dept, department, total_costs, sum(costs),
running_total, running_sum() )
FROM 

Where it becomes even more useful is when you want the json label to be
the result of a calculated expression:

SELECT build_json_object ( department, sum() )

Yes, you could do this with a two-array version as well; it's just not
more intuitive, and in cases where you have dozens of columns, puts you
in column-counting hell.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Andrew Dunstan


On 11/13/2013 04:58 PM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty effective
barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

   An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members).  A name is a
string.  A single colon comes after each name, separating the name
from the value.  A single comma separates a value from a following
name.  The names within an object SHOULD be unique.

And SHOULD means
   3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there.  Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients.  It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.




I understand the difference between should and must. But there is 
nothing that REQUIRES us to preserve key order or duplicate keys. If you 
really need textual preservation, you should probably store the data as 
text and convert it to json to do json-ish things to it. If not, we're 
going to face huge demands to implement another type which almost 
everyone but you will move to in rapid order because it performs so much 
better. The strong consensus I have seen in discussions at conferences 
and elsewhere is to go the way we're going, instead.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Gavin Flower

On 14/11/13 11:33, Andrew Dunstan wrote:


On 11/13/2013 04:58 PM, Merlin Moncure wrote:
On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net 
wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty 
effective

barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

   An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members).  A name is a
string.  A single colon comes after each name, separating the name
from the value.  A single comma separates a value from a following
name.  The names within an object SHOULD be unique.

And SHOULD means
   3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there. Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients. It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.




I understand the difference between should and must. But there is 
nothing that REQUIRES us to preserve key order or duplicate keys. If 
you really need textual preservation, you should probably store the 
data as text and convert it to json to do json-ish things to it. If 
not, we're going to face huge demands to implement another type which 
almost everyone but you will move to in rapid order because it 
performs so much better. The strong consensus I have seen in 
discussions at conferences and elsewhere is to go the way we're going, 
instead.


cheers

andrew




I can see that both points of view are valid, in different contexts.

Would be possible to have a boolean, such as 'strict' - so that unique  
ordered was only imposed when strict was TRUE? Alternately, separate 
functions to allow the same choice?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..

 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.

 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.

Common practice?  The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks.  Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html).
  Why do you think they did that?

I use pg/JSON all over the place.  In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument.  The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not.  I guess that's a done deal based on
'performance'.  I'm clearly not the only one to have complained about
this though.

merln


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Mike Blackwell
​

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:


  Putting it all together, I'd consider:
  *) dropping json_object (although maybe there is a case I'm not
 thinking about)
  *) changing json_build function names to get the json prefix
  *) adding a json object constructor that takes two parallel arrays as
  arguments.

 I was with you until the third idea.  Huh?
 ​​


​I actually had a use case for this today, though with hstore, importing a
fixed length record with​ something along the lines of:

hstore(
  ARRAY['field 1', 'field 2', 'field 3'],
  regexp_matches(fixed_field,'(.{4})(.{10})(.{5})')
)

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*
​


Re: [HACKERS] additional json functionality

2013-11-13 Thread Josh Berkus
Merlin,


 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.

It's not just a matter of performance.  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

XML went through the same thing, which is part of how we got a bunch of
incompatible dialects of XML.

Now, your use case does show us that there's a case to be made for still
having text JSON even after we have binary JSON.  There's a strong
simplicity argument against that, though ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-13 Thread Hannu Krosing
On 11/14/2013 12:09 AM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..
 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.
 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.
 Common practice?  The internet is littered with complaints about
 documents being spontaneously re-ordered and or de-duplicated in
 various stacks.  Other stacks provide mechanisms for explicit key
 order handling (see here: http://docs.python.org/2/library/json.html).
   Why do you think they did that?

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  
I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Maybe call it jsdoc or jdoc or jsobj or somesuch.

For some time I was also pretty perplexed by by some PostgreSQL JSON
type discussions where JSON was not really being a defined as
the the type constructed from its string representation, and even not
a string which results from serialising an existing javascript object,
but rather a source code, which can be parsed into a structured type.

So PostgreSQL json type is *not* a structured type like hstore is but
is really a string type with a few syntax checks.

Some of the json_* functions are then defined on top of this
json-source type which treat this source as if it were actual
structured type.

It is kind of defining an int-notation type, which acts like an integer
when added to another integer, but is required to  also keep its original
representation:

select '1+1'::int-notation + 2;
== 4
select '1+1'::int-notation
== 1+1

 I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
I am pretty sure we can not move to internal object representation and
preserve the current 'json source behaviour.

this is why I recommend not replacing json, but rather adding another
built-in
for real structured type.

then you can keep using current json for the earlier-quoted uses of
processing instructions and do real data manipulation on jsdoc/jsobj type.

Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >