[HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-03-28 Thread Simon Riggs
On Sat, 2010-03-27 at 22:39 +, Greg Stark wrote:
 On Sat, Mar 27, 2010 at 7:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Sat, 2010-03-27 at 19:15 +, Greg Stark wrote:
   If we're pruning an index entry to a heap tuple that has been HOT
   pruned wouldn't the HOT pruning record have already conflicted with
   any queries that could see  it?
 
  Quite probably, but a query that started after that record arrived might
  slip through. We have to treat each WAL record separately.
 
 Slip through? I'm not following you.

No, there is no possibility for it to slip through, you're right. (After
much thinking).

  Do you agree with the conjecture? That LP_DEAD items can be ignored
  because their xid would have been earlier than the latest LP_NORMAL
  tuple we find? (on any page).
 
  Or is a slightly less strong condition true: we can ignore LP_DEAD items
  on a page that is also referenced by an LP_NORMAL item.
 
 I don't like having dependencies on the precise logic in vacuum rather
 than only on the guarantees that vacuum provides. We want to improve
 the logic in vacuum and hot pruning to cover more cases and that will
 be harder if there's code elsewhere depending on its simple-minded xid
 = globalxmin test.

Agreed

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Mar 27, 2010 at 4:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not seeing how that would occur or would matter, but the worst case
 answer is to restart the scan of the SpecialJoinInfos from scratch any
 time you succeed in doing a join removal.

 Well, say you have something like

 SELECT 1 FROM A LEFT JOIN (B LEFT JOIN C ON Pbc) ON Pab

 I think that the SpecialJoinInfo structure for the join between B and
 C will match the criteria I articulated upthread, but the one for the
 join between A and {B C} will not.  If C had not been in the query
 from the begining then we'd have had:

 SELECT 1 FROM A LEFT JOIN B ON Pab

 ...under which circumstances the SpecialJoinInfo would match the
 aforementioned criteria.

 I experimented with this and found that you're correct: the tests on the
 different SpecialJoinInfos do interact, which I hadn't believed
 initially.  The reason for this is that when we find out we can remove a
 particular rel, we have to remove the bits for it in other relations'
 attr_needed bitmaps.  In the above example, we first discover we can
 remove C.  Whatever B vars were used in Pbc will have an attr_needed
 set of {B,C}, and that C bit will prevent us from deciding that B can
 be removed when we are examining the upper SpecialJoinInfo (which will
 not consider C to be part of either min_lefthand or min_righthand).
 So we have to remove the C bits when we remove C.

 Attached is an extremely quick-and-dirty, inadequately commented draft
 patch that does it along the lines you are suggesting.  This was just to
 see if I could get it to work at all; it's not meant for application in
 anything like its current state.  However, I feel a very strong
 temptation to finish it up and apply it before we enter beta.  As you
 noted, this way is a lot cheaper than the original coding, whether one
 focuses on the cost of failing cases or the cost when the optimization
 is successful.  And if we hold it off till 9.1, then any bug fixes that
 have to be made in the area later will need to be made against two
 significantly different implementations, which will be a real PITA.

 Things that would need to be cleaned up:

 * I left join_is_removable where it was, mainly so that it was easy to
 compare how much it changed for this usage (not a lot).  I'm not sure
 that joinpath.c is an appropriate place for it anymore, though I can't
 see any obviously better place either.  Any thoughts on that?

I dislike the idea of leaving it in joinpath.c.  I don't even think it
properly belongs in the path subdirectory since it no longer has
anything to do with paths.  Also worth thinking about where we would
put the logic I pontificated about here:

http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

 * The removed relation has to be taken out of the set of baserels
 somehow, else for example the Assert in make_one_rel will fail.
 The current hack is to change its reloptkind to RELOPT_OTHER_MEMBER_REL,
 which I think is a bit unclean.  We could try deleting it from the
 simple_rel_array altogether, but I'm worried that that could result in
 dangling-pointer failures, since we're probably not going to go to the
 trouble of removing every single reference to the rel from the planner
 data structures.  A possible compromise is to invent another reloptkind
 value that is only used for dead relations.

+1 for dead relation type.

 * It would be good to not count the removed relation in
 root-total_table_pages.  If we made either of the changes suggested
 above then we could move the calculation of total_table_pages down to
 after remove_useless_joins and ignore the removed relation(s)
 appropriately.  Otherwise I'm tempted to just subtract off the relation
 size from total_table_pages on-the-fly when we remove it.

Sounds good.

 * I'm not sure yet about the adjustment of PlaceHolder bitmaps --- we
 might need to break fix_placeholder_eval_levels into two steps to get
 it right.

Not familiar enough with that code to comment.

 * Still need to reverse out the now-dead code from the original patch,
 in particular the NoOpPath support.

Yeah.

 Thoughts?

I'm alarmed by your follow-on statement that the current code can't
handle the two-levels of removable join case.  Seems like it ought to
form {B C} as a path over {B} and then {A B C} as a path over {A}.
Given that it doesn't, we already have a fairly serious bug, so we've
either got to put more work into the old implementation or switch to
this new one - and I think at this point you and I are both fairly
convinced that this is a better way going forward.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm alarmed by your follow-on statement that the current code can't
 handle the two-levels of removable join case.  Seems like it ought to
 form {B C} as a path over {B} and then {A B C} as a path over {A}.

Actually I think it ought to form {A B} as a no-op join and then be able
to join {A B} to {C} as a no-op join.  It won't recognize joining A to
{B C} as a no-op because the RHS isn't a baserel.  But yeah, I was quite
surprised at the failure too.  We should take the time to understand why
it's failing before we go further.  I ran out of steam last night but
will have a look into that today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Simon Riggs
On Sun, 2010-03-28 at 02:15 -0400, Tom Lane wrote:
 I wrote:
  [ crude patch ]
 
 Oh, btw, if you try to run the regression test additions in that patch
 against CVS HEAD, you'll find out that HEAD actually fails to optimize
 the two-levels-of-removable-joins case.  Seems like another reason to
 press ahead with making the change.

Yes, please.

Does the new patch find more than two levels of join removal?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More idle thoughts

2010-03-28 Thread Simon Riggs
On Fri, 2010-03-26 at 18:59 +, Greg Stark wrote:

 The Linux kernel had a big push to reduce latency, and one of the
 tricks they did was they replaced the usual interrupt points with a
 call which noted how long it had been since the last interrupt point.
 It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
 conditionally having it call a function which calls gettimeofday and
 compares with the previous timestamp received at the last CFI().

Reducing latency sounds good, but what has CFI got to do with that?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Does the new patch find more than two levels of join removal?

Well, I'd assume if it can do two nested levels then it should work for
any number, but I plead guilty to not having actually tested that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More idle thoughts

2010-03-28 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2010-03-26 at 18:59 +, Greg Stark wrote:
 It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
 conditionally having it call a function which calls gettimeofday and
 compares with the previous timestamp received at the last CFI().

 Reducing latency sounds good, but what has CFI got to do with that?

It took me about five minutes to figure out what Greg was on about too.
His point is that we need to locate code paths in which an extremely
long time can pass between successive CFI calls, because that means
the backend will fail to respond to SIGINT/SIGTERM for a long time.
Instrumenting CFI itself is a possible tool for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm alarmed by your follow-on statement that the current code can't
 handle the two-levels of removable join case.  Seems like it ought to
 form {B C} as a path over {B} and then {A B C} as a path over {A}.

 Actually I think it ought to form {A B} as a no-op join and then be able
 to join {A B} to {C} as a no-op join.  It won't recognize joining A to
 {B C} as a no-op because the RHS isn't a baserel.  But yeah, I was quite
 surprised at the failure too.  We should take the time to understand why
 it's failing before we go further.

OK, I traced through it, and the reason HEAD fails on this example is
that it *doesn't* recognize {A B} as a feasible no-op join, for
precisely the reason that it sees some B vars marked as being needed for
the not-yet-done {B C} join.  So that path is blocked, and the other
possible path to the desired result is also blocked because it won't
consider {B C} as a valid RHS for a removable join.

I don't see any practical way to escape the false-attr_needed problem
given the current code structure.  We could maybe hack our way to a
solution by weakening the restriction against the RHS being a join,
eg by noting that the best path for the RHS is a no-op join and then
drilling down to the one baserel.  But it seems pretty ugly.

So I think the conclusion is clear: we should consign the current
join-removal code to the dustbin and pursue the preprocessing way
instead.  Will work on it today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 * I left join_is_removable where it was, mainly so that it was easy to
 compare how much it changed for this usage (not a lot).  I'm not sure
 that joinpath.c is an appropriate place for it anymore, though I can't
 see any obviously better place either.  Any thoughts on that?

 I dislike the idea of leaving it in joinpath.c.  I don't even think it
 properly belongs in the path subdirectory since it no longer has
 anything to do with paths.  Also worth thinking about where we would
 put the logic I pontificated about here:
 http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

The only argument I can see for leaving it where it is is that it
depends on clause_sides_match_join, which we'd have to either duplicate
or global-ize in order to continue sharing that code.  However, since
join_is_removable now needs a slightly different API for that anyway
(cf changes in draft patch), it's probably better to not try to share it.
So let's put the join removal code somewhere else.  The reasonable
alternatives seem to be:

* in a new file in prep/.  Although this clearly has the flavor of
preprocessing, all the other work in prep/ is done before we get into
query_planner().  So this choice seems a bit dubious.

* directly in plan/planmain.c.  Has the advantage of being where the
caller is, so no globally visible function declaration needed.  No other
redeeming social value though.

* in plan/initsplan.c.  Somewhat reasonable, although that file is
rather large already.

* in a new file in plan/.  Not sure if it's worth this, though your
thought that we might add more logic later makes it more defensible.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 12:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 * I left join_is_removable where it was, mainly so that it was easy to
 compare how much it changed for this usage (not a lot).  I'm not sure
 that joinpath.c is an appropriate place for it anymore, though I can't
 see any obviously better place either.  Any thoughts on that?

 I dislike the idea of leaving it in joinpath.c.  I don't even think it
 properly belongs in the path subdirectory since it no longer has
 anything to do with paths.  Also worth thinking about where we would
 put the logic I pontificated about here:
 http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

 The only argument I can see for leaving it where it is is that it
 depends on clause_sides_match_join, which we'd have to either duplicate
 or global-ize in order to continue sharing that code.  However, since
 join_is_removable now needs a slightly different API for that anyway
 (cf changes in draft patch), it's probably better to not try to share it.
 So let's put the join removal code somewhere else.  The reasonable
 alternatives seem to be:

 * in a new file in prep/.  Although this clearly has the flavor of
 preprocessing, all the other work in prep/ is done before we get into
 query_planner().  So this choice seems a bit dubious.

 * directly in plan/planmain.c.  Has the advantage of being where the
 caller is, so no globally visible function declaration needed.  No other
 redeeming social value though.

 * in plan/initsplan.c.  Somewhat reasonable, although that file is
 rather large already.

 * in a new file in plan/.  Not sure if it's worth this, though your
 thought that we might add more logic later makes it more defensible.

I sort of like the last of these ideas though I'm at a loss for what
to call it.  Otherwise I kind of like planmain.c.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * in a new file in plan/.  Not sure if it's worth this, though your
 thought that we might add more logic later makes it more defensible.

 I sort of like the last of these ideas though I'm at a loss for what
 to call it.  Otherwise I kind of like planmain.c.

joinremoval.c ?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * in a new file in plan/.  Not sure if it's worth this, though your
 thought that we might add more logic later makes it more defensible.

 I sort of like the last of these ideas though I'm at a loss for what
 to call it.  Otherwise I kind of like planmain.c.

 joinremoval.c ?

Maybe, except as I mentioned in the email linked upthread, my plan for
implementing inner join removal would also include allowing join
reordering in cases where we currently don't.  So I don't want to
sandbox it too tightly as join removal, per se, though that's
certainly what we have on the table ATM.  It's more like advanced
open-heart join-tree surgery - like prepjointree, but much later in
the process.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 joinremoval.c ?

 Maybe, except as I mentioned in the email linked upthread, my plan for
 implementing inner join removal would also include allowing join
 reordering in cases where we currently don't.  So I don't want to
 sandbox it too tightly as join removal, per se, though that's
 certainly what we have on the table ATM.  It's more like advanced
 open-heart join-tree surgery - like prepjointree, but much later in
 the process.

Hm.  At this point we're not really working with a join *tree* in any
case --- the data structure we're mostly concerned with is the list of
SpecialJoinInfo structs, and what we're trying to do is weaken the
constraints described by that list.  So I'd rather stay away from tree
terminology.

planjoins.c would fit with other names in the plan/ directory but it
seems like a misnomer because we're not really planning any joins
at this stage.

adjustjoins.c?  loosenjoins.c?  weakenjoins.c?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Alpha release this week?

2010-03-28 Thread Josh Berkus
All,

We've got two locations and some individuals signed up for a test-fest
this weekend.  Would it be possible to do an alpha release this week?
It would really help to be testing later code than Alpha4.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams
I introduced myself in the thread Proposal: access control jails (and
introduction as aspiring GSoC student), and we discussed jails and
session-local variables.  But, as Robert Haas suggested, implementing
variable support in the backend would probably be way too ambitious a
project for a newbie like me.  I decided instead to pursue the task of
adding JSON support to PostgreSQL, hence the new thread.

I plan to reference datatype-xml.html and functions-xml.html in some
design decisions, but there are some things that apply to XML that
don't apply to JSON and vice versa.  For instance, jsoncomment
wouldn't make sense because (standard) JSON doesn't have comments.
For access, we might have something like json_get('foo[1].bar') and
json_set('foo[1].bar', 'hello').  jsonforest and jsonagg would be
beautiful.  For mapping, jsonforest/jsonagg could be used to build a
JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2,
...)) FROM tbl), but I'm not sure on the best way to go the other way
around (generate a result set from JSON).  CSS-style selectors would
be cool, but selecting is what SQL is all about, and I'm not sure
having a json_select(dom-element[key=value]) function is a good,
orthogonal approach.

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).  Priorities of the JSON implementation, in descending
order, are:

* Small
* Correct
* Fast

Moreover, JSON operations shall not crash due to stack overflows.

I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen
any JSON implementations out there that use it anyway).  I would
probably end up writing the JSON parser/serializer manually.  It
should not take more than a week.

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.

Thoughts?  Thanks.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 4:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 joinremoval.c ?

 Maybe, except as I mentioned in the email linked upthread, my plan for
 implementing inner join removal would also include allowing join
 reordering in cases where we currently don't.  So I don't want to
 sandbox it too tightly as join removal, per se, though that's
 certainly what we have on the table ATM.  It's more like advanced
 open-heart join-tree surgery - like prepjointree, but much later in
 the process.

 Hm.  At this point we're not really working with a join *tree* in any
 case --- the data structure we're mostly concerned with is the list of
 SpecialJoinInfo structs, and what we're trying to do is weaken the
 constraints described by that list.  So I'd rather stay away from tree
 terminology.

 planjoins.c would fit with other names in the plan/ directory but it
 seems like a misnomer because we're not really planning any joins
 at this stage.

 adjustjoins.c?  loosenjoins.c?  weakenjoins.c?

How about analyzejoins.c?  Loosen and weaken don't seem like quite the
right idea; adjust is a little generic and perhaps overused, but not
bad.  If you don't like analyzejoins then go with adjustjoins.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I'm wondering whether the internal representation of JSON should be
 plain JSON text, or some binary code that's easier to traverse and
 whatnot.  For the sake of code size, just keeping it in text is
 probably best.

+1 for text.

 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.

 As far as character encodings, I'd rather keep that out of the JSON
 parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
 I'll just throw encode/decode/validate operations at it.

I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Robert Haas wrote:

On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
  

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.



+1 for text.
  


Agreed.
  

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).



Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.
  



Yeah, why? We should not be in the business of reinventing the wheel 
(and then maintaining the reinvented wheel), unless the code in question 
is *really* small.


  

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.



I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.


  



The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple I'll just throw some encoding/decoding function calls at it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] five-key syscaches

2010-03-28 Thread Robert Haas
Per previous discussion, PFA a patch to change the maximum number of
keys for a syscache from 4 to 5.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01105.php

This is intended for application to 9.1, and is supporting
infrastructure for knngist.

...Robert


syscache5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 I think you need to assume that the encoding will be the server
 encoding, not UTF-8.  Although others on this list are better
 qualified to speak to that than I am.

 The trouble is that JSON is defined to be specifically Unicode, and in 
 practice for us that means UTF8 on the server side.  It could get a bit 
 hairy, and it's definitely not something I think you can wave away with 
 a simple I'll just throw some encoding/decoding function calls at it.

It's just text, no?  Are there any operations where this actually makes
a difference?

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha release this week?

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 4:40 PM, Josh Berkus j...@agliodbs.com wrote:
 We've got two locations and some individuals signed up for a test-fest
 this weekend.  Would it be possible to do an alpha release this week?
 It would really help to be testing later code than Alpha4.

I'm willing to do the CVS bits, if that's helpful.  Or maybe Peter
wants to do it.  Anyway I have no problem with the idea.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Robert Haas wrote:


I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.
  


  
The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple I'll just throw some encoding/decoding function calls at it.



It's just text, no?  Are there any operations where this actually makes
a difference?
  


If we're going to provide operations on it that might involve some. I 
don't know.

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

  


I agree entirely. All I'm suggesting is that there could be many 
wrinkles here.


Here's another thought. Given that JSON is actually specified to consist 
of a string of Unicode characters, what will we deliver to the client 
where the client encoding is, say Latin1? Will it actually be a legal 
JSON byte stream?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist 
 of a string of Unicode characters, what will we deliver to the client 
 where the client encoding is, say Latin1? Will it actually be a legal 
 JSON byte stream?

No, it won't.  We will *not* be sending anything but latin1 in such a
situation, and I really couldn't care less what the JSON spec says about
it.  Delivering wrongly-encoded data to a client is a good recipe for
all sorts of problems, since the client-side code is very unlikely to be
expecting that.  A datatype doesn't get to make up its own mind whether
to obey those rules.  Likewise, data on input had better match
client_encoding, because it's otherwise going to fail the encoding
checks long before a json datatype could have any say in the matter.

While I've not read the spec, I wonder exactly what consist of a string
of Unicode characters should actually be taken to mean.  Perhaps it
only means that all the characters must be members of the Unicode set,
not that the string can never be represented in any other encoding.
There's more than one Unicode encoding anyway...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist
 of a string of Unicode characters, what will we deliver to the client
 where the client encoding is, say Latin1? Will it actually be a legal
 JSON byte stream?

 No, it won't.  We will *not* be sending anything but latin1 in such a
 situation, and I really couldn't care less what the JSON spec says about
 it.  Delivering wrongly-encoded data to a client is a good recipe for
 all sorts of problems, since the client-side code is very unlikely to be
 expecting that.  A datatype doesn't get to make up its own mind whether
 to obey those rules.  Likewise, data on input had better match
 client_encoding, because it's otherwise going to fail the encoding
 checks long before a json datatype could have any say in the matter.

 While I've not read the spec, I wonder exactly what consist of a string
 of Unicode characters should actually be taken to mean.  Perhaps it
 only means that all the characters must be members of the Unicode set,
 not that the string can never be represented in any other encoding.
 There's more than one Unicode encoding anyway...

See sections 2.5 and 3 of:

http://www.ietf.org/rfc/rfc4627.txt?number=4627

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist
 of a string of Unicode characters, what will we deliver to the client
 where the client encoding is, say Latin1? Will it actually be a legal
 JSON byte stream?

 No, it won't.  We will *not* be sending anything but latin1 in such a
 situation, and I really couldn't care less what the JSON spec says about
 it.  Delivering wrongly-encoded data to a client is a good recipe for
 all sorts of problems, since the client-side code is very unlikely to be
 expecting that.  A datatype doesn't get to make up its own mind whether
 to obey those rules.  Likewise, data on input had better match
 client_encoding, because it's otherwise going to fail the encoding
 checks long before a json datatype could have any say in the matter.

 While I've not read the spec, I wonder exactly what consist of a string
 of Unicode characters should actually be taken to mean.  Perhaps it
 only means that all the characters must be members of the Unicode set,
 not that the string can never be represented in any other encoding.
 There's more than one Unicode encoding anyway...

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.

Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely.  If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \u equivalents.  I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)

UPDATE:  Thanks, Robert, for pointing to the RFC.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
 In practice, every parser/serializer I've used (including the one I
 helped write) allows (and, often, forces) any non-ASCII character to
 be encoded as \u followed by a string of four hex digits.

Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Robert Haas wrote:

On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
  

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in utf8 
is 4 (and that's pretty rare, IIRC).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Andrew Dunstan wrote:



Robert Haas wrote:
On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com 
wrote:
 

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in 
utf8 is 4 (and that's pretty rare, IIRC).





Sorry. Disregard this. I see what you mean.

Yeah, I thing *requiring* non-ascii character to be escaped would be evil.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander
On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
 In practice, every parser/serializer I've used (including the one I
 helped write) allows (and, often, forces) any non-ASCII character to
 be encoded as \u followed by a string of four hex digits.

 Is it correct to say that the only feasible place where non-ASCII
 characters can be used is within string constants?

Yes.  That includes object property strings -- they are quoted string literals.

 If so, it might be
 reasonable to disallow characters with the high-bit set unless the
 server encoding is one of the flavors of Unicode of which the spec
 approves.  I'm tempted to think that when the server encoding is
 Unicode we really ought to allow Unicode characters natively, because
 turning a long string of two-byte wide chars into a long string of
 six-byte wide chars sounds pretty evil from a performance point of
 view.


+1

As an aside, \u-encoded (escaped) characters and native multi-byte
sequences (of any RFC-allowable Unicode encoding) are exactly
equivalent in JSON -- it's a storage and transmission format, and
doesn't prescribe the application-internal representation of the data.

If it's faster (which it almost certainly is) to not mangle the data
when it's all staying server side, that seems like a useful
optimization.  For output to the client, however, it would be useful
to provide a \u-escaping function, which (AIUI) should always be safe
regardless of client encoding.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC Query

2010-03-28 Thread gaurav gupta
Sir/Ma'am,

I am a Mtech student and want to participate in GSoC. I have a project
idea and want to discuss its feasibility, usability and chance of selection
with you.

My idea is to add a functionality of Auto tuning and Auto Indexing/
Reindexing in DB languages.

Though I am not working on this I have some idea about implementation.
Idea is that on the no. of rows deleted, Inserted in the table we can make
our system capable to reindex the table that will save the time of user.
Similarly using the no. of select hits on a table we can check that if
maximum no. of times it is on a non-index field we can index on that field
to make select faster.

I am looking forward to hear from you.

--
Thanks  Regards,
Gaurav Kumar Gupta
+91-9032844745


Re: [HACKERS] GSoC Query

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 10:01 PM, gaurav gupta
gauravkumar.gu...@students.iiit.ac.in wrote:
 My idea is to add a functionality of Auto tuning and Auto Indexing/
 Reindexing in DB languages.

 Though I am not working on this I have some idea about implementation.
 Idea is that on the no. of rows deleted, Inserted in the table we can make
 our system capable to reindex the table that will save the time of user.

Reindexing is not routine maintenance for PostgreSQL, so this seems
fairly pointless.

 Similarly using the no. of select hits on a table we can check that if
 maximum no. of times it is on a non-index field we can index on that field
 to make select faster.

Well, a SELECT statement hits a whole row, not a single column; but
even if you could somehow figure out a way to tally up per-column
statistics (and it's certainly not obvious to me how to do such a
thing) it doesn't follow that a column which is frequently accessed is
a good candidate for indexing.

I don't think this is a good project for a first-time hacker, or
something that can realistically be completed in one summer.  It
sounds more like a PhD project to me.  I wrote to another student who
is considering submitting a GSOC proposal with some ideas I thought
might be suitable.  You might want to review that email:

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01034.php

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams
On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

 Why?  I'm not saying you aren't right, but you need to make an
 argument rather than an assertion.  This is a community, so no one is
 entitled to decide anything unilaterally, and people want to be
 convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback.  I didn't want people to have to wade
through too many I thinks .  I'll be sure to use opinion tags in
the future :-)

My reasoning for It should be built-in is:
 * It would be nice to have a built-in serialization format that's
available by default.
 * It might be a little faster because it doesn't have to link to an
external library.
 * The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.
 * The externally-maintained and packaged libjson implementations I
saw brought in lots of dependencies (e.g. glib).
 * Everyone else (e.g. PHP) uses a statically-linked JSON implementation.

Is the code in question *really* small?  Well, not really, but it's
not enormous either.  By the way, I found a bug in PHP's JSON_parser
(json_decode(true ); /* with a space */ returns null instead of
true).  I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8.  I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... .  If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes.  Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1.  When JSON is stored to the database:
 1. The client is responsible and sends a valid UTF-16 JSON string.
 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
it to UTF-8.
 3. The JSON code parses it (to ensure it's valid).
 4. The JSON code unparses it (to get a representation without
needless whitespace).  It is given a flag indicating it should only
output ASCII text.
 5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:
 1. ASCII is retrieved from the server
 2. If user needs to extract one or more fields, the JSON is parsed,
and the fields are extracted.
 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable.  I am aiming for simple code, not fast
code.  I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks.  Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I apologize; I was just starting the conversation with some of my
 ideas to receive feedback.  I didn't want people to have to wade
 through too many I thinks .  I'll be sure to use opinion tags in
 the future :-)

FWIW, I don't care at all whether you say I think or I know; the
point is that you have to provide backup for any position you choose
to take.

 My reasoning for It should be built-in is:
  * It would be nice to have a built-in serialization format that's
 available by default.
  * It might be a little faster because it doesn't have to link to an
 external library.

I don't think either of these reasons is valid.

  * The code to interface between JSON logic and PostgreSQL will
 probably be much larger than the actual JSON encoding/decoding itself.

If true, this is a good argument.

  * The externally-maintained and packaged libjson implementations I
 saw brought in lots of dependencies (e.g. glib).

As is this.

  * Everyone else (e.g. PHP) uses a statically-linked JSON implementation.

But this isn't.

 Is the code in question *really* small?  Well, not really, but it's
 not enormous either.  By the way, I found a bug in PHP's JSON_parser
 (json_decode(true ); /* with a space */ returns null instead of
 true).  I'll have to get around to reporting that.

 Now, assuming JSON support is built-in to PostgreSQL and is enabled by
 default, it is my opinion that encoding issues should not be dealt
 with in the JSON code itself, but that the JSON code itself should
 assume UTF-8.  I think conversions should be done to/from UTF-8 before
 passing it through the JSON code because this would likely be the
 smallest way to implement it (not necessarily the fastest, though).

 Mike Rylander pointed out something wonderful, and that is that JSON
 code can be stored in plain old ASCII using \u... .  If a target
 encoding supports all of Unicode, the JSON serializer could be told
 not to generate \u escapes.  Otherwise, the \u escapes would be
 necessary.

 Thus, here's an example of how (in my opinion) character sets and such
 should be handled in the JSON code:

 Suppose the client's encoding is UTF-16, and the server's encoding is
 Latin-1.  When JSON is stored to the database:
  1. The client is responsible and sends a valid UTF-16 JSON string.
  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
 it to UTF-8.
  3. The JSON code parses it (to ensure it's valid).
  4. The JSON code unparses it (to get a representation without
 needless whitespace).  It is given a flag indicating it should only
 output ASCII text.
  5. The ASCII is stored in the server, since it is valid Latin-1.

 When JSON is retrieved from the database:
  1. ASCII is retrieved from the server
  2. If user needs to extract one or more fields, the JSON is parsed,
 and the fields are extracted.
  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

 Note that I am being biased toward optimizing code size rather than speed.

Can you comment on my proposal elsewhere on this thread and compare
your proposal to mine?  In what ways are they different, and which is
better, and why?

 Here's a question about semantics: should converting JSON to text
 guarantee that Unicode will be \u escaped, or should it render actual
 Unicode whenever possible (when the client uses a Unicode-complete
 charset) ?

I feel pretty strongly that the data should be stored in the database
in the format in which it will be returned to the user - any
conversion which is necessary should happen on the way in.  I am not
100% sure to what extent we should attempt to canonicalize the input
and to what extend we should simply store it in whichever way the user
chooses to provide it.

 As for reinventing the wheel, I'm in the process of writing yet
 another JSON implementation simply because I didn't find the other
 ones I looked at palatable.  I am aiming for simple code, not fast
 code.  I am using malloc for structures and realloc for strings/arrays
 rather than resorting to clever buffering tricks.  Of course, I'll
 switch it over to palloc/repalloc before migrating it to PostgreSQL.

I'm not sure that optimizing for simplicity over speed is a good idea.
 I think we can reject implementations as unpalatable because they are
slow or feature-poor or have licensing issues or are not actively
maintained, but rejecting them because they use complex code in order
to be fast doesn't seem like the right trade-off to me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Hitoshi Harada
2010/3/29 Andrew Dunstan and...@dunslane.net:
 Robert Haas wrote:
 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I'm wondering whether the internal representation of JSON should be
 plain JSON text, or some binary code that's easier to traverse and
 whatnot.  For the sake of code size, just keeping it in text is
 probably best.

 +1 for text.

 Agreed.

There's another choice, called BSON.

http://www.mongodb.org/display/DOCS/BSON

I've not researched it yet deeply, it seems reasonable to be stored in
databases as it is invented for MongoDB.

 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

 Why?  I'm not saying you aren't right, but you need to make an
 argument rather than an assertion.  This is a community, so no one is
 entitled to decide anything unilaterally, and people want to be
 convinced - including me.

 Yeah, why? We should not be in the business of reinventing the wheel (and
 then maintaining the reinvented wheel), unless the code in question is
 *really* small.

Many implementations in many languages of JSON show that parsing JSON
is not so difficult to code and the needs vary. Hence, I wonder if we
can have it very our own.

Never take it wrongly, I don't disagree text format nor disagree to
use an external library.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread David Christensen
Hackers,

Enclosed is a patch to add a -C option to initdb to allow you to easily append 
configuration directives to the generated postgresql.conf file for use in 
programmatic generation.  In my case, I'd been creating multiple db clusters 
with a script and would have specific overrides that I needed to make.   This 
patch fell out of the desire to make this a little cleaner.  Please review and 
comment.

From the commit message:

This is a simple mechanism to allow you to provide explicit overrides
to any GUC at initdb time.  As a basic example, consider the case
where you are programmatically generating multiple db clusters in
order to test various configurations:

  $ for cluster in 1 2 3 4 5 6;
 do initdb -D data$cluster -C port = 1234$cluster -C 
'max_connections = 10' -C shared_buffers=1M;
   done

A possible future improvement would be to provide some basic
formatting corrections to allow specificications such as -C 'port
1234', -C port=1234, and -C 'port = 1234' to all be ultimately output
as 'port = 1234' in the final output.  This would be consistent with
postmaster's parsing.

The -C flag was chosen to be a mnemonic for config.

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





0001-Add-C-option-to-initdb-to-allow-invocation-time-GUC-.patch
Description: Binary data


initdb-dash-C.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread Takahiro Itagaki

David Christensen da...@endpoint.com wrote:

 Enclosed is a patch to add a -C option to initdb to allow you to easily
 append configuration directives to the generated postgresql.conf file

Why don't you use just echo 'options'  $PGDATA/postgresql.conf ?
Could you explain where the -C options is better than initdb + echo?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for 9.1: initdb -C option

2010-03-28 Thread Greg Smith

David Christensen wrote:

Enclosed is a patch to add a -C option to initdb to allow you to easily append 
configuration directives to the generated postgresql.conf file for use in 
programmatic generation.


We had a patch not quite make it for 9.0 that switched over the 
postgresql.conf file to make it easy to scan a whole directory looking 
for configuration files:  
http://archives.postgresql.org/message-id/9837222c0910240641p7d75e2a4u2cfa6c1b5e603...@mail.gmail.com


The idea there was to eventually reduce the amount of postgresql.conf 
hacking that initdb and other tools have to do.  Your patch would add 
more code into a path that I'd like to see reduced significantly.


That implementation would make something easy enough for your use case 
too (below untested but show the general idea):


$ for cluster in 1 2 3 4 5 6;
 do initdb -D data$cluster
 (
 cat EOF
port = 1234$cluster;
max_connections = 10;
shared_buffers=1M;
EOF
 )  data$cluster/conf.d/99clustersetup
done

This would actually work just fine for what you're doing right now if 
you used  data$cluster/postgresql.conf for that next to last line 
there.  There would be duplicates, which I'm guessing is what you wanted 
to avoid with this patch, but the later values set for the parameters 
added to the end would win and be the active ones.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers