Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Tom Lane wrote:
 I lack git-fu pretty completely, but I do have the CVS logs ;-).
 It looks like some of these commits that are being ascribed to the
 REL8_3_STABLE branch were actually only committed on HEAD.  For
 instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
 only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
 and that's also shown here, but the HEAD commit shouldn't be.
 
 I wonder whether the repository is completely OK and the problem
 is that this webpage isn't filtering the commits correctly.

Please don't panic :-)

The problem is that it is *impossible* to faithfully represent a CVS or
Subversion history with its ancestry information in a git repository (or
AFAIK any of the DVCS repositories).  The reason is that CVS
fundamentally records the history of single files, and each file can
have a branching history that is incompatible with those of other files.
 For example, in CVS, a file can be added to a branch after the branch
already exists, different files can be added to a branch from multiple
parent branches, and even more perverse things are allowed.  The CVS
history can record this mish-mash (albeit with much ambiguity).

Git, on the other hand, fundamentally only records a single history that
is considered to apply to the entire source tree.  If a commit is
created with more than one parent, git treats it as a merge and
implicitly assumes that all of the contents of all of the ancestor
commits of all of the parents have been merged into the new version of
the source tree.

See [1] for more discussion of the impedance mismatch between the
branching model of CVS/Subversion vs. that of the DVCSs.

So let's take the simplest example: a branch BRANCH1 is created from
trunk commit T1, then some time later another FILE1 from trunk commit T3
is added to BRANCH1 in commit B4.  How should this series of events be
represented in a git repository?

The inclusive possibility is to say that some content was merged from
trunk to BRANCH1, and therefore to treat B4 as a merge commit:

T0 -- T1 -- T2  T3 -- T4TRUNK
   \ \
B1 -- B2 -- B3 -- B4BRANCH1

This is wrong because there might be other changes in T2 and T3 (besides
the addition of FILE1) that were *not* merged to BRANCH1.

The exclusive possibility is to ignore the fact that some of the
content of B4 came from trunk and to pretend that FILE1 just appeared
out of nowhere in commit B4 independent of the FILE1 in TRUNK:

T0 -- T1 -- T2  T3 -- T4TRUNK
   \
B1 -- B2 -- B3 -- B4BRANCH1

This is also wrong, because it doesn't reflect the true lineage of FILE1.

Given the choice between two wrong histories, cvs2git uses the
inclusive style.  The result is that the ancestors of B4 include not
only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
 The display in the website that was quoted [2] seems to mash all of the
ancestors together without showing the topology of the history, making
the result quite confusing.  The true history looks more like this:

$ git log --oneline --graph REL8_3_10 master
[...]
| * 2a91f07 tag 8.3.10
| * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
| * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
| * 1194fb9 Update time zone data files to tzdata release 201
| * fdfd1ec Return proper exit code (3) from psql when ON_ERR
| * 77524a1 Backport fix from HEAD that makes ecpglib give th
| * 55391af Add missing space in example.
| * 982aa23 Require hostname to be set when using GSSAPI auth
| * cb58615 Update time zone data files to tzdata release 201
| * ebe1e29 When reading pg_hba.conf and similar files, do no
| * 5a401e6 Fix a couple of places that would loop forever if
| * 5537492 Make contrib/xml2 use core xml.c's error handler,
| * c720f38 Export xml.c's libxml-error-handling support so t
| * 42ac390 Make iconv work like other optional libraries for
| * b03d523 pgindent run on xml.c in 8.3 branch, per request
| * 7efcdaa Add missing library and include dir for XSLT in M
| * 6ab1407 Do not run regression tests for contrib/xml2 on M
| * fff18e6 Backpatch MSVC build fix for XSLT
| * 7ae09ef Fix numericlocale psql option when used with a nu
| * de92a3d Fix contrib/xml2 so regression test still works w
| *   80f81c3 This commit was manufactured by cvs2svn to crea
| |\
| |/
|/|
* | a08b04f Fix contrib/xml2 so regression test still works w
* | 0d69e0f It's clearly now pointless to do backwards compat
* | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
* | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
* | 5b65b67 add EPERM to the list of return codes to expect f
| * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
| * 91b76a4 Back-patch today's memory management fixups in co
| * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
| *   043041e This commit was manufactured by cvs2svn to crea
| |\
| |/
|/|
* | 98cc16f Fix up memory management 

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Martijn van Oosterhout
On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?

snip

 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:
 
 T0 -- T1 -- T2  T3 -- T4TRUNK
\
 B1 -- B2 -- B3 -- B4BRANCH1
 
 This is also wrong, because it doesn't reflect the true lineage of FILE1.

But the true lineage is not stored anywhere in CVS so I don't see why
you need to fabricate it for git. Sure, it would be really nice if you
could, but if you can't do it reliably, you may as well not do it at
all. What's the loss?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander
On Wed, Aug 18, 2010 at 08:25, Michael Haggerty mhag...@alum.mit.edu wrote:
 Tom Lane wrote:
 I lack git-fu pretty completely, but I do have the CVS logs ;-).
 It looks like some of these commits that are being ascribed to the
 REL8_3_STABLE branch were actually only committed on HEAD.  For
 instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
 only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
 and that's also shown here, but the HEAD commit shouldn't be.

 I wonder whether the repository is completely OK and the problem
 is that this webpage isn't filtering the commits correctly.

 Please don't panic :-)

We're not panic'ing just yet :-)


 The problem is that it is *impossible* to faithfully represent a CVS or
 Subversion history with its ancestry information in a git repository (or
 AFAIK any of the DVCS repositories).  The reason is that CVS
 fundamentally records the history of single files, and each file can
 have a branching history that is incompatible with those of other files.
  For example, in CVS, a file can be added to a branch after the branch
 already exists, different files can be added to a branch from multiple
 parent branches, and even more perverse things are allowed.  The CVS
 history can record this mish-mash (albeit with much ambiguity).

It can. IIRC we have cleaned a couple of such things out.


snip some good descriptions of how git works

 Given the choice between two wrong histories, cvs2git uses the
 inclusive style.  The result is that the ancestors of B4 include not
 only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
  The display in the website that was quoted [2] seems to mash all of the
 ancestors together without showing the topology of the history, making
 the result quite confusing.  The true history looks more like this:

 $ git log --oneline --graph REL8_3_10 master
 [...]
 | * 2a91f07 tag 8.3.10
 | * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
 | * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
 | * 1194fb9 Update time zone data files to tzdata release 201
 | * fdfd1ec Return proper exit code (3) from psql when ON_ERR
 | * 77524a1 Backport fix from HEAD that makes ecpglib give th
 | * 55391af Add missing space in example.
 | * 982aa23 Require hostname to be set when using GSSAPI auth
 | * cb58615 Update time zone data files to tzdata release 201
 | * ebe1e29 When reading pg_hba.conf and similar files, do no
 | * 5a401e6 Fix a couple of places that would loop forever if
 | * 5537492 Make contrib/xml2 use core xml.c's error handler,
 | * c720f38 Export xml.c's libxml-error-handling support so t
 | * 42ac390 Make iconv work like other optional libraries for
 | * b03d523 pgindent run on xml.c in 8.3 branch, per request
 | * 7efcdaa Add missing library and include dir for XSLT in M
 | * 6ab1407 Do not run regression tests for contrib/xml2 on M
 | * fff18e6 Backpatch MSVC build fix for XSLT
 | * 7ae09ef Fix numericlocale psql option when used with a nu
 | * de92a3d Fix contrib/xml2 so regression test still works w
 | *   80f81c3 This commit was manufactured by cvs2svn to crea
 | |\
 | |/
 |/|
 * | a08b04f Fix contrib/xml2 so regression test still works w
 * | 0d69e0f It's clearly now pointless to do backwards compat
 * | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
 * | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
 * | 5b65b67 add EPERM to the list of return codes to expect f
 | * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
 | * 91b76a4 Back-patch today's memory management fixups in co
 | * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
 | *   043041e This commit was manufactured by cvs2svn to crea
 | |\
 | |/
 |/|
 * | 98cc16f Fix up memory management problems in contrib/xml2
 * | 17e1420 Second try at fsyncing directories in CREATE DATA
 * | a350f70 Assorted code cleanup for contrib/xml2.  No chang
 * | 3524149 Update complex locale example in the documentatio
 [...]

 The left branch is master, the right branch is the one leading to
 REL8_3_10.  You can see that there are multiple merges from master to
 the branch, presumably when new files from trunk were ported to the
 branch.  This is even easier to see using a graphical history browser
 like gitk.

Yeah, this is clearly the problem.


 There are good arguments for both the inclusive and the exclusive
 representation of history.  The ideal would require a lot more
 intelligence and better heuristics (and slow down the conversion
 dramatically).  But even the smartest conversion would still be wrong,
 because git is simply incapable of representing an arbitrary CVS
 history.  The main practical result of the impedance mismatch is that it
 will be more difficult to merge between branches that originated in CVS
 (but that is no surprise!)

Our requirements are simple: our cvs history is linear, the git
history should be linear. It is *not* the same commit that's on head
and the branch. They are two 

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Martijn van Oosterhout wrote:
 On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?
 
 snip
 
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:

 T0 -- T1 -- T2  T3 -- T4TRUNK
\
 B1 -- B2 -- B3 -- B4BRANCH1

 This is also wrong, because it doesn't reflect the true lineage of FILE1.
 
 But the true lineage is not stored anywhere in CVS so I don't see why
 you need to fabricate it for git. Sure, it would be really nice if you
 could, but if you can't do it reliably, you may as well not do it at
 all. What's the loss?

CVS does record (albeit somewhat ambiguously) the branch from which a
new branch sprouted.  The history above might result from commands like

cvs update -A
cvs tag -b BRANCH1
hack hack   cvs update -r BRANCH1
cvs commit -m T2  hack hack
touch FILE1   cvs commit -m B1
cvs add FILE1 hack hack
cvs commit -m T3  cvs commit -m B2
  hack hack
  cvs commit -m B3
cvs tag -b BRANCH1 FILE1

or the last step might have been an explicit merge into BRANCH1:

  cvs update -j T1 -j T3
  cvs commit -m B4

Either way, the CVS history relatively clearly indicates that content
was ported from TRUNK to BRANCH1.  There is no way to distinguish
whether it was a cherry-pick (not recordable in git's history) vs. a
full merge without more information or more intelligence.

Magnus Hagander wrote:
 Our requirements are simple: our cvs history is linear, the git
 history should be linear. It is *not* the same commit that's on head
 and the branch. They are two different commits, that happen to have
 the same commit message and mostly the same content.

I don't think this is at all an issue of cvs2svn merging commits that
happen to have the same commit message and/or commit time.  The merge
commits are all manufactured by cvs2svn to do two things:

1. Add content that needs to be on the branch, because a file was added
to the branch after the branch's creation.  This *needs* to be done to
ensure that the branch has the correct content.

2. Indicate the origin of the new branch content.  This goal is debatable.

 Bottom line is, we want zero merge commits in the git repository. We
 may start using that sometime in the future (but for now, we've
 decided we don't want that even in the future), but we most
 *definitely* don't want it in the past. We don't care about
 representing the proper heritage of FILE1 in git, because we never
 did in cvs.
 
 Is there some way to make cvs2git work this way, and just not bother
 even trying to create merge commits, or is that fundamentally
 impossible and we need to look at another tool?

A merge is just a special case of content being taken from one branch
and added to another.  Logically, the same thing happens when a branch
is created, and some of the same problems can occur in that situation.
A branch can be created using content from multiple source branches,
which cvs2git currently also represents as a merge.

Assuming that you don't want to discard all record of where a branch
sprouted from, it is therefore necessary to choose a single parent
branch for each branch creation.  To be sure, this choice can be
incorrect the same way as the merge commits discussed above are
incorrect.  But one reasonable mostly-exclusive approach would be to
choose the most likely parent as the source of the branch and ignore all
others.

cvs2git doesn't currently have this option.  I'm not sure how much work
it would be to implement; probably a few days'.  Alternatively, you
could write a tool that would rewrite the ancestry information in the
repository *after* the cvs2git conversion using .git/info/grafts (see
git-filter-branch(1)).  Such rewriting would have to occur before the
repository is published, because the rewriting will change the hashes of
most commits.

Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander
On Wed, Aug 18, 2010 at 11:01, Michael Haggerty mhag...@alum.mit.edu wrote:
 Martijn van Oosterhout wrote:
 On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?

 snip

 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:

 T0 -- T1 -- T2  T3 -- T4        TRUNK
        \
         B1 -- B2 -- B3 -- B4            BRANCH1

 This is also wrong, because it doesn't reflect the true lineage of FILE1.

 But the true lineage is not stored anywhere in CVS so I don't see why
 you need to fabricate it for git. Sure, it would be really nice if you
 could, but if you can't do it reliably, you may as well not do it at
 all. What's the loss?

 CVS does record (albeit somewhat ambiguously) the branch from which a
 new branch sprouted.  The history above might result from commands like

 cvs update -A
 cvs tag -b BRANCH1
 hack hack                   cvs update -r BRANCH1
 cvs commit -m T2              hack hack
 touch FILE1                   cvs commit -m B1
 cvs add FILE1                 hack hack
 cvs commit -m T3              cvs commit -m B2
                              hack hack
                              cvs commit -m B3
 cvs tag -b BRANCH1 FILE1

 or the last step might have been an explicit merge into BRANCH1:

                              cvs update -j T1 -j T3
                              cvs commit -m B4

 Either way, the CVS history relatively clearly indicates that content
 was ported from TRUNK to BRANCH1.  There is no way to distinguish
 whether it was a cherry-pick (not recordable in git's history) vs. a
 full merge without more information or more intelligence.

Well, in *our* case we know that it was a cherry-pick. Because we've
done no full merges ;) So if there's a way for us to short-wire the
tool, that'd be great.


 Magnus Hagander wrote:
 Our requirements are simple: our cvs history is linear, the git
 history should be linear. It is *not* the same commit that's on head
 and the branch. They are two different commits, that happen to have
 the same commit message and mostly the same content.

 I don't think this is at all an issue of cvs2svn merging commits that
 happen to have the same commit message and/or commit time.  The merge
 commits are all manufactured by cvs2svn to do two things:

 1. Add content that needs to be on the branch, because a file was added
 to the branch after the branch's creation.  This *needs* to be done to
 ensure that the branch has the correct content.

Ok.


 2. Indicate the origin of the new branch content.  This goal is debatable.

I agree this is debatable. We've kind of debated it already (though
not in exactly this context) and decided we'd rather have it appear as
brand new content on this branch and not as a merge.


 Bottom line is, we want zero merge commits in the git repository. We
 may start using that sometime in the future (but for now, we've
 decided we don't want that even in the future), but we most
 *definitely* don't want it in the past. We don't care about
 representing the proper heritage of FILE1 in git, because we never
 did in cvs.

 Is there some way to make cvs2git work this way, and just not bother
 even trying to create merge commits, or is that fundamentally
 impossible and we need to look at another tool?

 A merge is just a special case of content being taken from one branch
 and added to another.  Logically, the same thing happens when a branch
 is created, and some of the same problems can occur in that situation.
 A branch can be created using content from multiple source branches,
 which cvs2git currently also represents as a merge.

Can be, yes. AFAIK, we don't ever do that (though I can't swear to
that, since there have been some funky things in our cvs repository
earlier)


 Assuming that you don't want to discard all record of where a branch
 sprouted from, it is therefore necessary to choose a single parent
 branch for each branch creation.  To be sure, this choice can be
 incorrect the same way as the merge commits discussed above are
 incorrect.  But one reasonable mostly-exclusive approach would be to
 choose the most likely parent as the source of the branch and ignore all
 others.

Yes, I believe that is what we'd prefer, as it's what most closely
matches how *we*'ve been using CVS.


 cvs2git doesn't currently have this option.  I'm not sure how much work
 it would be to implement; probably a few days'.  Alternatively, you

Would this be something you'd consider doing, since it might be of
interest to others? I'm sure if it's a few days work for you, it'd be
weeks for one of us, given no knowledge of the 

Re: [HACKERS] GROUPING SETS revisited

2010-08-18 Thread Pavel Stehule
Hello

I found a break in GROUPING SETS implementation. Now I am playing with
own executor and planner node and I can't to go forward :(. Probably
this feature will need a significant update of our agg implementation.
Probably needs a some similar structure like CTE but it can be a
little bit reduced - there are a simple relation between source query
and result query - I am not sure, if this has to be implemented via
subqueries? The second question is relative big differencies between
GROUP BY behave and GROUP BY GROUPING SETS behave. Now I don't know
about way to join GROUP BY and GROUPING SETS together

Any ideas welcome

Regards

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security label support, part.2

2010-08-18 Thread Robert Haas
2010/8/18 KaiGai Kohei kai...@ak.jp.nec.com:
 It's also worth pointing out that the hook in ExecCheckRTPerms() does
 not presuppose label-based security.  It could be used to implement
 some other policy altogether, which only strengthens the argument that
 we can't know how the user of the hook wants to handle these cases.

 If rte-requiredPerms would not be cleared, the user of the hook will
 be able to check access rights on the child tables, as they like.
 How about an idea to add a new flag in RangeTblEntry which shows where
 the RangeTblEntry came from, instead of clearing requiredPerms?
 If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
 on the child tables.

Something along those lines might work, although I haven't yet
scrutinized the code well enough to have a real clear opinion on what
the best way of dealing with this is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
Hello

I still thinking about a median type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

when stype is ARRAY of then final and transistent functions can be a
PL functions. When stype isn't scalar, then sfunc can be undefined (it
use a buildin functions). Then we can implement a aggregate only with
final functions.

so median function can be defined:

CREATE FUNCTION num_median_final(internal) RETURNS numeric AS ...
CREATE AGGREGATE median(numeric) (stype = TUPLESORT OF numeric,
finalfunc = num_median_final);

This feature has impact primary on agg executor, and can be relative
simple - no planner changes (or not big), minimal parser changes.

Main reason for this feature is possible access to tuplesort and
tuplesort. I hope, so this can solve a problems with computing a
median and similar functions on very large datasets.

comments?

regards

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Greg Stark
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page dp...@pgadmin.org wrote:
 Which is ideal for monitoring your own connection - having the info in
 the pg_stat_activity is also valuable for monitoring and system
 administration. Both would be ideal :-)

Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...

But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.

But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.

We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security label support, part.2

2010-08-18 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 If C1, C2, and C3 inherit from P, it's perfectly reasonable to grant
 permissions to X on C1 and C2, Y on C3, and Z on C1, C2, C3, and P.  I
 don't think we should disallow that.  Sure, it's possible to do things
 that are less sane, but if we put ourselves in the business of
 removing useful functionality because it might be misused, we'll put
 ourselves out of business.
 
 Having said that, I'm not sure that the same arguments really hold
 water in the world of label based security.  Suppose we have
 compartmentalized security: P is a table of threats, with C1
 containing data on nukes, C2 containing data on terrorists, and C3
 containing data on foreign militaries.  If we create a label for each
 of these threat types, we can apply that label to the corresponding
 table; but what label shall we assign P?  Logically, the label for P
 should be set up in such a fashion that the only people who can read P
 are those who can read C1, C2, and C3 anyway, but who is to say that
 such a label exists? Even if KaiGai's intended implementation of
 SE-PostgreSQL supports construction of such a label, who is to say
 that EVERY conceivable labeling system will also do so?

I don't see why using labels in the second case changes anything.
Consider roles.  If you only had a role that could see threats, a role
that could see nukes, and a role that could see terrorists, but no role
that could see all of them, it's the same problem.  Additionally, this
kind of problem *isn't* typically addressed with the semantics or the
structure of inheiritance- it's done with row-level security and is
completely orthogonal to the inheiritance issue.

Imagine a new table, C4, is added to P and the admin configures it such
that only the 'view_c4' role has access to that child table directly.
Now, Z can see what's in C4 through P, even though Z doesn't have access
to C4.  In the old system, if Z's query happened to hit C4, the whole
query would fail but at least Z wouldn't see any C4 data.  Other queries
on P done by Z would be fine, so long as they didn't hit C4.

 In fact, it
 seems to me that it might be far more reasonable, in a case like this,
 to ignore the *parent* label and look only at each *child* label,
 which to me is an argument that we should set this up so as to allow
 individual users of this hook to do as they like.

I think it'd be more reasonable to do this for inheiritance in general,
but the problem is that people use it for partitioning, and there is a
claim out there that it's against what the SQL spec says.  The folks
using inheiritance for partitioning would probably prefer to not have to
deal with setting up the permissions on the child tables.  I think
that's less of an issue now, but I didn't like the previous behavior
where certain queries would work and certain queries wouldn't work
against the parent table, either.

 It's also worth pointing out that the hook in ExecCheckRTPerms() does
 not presuppose label-based security.  It could be used to implement
 some other policy altogether, which only strengthens the argument that
 we can't know how the user of the hook wants to handle these cases.

This comes back around, in my view, to the distinction between really
using inheiritance for inheiritance, vs using it for partitioning.  If
it's used for partitioning (which certainly seems to be the vast
majority of the cases I've seen it used) then I think it should really
be considered and viewed as a single object to the authentication
system.  I don't suppose we're going to get rid of inheiritance for
inheiritance any time soon though.

In the end, I'm thinking that if the external security module wants to
enforce a check against all the children of a parent, they could quite
possibly handle that already and do it in such a way that it won't break
depending on the specific query.  To wit, it could query the catalog to
determine if the current table is a parent of any children, and if so,
go check the labels/permissions/etc on those children.  I'd much rather
have something where the permissions check either succeeds or fails
against the parent, depending on the permissions of the parent and its
children, than on what the query is itself and what conditionals are
applied to it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] security label support, part.2

2010-08-18 Thread Stephen Frost
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 If rte-requiredPerms would not be cleared, the user of the hook will
 be able to check access rights on the child tables, as they like.

This would only be the case for those children which are being touched
in the current query, which would depend on what conditionals are
applied, what the current setting of check_constraints is, and possibly
other factors.  I do *not* like this approach.

 How about an idea to add a new flag in RangeTblEntry which shows where
 the RangeTblEntry came from, instead of clearing requiredPerms?
 If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
 on the child tables.

How about the external module just checks if the current object being
queried has parents, and if so, goes and checks the
labels/permissions/etc on those children?  That way the query either
always fails or never fails for a given caller, rather than sometimes
working and sometimes not depending on the query.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] security label support, part.2

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 8:49 AM, Stephen Frost sfr...@snowman.net wrote:
 In the end, I'm thinking that if the external security module wants to
 enforce a check against all the children of a parent, they could quite
 possibly handle that already and do it in such a way that it won't break
 depending on the specific query.  To wit, it could query the catalog to
 determine if the current table is a parent of any children, and if so,
 go check the labels/permissions/etc on those children.  I'd much rather
 have something where the permissions check either succeeds or fails
 against the parent, depending on the permissions of the parent and its
 children, than on what the query is itself and what conditionals are
 applied to it.

Interesting idea.  Again, I haven't read the code, but seems worth
further investigation, at least.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Thom Brown
On 18 August 2010 13:45, Greg Stark st...@mit.edu wrote:
 On Tue, Aug 17, 2010 at 11:29 PM, Dave Page dp...@pgadmin.org wrote:
 Which is ideal for monitoring your own connection - having the info in
 the pg_stat_activity is also valuable for monitoring and system
 administration. Both would be ideal :-)

 Hm, I think I've come around to the idea that having the info in
 pg_stat_activity would be very nice. I can just picture sitting in
 pgadmin while a bunch of reports are running and seeing progress bars
 for all of them...

 But progress bars alone aren't really the big prize. I would really
 love to see the explain plans for running queries.

Do you mean just see the explain plan?  Or see at what stage of the
plan the query has reached?  I think the latter would be awesome.  And
if it's broken down by step, wouldn't it be feasible to knew how far
through that step it's got for some steps?  Obviously for ones with a
LIMIT applied it wouldn't know how far through it had got, but for
things like a sequential scan or sort it should be able to indicate
how far through it is.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark st...@mit.edu wrote:
 On Tue, Aug 17, 2010 at 11:29 PM, Dave Page dp...@pgadmin.org wrote:
 Which is ideal for monitoring your own connection - having the info in
 the pg_stat_activity is also valuable for monitoring and system
 administration. Both would be ideal :-)

 Hm, I think I've come around to the idea that having the info in
 pg_stat_activity would be very nice. I can just picture sitting in
 pgadmin while a bunch of reports are running and seeing progress bars
 for all of them...

 But progress bars alone aren't really the big prize. I would really
 love to see the explain plans for running queries. This would improve
 the DBAs view of what's going on in the system immensely. Currently
 you have to grab the query and try to set up a similar environment for
 it to run explain on it. If analyze has run since or if the tables
 have grown or shrank or if the query was run with some constants as
 parameters it can be awkward. If some of the tables in the query were
 temporary tables it can be impossible. You can never really be sure
 you're looking at precisely the same plan than the other user's
 session is running.

 But stuffing the whole json or xml explain plan into pg_stat_activity
 seems like it doesn't really fit the same model that the existing
 infrastructure is designed around. It could be quite large and if we
 want to support progress feedback it could change quite frequently.

 We do stuff the whole query there (up to a limited size) so maybe I'm
 all wet and stuffing the explain plan in there would be fine?

It seems to me that progress reporting could add quite a bit of
overhead.  For example, in the whole-database vacuum case, the most
logical way to report progress would be to compute pages visited
divided by pages to be visited.  But the total number of pages to be
visited is something that doesn't need to be computed in advance
unless someone cares about progress.  I don't think we want to incur
that overhead in all cases just on the off chance someone might ask.
We need to think about ways to structure this so that it only costs
when someone's using it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 I still thinking about a median type functions. My idea is to
 introduce a new syntax for stype definition - like

 stype = type, or
 stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
 stype = TUPLESTORE OF type, or
 stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts.  It's also exposing some implementation details that I'd just as
soon not expose in SQL.  I'd rather just implement median as a
special-purpose aggregate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
2010/8/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I still thinking about a median type functions. My idea is to
 introduce a new syntax for stype definition - like

 stype = type, or
 stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
 stype = TUPLESTORE OF type, or
 stype = TUPLESORT OF type [ DESC | ASC ]

 This seems like a fairly enormous amount of conceptual (and code)
 infrastructure just to make it possible to build median() out of spare
 parts.  It's also exposing some implementation details that I'd just as
 soon not expose in SQL.  I'd rather just implement median as a
 special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for this
functionality. So I searched different more general way. On the other
hand, I agree so special purpose aggregate (with a few changes in
nodeAgg) can be enough. The median (and additional forms) is really
special and there are not wide used use case.

Regards

Pavel



                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter
On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
 2010/8/18 Tom Lane t...@sss.pgh.pa.us:
  Pavel Stehule pavel.steh...@gmail.com writes:
  I still thinking about a median type functions. My idea is to
  introduce a new syntax for stype definition - like
 
  stype = type, or
  stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
  stype = TUPLESTORE OF type, or
  stype = TUPLESORT OF type [ DESC | ASC ]
 
  This seems like a fairly enormous amount of conceptual (and code)
  infrastructure just to make it possible to build median() out of
  spare parts.  It's also exposing some implementation details that
  I'd just as soon not expose in SQL.  I'd rather just implement
  median as a special-purpose aggregate.
 
 yes, it is little bit strange - but when we talked last time about
 this topic, I understand, so you dislike any special solution for
 this functionality. So I searched different more general way. On the
 other hand, I agree so special purpose aggregate (with a few changes
 in nodeAgg) can be enough. The median (and additional forms) is
 really special and there are not wide used use case.

Which median do you plan to implement?  Or do you plan to implement
several different medians, each with distinguishing names?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
2010/8/18 David Fetter da...@fetter.org:
 On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
 2010/8/18 Tom Lane t...@sss.pgh.pa.us:
  Pavel Stehule pavel.steh...@gmail.com writes:
  I still thinking about a median type functions. My idea is to
  introduce a new syntax for stype definition - like
 
  stype = type, or
  stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
  stype = TUPLESTORE OF type, or
  stype = TUPLESORT OF type [ DESC | ASC ]
 
  This seems like a fairly enormous amount of conceptual (and code)
  infrastructure just to make it possible to build median() out of
  spare parts.  It's also exposing some implementation details that
  I'd just as soon not expose in SQL.  I'd rather just implement
  median as a special-purpose aggregate.

 yes, it is little bit strange - but when we talked last time about
 this topic, I understand, so you dislike any special solution for
 this functionality. So I searched different more general way. On the
 other hand, I agree so special purpose aggregate (with a few changes
 in nodeAgg) can be enough. The median (and additional forms) is
 really special and there are not wide used use case.

 Which median do you plan to implement?  Or do you plan to implement
 several different medians, each with distinguishing names?

my proposal enabled implementation of any median like function. But
if we implement median as special case of aggregate, then some basic
median will be implemented.

Regards

Pavel


 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter      XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter
On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
 2010/8/18 David Fetter da...@fetter.org:
  Which median do you plan to implement?  Or do you plan to implement
  several different medians, each with distinguishing names?
 
 my proposal enabled implementation of any median like function. But
 if we implement median as special case of aggregate, then some basic
 median will be implemented.

Apart from the medians, which median-like aggregates do you have in
mind to start with?  If you can provide examples of median-like
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane
David Fetter da...@fetter.org writes:
 Apart from the medians, which median-like aggregates do you have in
 mind to start with?  If you can provide examples of median-like
 aggregates that people might need to implement as user-defined
 aggregates, or other places where people would use this machinery, it
 will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
2010/8/18 David Fetter da...@fetter.org:
 On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
 2010/8/18 David Fetter da...@fetter.org:
  Which median do you plan to implement?  Or do you plan to implement
  several different medians, each with distinguishing names?

 my proposal enabled implementation of any median like function. But
 if we implement median as special case of aggregate, then some basic
 median will be implemented.

 Apart from the medians, which median-like aggregates do you have in
 mind to start with?  If you can provide examples of median-like
 aggregates that people might need to implement as user-defined
 aggregates, or other places where people would use this machinery, it
 will make your case stronger for this refactoring.


I didn't think about some special median - this proposal is just about
aggregates with large a transistent data. Then the access to
tuplestore can be very usefull.

 Otherwise, it seems like a more reasonable thing to make the medians
 special case code.

yes, minimally for this moment.

Regards

Pavel


 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter      XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter
On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  Apart from the medians, which median-like aggregates do you have in
  mind to start with?  If you can provide examples of median-like
  aggregates that people might need to implement as user-defined
  aggregates, or other places where people would use this machinery, it
  will make your case stronger for this refactoring.
 
 There would be plenty of scope to re-use the machinery without any
 SQL-level extensions.  All you need is a polymorphic aggregate
 transition function that maintains a tuplestore or whatever.
 I don't see that extra syntax in CREATE AGGREGATE is really buying
 much of anything.

Thanks for clarifying.  Might this help out with things like GROUPING
SETS or wCTEs?

Cheers,
David (a little slow today).
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
2010/8/18 Tom Lane t...@sss.pgh.pa.us:
 David Fetter da...@fetter.org writes:
 Apart from the medians, which median-like aggregates do you have in
 mind to start with?  If you can provide examples of median-like
 aggregates that people might need to implement as user-defined
 aggregates, or other places where people would use this machinery, it
 will make your case stronger for this refactoring.

 There would be plenty of scope to re-use the machinery without any
 SQL-level extensions.  All you need is a polymorphic aggregate
 transition function that maintains a tuplestore or whatever.
 I don't see that extra syntax in CREATE AGGREGATE is really buying
 much of anything.


Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now. So
there can be a special path for direct call of final func. There is
useles to call transident function.

Regards

Pavel




                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane
David Fetter da...@fetter.org writes:
 On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
 There would be plenty of scope to re-use the machinery without any
 SQL-level extensions.  All you need is a polymorphic aggregate
 transition function that maintains a tuplestore or whatever.
 I don't see that extra syntax in CREATE AGGREGATE is really buying
 much of anything.

 Thanks for clarifying.  Might this help out with things like GROUPING
 SETS or wCTEs?

Don't see how --- this is just about what you can do within an aggregate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Tom Lane
Michael Haggerty mhag...@alum.mit.edu writes:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?
 ...
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:

 T0 -- T1 -- T2  T3 -- T4TRUNK
\
 B1 -- B2 -- B3 -- B4BRANCH1

 This is also wrong, because it doesn't reflect the true lineage of FILE1.

Maybe not, but that *is* how things appeared in the CVS history, and
we'd rather have a git history that looks like the CVS history than
one that claims that boatloads of utterly unrelated commits are part
of a branch's history.

The inclusive possibility might be tolerable if it restricted itself
to mentioning commits that actually touched FILE1 in between its
addition to TRUNK and its addition to BRANCH1.  So far as I can see,
though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
... not even between T3 and B4, but back to the branch point.  How can
you possibly justify that as either sane or useful?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter
On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote:
 2010/8/18 Tom Lane t...@sss.pgh.pa.us:
  David Fetter da...@fetter.org writes:
  Apart from the medians, which median-like aggregates do you
  have in mind to start with?  If you can provide examples of
  median-like aggregates that people might need to implement as
  user-defined aggregates, or other places where people would use
  this machinery, it will make your case stronger for this
  refactoring.
 
  There would be plenty of scope to re-use the machinery without any
  SQL-level extensions.  All you need is a polymorphic aggregate
  transition function that maintains a tuplestore or whatever.  I
  don't see that extra syntax in CREATE AGGREGATE is really buying
  much of anything.
 
 
 Have we to use a transisdent function? If we implement median as
 special variant of aggregate - because we need to push an sort, then
 we can skip a transident function function - and call directly final
 function. This mechanism is used for aggregates with ORDER BY now.
 So there can be a special path for direct call of final func. There
 is useles to call transident function.

Just a wacky idea here.  Could we make a special state transition
function called IDENTITY or some such that would turn into a noop?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera
Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:

 cvs2git doesn't currently have this option.  I'm not sure how much work
 it would be to implement; probably a few days'.  Alternatively, you
 could write a tool that would rewrite the ancestry information in the
 repository *after* the cvs2git conversion using .git/info/grafts (see
 git-filter-branch(1)).  Such rewriting would have to occur before the
 repository is published, because the rewriting will change the hashes of
 most commits.

AFAICT, graft points are not checked in[1], thus they don't propagate; are
you saying that we should run the migration, then manually inject the
graft points, then run some conversion tool that writes a different
repository with those graft points welded into the history?  This sounds
like it needs some manual work (namely find out the appropriate graft
points for each branch), that can be prepared beforehand.  Otherwise it
seems easier than reworking the cvs2git code for the mostly-exclusive
option.

I am sort of assuming that this conversion tool already exists, but
maybe this is not the case?

[1] 
http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Khee Chin
I previously proposed off-list an alternate solution to generate the git
repository which was turned down due to it not being able to handle
incremental updates. However, since we are now looking at a one-time
conversion, this method might come in handy.

---
Caveat: cvs2git apparently requires CVSROOT somewhere in the path for
it to work. I did a symbolic link of the current directory $PWD with
CVSROOT to bypass the quirk cvs2git requires.

mkdir work
cd work
wget http://ftp.netbsd.se/pkgsrc/distfiles/cvsclone-0.00/cvsclone.l
flex cvsclone.l  gcc -Wall -O2 lex.yy.c -o cvsclone
cvsclone -d :pserver:anon...@anoncvs.postgresql.org:/projects/cvsroot pgsql
ln -s $PWD CVSROOT
cvs2git --blobfile=blobfile --dumpfile=dumpfile --username pgdude
--encoding=UTF8 --fallback-encoding=UTF8 CVSROOT/pgsql  cvs2git.log
mkdir git  cd git  git init .
cat ../blobfile ../dumpfile | git fast-import
git reset --hard
cd ..
---


Regards,
Khee Chin.


On Wed, Aug 18, 2010 at 11:14 PM, Alvaro Herrera alvhe...@commandprompt.com
 wrote:

 Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:

  cvs2git doesn't currently have this option.  I'm not sure how much work
  it would be to implement; probably a few days'.  Alternatively, you
  could write a tool that would rewrite the ancestry information in the
  repository *after* the cvs2git conversion using .git/info/grafts (see
  git-filter-branch(1)).  Such rewriting would have to occur before the
  repository is published, because the rewriting will change the hashes of
  most commits.

 AFAICT, graft points are not checked in[1], thus they don't propagate; are
 you saying that we should run the migration, then manually inject the
 graft points, then run some conversion tool that writes a different
 repository with those graft points welded into the history?  This sounds
 like it needs some manual work (namely find out the appropriate graft
 points for each branch), that can be prepared beforehand.  Otherwise it
 seems easier than reworking the cvs2git code for the mostly-exclusive
 option.

 I am sort of assuming that this conversion tool already exists, but
 maybe this is not the case?

 [1]
 http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 2010/8/18 Tom Lane t...@sss.pgh.pa.us:
 There would be plenty of scope to re-use the machinery without any
 SQL-level extensions.  All you need is a polymorphic aggregate
 transition function that maintains a tuplestore or whatever.

 Have we to use a transisdent function? If we implement median as
 special variant of aggregate - because we need to push an sort, then
 we can skip a transident function function - and call directly final
 function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal.  But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule
2010/8/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2010/8/18 Tom Lane t...@sss.pgh.pa.us:
 There would be plenty of scope to re-use the machinery without any
 SQL-level extensions.  All you need is a polymorphic aggregate
 transition function that maintains a tuplestore or whatever.

 Have we to use a transisdent function? If we implement median as
 special variant of aggregate - because we need to push an sort, then
 we can skip a transident function function - and call directly final
 function.

 Well, that would require a whole bunch of *other* mechanisms, which you
 weren't saying anything about in your original proposal.  But driving
 it off the transtype declaration would be quite inappropriate anyway IMO.


I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

Regards

Pavel Stehule

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Tom Lane wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?
 ...
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:
 
 T0 -- T1 -- T2  T3 -- T4TRUNK
\
 B1 -- B2 -- B3 -- B4BRANCH1
 
 This is also wrong, because it doesn't reflect the true lineage of FILE1.
 
 Maybe not, but that *is* how things appeared in the CVS history, and
 we'd rather have a git history that looks like the CVS history than
 one that claims that boatloads of utterly unrelated commits are part
 of a branch's history.
 
 The inclusive possibility might be tolerable if it restricted itself
 to mentioning commits that actually touched FILE1 in between its
 addition to TRUNK and its addition to BRANCH1.  So far as I can see,
 though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
 ... not even between T3 and B4, but back to the branch point.  How can
 you possibly justify that as either sane or useful?

There is no way, in git, to claim that (say) T3 was incorporated into B4
but that T2 was not.  If T3 is listed as a parent of B4, then it is
implied that all ancestors of T3 are also incorporated into B4.  This is
a crucial simplification that helps DVCSs merge reliably.  So an
exclusive option is definitely the way to go for the postgresql project.

[By the way, it *is* possible to list the commits that touched FILE1:

git log BRANCH1 -- FILE1

The user would first have to find out that FILE1 is the file that is the
subject of merge B4, which could be done using git diff B3..B4.  But I
am not arguing that this is the preferred solution, given your project's
practice to do cherry-picks and never full merges.]

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander
On Wed, Aug 18, 2010 at 17:33, Khee Chin kheec...@gmail.com wrote:
 I previously proposed off-list an alternate solution to generate the git
 repository which was turned down due to it not being able to handle
 incremental updates. However, since we are now looking at a one-time
 conversion, this method might come in handy.

cvs2git *is* the tool we've been using now that it's a one-off
conversion. It's the one that's causing the current problems.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 11:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 So let's take the simplest example: a branch BRANCH1 is created from
 trunk commit T1, then some time later another FILE1 from trunk commit T3
 is added to BRANCH1 in commit B4.  How should this series of events be
 represented in a git repository?
 ...
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:

 T0 -- T1 -- T2  T3 -- T4        TRUNK
        \
         B1 -- B2 -- B3 -- B4            BRANCH1

 This is also wrong, because it doesn't reflect the true lineage of FILE1.

 Maybe not, but that *is* how things appeared in the CVS history, and
 we'd rather have a git history that looks like the CVS history than
 one that claims that boatloads of utterly unrelated commits are part
 of a branch's history.

Exactly.  IMHO, the way this should work is by starting at the
beginning of time and working forward.  At each step, we examine the
earliest revision of each file for which no git commit has yet been
written.  From among those, we select the one with the earliest
timestamp.  We then also select all other files whose most recent
unprocessed revision is nearly contemporaneous and shares the same
author and log message.  From the results, we generate a commit.  Then
we repeat.  When we arrive at a branch point, the branch gets
processed separately from the trunk.  If there is no trunk rev which
has every file at the rev where it starts on the branch, then we use
some sane algorithm to pick the best one (perhaps, the one that has
the right revs of the most files) and then insert a fixup commit on
the branch to remove the deltas and carry on as before.

 The inclusive possibility might be tolerable if it restricted itself
 to mentioning commits that actually touched FILE1 in between its
 addition to TRUNK and its addition to BRANCH1.  So far as I can see,
 though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
 ... not even between T3 and B4, but back to the branch point.  How can
 you possibly justify that as either sane or useful?

git can't do that.  It's finding those commits by following parent
pointers from the merge commits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Alvaro Herrera wrote:
 Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:
 
 [...]  Alternatively, you
 could write a tool that would rewrite the ancestry information in the
 repository *after* the cvs2git conversion using .git/info/grafts (see
 git-filter-branch(1)).  Such rewriting would have to occur before the
 repository is published, because the rewriting will change the hashes of
 most commits.
 
 AFAICT, graft points are not checked in[1], thus they don't propagate; are
 you saying that we should run the migration, then manually inject the
 graft points, then run some conversion tool that writes a different
 repository with those graft points welded into the history?  This sounds
 like it needs some manual work (namely find out the appropriate graft
 points for each branch), that can be prepared beforehand.  Otherwise it
 seems easier than reworking the cvs2git code for the mostly-exclusive
 option.

It is true that grafts are not propagated, but they can be baked into a
repository (at the cost of rewriting the SHA1 hashes) using git
filter-branch.  The procedure would be as follows:

1. Convert using cvs2git

2. Create a file .git/info/grafts containing the changes that you want
to make to the project's ancestry.  The file has the format

commit parent0 parent1 ...

where each of the entries is a SHA1 hash from the existing repository.
Only commits whose parentage should be changed need to be mentioned.
This is the tricky step because it requires some logic to decide what
needs changing.  And it can only be done after the cvs2git conversion,
because it requires the SHA1s resulting from the conversion.

3. Run

git filter-branch

This rewrites the commits using any parentage changes from the grafts
file.  This changes most commits' SHA1 hashes.  After this you can
discard the .git/info/grafts file.  You would then want to remove the
original references, which were moved to refs/original.

4. Publish the repository.

As long as the repository is only published after the grafts have been
baked in, there is no reason that anybody else would need the grafts file.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Robert Haas wrote:
 Exactly.  IMHO, the way this should work is by starting at the
 beginning of time and working forward.  [...]

What you are describing is more or less the algorithm that was used by
cvs2svn version 1.x.  It mostly works, but has nasty edge cases that are
impossible to fix.

cvs2svn version 2.x uses a better algorithm [1].  It can be changed to
add an exclusive mode, it's a simple matter of programming.  I will
try to find some time to work on it.

Michael

[1]
http://cvs2svn.tigris.org/source/browse/cvs2svn/trunk/doc/design-notes.txt?view=markup

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Tom Lane wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:
 
 T0 -- T1 -- T2  T3 -- T4TRUNK
\
 B1 -- B2 -- B3 -- B4BRANCH1
 
 This is also wrong, because it doesn't reflect the true lineage of FILE1.
 
 Maybe not, but that *is* how things appeared in the CVS history, [...]

I forgot to point out that the CVS history looks nothing like this,
because the CVS history is only defined file by file.  So the CVS
history of FILE0 might look like this:

 1.0 - 1.1 -- 1.2 - 1.3 - 1.4TRUNK
\
 1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4BRANCH1

whereas the history of FILE1 probably looks more like this:

  1.1 - 1.2 - 1.3TRUNK
 \
  1.2.2.1 -- 1.2.2.2 BRANCH1

(here I've tried to put corresponding commits in the same relative
location) and there might be a FILE2 that looks like this:

 1.0  1.1 --- 1.2TRUNK
   \
*no commit here* BRANCH1

Perhaps this makes it clearer why creating a single git history requires
some compromises.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera
Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
 On Wed, Aug 18, 2010 at 17:33, Khee Chin kheec...@gmail.com wrote:
  I previously proposed off-list an alternate solution to generate the git
  repository which was turned down due to it not being able to handle
  incremental updates. However, since we are now looking at a one-time
  conversion, this method might come in handy.
 
 cvs2git *is* the tool we've been using now that it's a one-off
 conversion. It's the one that's causing the current problems.

I think the point is to run the repo through cvsclone, which apparently
changes the repo in some (not documented) ways, removing corruption.
Not sure how this is an essential part of Khee Chin's proposal.

The cited URL is no longer valid however.  The code can be found here
http://samba.org/ftp/tridge/rtc/cvsclone.l

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Peter Eisentraut
On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
  creating collations ...FATAL:  invalid byte sequence for encoding
  UTF8: 0xe56c09
  CONTEXT:  COPY tmp_pg_collation, line 86
  STATEMENT:  COPY tmp_pg_collation FROM
  E'/usr/local/pgsql/9.1/share/locales.txt';
  
 
  Hmm, what is in that file on that line?
 
 
 
 bokmål  ISO-8859-1

Hey, that borders on genius: Use a non-ASCII letter in the name of a
locale whose purpose it is to configure how non-ASCII letters are
interpreted. :-/

Interestingly, I don't see this on a Debian system.  Good thing to know
that this needs separate testing on different Linux variants.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera
Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:

 3. Run
 
 git filter-branch
 
 This rewrites the commits using any parentage changes from the grafts
 file.  This changes most commits' SHA1 hashes.  After this you can
 discard the .git/info/grafts file.  You would then want to remove the
 original references, which were moved to refs/original.

Hmm.  If I need to do two changes in the same branch, do I need to
mention the new SHA1 for the second one (after filter-branch changes its
SHA1), or the original one?  If the former, then this is going to be a
very painful process.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Joshua D. Drake
On Wed, 2010-08-18 at 12:26 -0400, Alvaro Herrera wrote:
 Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
  On Wed, Aug 18, 2010 at 17:33, Khee Chin kheec...@gmail.com wrote:
   I previously proposed off-list an alternate solution to generate the git
   repository which was turned down due to it not being able to handle
   incremental updates. However, since we are now looking at a one-time
   conversion, this method might come in handy.
  
  cvs2git *is* the tool we've been using now that it's a one-off
  conversion. It's the one that's causing the current problems.

We had a lot of luck with cvs to svn conversion in the past. And
supposedly the git-svn stuff is top notch. It may be worth a shot.

JD

 -- 
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 12:18 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 Tom Lane wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 The exclusive possibility is to ignore the fact that some of the
 content of B4 came from trunk and to pretend that FILE1 just appeared
 out of nowhere in commit B4 independent of the FILE1 in TRUNK:

 T0 -- T1 -- T2  T3 -- T4        TRUNK
        \
         B1 -- B2 -- B3 -- B4            BRANCH1

 This is also wrong, because it doesn't reflect the true lineage of FILE1.

 Maybe not, but that *is* how things appeared in the CVS history, [...]

 I forgot to point out that the CVS history looks nothing like this,
 because the CVS history is only defined file by file.  So the CVS
 history of FILE0 might look like this:

  1.0 - 1.1 -- 1.2 - 1.3 - 1.4        TRUNK
        \
         1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4            BRANCH1

 whereas the history of FILE1 probably looks more like this:

                  1.1 - 1.2 - 1.3        TRUNK
                                         \
                                          1.2.2.1 -- 1.2.2.2 BRANCH1

 (here I've tried to put corresponding commits in the same relative
 location) and there might be a FILE2 that looks like this:

  1.0  1.1 --- 1.2        TRUNK
                   \
                    *no commit here*                         BRANCH1

 Perhaps this makes it clearer why creating a single git history requires
 some compromises.

I think we all understand that the conversion process may create some
artifacts.  Also, since I think this has not yet been mentioned, I
really appreciate you being willing to jump into this discussion and
possibly try to write some code to help us get what we want.

I think what is frustrating is that we have a mental image of what the
history looks like in CVS based on what we actually do, and it doesn't
look anything like the history that cvs2git created.  You can to all
kinds of crazy things in CVS, like tag the whole tree and then move
the tags on half a dozen individual files forward or backward in time,
or delete the tags off them altogether.  But we believe (perhaps
naively) that we haven't done those things, so we're expecting to get
a simple linear history without merges, and definitely without commits
from one branch jumping into the midst of other branches.  What was
really alarming to me about what I found yesterday is that - even
after reading your explanation - I can't understand why it did that.
I think it's human nature to like it when good things happen to us and
to dislike it when bad things happen to us, but we tend to hate the
bad things a lot more when we feel like we didn't deserve it.  If
you're going 90 MPH and get a speeding ticket, you may be steamed, but
at some level you know you deserved it.  If you were going 50 MPH on a
road where the speed limit is 55 MPH and the cop tickets you for 60
MPH, even the most mild-mannered driver may feel an urge to say
something less polite than thank you, officer.  Hence our
consternation.  Perhaps there is some way to tilt your head so that
these merge commits are the Right Thing To Do, but to me at least it
feels extremely weird and inexplicable.  If at some point, we had
taken the majority of the deltas between 9.0 and 8.3 and put them into
8.3 and the converter said oh, that's a merge, well, we might want
an option to turn that behavior off, but at least it would be clear
why it happened.  But the merge commit that got fabricated here almost
by definition has to be ignoring the vast bulk of the activity on one
side, which just doesn't feel right.

To what degree does your proposed solution (an exclusive option)
resemble don't ever create merge commits?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Per-tuple memory leak in 9.0

2010-08-18 Thread Dean Rasheed
While testing triggers, I came across the following memory leak.
Here's a simple test case:

CREATE TABLE foo(a int);

CREATE OR REPLACE FUNCTION trig_fn() RETURNS trigger AS
$$
BEGIN
  RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER ins_trig BEFORE INSERT ON foo
  FOR EACH ROW EXECUTE PROCEDURE trig_fn();

INSERT INTO foo SELECT g
  FROM generate_series(1, 500) AS g;

Memory usage goes up by around 100 bytes per row for the duration of the query.

The problem is that the trigger code assumes that anything it
allocates in the per-tuple memory context will be freed per-tuple
processed, which used to be the case because the loop in ExecutePlan()
calls ResetPerTupleExprContext() once each time round the loop, and
that used to correspond to once per tuple.

However, with the refactoring of that code out to nodeModifyTable.c,
this is no longer the case because the ModifyTable node processes all
the tuples from the subquery before returning, so I guess that the
loop in ExecModifyTable() needs to call ResetPerTupleExprContext()
each time round.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié ago 18 13:10:19 -0400 2010:

 I think what is frustrating is that we have a mental image of what the
 history looks like in CVS based on what we actually do, and it doesn't
 look anything like the history that cvs2git created.  You can to all
 kinds of crazy things in CVS, like tag the whole tree and then move
 the tags on half a dozen individual files forward or backward in time,
 or delete the tags off them altogether.  But we believe (perhaps
 naively) that we haven't done those things, so we're expecting to get
 a simple linear history without merges, and definitely without commits
 from one branch jumping into the midst of other branches.

In fact, we went some lengths to remove some of the more problematic
artifacts in our original CVS repository, so that a Git conversion
wouldn't have a problem with them.  It's disappointing that it ends up
punting in this manner.

I do welcome the offer of Michael's development time to solve our
problems.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Jaime Casanova
On Wed, Aug 18, 2010 at 11:29 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
  creating collations ...FATAL:  invalid byte sequence for encoding
  UTF8: 0xe56c09
  CONTEXT:  COPY tmp_pg_collation, line 86
  STATEMENT:  COPY tmp_pg_collation FROM
  E'/usr/local/pgsql/9.1/share/locales.txt';
  
 
  Hmm, what is in that file on that line?
 
 

 bokmål  ISO-8859-1

 Hey, that borders on genius: Use a non-ASCII letter in the name of a
 locale whose purpose it is to configure how non-ASCII letters are
 interpreted. :-/

 Interestingly, I don't see this on a Debian system.  Good thing to know
 that this needs separate testing on different Linux variants.



Yeah! and when installing centos 5 i don't have a chance to choose
what locales i want, it just installs all of them

-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 What I find interesting about that trace is the large proportion of
 writes.  That appears to me to indicate that it's *not* a matter of
 vacuum delays, or at least not just a matter of that.  The process seems
 to be getting involved in having to dump dirty buffers to disk.  Perhaps
 the background writer is malfunctioning?

You appear to be correct in that it's write-related.  Will be testing on
what specificially is producing it.

Note that this is one of two ostensibly duplicate servers, and the issue
has never appeared on the other server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 What I find interesting about that trace is the large proportion of
 writes.  That appears to me to indicate that it's *not* a matter of
 vacuum delays, or at least not just a matter of that.  The process seems
 to be getting involved in having to dump dirty buffers to disk.  Perhaps
 the background writer is malfunctioning?

 You appear to be correct in that it's write-related.  Will be testing on
 what specificially is producing it.

 Note that this is one of two ostensibly duplicate servers, and the issue
 has never appeared on the other server.

On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM.  VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time.  So it's wrong to blame this on the bgwriter not
holding up its end.  Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 On further reflection, though: since we put in the BufferAccessStrategy
 code, which was in 8.3, the background writer isn't *supposed* to be
 very much involved in writing pages that are dirtied by VACUUM.  VACUUM
 runs in a small ring of buffers and is supposed to have to clean its own
 dirt most of the time.  So it's wrong to blame this on the bgwriter not
 holding up its end.  Rather, what you need to be thinking about is how
 come vacuum seems to be making lots of pages dirty on only one of these
 machines.

This is an anti-wraparound vacuum, so it could have something to do with
the hint bits.  Maybe it's setting the freeze bit on every page, and
writing them one page at a time?  Still don't understand the call to
pollsys, even so, though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Rather, what you need to be thinking about is how
 come vacuum seems to be making lots of pages dirty on only one of these
 machines.

 This is an anti-wraparound vacuum, so it could have something to do with
 the hint bits.  Maybe it's setting the freeze bit on every page, and
 writing them one page at a time?

That would explain all the writes, but it doesn't seem to explain why
your two servers aren't behaving similarly.

 Still don't understand the call to pollsys, even so, though.

Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay.  I'm still suspicious that the writes are eating
more cost_delay points than you think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] trace_recovery_messages

2010-08-18 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 The explanation of trace_recovery_messages in the document
 is inconsistent with the definition of it in guc.c.

Setting the default to WARNING is confusing and useless, because
there are no trace_recovery calls with that debug level.  IMO the
default setting should be LOG, which makes trace_recovery() a clear
no-op (rather than not clearly a no-op).  There is circumstantial
evidence in the code that this was the original intention:

int trace_recovery_messages = LOG;

The documentation of the parameter is about as clear as mud, too.
We need to explain what it does rather than just copy-and-paste
a lot of text from log_min_messages.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 
 This is an anti-wraparound vacuum, so it could have something to
 do with the hint bits.  Maybe it's setting the freeze bit on
 every page, and writing them one page at a time?
 
 That would explain all the writes, but it doesn't seem to explain
 why your two servers aren't behaving similarly.
 
One was bulk-loaded from the other, or they were bulk-loaded at
different times?  Or one had some other activity that boosted the
xid count, possibly in another database?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: utf8_to_unicode (trivial)

2010-08-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Anyway, it's not really important enough to me to have a protracted
 argument about it.  Let's wait and see if anyone else has an opinion,
 and perhaps a consensus will emerge.

Well, nobody else seems to care, so I went ahead and committed the
shorter form of the patch, ie just rename  export the function.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 That would explain all the writes, but it doesn't seem to explain why
 your two servers aren't behaving similarly.

Well, that's why I said ostensibly identical.  There may in fact be
differences, not just in the databases but in some OS libs as well.
These servers have been in production for quite a while, and the owner
has a messy deployment process.

 Most likely that's the libc implementation of the select()-based sleeps
 for vacuum_cost_delay.  I'm still suspicious that the writes are eating
 more cost_delay points than you think.

Tested that.  It does look like if I increase vacuum_cost_limit to 1
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys.  The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
a vacuum without cost_delay and make sure the pollsys() are connected to
the cost delay and not something else.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas
Kevin didn't send out an official gavel-banging announcement of the
end of CommitFest 2009-07 (possibly because I neglected until today to
give him privileges to actually change it in the web application), but
I'd just like to take a minute to thank him publicly for his efforts.
We started this CommitFest with something like 60 patches, which is
definitely on the larger side for a CommitFest, and Kevin did a great
job staying on top of what was going on with all of them and, I felt,
really helped keep us on track.  At the same time, I felt he did this
with a very light touch that made the whole thing go very smoothly.
So -- thanks, Kevin!

I also appreciate the efforts of all those who reviewed.  Good reviews
are really critical to keep the burden from building up on committers,
and I appreciate the efforts of everyone who contributed, in many
cases probably on their own time.  I'm particularly grateful to the
people who were vigilant about spelling, grammar, coding style,
whitespace, and other nitpicky little issues that are not much fun,
but which at least for me are a major time sink if they're still
lingering when it comes time to do the actual commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-tuple memory leak in 9.0

2010-08-18 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 The problem is that the trigger code assumes that anything it
 allocates in the per-tuple memory context will be freed per-tuple
 processed, which used to be the case because the loop in ExecutePlan()
 calls ResetPerTupleExprContext() once each time round the loop, and
 that used to correspond to once per tuple.

 However, with the refactoring of that code out to nodeModifyTable.c,
 this is no longer the case because the ModifyTable node processes all
 the tuples from the subquery before returning, so I guess that the
 loop in ExecModifyTable() needs to call ResetPerTupleExprContext()
 each time round.

Hmmm ... it seems a bit unclean to be resetting the output-tuple
exprcontext at a level below the top of the plan.  I agree that that's
probably the sanest fix at the moment, but I fear we may need to revisit
this in connection with writable CTEs.  We might need a separate output
tuple context for each ModifyTable node, or something like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 Tested that.  It does look like if I increase vacuum_cost_limit to 1
 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
 2-3 before each pollsys.  The math seems completely wrong on that,
 though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
 a vacuum without cost_delay and make sure the pollsys() are connected to
 the cost delay and not something else.

Hmmm.  Looks like, at least in 8.3, running a manual vacuum on a table
doesn't prevent anti-wraparound vacuum from restarting.   So I can't do
any further testing until we can restart the server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Peter Eisentraut
On tis, 2010-08-17 at 13:52 -0400, Stephen Frost wrote:
 I don't like how the backend would have to send something NOTICE-like,
 I had originally been thinking gee, it'd be nice if psql could query
 pg_stat while doing something else, but that's not really possible...
 So, I guess NOTICE-like messages would work, if the backend could be
 taught to do it.

That should be doable; you'd just have to do some ereport(NOTICE)
variant inside pgstat_report_progress and have a switch to turn it on
and off, and have psql do something with it.  The latter is really the
interesting part; the former is relatively easy once the general
framework is in place.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Peter Eisentraut
On ons, 2010-08-18 at 13:45 +0100, Greg Stark wrote:
 But progress bars alone aren't really the big prize. I would really
 love to see the explain plans for running queries.

The auto_explain module does that already.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Most likely that's the libc implementation of the select()-based sleeps
 for vacuum_cost_delay.  I'm still suspicious that the writes are eating
 more cost_delay points than you think.

 Tested that.  It does look like if I increase vacuum_cost_limit to 1
 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
 2-3 before each pollsys.  The math seems completely wrong on that,
 though -- it should be 50 and 30 pages, or similar.

I think there could be a lot of cost_delay points getting expended
without any effects visible at the level of strace.  Maybe try fooling
with vacuum_cost_page_hit and vacuum_cost_page_miss, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I'd just like to take a minute to thank him publicly for his
 efforts.  We started this CommitFest with something like 60
 patches, which is definitely on the larger side for a CommitFest,
 and Kevin did a great job staying on top of what was going on with
 all of them and, I felt, really helped keep us on track.  At the
 same time, I felt he did this with a very light touch that made
 the whole thing go very smoothly.  So -- thanks, Kevin!
 
You're welcome.  It was educational for me.  I don't think I want to
try to handle two in a row, and I think your style is better suited
than mine to the final CF for a release, but I might be able to take
on the 2010-11 CF if people want that.
 
My hand was not always so light behind the scenes, though -- I sent
or received about 100 off-list emails to try to keep things moving. 
Hopefully nobody was too offended by my nagging.  :-)
 
Oh, and thanks for putting together the CF web application.  Without
that, I couldn't have done half as well as I did.
 
 I also appreciate the efforts of all those who reviewed.
 
Yes, I'll second that!  I've always been impressed with the
PostgreSQL community, and managing this CF gave me new insights and
appreciation for the intelligence, professionalism, and community
spirit of its members -- authors, reviewers, and committers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-08-18 Thread A.M.

On Aug 18, 2010, at 9:02 AM, Robert Haas wrote:

 On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark st...@mit.edu wrote:
 On Tue, Aug 17, 2010 at 11:29 PM, Dave Page dp...@pgadmin.org wrote:
 Which is ideal for monitoring your own connection - having the info in
 the pg_stat_activity is also valuable for monitoring and system
 administration. Both would be ideal :-)
 
 Hm, I think I've come around to the idea that having the info in
 pg_stat_activity would be very nice. I can just picture sitting in
 pgadmin while a bunch of reports are running and seeing progress bars
 for all of them...
 
 But progress bars alone aren't really the big prize. I would really
 love to see the explain plans for running queries. This would improve
 the DBAs view of what's going on in the system immensely. Currently
 you have to grab the query and try to set up a similar environment for
 it to run explain on it. If analyze has run since or if the tables
 have grown or shrank or if the query was run with some constants as
 parameters it can be awkward. If some of the tables in the query were
 temporary tables it can be impossible. You can never really be sure
 you're looking at precisely the same plan than the other user's
 session is running.
 
 But stuffing the whole json or xml explain plan into pg_stat_activity
 seems like it doesn't really fit the same model that the existing
 infrastructure is designed around. It could be quite large and if we
 want to support progress feedback it could change quite frequently.
 
 We do stuff the whole query there (up to a limited size) so maybe I'm
 all wet and stuffing the explain plan in there would be fine?
 
 It seems to me that progress reporting could add quite a bit of
 overhead.  For example, in the whole-database vacuum case, the most
 logical way to report progress would be to compute pages visited
 divided by pages to be visited.  But the total number of pages to be
 visited is something that doesn't need to be computed in advance
 unless someone cares about progress.  I don't think we want to incur
 that overhead in all cases just on the off chance someone might ask.
 We need to think about ways to structure this so that it only costs
 when someone's using it.

I wish that I could get explain analyze output step-by-step while running a 
long query instead of seeing it jump out at the end of execution. Some queries 
never end and it would be nice to see which step is spinning (explain can be 
a red herring). To me the progress bar is nice, but I don't see how it would 
be reliable enough to draw any inferences (such as execution time). If I could 
get the explain analyze results *and* the actual query results, that would be a 
huge win, too.

Cheers,
M


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2010-07 final report

2010-08-18 Thread Kevin Grittner
At the close of the 2010-07 CommitFest, the numbers were:
 
72 patches were submitted
 3 patches were withdrawn (deleted) by their authors
14 patches were moved to CommitFest 2010-09
--
55 patches in CommitFest 2010-07
--
 3 committed to 9.0
--
52 patches for 9.1
--
 1 rejected
20 returned with feedback
31 committed for 9.1
 
When we hit the end of the allotted time, I moved the last two
patches to the next CF, for want of a better idea for disposition. 
One is Ready for Committer with an author who is a committer.  The
other is my WiP patch for serializable transactions -- there's a lot
to review and the reviewer had unexpected demands on his time during
the CF; he said he'll continue work on that outside the CF.
 
-Kevin
 
 
At the end of week four:
 
 72 patches were submitted
  3 patches were withdrawn (deleted) by their authors
 12 patches were moved to CommitFest 2010-09
 --
 57 patches in CommitFest 2010-07
 --
  3 committed to 9.0
 --
 54 patches for 9.1
 --
  1 rejected
 18 returned with feedback
 28 committed for 9.1
 --
 47 disposed
 --
  7 pending
  2 ready for committer
 --
  5 will still need reviewer attention
  1 waiting on author to respond to review
 --
  4 patches need review now and have a reviewer assigned


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Greg Smith

Kevin Grittner wrote:

I don't think I want to try to handle two in a row, and I think your style is 
better suited
than mine to the final CF for a release, but I might be able to take on the 
2010-11 CF if people want that


Ha, you just put yourself right back on the hook with that comment, and 
Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the 
question of what's going to happen with CF-2 next month.


I think the crucial thing with the 2010-09 CF is that we have to get 
serious progress made sorting out all the sync rep ideas before/during 
that one.  The review Yeb did and subsequent discussion was really 
helpful, but the scope on that needs to actually get nailed down to 
*something* concrete if it's going to get built early enough in the 9.1 
release to be properly reviewed and tested for more than one round.  
Parts of the design and scope still feel like they're expanding to me, 
and I think having someone heavily involved in the next CF who is 
willing to push on nailing down that particular area is pretty 
important.  Will volunteer myself if I can stay on schedule to make it 
past the major time commitment sink I've had so far this year by then.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security label support, part.2

2010-08-18 Thread KaiGai Kohei
(2010/08/18 21:52), Stephen Frost wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 If rte-requiredPerms would not be cleared, the user of the hook will
 be able to check access rights on the child tables, as they like.
 
 This would only be the case for those children which are being touched
 in the current query, which would depend on what conditionals are
 applied, what the current setting of check_constraints is, and possibly
 other factors.  I do *not* like this approach.
 
Indeed, the planner might omit scan on the children which are not obviously
referenced, but I'm not certain whether its RangeTblEntry would be also
removed from the PlannedStmt-rtable, or not.

 How about an idea to add a new flag in RangeTblEntry which shows where
 the RangeTblEntry came from, instead of clearing requiredPerms?
 If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
 on the child tables.
 
 How about the external module just checks if the current object being
 queried has parents, and if so, goes and checks the
 labels/permissions/etc on those children?  That way the query either
 always fails or never fails for a given caller, rather than sometimes
 working and sometimes not depending on the query.
 
Hmm, this idea may be feasible. The RangeTblEntry-inh flag of the parent
will give us a hint whether we also should check labels on its children.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security label support, part.2

2010-08-18 Thread KaiGai Kohei
 How about an idea to add a new flag in RangeTblEntry which shows where
 the RangeTblEntry came from, instead of clearing requiredPerms?
 If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
 on the child tables.

 How about the external module just checks if the current object being
 queried has parents, and if so, goes and checks the
 labels/permissions/etc on those children?  That way the query either
 always fails or never fails for a given caller, rather than sometimes
 working and sometimes not depending on the query.

 Hmm, this idea may be feasible. The RangeTblEntry-inh flag of the parent
 will give us a hint whether we also should check labels on its children.
 

http://code.google.com/p/sepgsql/source/browse/trunk/sepgsql/relation.c#293

At least, it seems to me this logic works as expected.

  postgres=# CREATE TABLE tbl_p (a int, b text);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_1 (check (a  100)) inherits (tbl_p);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_2 (check (a = 100 and a  200)) inherits (tbl_p);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_3 (check (a = 300)) inherits (tbl_p);
  CREATE TABLE
  postgres=# SECURITY LABEL on TABLE tbl_p IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL
  postgres=# SECURITY LABEL on COLUMN tbl_p.a IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL
  postgres=# SECURITY LABEL on COLUMN tbl_p.b IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL

  postgres=# set sepgsql_debug_audit = on;
  SET

  postgres=# SELECT a FROM ONLY tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p
  STATEMENT:  SELECT a FROM ONLY tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a
  STATEMENT:  SELECT a FROM ONLY tbl_p WHERE a = 150;
   a
  ---
  (0 rows)

- ONLY tbl_p was not expanded

  postgres=# SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_1
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_1.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_2
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_2.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_3
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_3.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
   a
  ---
  (0 rows)

- tbl_p was expanded to tbl_1, tbl_2 and tbl_3

  postgres=# set sepgsql_debug_audit = off;
  SET
  postgres=# EXPLAIN SELECT a FROM tbl_p WHERE a = 150;
 QUERY PLAN
  
   Result  (cost=0.00..50.75 rows=12 width=4)
 -  Append  (cost=0.00..50.75 rows=12 width=4)
   -  Seq Scan on tbl_p  (cost=0.00..25.38 rows=6 width=4)
 Filter: (a = 150)
   -  Seq Scan on tbl_2 tbl_p  (cost=0.00..25.38 rows=6 width=4)
 Filter: (a = 150)
  (6 rows)

- Actually, it does not scan tbl_1 and tbl_3 due to the a = 150.

-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Alvaro Herrera wrote:
 Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:
 
 3. Run

 git filter-branch

 This rewrites the commits using any parentage changes from the grafts
 file.  This changes most commits' SHA1 hashes.  After this you can
 discard the .git/info/grafts file.  You would then want to remove the
 original references, which were moved to refs/original.
 
 Hmm.  If I need to do two changes in the same branch, do I need to
 mention the new SHA1 for the second one (after filter-branch changes its
 SHA1), or the original one?  If the former, then this is going to be a
 very painful process.

No, all SHA1s refer to the values for the *old* versions of the commits.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 7:46 PM, Greg Smith g...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 I don't think I want to try to handle two in a row, and I think your style
 is better suited
 than mine to the final CF for a release, but I might be able to take on
 the 2010-11 CF if people want that

 Ha, you just put yourself right back on the hook with that comment, and
 Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the
 question of what's going to happen with CF-2 next month.

My reputation precedes me, apparently.  Although I appreciate everyone
so far being willing to avoid mentioning exactly what that reputation
might be.  :-)

 I think the crucial thing with the 2010-09 CF is that we have to get serious
 progress made sorting out all the sync rep ideas before/during that one.
  The review Yeb did and subsequent discussion was really helpful, but the
 scope on that needs to actually get nailed down to *something* concrete if
 it's going to get built early enough in the 9.1 release to be properly
 reviewed and tested for more than one round.  Parts of the design and scope
 still feel like they're expanding to me, and I think having someone heavily
 involved in the next CF who is willing to push on nailing down that
 particular area is pretty important.  Will volunteer myself if I can stay on
 schedule to make it past the major time commitment sink I've had so far this
 year by then.

Sitting on Sync Rep is a job and a half by itself, without adding all
the other CF work on top of it.  Maybe we should try to find two
vi^Holunteers: a CommitFest Manager (CFM) and a Major Feature
Babysitter (MBS).  At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.  I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.  And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.
 It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not.  We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November.  We'll be lucky to
commit sync rep that early.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty
Magnus Hagander wrote:
 Is there some way to make cvs2git work this way, and just not bother
 even trying to create merge commits, or is that fundamentally
 impossible and we need to look at another tool?

The good news: (I just reminded myself/realized that) Max Bowsher has
already implemented pretty much exactly what you want in the cvs2svn
trunk version, including noting in the commit messages any cherry-picks
that are not reflected in the repo ancestry.

The bad news: It is broken [1].  But I don't think it should be too much
work to fix it.

Michael

[1]
http://cvs2svn.tigris.org/ds/viewMessage.do?dsForumId=1670dsMessageId=2624153

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers