Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

2013-09-21 Thread Amit Kapila
On Fri, Sep 20, 2013 at 7:17 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 9:43 AM, Amit Kapila amit.kapil...@gmail.com wrote:
Handling for OID is not clear, shall we allow it or not in check 
 constraint?
In my current patch OID will not be allowed in check constraint.

 I vote for allowing it.

I had tried to allow for OID case, but setting OID in case of UPDATE
operation is bit tricky.
In ExecUpdate(), we need to set OID in new tuple by getting it from
old tuple, but we only have old tupleid in this function. Using old
tupleid, I can get tuple and then fetch OID from it to set in new
tuple as is done in heap_update(), but it seems bit more work and also
same has to be done in heap_update anyway unless we change signature
of heap_update().
For now, I had updated the patch for below points:
a.  to set tableoid in Copy case
b.  to set tableoid in Alter case
c.  update the docs for Create Table Page. The other places like Alter
Table and Constraints page doesn't have any information related to
what is not allowed in
 check constraints, so I left them as is.

I have not added new tests for Alter/Copy as they will test what
testcase for insert will test. However if you feel we should add test
for Alter/Copy/Update operation, then I will update the patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


sys_col_constr_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements

2013-09-21 Thread Fabien COELHO


Hello Noah,

Thanks a lot for all these comments!

I'm not planning to apply all of them directly, especially removing 
features that I think really desirable. Please find a defense against some 
of your suggestions. I wish to wait for some more feedback about these 
arguments before spending time in heavy changes.




Improve pgbench measurements  progress report


These changes are loosely coupled; please separate them into several patch
files:


I thought about this. I submitted a bunch of very small pgbench patches to 
the previous commit fest, and it resulted in a lot of duplicated tests by 
reviewers, which made it a waste of their precious time.


ISTM that what you suggest will result in more reviewer time, which is the 
critical resource of the commitfest.


Moreover, splitting the patch as you suggest, especially in 7 patches, 
would result in conflicting or dependent patches, which creates yet more 
issues for the reviews. There is *some* coupling, because the reports are 
about the measurements performed, and I change both to try to have 
something consistent in the end. I can separate the doc fix, but this is 
just one line...


Basically, I tried to optimize reviewer time by a combined patch focussed 
on pgbench measurements and their reporting.


But this does not prevent discussing features separately!


 - Use progress option both under init  bench.

   Activate progress report by default, every 5 seconds.
   When initializing, --quiet reverts to the old every 100,000 insertions
   behavior...


 Patch (1): Change the default from --progress=0 to --progress=5

This has a disadvantage of causing two extra gettimeofday() calls per 
transaction.  That's not cheap on all platforms; a user comparing 
pgbench results across versions will need to make a point of disabling 
this again to make his results comparable. Given that threat and 
uncertainty about which default would be more popular, I think we should 
drop this part.


The gettimeofday call time is very small compared to network and database 
(disk!) accesses involved in a pgbench run. On my small laptop, 
gettimeofday takes about 0.0002 seconds (0.02 µs -- micro seconds) per 
call, which is typically under 1/1000 of the fastest local-socket 
read-only one-table cache-hit prepared trivial transaction.


This is a little different when initializing, but I put a guard on 
gettimeofday in that case.


If this is a blocker, I can put 0, but I really do not think it is 
necessary because of the performance impact.


About the default behavior : it should be what is more useful.

I found that running pgbench to get significant results requires long 
minutes or even hours because of warmup time. Running a command for 30 
minutes without any feedback is annoying. The second point is that the 
feedback would help user notice that the figures evolve slowly but 
significantly, and realise that their too short runs are not significant 
at all. So it seems to me that it should be the desirable behavior, and 
--progress=0 is always possible anyway for the very performance 
(over-)conscious tester.



 Patch (2): Make --initialize mode respect --progress.

The definition you've chosen for --quiet makes that option contrary to its own
name: message-per-100k-tuples is typically more voluminous than your proposed
new default of message-per-5s.  In fact, since --quiet currently switches from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.

During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:

http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-is...@sraoss.co.jp
http://www.postgresql.org/message-id/flat/1346472039.18010.10.ca...@vanquo.pezone.net

That gives me pause about working through yet another change; we keep burning
time on this minor issue.


I totally agree that this quiet is not convincing!

My actual opinion is that quiet should just mean quiet:-), i.e. no 
progress report.


I tried to preserve the row-counting behavior because I thought that 
someone would object otherwise, but I would be really in favor of dropping 
the row-counting report behavior altogether and only keep the time 
triggered report.



 - Measure transaction latency instead of computing it.


 Patch (3): Report the standard deviation of latency.

Seems potentially useful; see inline comments below.

In my limited testing, the interesting latency cases involved stray 
clusters of transactions showing 10x-100x mean latency. If that's 
typical, I'd doubt mean/stddev is a great analytical tool. But I have 
little reason to believe that my experience here was representative.


The stddev measurement is sensitive to the cache hit ratio, so this is a 
good indicator of whether the steady state is reached.


Indeed, mean is the most basic tool, and stddev the second most basic...
They are better than nothing, and the stddev does add value.




Re: [HACKERS] pgbench progress report improvements

2013-09-21 Thread Rod Taylor
On Sat, Sep 21, 2013 at 4:55 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:



  I suggest getting the term stddev in there somehow, maybe like this:

  progress: 37.0 s, 115.2 tps, latency avg=8.678 ms  stddev=1.792


 My issue is to try to keep the line width under control so as to avoid
 line breaks in the terminal. Under throttling, there is also the time lag
 appended to the line.

 Moreover, using 'xxx=figure breaks simple cut pipelining to extract the
 figures, so I would prefer to stick to spaces.

 Maybe:

   progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

 but I liked my +- approach:-)


100 +- 3 implies a range of 97 to 103 and no values are outside of that
range.


Re: [HACKERS] pgbench progress report improvements

2013-09-21 Thread Fabien COELHO



Moreover, using 'xxx=figure breaks simple cut pipelining to extract the
figures, so I would prefer to stick to spaces.

Maybe:

  progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

but I liked my +- approach:-)


100 +- 3 implies a range of 97 to 103 and no values are outside of that
range.


This is one of the possible meaning, but ISTM that it is not exclusive. 
Anyway here we do not have a symmetric distribution of run times.


Well, it seems I'm alone in liking my +-, so I'll backtrack on that.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] VMs for Reviewers Available

2013-09-21 Thread Josh Berkus
Folks,

In order to make sure that nobody is prevented from reviewing due to not
having a suitable development environment available, the PostgreSQL
community is offering free virtual machines for reviewing and testing
patches for this CommitFest.

If you want a VM for this purpose, please email me the following at any
point during the CF:

1. Your public key.

2. Which patch you intend to review/test.

3. Preference for CentOS or Ubuntu.

4. When you expect to do the review.

Since these are EC2 virtual servers, they will NOT be suitable for
performance testing.  However, if you want to review a replication
patch, we can make multiple VMs available.  All VMs will be terminated
at the end of the CF.

VMs will come equipped with all tools required to build PostgreSQL, plus
a checkout of the PostgreSQL code from the beginning of the CF.  You may
need to install additional requirements if you are reviewing extensions
or special features with dependancies.

Note that I may take a day to respond with your VM access.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements

2013-09-21 Thread Noah Misch
On Sat, Sep 21, 2013 at 10:55:54AM +0200, Fabien COELHO wrote:
 Improve pgbench measurements  progress report

 These changes are loosely coupled; please separate them into several patch
 files:

 I thought about this. I submitted a bunch of very small pgbench patches 
 to the previous commit fest, and it resulted in a lot of duplicated tests 
 by reviewers, which made it a waste of their precious time.

 ISTM that what you suggest will result in more reviewer time, which is 
 the critical resource of the commitfest.

 Moreover, splitting the patch as you suggest, especially in 7 patches,  
 would result in conflicting or dependent patches, which creates yet more  
 issues for the reviews. There is *some* coupling, because the reports are 
 about the measurements performed, and I change both to try to have  
 something consistent in the end. I can separate the doc fix, but this is  
 just one line...

Post all the patch files to the same mailing list thread and have them
reviewed under a single CommitFest entry.  A reviewer or committer preferring
less granularity can squash patches together effortlessly.  The inverse task,
splitting a patch, is not so mechanical.  Consequently, err on the side of
more, smaller patches; it almost always saves reviewer and committer time.

 Basically, I tried to optimize reviewer time by a combined patch focussed 
 on pgbench measurements and their reporting.

Relatedness of changes is a continuum, and committers sometimes disagree on
how far to subdivide before commit.  This patch is not a borderline case;
removing a bias from --rate throttle delays does not belong in the same commit
as enabling progress reports by default.

 But this does not prevent discussing features separately!

Actually, it does hamper debate.  With smaller patches, a person interested in
just one issue can read just that patch.  With a monolithic patch, one must
take in the entire thing to fully comprehend any single issue.

  Patch (1): Change the default from --progress=0 to --progress=5

 This has a disadvantage of causing two extra gettimeofday() calls per  
 transaction.  That's not cheap on all platforms; a user comparing  
 pgbench results across versions will need to make a point of disabling  
 this again to make his results comparable. Given that threat and  
 uncertainty about which default would be more popular, I think we 
 should drop this part.

 The gettimeofday call time is very small compared to network and database 
 (disk!) accesses involved in a pgbench run. On my small laptop,  
 gettimeofday takes about 0.0002 seconds (0.02 µs -- micro seconds) 
 per call, which is typically under 1/1000 of the fastest local-socket  
 read-only one-table cache-hit prepared trivial transaction.

Many systems do have cheap gettimeofday().  Many don't.  We recently added
contrib/pg_test_timing in response to the variability in this area.

 I found that running pgbench to get significant results requires long  
 minutes or even hours because of warmup time. Running a command for 30  
 minutes without any feedback is annoying. The second point is that the  
 feedback would help user notice that the figures evolve slowly but  
 significantly, and realise that their too short runs are not significant  
 at all. So it seems to me that it should be the desirable behavior, and  
 --progress=0 is always possible anyway for the very performance  
 (over-)conscious tester.

Those benefits aren't compelling enough to counterbalance the risk of
gettimeofday() overhead affecting results.  (Other opinions welcome.)  For a
tool like pgbench that requires considerable skill to use effectively,
changing a default only helps slightly.  It doesn't take much of a risk to
make us better off leaving the default unchanged.

  Patch (2): Make --initialize mode respect --progress.

 The definition you've chosen for --quiet makes that option contrary to its 
 own
 name: message-per-100k-tuples is typically more voluminous than your proposed
 new default of message-per-5s.  In fact, since --quiet currently switches 
 from
 message-per-100k-tuples to message-per-5s, your patch gives it the exact
 opposite of its present effect.

 During the 9.3 development cycle, we _twice_ made changes pertaining to
 --initialize message frequency:

 http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-is...@sraoss.co.jp
 http://www.postgresql.org/message-id/flat/1346472039.18010.10.ca...@vanquo.pezone.net

 That gives me pause about working through yet another change; we keep burning
 time on this minor issue.

 I totally agree that this quiet is not convincing!

 My actual opinion is that quiet should just mean quiet:-), i.e. no  
 progress report.

 I tried to preserve the row-counting behavior because I thought that  
 someone would object otherwise, but I would be really in favor of 
 dropping the row-counting report behavior altogether and only keep the 
 time triggered report.

I would be fine with 

Re: [HACKERS] SSI freezing bug

2013-09-21 Thread Heikki Linnakangas
Kevin Grittner kgri...@ymail.com wrote:
Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
 When a tuple is predicate-locked, the key of the lock is ctid+xmin.
 However, when a tuple is frozen, its xmin is changed to FrozenXid.
That
 effectively
 invalidates any predicate lock on the tuple, as checking for a lock
on
 the
 same tuple later won't find it as the xmin is different.

 Attached is an isolationtester spec to demonstrate this.

 Do you have any idea to fix that besides keeping the xmin horizon
below the
 lowest of the xids that are predicate locked? Which seems nasty to
 compute and is probably not trivial to fit into the procarray.c
 machinery?

 A better solution probably is to promote tuple-level locks if they
exist
 to a relation level one upon freezing I guess?

That would work.  A couple other ideas would be to use the oldest
serializable xmin which is being calculated in predicate.c to
either prevent freezing of newer transaction or to summarize
predicate locks (using the existing SLRU-based summarization
functions) which use xmin values for xids which are being frozen. 
The transactions must already be committed, and so are eligible for
summarization.

What's the point of using xmin as part of the lock key in the first place, 
wouldn't ctid alone suffice? If the tuple was visible to the reading 
transaction, it cannot be vacuumed away until the transaction commits. No other 
tuple can appear with the same ctid.


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI freezing bug

2013-09-21 Thread Andres Freund


Heikki Linnakangas hlinnakan...@vmware.com schrieb:
Kevin Grittner kgri...@ymail.com wrote:
Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
 When a tuple is predicate-locked, the key of the lock is
ctid+xmin.
 However, when a tuple is frozen, its xmin is changed to FrozenXid.
That
 effectively
 invalidates any predicate lock on the tuple, as checking for a
lock
on
 the
 same tuple later won't find it as the xmin is different.

 Attached is an isolationtester spec to demonstrate this.

 Do you have any idea to fix that besides keeping the xmin horizon
below the
 lowest of the xids that are predicate locked? Which seems nasty to
 compute and is probably not trivial to fit into the procarray.c
 machinery?

 A better solution probably is to promote tuple-level locks if they
exist
 to a relation level one upon freezing I guess?

That would work.  A couple other ideas would be to use the oldest
serializable xmin which is being calculated in predicate.c to
either prevent freezing of newer transaction or to summarize
predicate locks (using the existing SLRU-based summarization
functions) which use xmin values for xids which are being frozen. 
The transactions must already be committed, and so are eligible for
summarization.

What's the point of using xmin as part of the lock key in the first
place, wouldn't ctid alone suffice? If the tuple was visible to the
reading transaction, it cannot be vacuumed away until the transaction
commits. No other tuple can appear with the same ctid.

SSI locks can live longer than one TX/snapshot.

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI freezing bug

2013-09-21 Thread Hannu Krosing
On 09/21/2013 10:46 PM, Andres Freund wrote:

 Heikki Linnakangas hlinnakan...@vmware.com schrieb:
 Kevin Grittner kgri...@ymail.com wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
 When a tuple is predicate-locked, the key of the lock is
 ctid+xmin.
 However, when a tuple is frozen, its xmin is changed to FrozenXid.
 That
 effectively
 invalidates any predicate lock on the tuple, as checking for a
 lock
 on
 the
 same tuple later won't find it as the xmin is different.

 Attached is an isolationtester spec to demonstrate this.
 Do you have any idea to fix that besides keeping the xmin horizon
 below the
 lowest of the xids that are predicate locked? Which seems nasty to
 compute and is probably not trivial to fit into the procarray.c
 machinery?
 A better solution probably is to promote tuple-level locks if they
 exist
 to a relation level one upon freezing I guess?
 That would work.  A couple other ideas would be to use the oldest
 serializable xmin which is being calculated in predicate.c to
 either prevent freezing of newer transaction or to summarize
 predicate locks (using the existing SLRU-based summarization
 functions) which use xmin values for xids which are being frozen. 
 The transactions must already be committed, and so are eligible for
 summarization.
 What's the point of using xmin as part of the lock key in the first
 place, wouldn't ctid alone suffice? If the tuple was visible to the
 reading transaction, it cannot be vacuumed away until the transaction
 commits. No other tuple can appear with the same ctid.
 SSI locks can live longer than one TX/snapshot.
But the only way that ctid can change without xmin changing is when
you update the tuple in the same TX that created it.

Could it be the case here or can we safely exclude this ?


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-21 Thread Jim Nasby

On 9/19/13 7:08 AM, Pavel Stehule wrote:

FWIW, we've written a framework (currently available in the EnovaTools 
project on pgFoundry) that allows for very, very fine-grain control over 
asserts.

- Every assert has a name (and an optional sub-name) as well as a level
- You can globally set the minimum level that will trigger an assert. This 
is useful for some debugging stuff; have an assert with a negative level and 
normally it won't fire unless you set the minimum level to be less than zero.
- You can disable an assert globally (across all backends)
- You can disable an assert only within your session

We should eventually allow for disabling an assert only for your 
transaction; we just haven't gotten around to it yet.

The reason for all this flexibility is the concept of it should be very 
difficult but not impossible for the code to do X. We use it for sanity-checking 
things.


I think so similar frameworks will be exists (we have some similar 
functionality) in orafce too - and it is not reason why we should not merge 
some function to core. I am with Marko, so some simple, user friendly statement 
for assertions should be very nice plpgsql feature. I am different in opinion 
how to implementat it and about syntax. I prefer a possibility (not necessary 
currently implemented) to enhance this feature for similar tasks (as buildin or 
external feature)

Probably You and me have a same opinion so only simple and very primitive 
assert is not enough:

I see as useful feature for assertions:

a) possibility to specify a message (two parametric assert)
b) possibility to specify some threshold
c) possibility to specify some level (exception, warning, notice) .. default 
should be exception
c) possibility to specify a handled/unhandled exception


I'm not opposed to the improvements you're proposing, but I do want to point 
out that none of them would allow us to use these asserts, because we 
definitely need the ability to enable and disable individual asserts.

(Understand that what we've developed is actually rather different from the C 
concept of asserts...)

I'm not saying that's necessarily bad, but there is an interesting point here: 
different environments might have radically different needs for dealing with 
asserts that fail.

What we *could* make use of would be asserts that are extremely fast when the 
assert passes but then allow us to do whatever we want when an assert fails 
(including possibly ignoring the fact that the assert failed).

Of course, if the community wanted to just accept the API and functionality 
we've developed I'd be fine with that too... ;P
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-21 Thread Jaime Casanova
On Fri, Sep 20, 2013 at 5:17 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 9/20/13 12:09 PM, Amit Khandekar wrote:

 On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote:

 I think it would be extremely surprising if a command like that got
 optimized away based on a GUC, so I don't think that would be a good
 idea.



 In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for
 ASSERT, and then return NULL if plpgsql_curr_compile-enable_assertions is
 false. Isn't this possible ?


 Of course it's possible.  But I, as a PostgreSQL user writing PL/PgSQL code,
 would be extremely surprised if this new cool option to RAISE didn't work
 for some reason.  If we use ASSERT the situation is different; most people
 will realize it's a new command and works differently from RAISE.



What about just adding a clause WHEN to the RAISE statement and use
the level machinery (client_min_messages) to make it appear or not
of course, this has the disadvantage that an EXCEPTION level will
always happen... or you can make it a new loglevel that mean EXCEPTION
if asserts_enabled

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-21 Thread Tatsuo Ishii
 I think the point here is that, at least as I understand it, encoding
 conversion and sanitization happens at a very early stage right now,
 when we first receive the input from the client. If the user sends a
 string of bytes as part of a query or bind placeholder that's not
 valid in the database encoding, it's going to error out before any
 type-specific code has an opportunity to get control.   Look at
 textin(), for example.  There's no encoding check there.  That means
 it's already been done at that point.  To make this work, someone's
 going to have to figure out what to do about *that*.  Until we have a
 sketch of what the design for that looks like, I don't see how we can
 credibly entertain more specific proposals.

I don't think the bind placeholder is the case. That is processed by
exec_bind_message() in postgres.c. It has enough info about the type
of the placeholder, and I think we can easily deal with NCHAR. Same
thing can be said to COPY case.

Problem is an ordinary query (simple protocol Q message) as you
pointed out. Encoding conversion happens at a very early stage (note
that fast-path case has the same issue). If a query message contains,
say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
encoding conversion routine (pg_client_to_server) regards that the
message from client contains only one encoding. However my question
is, does it really happen? Because there's any text editor which can
create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
to use NCHAR as SHIFT-JIS text, the rest of query consist of either
SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
client encoding to SJIFT-JIS and everything should be fine.

Maumau, is my guess correct?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VMs for Reviewers Available

2013-09-21 Thread Josh Berkus

 3. Preference for CentOS or Ubuntu.

Windows VMs are also available, but I don't have the ability to
preconfigure them with tools.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-21 Thread Peter Geoghegan
Hi Stephen,

On Fri, Sep 20, 2013 at 6:55 PM, Stephen Frost sfr...@snowman.net wrote:
 I'm not sure I follow this completely- you're saying that a definition
 of 'upsert' which includes having to lock rows which aren't in your
 current snapshot (for reasons stated) isn't a useful one.  Is the
 implication that a useful definition of 'upsert' is that it *doesn't*
 have to lock rows which aren't in your current snapshot, and if so, then
 what would the semantics of that upsert look like?

No, I'm suggesting that the useful semantics are that it does
potentially lock rows not yet visible to our snapshot that have
committed - the latest row version. I see no alternative (we can't
throw a serialization failure at read committed isolation level), and
Andres seemed to agree that this was the way forward. Robert described
problems he saw with this a few years ago [1]. It *is* a problem (we
need to think very carefully about it), but, as I've said, it is a
problem that anyone implementing this feature for a Snapshot
Isolation/MVCC database would have to deal with, and several have.

So, what the patch does right now is (if you squint) analogous to how
SELECT FOR UPDATE uses EvalPlanQual already. However, instead of
re-verifying a qual, we're re-verifying that the value locking has
identified the right tid (there will probably be a different one in
the subsequent iteration, or maybe we *can* go insert this time). We
need consensus across unique indexes to go ahead with insertion, but
once we know that we can't (and have a tid to lock), value locks can
go away - we'll know if anything has changed about the tid's logical
row that we need to care about when row locking. Besides, holding
value locks while row locking has deadlock hazards, and, because value
locks only stop insertions *finishing*, holding on to them is at best
pointless.

The tid we get from locking, that points to a would-be duplicate heap
tuple has always committed - otherwise we'd never return from locking,
because that blocks pending the outcome of a duplicate-inserting-xact
(and only returns the tid when that xact commits). Even though this
tuple is known to be visible, it may be deleted in the interim before
row locking, in which case restarting from before value locking is
appropriate. It might also be updated, which would necessitate locking
a later row version in order to prevent race conditions. But it seems
dangerous, invasive, and maybe even generally impossible to try and
wait for the transaction that updated to commit or abort so that we
can lock that later version the usual way (the usual EvalPlanQual
looping thing) - better to restart value locking.

The fundamental observation about value locking (at least for any
half-way reasonable implementation), that I'd like to emphasize, is
that short of a radical overhaul that would have many downsides, it
can only ever prevent insertion from *finishing*. The big picture of
my design is that it tries to quickly grab value locks, release them
and grab a row lock (or insert heap tuples, index tuples, and then
release value locks). If row locking fails, it waits for the
conflicter xact to finish, and then restarts before the value locking
of the current slot. If you think that's kind of questionable, maybe
you have a point, but consider:

1. How else are you going to handle it if row locking needs to handle
conflicts? You might say I can re-verify that no unique index columns
were affected instead, and maybe you can, but what if that doesn't
help because they *were* changed? Besides, doesn't this break the
amcanunique contract? Surely judging what's really a duplicate is the
AM's job.

You're back to I need to throw an error to get out of this but I have
no good excuse to do so at read committed -- you've lost the usual
duplicate key error excuse. I don't think you can expect holding the
value locks throughout row locking to help, because, as I've said,
that causes morally indefensible deadlocks, and besides, it doesn't
stop what row locking would consider to be a conflict, it just stops
insertion from *finishing*.

2. In the existing btree index insertion code, the order that retries
occur in the event of unique index tuple insertion finding an
unfinished conflicting xact *is* undefined. Yes, that's right - the
first waiter is not guaranteed to be the first to get a second chance.
It's not even particularly probable! See remarks from my last mail to
Robert for more information.

3. People with a real purist's view on the (re)ordering of value
locking must already think that EvalPlanQual() is completely ghetto
for very similar reasons, and as such should just go use a higher
isolation level. For the rest of us, what concurrency control anomaly
can allowing this cause over and above what's already possible there?
Are lock starvation risks actually appreciably raised at all?

What Andres and Robert seem to expect generally - that value locks
only be released when we the locker has a definitive answer 

Re: [HACKERS] VMs for Reviewers Available

2013-09-21 Thread Peter Geoghegan
On Sat, Sep 21, 2013 at 3:35 PM, Josh Berkus j...@agliodbs.com wrote:
 Windows VMs are also available, but I don't have the ability to
 preconfigure them with tools.

Wasn't there an EC2 image doing the rounds that Magnus created, that
took care of all of that for you?

http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

I'm not sure how current it is - was this before or after we started
shipping our own WinFlex? Magnus?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VMs for Reviewers Available

2013-09-21 Thread Andrew Dunstan


On 09/21/2013 06:48 PM, Peter Geoghegan wrote:

On Sat, Sep 21, 2013 at 3:35 PM, Josh Berkus j...@agliodbs.com wrote:

Windows VMs are also available, but I don't have the ability to
preconfigure them with tools.

Wasn't there an EC2 image doing the rounds that Magnus created, that
took care of all of that for you?

http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

I'm not sure how current it is - was this before or after we started
shipping our own WinFlex? Magnus?




There is really no longer any need to use our WinFlex. Just use the flex 
that comes with MSysGit - it works perfectly well, and you only need 
flex if you're building from git anyway.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-21 Thread Peter Geoghegan
On Sun, Sep 15, 2013 at 8:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 I'll find it very difficult to accept any implementation that is going
 to bloat things even worse than our upsert looping example.

 How would any even halfway sensible example cause *more* bloat than the
 upsert looping thing?

I was away in Chicago over the week, and didn't get to answer this.
Sorry about that.

In the average/uncontended case, the subxact example bloats less than
all alternatives to my design proposed to date (including the unborn
heap tuple idea Robert mentioned in passing to me in person the other
day, which I think is somewhat similar to a suggestion of Heikki's
[1]). The average case is very important, because in general
contention usually doesn't happen. But you need to also appreciate
that because of the way row locking works and the guarantees value
locking makes, any ON DUPLICATE KEY LOCK FOR UPDATE implementation is
going to have to potentially restart in more places (as compared to
the doc's example), maybe including value locking of each unique index
and certainly including row locking. So the contended case might even
be worse as well.

On average, it is quite likely that either the UPDATE or INSERT will
succeed - there has to be some concurrent activity around the same
values for either to fail, and in general that's quite unlikely. If
the UPDATE doesn't succeed, it won't bloat, and it's then very likely
that the INSERT at the end of the loop will go ahead and succeed
without itself creating bloat.

Going forward with this discussion, I would like us all to take as
read that the buffer locking stuff is a prototype approach to value
locking, to be refined later (subject to my basic design being judged
fundamentally sound). I don't think anyone believes that it's
fundamentally incorrect in that it doesn't do something that it claims
to do (concerns are more around what it might do or prevent that it
shouldn't), and it can still drive discussion in a very useful
direction. So far criticism of this patch has almost entirely been on
aspects of buffer locking, but it would be much more useful for the
time being to simply assume that the buffer locks *are* interruptible.
It's probably okay with me to still be a bit suspicious of
deadlocking, though, because if we refine the buffer locking using a
more granular SLRU value locking approach, that doesn't necessarily
guarantee that it's impossible, even if it does (I guess) prevent
undesirable interactions with other buffer locking.

[1] http://www.postgresql.org/message-id/45e845c4.6030...@enterprisedb.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-21 Thread Steve Singer

On 09/20/2013 06:33 AM, Andres Freund wrote:

Hi,




The points I find daunting are the semantics, like:
* How do we control whether a standby is allowed prevent WAL file
   removal. What if archiving is configured?
* How do we control whether a standby is allowed to peg xmin?
* How long do we peg an xmin/wal file removal if the standby is gone
* How does the userinterface look to remove a slot if a standby is gone
* How do we decide/control which commands use a slot in which cases?


I think we are going to want to be flexible enough to support users with 
a couple of different points of use-cases.
* Some people will want to keep xmin pegged and prevent WAL removal so a 
standby with a slot can always catch up, and wi
* Most people will want to say keep X megabytes of WA (if needed by a 
behind slot) and keep xmin pegged so that the WAL can be consumed by a 
logical plugin.


I can see us also implementing a restore_command that the walsender 
could use to get archived segments but for logical replication xmin 
would still need to be low enough


I don't think the current patch set is incompatible with us later 
implementing any of the above. I'd rather see us focus on getting the 
core functionality committed and worry about a good interface for 
managing slots later.



Greetings, Andres Freund 


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-21 Thread Peter Geoghegan
On Fri, Sep 20, 2013 at 5:48 PM, Peter Geoghegan p...@heroku.com wrote:
 ProcLockWakeup() only wakes as many waiters from the head of the queue
 as can all be granted the lock without any conflicts.  So I don't
 think there is a race condition in that path.

 Right, but what about XactLockTableWait() itself? It only acquires a
 ShareLock on the xid of the got-there-first inserter that potentially
 hasn't yet committed/aborted. There will be no conflicts between
 multiple second-chance-seeking blockers trying to acquire this lock
 concurrently, and so in fact there is (what I guess you'd consider to
 be) a race condition in the current btree insertion code.

I should add: README.tuplock says the following:



  The protocol for waiting for a tuple-level lock is really

 LockTuple()
 XactLockTableWait()
 mark tuple as locked by me
 UnlockTuple()

When there are multiple waiters, arbitration of who is to get the lock next
is provided by LockTuple().



So because this isn't a tuple-level lock - it's really a value-level
lock - LockTuple() is not called by the btree code at all, and so
arbitration of who gets the lock is, as I've said, essentially
undefined.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-21 Thread Peter Geoghegan
On Sat, Sep 21, 2013 at 7:22 PM, Peter Geoghegan p...@heroku.com wrote:
 So because this isn't a tuple-level lock - it's really a value-level
 lock - LockTuple() is not called by the btree code at all, and so
 arbitration of who gets the lock is, as I've said, essentially
 undefined.

Addendum: It isn't even a value-level lock, because the buffer locks
are of course released before the XactLockTableWait() call. It's a
simple attempt to acquire a shared lock on an xid.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VMs for Reviewers Available

2013-09-21 Thread Jaime Casanova
El 21/09/2013 18:09, Andrew Dunstan and...@dunslane.net escribió:


 On 09/21/2013 06:48 PM, Peter Geoghegan wrote:

 On Sat, Sep 21, 2013 at 3:35 PM, Josh Berkus j...@agliodbs.com wrote:

 Windows VMs are also available, but I don't have the ability to
 preconfigure them with tools.

 Wasn't there an EC2 image doing the rounds that Magnus created, that
 took care of all of that for you?


http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

 I'm not sure how current it is - was this before or after we started
 shipping our own WinFlex? Magnus?



 There is really no longer any need to use our WinFlex. Just use the flex
that comes with MSysGit - it works perfectly well, and you only need flex
if you're building from git anyway.


 If you're reviewing patches you're probably compiling from git.

those win machines come with msys (and flex)? not always our reviewers know
how to install those tools

--
Jaime Casanova
2ndQuadrant: Your PostgreSQL partner


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-21 Thread Jaime Casanova
El 21/09/2013 17:16, Jaime Casanova ja...@2ndquadrant.com escribió:

 On Fri, Sep 20, 2013 at 5:17 AM, Marko Tiikkaja ma...@joh.to wrote:
  On 9/20/13 12:09 PM, Amit Khandekar wrote:
 
  On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote:
 
  I think it would be extremely surprising if a command like that got
  optimized away based on a GUC, so I don't think that would be a good
  idea.
 
 
 
  In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for
  ASSERT, and then return NULL if
plpgsql_curr_compile-enable_assertions is
  false. Isn't this possible ?
 
 
  Of course it's possible.  But I, as a PostgreSQL user writing PL/PgSQL
code,
  would be extremely surprised if this new cool option to RAISE didn't
work
  for some reason.  If we use ASSERT the situation is different; most
people
  will realize it's a new command and works differently from RAISE.
 
 

 What about just adding a clause WHEN to the RAISE statement and use
 the level machinery (client_min_messages) to make it appear or not
 of course, this has the disadvantage that an EXCEPTION level will
 always happen... or you can make it a new loglevel that mean EXCEPTION
 if asserts_enabled


meaning RAISE ASSERT of course

--
Jaime Casanova
2ndQuadrant: Your PostgreSQL partner


Re: [HACKERS] VMs for Reviewers Available

2013-09-21 Thread Andrew Dunstan


On 09/21/2013 10:48 PM, Jaime Casanova wrote:



El 21/09/2013 18:09, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net escribió:



 On 09/21/2013 06:48 PM, Peter Geoghegan wrote:

 On Sat, Sep 21, 2013 at 3:35 PM, Josh Berkus j...@agliodbs.com 
mailto:j...@agliodbs.com wrote:


 Windows VMs are also available, but I don't have the ability to
 preconfigure them with tools.

 Wasn't there an EC2 image doing the rounds that Magnus created, that
 took care of all of that for you?

 
http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html


 I'm not sure how current it is - was this before or after we started
 shipping our own WinFlex? Magnus?



 There is really no longer any need to use our WinFlex. Just use the 
flex that comes with MSysGit - it works perfectly well, and you only 
need flex if you're building from git anyway.



If you're reviewing patches you're probably compiling from git.

those win machines come with msys (and flex)? not always our reviewers 
know how to install those tools







It's do0cumented on the wiki - see 
http://wiki.postgresql.org/wiki/Building_With_MinGW


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add use of asprintf()

2013-09-21 Thread Peter Eisentraut
On Mon, 2013-09-16 at 17:31 -0300, Alvaro Herrera wrote:
 Looks good to me, except that pg_asprintf seems to be checking ret
 instead of rc.

Ah, good catch!

 Is there a reason for the API discrepancy of pg_asprintf vs. psprintf?
 I don't see that we use the integer return value anywhere.  Callers
 interested in the return value can use asprintf directly (and you have
 already inserted callers that do nonstandard things using direct
 asprintf).

I wanted to keep pg_asprintf the same as asprintf.  I think there is
some value in that, but it's not supremely important.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add use of asprintf()

2013-09-21 Thread Peter Eisentraut
On Tue, 2013-09-17 at 15:13 +0500, Asif Naeem wrote:

 1. It seems that you have used strdup() on multiple places in the
 patch, e.g. in the below code snippet is it going to lead crash if
 newp-ident is NULL because of strdup() failure ?
 
 static EPlan *
 find_plan(char *ident, EPlan **eplan, int *nplans)
 {
 ...
  newp-ident = strdup(ident);
 ...
 }
 
The previous code used unchecked malloc, so this doesn't change
anything.  It's only example code anyway.  (The use of malloc instead of
palloc is dubious anyway.)

 
 2. Can we rely on return value of asprintf() instead of recompute
 length as strlen(cmdbuf) ?
 
   if (asprintf(cmdbuf, COMMENT ON LARGE OBJECT %u IS
 ', loid)  0)
return fail_lo_xact(\\lo_import,
 own_transaction);
   bufptr = cmdbuf + strlen(cmdbuf);
 
Yes, good point.

 3. There seems naming convention for functions like pfree (that seems
 similar to free()), pstrdup etc; but psprintf seems different than
 sprintf. Can't we use pg_asprintf instead in the patch, as it seems
 that both (psprintf and pg_asprintf) provide almost same
 functionality ?

pg_asprintf uses malloc, psprintf uses palloc, so they have to be
different functions.

The naming convention where

psomething = something with memory context management
pg_something = something with error checking

isn't very helpful, but it's what we have.  Better ideas welcome.

 
 5. It seems that in the previously implementation, error handling was
 not there, is it appropriate here that if there is issue in
 duplicating environment, quit the application ? i.e.
 
 
 /*
  * Handy subroutine for setting an environment variable var
 to val
  */
 static void
 doputenv(const char *var, const char *val)
 {
  char*s;
  pg_asprintf(s, %s=%s, var, val);
  putenv(s);
 }
 
I think so, yes.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions makefiles - coverage

2013-09-21 Thread Peter Eisentraut
On Thu, 2013-07-25 at 17:07 +0200, Ronan Dunklau wrote:
 I am using approximatively the layout that was proposed here:
 http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net
 It looks like everything is hard-coded to take the source and the
 gcda, gcno files in the base directory, but these files lay in a src
 directory with the proposed layout.

The PostgreSQL build system isn't going to work very well if you build
files outside of the current directory.  If you want to put your source
files into a src/ subdirectory, then your top-level makefile should to a
$(MAKE) -C src, and you need to have a second makefile in the src
directory.  If you do that, then the existing coverage targets will work
alright, I think.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers