Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 4:38 PM, Noah Misch  wrote:
> No new ideas come to mind, here.

OK, I have a new idea.  :-)

1. Add a new flag to each procState called something like "timeToPayAttention".
2. Each call to SIGetDataEntries() iterates over all the ProcStates
whose index is < lastBackend and sets stateP->timeToPayAttention =
TRUE for each.
3. At the beginning of SIGetDataEntries(), we do an unlocked if
(!stateP->timeToPayAttention) return 0.
4. Immediately following that if statement and before acquiring any
locks, we set stateP->timeToPayAttention = FALSE.

The LWLockRelease() in SIGetDataEntries() will be sufficient to force
all of the stateP->timeToPayAttention writes out to main memory, and
the read side is OK because we either just took a lock (which acted as
a fence) or else there's a race anyway.  But unlike my previous
proposal, it doesn't involve *comparing* anything.  We needn't worry
about whether we could read two different values that are through
great misfortune out of sync because we're only reading one value.

If, by chance, the value is set to true just after we set it to false,
that won't mess us up either: we'll still read all the messages after
acquiring the lock.

Now, there's some downside to this approach - it involves doing O(N)
work for each invalidation message we receive.  But as long as it's
only O(N) stores and not O(N) lock acquisitions and releases, I think
that might be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 3:25 PM, Simon Riggs  wrote:
> On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas  wrote:
>> It wouldn't, although it might be bad in the case where there are lots
>> of temp tables being created and dropped.
>
> Do temp tables cause relcache invalidations?
>
> That seems like something we'd want to change in itself.

I agree.  Unfortunately, I think it's a non-trivial fix.

I've also been wondering if we could avoid taking an
AccessExclusiveLock on a newly created (temporary?) table.  It seems
like no one should be able to see it until commit, at which point we'd
be releasing the lock anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 3:34 PM, Simon Riggs  wrote:
> You might be right, but I think we have little knowledge of how some
> memory barrier code you haven't written yet effects performance on
> various architectures.
>
> A spinlock per backend would cache very nicely, now you mention it. So
> my money would be on the multiple copies.

Maybe so, but you can see from the numbers in my OP that the results
still leave something to be desired.

> It's not completely clear to me that updating N copies would be more
> expensive. Accessing N low contention copies rather than 1
> high-contention value might actually be a win.

Yeah, I haven't tested that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: bogus descriptions displayed by \d+

2011-07-26 Thread Josh Kupershmidt
On Tue, Jul 26, 2011 at 9:53 AM, Robert Haas  wrote:
> On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt  wrote:
> I think this is basically the right approach but I found what you did
> here a bit wordy, so I simplified it, committed it, and back-patched
> to 9.0 with suitable adjustment.  Hopefully I didn't simplify it into
> a form you don't like.

That looks fine. Minor grammar quibble about:

+  When commenting on a column,
+  relation_name must refer
+  to a table, view, composite types, or foreign table.

"types" should probably be the singular "type".

  * get rid of the bogus "Description" outputs for \d+ sequence_name
 and \d+ index_name

> Committed this part to head with minor tweaks.

Thanks for the commit.

  * The "Storage" column for \d+ sequence_name is correct, I suppose,
 but repetitive
>>>
>>> I'm OK with removing that.
>>
>> Hrm, would it be better to keep that  Storage bit around in some
>> non-repetitive form, maybe on its own line below the table output?
>
> Well, I don't really see that it has any value.  I'd probably just
> leave it the way it is, but if we're going to change something, I
> would favor removing it over relocating it.

I notice the "Storage" information is also repeated for multi-column
indexes. I don't mind leaving this wart as-is for now, since
single-column indexes are probably the norm, and we would presumably
want to fix both types in one go.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning WAS: Check constraints on partition parents only?

2011-07-26 Thread Josh Berkus
Jim,

> That's why I'd be opposed to any partitioning scheme that removed the ability 
> to have different fields in different children. We've found that ability to 
> be very useful. Likewise, I think we need to have intelligent plans involving 
> a parent table that's either completely empty or mostly empty.

Well, I don't think that anyone is proposing making constraint exclusion
go away.  However, we also need a new version of partitioning which
happens "below" the table level.  I don't agree that the new
partitioning needs -- at least at the start -- the level of flexibility
which CE gives the user.  In order to get simplicity, we have to
sacrifice flexibility.

In fact, I'd suggest extreme simplicity for the first version of this,
with just key partitioning.  That is:

CREATE TABLE  (
... cols ... )
PARTITION ON 
[ AUTOMATIC CREATE ];

... where  can be any immutable expression on one or
more columns of some_table.  This actually covers range partitioning as
well, provided that the ranges can be expressed as the results of an
expression (e.g. EXTRACT ('month' FROM date_processed ) ).

For the optional AUTOMATIC CREATE phrase, new values for key_expression
would result in the automatic creation of new partitions when they
appear (this has some potential deadlocking issues, so it's not ideal
for a lot of applications).  Otherwise, you'd create partitions manually:

CREATE PARTITION ON  KEY ;
DROP PARTITION ON  KEY ;

... where  is some valid value which could result from
.

Yes, this is a very narrow and simplistic partitioning spec.  However,
it would cover 80% of the use cases I see in the field or on IRC, while
being 80% simpler than CE.  And CE would still be there for those who
need it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Jim Nasby
On Jul 26, 2011, at 5:56 PM, Christopher Browne wrote:
>> I'm assuming that the issue here is that multiple backends could be 
>> connected to the same database, and we don't want all of them to try to 
>> actually do the updates, only the first one that discovers the change. If 
>> that's the problem you foresee then perhaps it's a non-issue... if each 
>> backend only updates things that have actually changed, and they do that 
>> with race-free 'merge' logic, then only the first backend to attempt the 
>> update would end up finding actual work to do.
>> 
>> Or are you seeing a problem I'm missing?
> 
> What if 4 backends concurrently are the "first ones" to try to
> simultaneously add "South America/Ruritania", and...
> 
> 1.  Connection #1 came in 'first', but rolls back its transaction.
> 2.  Connection #2 came in 'second', and also winds up rolling back its
> transaction because the connection fails due to a network problem.
> 3.  Connection #3 actually completes.  But doesn't commit until after #4.
> 4.  Connection #4 started last, but turns out to COMMIT first.

Ugh, I didn't realize that a reload would take effect in the middle of a 
transaction. That certainly kills what I proposed.

Though, now I'm wondering why this info would need to be in every database 
anyway... certainly this should be treated as global data, and if that's the 
case then only one process needs to update it. Though I'm not sure if it's 
possible for global data to be ACID.

Anyway, rather than continue this on-list, I'm going to get Alvaro to think 
about it in more detail and see what he comes up with.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Jim Nasby
On Jul 25, 2011, at 9:59 PM, Jerry Sievers wrote:
> That our version of partitioning can be overloaded like this though I
> think adds power.  A bit of which we lost adding the restrictgion.

That's why I'd be opposed to any partitioning scheme that removed the ability 
to have different fields in different children. We've found that ability to be 
very useful. Likewise, I think we need to have intelligent plans involving a 
parent table that's either completely empty or mostly empty.

As for dealing with inheritance and putting stuff on some children but not 
others, take a look at http://pgfoundry.org/projects/enova-tools/. There's a 
presentation there that discusses how we solved these issues and it includes 
the tools we created to do it. Note that we're close to releasing a cleaner 
version of that stuff, so if you decide to use it please ping me off-list if we 
haven't gotten the new stuff posted.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Christopher Browne
On Tue, Jul 26, 2011 at 6:45 PM, Jim Nasby  wrote:
> On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
>> On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby  wrote:
>>> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to 
>>> it dynamically all the time? Perhaps we can enforce that we'll only 
>>> recognize new TZ info as part of a config reload?
>>
>> Hmm.  That might work in theory, but I don't see any good way to
>> update every database's tz table on each reload.
>
> I'm assuming that the issue here is that multiple backends could be connected 
> to the same database, and we don't want all of them to try to actually do the 
> updates, only the first one that discovers the change. If that's the problem 
> you foresee then perhaps it's a non-issue... if each backend only updates 
> things that have actually changed, and they do that with race-free 'merge' 
> logic, then only the first backend to attempt the update would end up finding 
> actual work to do.
>
> Or are you seeing a problem I'm missing?

What if 4 backends concurrently are the "first ones" to try to
simultaneously add "South America/Ruritania", and...

1.  Connection #1 came in 'first', but rolls back its transaction.
2.  Connection #2 came in 'second', and also winds up rolling back its
transaction because the connection fails due to a network problem.
3.  Connection #3 actually completes.  But doesn't commit until after #4.
4.  Connection #4 started last, but turns out to COMMIT first.

The "merge" is a pretty bad one.  They all have to try to succeed, and
in some way that doesn't block things.

Perhaps the TZ values need to not be UNIQUE, but some process can come
in afterwards and rewrite to drop out non-unique values.  That's not
very nice either; that means you can't use a FK reference against the
TZ table.

Or you need to have something that comes in afterwards and repoints
tuples to the *real* TZ entry, which seems likely to be troublesome.

This just gets so ugly so fast; the attempt to save space by storing a
pointer to the TZ value is just filled with trouble (and potential
#fail).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Jim Nasby
On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
> On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby  wrote:
>> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to 
>> it dynamically all the time? Perhaps we can enforce that we'll only 
>> recognize new TZ info as part of a config reload?
> 
> Hmm.  That might work in theory, but I don't see any good way to
> update every database's tz table on each reload.

I'm assuming that the issue here is that multiple backends could be connected 
to the same database, and we don't want all of them to try to actually do the 
updates, only the first one that discovers the change. If that's the problem 
you foresee then perhaps it's a non-issue... if each backend only updates 
things that have actually changed, and they do that with race-free 'merge' 
logic, then only the first backend to attempt the update would end up finding 
actual work to do.

Or are you seeing a problem I'm missing?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add missing newlines at end of error messages

2011-07-26 Thread David Fetter
This seems like a mechanical check.  Should it be part of what gets
checked when people push?

Cheers,
David.
On Tue, Jul 26, 2011 at 08:30:16PM +, Peter Eisentraut wrote:
> Add missing newlines at end of error messages
> 
> Branch
> --
> master
> 
> Details
> ---
> http://git.postgresql.org/pg/commitdiff/e67efb01e886d69d40d1cd87fba4507e8bb1035e
> 
> Modified Files
> --
> src/bin/psql/command.c |2 +-
> src/bin/psql/common.c  |2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> -- 
> Sent via pgsql-committers mailing list (pgsql-committ...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-committers

-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch  writes:
> On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
>> Dirty cache line, maybe not, but what if the assembly code commands the
>> CPU to load those variables into CPU registers before doing the
>> comparison?  If they're loaded with maxMsgNum coming in last (or at
>> least after resetState), I think you can have the problem without any
>> assumptions about cache line behavior at all.  You just need the process
>> to lose the CPU at the right time.

> True.  If the compiler places the resetState load first, you could hit the
> anomaly by "merely" setting a breakpoint on the next instruction, waiting for
> exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend 
> continue.
> I think, though, we should either plug that _and_ the cache incoherency case 
> or
> worry about neither.

How do you figure that?  The poor-assembly-code-order risk is both a lot
easier to fix and a lot higher probability.  Admittedly, it's still way
way down there, but you only need a precisely-timed sleep, not a
precisely-timed sleep *and* a cache line that somehow remained stale.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > That's the theoretical risk I wished to illustrate.  Though this appears
> > possible on an abstract x86_64 system, I think it's unrealistic to suppose 
> > that
> > a dirty cache line could persist *throughout* the issuance of more than 10^9
> > invalidation messages on a concrete implementation.
> 
> Dirty cache line, maybe not, but what if the assembly code commands the
> CPU to load those variables into CPU registers before doing the
> comparison?  If they're loaded with maxMsgNum coming in last (or at
> least after resetState), I think you can have the problem without any
> assumptions about cache line behavior at all.  You just need the process
> to lose the CPU at the right time.

True.  If the compiler places the resetState load first, you could hit the
anomaly by "merely" setting a breakpoint on the next instruction, waiting for
exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue.
I think, though, we should either plug that _and_ the cache incoherency case or
worry about neither.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

2011-07-26 Thread Tom Lane
Yeb Havinga  writes:
> A few days ago I read Tomas Vondra's blog post about dss tpc-h queries 
> on PostgreSQL at 
> http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in 
> which he showed how to manually pull up a dss subquery to get a large 
> speed up. Initially I thought: cool, this is probably now handled by 
> Hitoshi's patch, but it turns out the subquery type in the dss query is 
> different.

Actually, I believe this example is the exact opposite of the
transformation Hitoshi proposes.  Tomas was manually replacing an
aggregated subquery by a reference to a grouped table, which can be
a win if the subquery would be executed enough times to amortize
calculation of the grouped table over all the groups (some of which
might never be demanded by the outer query).  Hitoshi was talking about
avoiding calculations of grouped-table elements that we don't need,
which would be a win in different cases.  Or at least that was the
thrust of his original proposal; I'm not sure where the patch went since
then.

This leads me to think that we need to represent both cases as the same
sort of query and make a cost-based decision as to which way to go.
Thinking of it as a pull-up or push-down transformation is the wrong
approach because those sorts of transformations are done too early to
be able to use cost comparisons.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch  writes:
> On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
>> After some further reflection I believe this patch actually is pretty
>> safe, although Noah's explanation of why seems a bit confused.

> Here's the way it can fail:

> 1. Backend enters SIGetDataEntries() with main memory bearing 
> stateP->resetState
> = false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505.  The CPU has those
> latest stateP values in cache, but segP->maxMsgNum is *not* in cache.
> 2. Backend stalls for .  Meanwhile, other backends issue
> MSGNUMWRAPAROUND - 5 invalidation messages.  Main memory bears
> stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND,
> segP->maxMsgNum = 500.
> 3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum =
> 500 from main memory.  The patch's test finds no need to reset or process
> invalidation messages.

[ squint... ]  Hmm, you're right.  The case where this would break
things is if (some of) the five unprocessed messages relate to some
object we've just locked.  But the initial state you describe would be
valid right after obtaining such a lock.

> That's the theoretical risk I wished to illustrate.  Though this appears
> possible on an abstract x86_64 system, I think it's unrealistic to suppose 
> that
> a dirty cache line could persist *throughout* the issuance of more than 10^9
> invalidation messages on a concrete implementation.

Dirty cache line, maybe not, but what if the assembly code commands the
CPU to load those variables into CPU registers before doing the
comparison?  If they're loaded with maxMsgNum coming in last (or at
least after resetState), I think you can have the problem without any
assumptions about cache line behavior at all.  You just need the process
to lose the CPU at the right time.

If we marked the pointers volatile, we could probably ensure that the
assembly code tests resetState last, but that's not sufficient to avoid
the stale-cache-line risk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] isolation test deadlocking on buildfarm member coypu

2011-07-26 Thread Alvaro Herrera
After committing Noah's patch to fix the isolation tests, there have
been two more failures in Rémi's machines pika and coypu:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pika&dt=2011-07-24%2006%3A45%3A45
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2011-07-23%2021%3A50%3A54

In both cases, the failure is identical in fk-deadlock2:

== pgsql.20950/src/test/isolation/regression.diffs 
===
*** 
/home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/expected/fk-deadlock2.out
 Sun Jul 24 08:46:44 2011
--- 
/home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/results/fk-deadlock2.out
  Sun Jul 24 15:11:42 2011
***
*** 22,29 
  step s2u1:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  step s1u2:  UPDATE B SET Col2 = 1 WHERE BID = 2;  
  step s2u2:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
- step s1u2: <... completed>
  ERROR:  deadlock detected
  step s1c:  COMMIT; 
  step s2c:  COMMIT; 
  
--- 22,29 
  step s2u1:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  step s1u2:  UPDATE B SET Col2 = 1 WHERE BID = 2;  
  step s2u2:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  ERROR:  deadlock detected
+ step s1u2: <... completed>
  step s1c:  COMMIT; 
  step s2c:  COMMIT; 



I think the only explanation necessary for this is that one process
reports its status before the other one.  I think it would be enough to
add another variant of the expected file to fix this problem, but I
don't quite want to do that because we already have three of them, and I
think we would need to add one per existing expected, so we'd end up
with 6 expected files which would be a pain to work with.

Not quite sure what to do with this.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-26 Thread Florian Pflug
Hi

While reviewing the (now applied) XPATH escaping patches, Radoslaw found one
case where the previous failure of XPATH to escape its return value was offset
by XMLATTRIBUTES insistence to escape all input values, even if they're
already of type XML.

To wit, if you do

  SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES('&'::XML AS "a"))

you get

 xmlelement 

 

which is somewhat surprising. Especially since

  SELECT XMLELEMENT(NAME "t", '&'::XML);

gives

  xmlelement  
--
 &

as one would except. Now, it seems rather unlikely that a real application
would actually contain a query similar to the former one - you usually don 't
store XML in the attributes of XML nodes. But since XPATH() returns XML, it's
not unlikely that an application would do

  SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...))

As it stands, on 9.2 the values returned by the XPath expression will be
escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1
the value will be escaped only once.

Since this seems to be the most likely situation in which XMLATTRIBUTES would
receive an input argument of type XML, and since it's not entirely logical for
XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose
to change the behaviour of XMLATTRIBUTES as follows.

Values not of type XML are be treated as they always have been, i.e. all
special characters (<,>,&,",') are replaced by an entity reference (<,
>, &, ", ').

Values of type XML are assumed to be already escaped, so '&' isn't treated as
a special character to avoid double escaping. The other special characters
(<,>,",') are escaped as usual.

The safety of this relies on the fact that '&' may never appear in a
well-formed XML document, except as part of an entity reference. This seems to
be the case, except if CDATA sections are involved, which may contain plain
ampersands. So the actual escaping rule would have to be a bit more complex
than I made it sound above - we'd need to detect CDATA sections, and
re-enabled the escaping of '&' until the end of such a section.

To actually implement this, I'd remove the use of libxml from the
implementation of XMLELEMENT, and instead use our already existing
escape_xml() function, enhanced with the ability to handle the partial
escaping algorithm outlined above. We already only use libxml to escape
attribute values, so doing that isn't a radical departure from xml.c's ways.
As an added benefit, all the encoding-related problems of XMLELEMENT and
XMLATTRIBUTES would go away once libxml is removed from this code path. So
XPATH() would be the only remaining function which breaks in non-UTF-8
databases.

Comments? Thoughts? Suggestions?

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 01:36:26PM -0400, Robert Haas wrote:
> On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch  wrote:
> > On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote:
> >> On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch  wrote:
> >> > This is attractive, and I don't see any problems with it.  (In theory, 
> >> > you could
> >> > hit a case where the load of resetState gives an ancient "false" just as 
> >> > the
> >> > counters wrap to match.  Given that the wrap interval is 100x as 
> >> > long as the
> >> > reset interval, I'm not worried about problems on actual silicon.)
> >>
> >> It's actually 262,144 times as long - see MSGNUMWRAPAROUND.
> >
> > Ah, so it is.
> >
> >> It would be pretty easy to eliminate even the theoretical possibility
> >> of a race by getting rid of resetState altogether and using nextMsgNum
> >> = -1 to mean that.  Maybe I should go ahead and do that.
> >
> > Seems like a nice simplification.
> 
> On further reflection, I don't see that this helps: it just moves the
> problem around.

Alas, yes.

> With resetState as a separate variable, nextMsgNum is
> never changed by anyone other than the owner, so we can never have a
> stale load.

It's also changed when SICleanupQueue() decides to wrap everyone.  This could
probably be eliminated by using uint32 and letting overflow take care of wrap
implicitly, but ...

> But if we overload nextMsgNum to also indicate whether
> our state has been reset, then there's a race between when we load
> nextMsgNum and when we load maxMsgNum (instead of code I posted
> previously, which has a race between when we load resetState and when
> we load maxMsgNum).  Now, as you say, it seems really, really
> difficult to hit that in practice, but I don't see a way of getting
> rid of the theoretical possibility without either (1) a spinlock or
> (2) a fence.  (Of course, on x86, the fence could be optimized down to
> a compiler barrier.)

No new ideas come to mind, here.  We could migrate back toward your original
proposal of making the counter a non-wrapping uint64; Florian described some
nice techniques for doing atomic 64-bit reads on x86:

http://archives.postgresql.org/message-id/7c94c386-122f-4918-8624-a14352e8d...@phlo.org

I'm not personally acquainted with those approaches, but they sound promising.
Since the overlap between 32-bit installations and performance-sensitive
installations is all but gone, we could even just use a spinlock under 32-bit.

> I guess the question is "should we worry about that?".

I'd lean toward "no".  I share Tom's unease about writing off a race condition
as being too improbable, but this is quite exceptionally improbable.  On the
other hand, some of the fixes don't look too invasive.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
> >> I think I have a simpler idea, though:
> >> before acquiring any locks, just have SIGetDataEntries() do this:
> >> 
> >> +   if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState)
> >> +   return 0;
> >> 
> >> Patch (with comment explaining why I think this is OK) attached.  If
> >> the message numbers happen to be equal only because the counter has
> >> wrapped, then stateP->resetState will be true, so we'll still realize
> >> we need to do some work.
> 
> > This is attractive, and I don't see any problems with it.  (In theory, you 
> > could
> > hit a case where the load of resetState gives an ancient "false" just as the
> > counters wrap to match.  Given that the wrap interval is 100x as long 
> > as the
> > reset interval, I'm not worried about problems on actual silicon.)
> 
> > +1 for doing this and moving on.
> 
> After some further reflection I believe this patch actually is pretty
> safe, although Noah's explanation of why seems a bit confused.  The key
> points are that (1) each of the reads is atomic, and (2) we should not
> be able to see a value that is older than our last acquisition/release
> of a shared memory lock.  These being granted, we will make a decision
> that is correct, or at least was correct as of some time after that last
> lock action.  As stated in the patch comments, we are not required to
> promptly assimilate sinval actions on objects we don't hold any lock on,
> so this seems sufficient.  In essence, we are relying on an assumed
> prior visit to the lock manager to provide memory access synchronization
> for the sinval no-work-to-do test.
> 
> The case Noah speculates about above amounts to supposing that this
> reasoning fails to hold for the resetState value, and I don't see why
> that would be true.  Even if the above scenario did manage to happen,
> it would not mean that we missed the reset, just that we hadn't noticed
> it *yet*.  And by hypothesis, none of the as-yet-not-seen catalog
> updates are a problem for us.

Here's the way it can fail:

1. Backend enters SIGetDataEntries() with main memory bearing stateP->resetState
= false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505.  The CPU has those
latest stateP values in cache, but segP->maxMsgNum is *not* in cache.
2. Backend stalls for .  Meanwhile, other backends issue
MSGNUMWRAPAROUND - 5 invalidation messages.  Main memory bears
stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND,
segP->maxMsgNum = 500.
3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum =
500 from main memory.  The patch's test finds no need to reset or process
invalidation messages.

That's the theoretical risk I wished to illustrate.  Though this appears
possible on an abstract x86_64 system, I think it's unrealistic to suppose that
a dirty cache line could persist *throughout* the issuance of more than 10^9
invalidation messages on a concrete implementation.

A way to think about the problem is that our read of segP->maxMsgNum is too new.
If we had used a snapshot of values as of the most recent lock
acquisition/release, there would be no problem.  It's the mix of a new-enough
stateP with an all-too-new segP that yields the anomaly.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> Patch attached.

Will check and apply this.

> I've pondered whether to add a check to configure which verifies that
> the headers match the libxml version exactly at compile time. In the end,
> I didn't do that, for two reasons. First, there isn't anything wrong with
> using older headers together with a newer libxml, so long as both versions
> are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
> the only way to exercise libxml's promised ABI compatibility is to upgrade
> the libxml 2package after compiling postgres. That, however, is unlikely
> to happen except on production servers, and so by adding the check we'd
> increase the chance of ABI-compatibility problems remaining undetected
> during development and testing.

I concur with *not* doing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch  writes:
> On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
>> I think I have a simpler idea, though:
>> before acquiring any locks, just have SIGetDataEntries() do this:
>> 
>> +   if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState)
>> +   return 0;
>> 
>> Patch (with comment explaining why I think this is OK) attached.  If
>> the message numbers happen to be equal only because the counter has
>> wrapped, then stateP->resetState will be true, so we'll still realize
>> we need to do some work.

> This is attractive, and I don't see any problems with it.  (In theory, you 
> could
> hit a case where the load of resetState gives an ancient "false" just as the
> counters wrap to match.  Given that the wrap interval is 100x as long as 
> the
> reset interval, I'm not worried about problems on actual silicon.)

> +1 for doing this and moving on.

After some further reflection I believe this patch actually is pretty
safe, although Noah's explanation of why seems a bit confused.  The key
points are that (1) each of the reads is atomic, and (2) we should not
be able to see a value that is older than our last acquisition/release
of a shared memory lock.  These being granted, we will make a decision
that is correct, or at least was correct as of some time after that last
lock action.  As stated in the patch comments, we are not required to
promptly assimilate sinval actions on objects we don't hold any lock on,
so this seems sufficient.  In essence, we are relying on an assumed
prior visit to the lock manager to provide memory access synchronization
for the sinval no-work-to-do test.

The case Noah speculates about above amounts to supposing that this
reasoning fails to hold for the resetState value, and I don't see why
that would be true.  Even if the above scenario did manage to happen,
it would not mean that we missed the reset, just that we hadn't noticed
it *yet*.  And by hypothesis, none of the as-yet-not-seen catalog
updates are a problem for us.

BTW, the patch really ought to add something to the comment around
line 100.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas  wrote:

 * Can we partition the sinval lock, so we have multiple copies? That
 increases the task for those who trigger an invalidation, but will
 relieve the pressure for most readers.
>>>
>>> Not sure there's a way to meaningfully partition the queue.  In any
>>> case, I think the problem being dealt with here is how to update the
>>> read heads of the queue, not its contents.
>>
>> I agree there's no meaningful way to partition the queue, but we can
>> store the information in more than one place to reduce the contention
>> of people requesting it.
>
> I thought about that.  Basically, that saves you a factor of N in
> contention on the read side (where N is the number of copies) and
> costs you a factor of N on the write side (you have to update N
> copies, taking a spinlock or lwlock for each).  In the limit, you
> could do one copy of the counter per backend.
>
> I think, though, that a lock-free implementation using memory barriers
> is going to end up being the only real solution.  We might possibly
> convince ourselves that we're OK with increasing the cost of
> SIInsertDataEntries(), but any solution that involves replication the
> data is still based on doing at least some locking.  And I am pretty
> well convinced that even one spinlock acquisition in
> SIInsertDataEntries() is too many.

You might be right, but I think we have little knowledge of how some
memory barrier code you haven't written yet effects performance on
various architectures.

A spinlock per backend would cache very nicely, now you mention it. So
my money would be on the multiple copies.

It's not completely clear to me that updating N copies would be more
expensive. Accessing N low contention copies rather than 1
high-contention value might actually be a win.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-07-26 Thread Alexander Korotkov
On Tue, Jul 26, 2011 at 9:24 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> That was a quite off-the-cuff remark, so I took the patch and culled out
> loaded-tree bookkeeping. A lot of other changes fell off from that, so it
> took me quite some time to get it working again, but here it is. This is a
> *lot* smaller patch, although that's partly explained by the fact that I
> left out some features: prefetching and the neighbor relocation code is
> gone.
>
> I'm pretty exhausted by this, so I just wanted to send this out without
> further analysis. Let me know if you have questions on the approach taken.
> I'm also not sure how this performs compared to your latest patch, I haven't
> done any performance testing. Feel free to use this as is, or as a source of
> inspiration :-).

I also was going to try to evade keeping loaded-tree hash. This might help
me a lot. Thanks.

--
With best regards,
Alexander Korotkov.


[HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

2011-07-26 Thread Yeb Havinga

On 2011-07-22 17:35, Hitoshi Harada wrote:

2011/7/23 Yeb Havinga:

Works like a charm :-). However, now there is always a copyObject of a
subquery even when the subquery is not safe for qual pushdown. The problem
with the previous issafe was that it was only assigned for
rel->baserestrictinfo != NIL. If it is assigned before the if statement, it
still works. See attached patch that avoids subquery copy for unsafe
subqueries, and also exits best_inner_subqueryscan before palloc of
differenttypes in case of unsafe queries.


Ah, yeah, right. Too quick fix bloated my brain :P Thanks for testing!
I'll check it more.


A few days ago I read Tomas Vondra's blog post about dss tpc-h queries 
on PostgreSQL at 
http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in 
which he showed how to manually pull up a dss subquery to get a large 
speed up. Initially I thought: cool, this is probably now handled by 
Hitoshi's patch, but it turns out the subquery type in the dss query is 
different.


The original and rewritten queries are below. The debug_print_plan 
output shows the subquery is called from a opexpr (< l_quantity, 
subquery output) and the sublink type is EXPR_SUBLINK. Looking at the 
source code; pull_up_sublink only considers ANY and EXISTS sublinks. I'm 
wondering if this could be expanded to deal with EXPR sublinks. Clearly 
in the example Tomas has given this can be done. I'm wondering if there 
are show stoppers that prevent this to be possible in the general case, 
but can't think of any, other than the case of a sublink returning NULL 
and the opexpr is part of a larger OR expression or IS NULL; in which 
case it should not be pulled op, or perhaps it could be pulled up as 
outer join.


Thoughts?

regards,
Yeb


The original query:

tpch=# explain select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#13'
and p_container = 'JUMBO PKG'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
)
LIMIT 1;
 QUERY PLAN

 Limit  (cost=183345309.79..183345309.81 rows=1 width=8)
   ->  Aggregate  (cost=183345309.79..183345309.81 rows=1 width=8)
 ->  Hash Join  (cost=2839.99..183345307.76 rows=815 width=8)
   Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
   Join Filter: (public.lineitem.l_quantity < (SubPlan 1))
   ->  Seq Scan on lineitem  (cost=0.00..68985.69 
rows=2399869 width=17)

   ->  Hash  (cost=2839.00..2839.00 rows=79 width=4)
 ->  Seq Scan on part  (cost=0.00..2839.00 rows=79 
width=4)
   Filter: ((p_brand = 'Brand#13'::bpchar) AND 
(p_container = 'JUMBO PKG'::bpchar))

   SubPlan 1
 ->  Aggregate  (cost=74985.44..74985.46 rows=1 width=5)
   ->  Seq Scan on lineitem  (cost=0.00..74985.36 
rows=31 width=5)

 Filter: (l_partkey = part.p_partkey)

manually rewritten variant:

tpch=# explain select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part,
(SELECT l_partkey AS agg_partkey,
0.2 * avg(l_quantity) AS avg_quantity
  FROM lineitem GROUP BY l_partkey) part_agg
where
p_partkey = l_partkey
and agg_partkey = l_partkey
and p_brand = 'Brand#13'
and p_container = 'JUMBO PKG'
and l_quantity < avg_quantity
LIMIT 1;
   QUERY PLAN

 Limit  (cost=179643.88..179643.89 rows=1 width=8)
   ->  Aggregate  (cost=179643.88..179643.89 rows=1 width=8)
 ->  Hash Join  (cost=161865.21..178853.91 rows=315985 width=8)
   Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
   Join Filter: (public.lineitem.l_quantity < ((0.2 * 
avg(public.lineitem.l_quantity
   ->  HashAggregate  (cost=80985.04..82148.65 rows=77574 
width=9)
 ->  Seq Scan on lineitem  (cost=0.00..68985.69 
rows=2399869 width=9)

   ->  Hash  (cost=80849.63..80849.63 rows=2444 width=21)
 ->  Hash Join  (cost=2839.99..80849.63 rows=2444 
width=21)
   Hash Cond: (public.lineitem.l_partkey = 
part.p_partkey)
   ->  Seq Scan on lineitem  
(cost=0.00..68985.69 rows=2399869 width=17)
   ->  Hash  (cost=2839.00..2839.00 rows=79 
width=4)
 ->  Seq Scan on part  
(cost=0.00..2839.00 rows=79 width=4)
   Filter: ((p_brand = 
'Brand#13'::bpchar) AND (p_container = 'JUMBO PKG'::bpchar))




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your su

Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas  wrote:

> It wouldn't, although it might be bad in the case where there are lots
> of temp tables being created and dropped.

Do temp tables cause relcache invalidations?

That seems like something we'd want to change in itself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Simon Riggs  writes:
> On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
>  wrote:
>> Signals are already in use for special cases (queue is full), and I
>> think going through the kernel to achieve much more will lower
>> performance significantly.

> If there are no invalidations, there would be no signals. How would
> zero signals decrease performance?

But if there *is* an invalidation (which is not a negligible case),
it'd get significantly slower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 2:56 PM, Simon Riggs  wrote:
> On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
>  wrote:
>> Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:
>>
>>> Let me ask a few questions to stimulate a different solution
>>>
>>> * Can we do this using an active technique (e.g. signals) rather than
>>> a passive one (reading a counter?)
>>
>> Signals are already in use for special cases (queue is full), and I
>> think going through the kernel to achieve much more will lower
>> performance significantly.
>
> If there are no invalidations, there would be no signals. How would
> zero signals decrease performance?

It wouldn't, although it might be bad in the case where there are lots
of temp tables being created and dropped.  I think the biggest problem
with signals is that they don't provide any meaningful synchronization
guarantees.  When you send somebody a signal, you don't really know
how long it's going to take for them to receive it.

>>> * Can we partition the sinval lock, so we have multiple copies? That
>>> increases the task for those who trigger an invalidation, but will
>>> relieve the pressure for most readers.
>>
>> Not sure there's a way to meaningfully partition the queue.  In any
>> case, I think the problem being dealt with here is how to update the
>> read heads of the queue, not its contents.
>
> I agree there's no meaningful way to partition the queue, but we can
> store the information in more than one place to reduce the contention
> of people requesting it.

I thought about that.  Basically, that saves you a factor of N in
contention on the read side (where N is the number of copies) and
costs you a factor of N on the write side (you have to update N
copies, taking a spinlock or lwlock for each).  In the limit, you
could do one copy of the counter per backend.

I think, though, that a lock-free implementation using memory barriers
is going to end up being the only real solution.  We might possibly
convince ourselves that we're OK with increasing the cost of
SIInsertDataEntries(), but any solution that involves replication the
data is still based on doing at least some locking.  And I am pretty
well convinced that even one spinlock acquisition in
SIInsertDataEntries() is too many.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
 wrote:
> Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:
>
>> Let me ask a few questions to stimulate a different solution
>>
>> * Can we do this using an active technique (e.g. signals) rather than
>> a passive one (reading a counter?)
>
> Signals are already in use for special cases (queue is full), and I
> think going through the kernel to achieve much more will lower
> performance significantly.

If there are no invalidations, there would be no signals. How would
zero signals decrease performance?


>> * Can we partition the sinval lock, so we have multiple copies? That
>> increases the task for those who trigger an invalidation, but will
>> relieve the pressure for most readers.
>
> Not sure there's a way to meaningfully partition the queue.  In any
> case, I think the problem being dealt with here is how to update the
> read heads of the queue, not its contents.

I agree there's no meaningful way to partition the queue, but we can
store the information in more than one place to reduce the contention
of people requesting it.

Both those ideas are still on the table.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Alvaro Herrera
Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:

> Let me ask a few questions to stimulate a different solution
> 
> * Can we do this using an active technique (e.g. signals) rather than
> a passive one (reading a counter?)

Signals are already in use for special cases (queue is full), and I
think going through the kernel to achieve much more will lower
performance significantly.

> * Can we partition the sinval lock, so we have multiple copies? That
> increases the task for those who trigger an invalidation, but will
> relieve the pressure for most readers.

Not sure there's a way to meaningfully partition the queue.  In any
case, I think the problem being dealt with here is how to update the
read heads of the queue, not its contents.

> * Can we put the sinval info in a different place? e.g. inside each
> lock partition.

I don't think you can relate sinval messages to particular locks, which
would make this infeasible.  I think this bit is directly related to the
point above.

> * Why do we have a different mechanism for cache invalidation
> internally (sinval) to the one we offer externally (LISTEN/NOTIFY)?
> Why don't we have just one?

Performance.  Not sure there are other reasons as well; but just this
one is significant enough.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 17:07 , Tom Lane wrote:
> Florian Pflug  writes:
>> What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
>> to elog(FATAL)? Shall I do that too, or leave it for now?
> 
> No objection here --- I had considered writing it that way myself.
> I refrained for fear of making a bad situation even worse, but if
> other people think FATAL would be the safer way, fine.

Patch attached.

pg_xml_init() now checks the error context after setting it, and
ereport(ERROR)s if the check fails. I've made it so that the errhint()
which suggest that compiling against libxml <= 2.7.3 but running
against > 2.7.3 might be the reason is only emitted if we actually
compiled against an older version. This is meant to avoid confusion
should there ever be another reason for that check to fail.

I've also changed the elog(ERROR) to elog(FATAL) in xml_errorHandler().

I've pondered whether to add a check to configure which verifies that
the headers match the libxml version exactly at compile time. In the end,
I didn't do that, for two reasons. First, there isn't anything wrong with
using older headers together with a newer libxml, so long as both versions
are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
the only way to exercise libxml's promised ABI compatibility is to upgrade
the libxml 2package after compiling postgres. That, however, is unlikely
to happen except on production servers, and so by adding the check we'd
increase the chance of ABI-compatibility problems remaining undetected
during development and testing.

best regards,
Florian Pflug


pg_xml_errctxcheck.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 6:36 PM, Robert Haas  wrote:

> Now, as you say, it seems really, really
> difficult to hit that in practice, but I don't see a way of getting
> rid of the theoretical possibility without either (1) a spinlock or
> (2) a fence.  (Of course, on x86, the fence could be optimized down to
> a compiler barrier.)  I guess the question is "should we worry about
> that?".

Perhaps the answer lies in a different direction altogether?

Let me ask a few questions to stimulate a different solution

* Can we do this using an active technique (e.g. signals) rather than
a passive one (reading a counter?)

* Can we partition the sinval lock, so we have multiple copies? That
increases the task for those who trigger an invalidation, but will
relieve the pressure for most readers.

* Can we put the sinval info in a different place? e.g. inside each
lock partition.

* Why do we have a different mechanism for cache invalidation
internally (sinval) to the one we offer externally (LISTEN/NOTIFY)?
Why don't we have just one?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Greg Smith

On 07/26/2011 12:33 PM, Pavan Deolasee wrote:

I think what I am suggesting is that the default pgbench test setup
would probably not give you a good scalability as number of clients
are increased and one reason could be the contention in the small
table. So it might be a good idea to get rid of that and see if we get
much better scalability and understand other bottlenecks.
   


You can easily see this form of contention pulling down results when the 
database itself is really small and the overall transaction rate is very 
high.  With Robert using a scale=100, no more than 80 clients, and 
transaction rates peaking at <10K TPS on a 24 core box, I wouldn't 
expect this form of contention to be a large issue.  It may be dropping 
results a few percent, but I'd be surprised if it was any more than that.


It's easy enough to use "-N" to skip the updates to the smaller tellers 
and branches table to pull that out of the way.  TPS will go up, because 
it's doing less per transaction.  That's not necessarily a better test 
case though, it's just a different one.  The regular case includes a lot 
of overwriting the same blocks in the hot branches and tellers tables.  
That effectively pushes a lot more I/O toward being likely to happen at 
checkpoint time.  Those tables rarely have any significant I/O outside 
of the checkpoint in the standard "TPC-B like" scenario, because their 
usage counts stay high most of the time.


Contention for small tables that are being heavily updated is still 
important to optimize too though.  Which type of test makes more sense 
depends on what aspect of performance you're trying to focus on.  I'll 
sometimes do a full pgbench-tools "sweep" (range of multiple scales and 
clients at each scale) in both regular write and "-N" write modes, just 
to collect the slightly different data each provides.  The form of 
checkpoint I/O spike you see at sync time is very different in the two 
cases, but both usage profiles are important to consider and optimize.  
That Robert has started with the regular case doesn't worry me too much 
now that I've seen the parameters he's using, he's not running it in a 
way where I'd expect branch/teller contention to be a major limiting 
factor on the results.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 26 13:43:08 -0400 2011:

> Uh, yes.  I've lost count of the number of times I've seen people hit
> one-instruction-wide race condition windows, SIGSEGV crash based on
> accessing only one byte past the valid data structure, etc etc.

I think you need a better locking protocol on that counter of yours.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem with compiling beta3 on mingw32+WinXP

2011-07-26 Thread pasman pasmański
After reinstalling mingw is ok.


2011/7/25, pasman pasmański :
> After googling i found that mingw's gcc works with 64 bit integers.
> But printf is incompatible :( . Possible workaround: include
> inttypes.h , define macros and convert printf strings:
>
> printf("%" LL,(long long)100)
>
> 2011/7/25, pasman pasmański :
>> Hi.
>>
>> When i try to compile postgresql-beta3 on mingw32 ./configure pass ok,
>> but there is error when i do "make":
>>
>>
>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement
>>  -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
>> -I../../src/port
>> -DFRONTEND -I../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
>> "-I../
>> ../src/include/port/win32"  -c -o crypt.o crypt.c
>> In file included from crypt.c:44:0:
>> ../../src/include/c.h:284:2: error: #error must have a working 64-bit
>> integer da
>> tatype
>> In file included from ../../src/include/c.h:851:0,
>>  from crypt.c:44:
>> ../../src/include/port.h:390:0: warning: "fseeko" redefined
>> ../../src/include/pg_config_os.h:228:0: note: this is the location of the
>> previo
>> us definition
>> ../../src/include/port.h:391:0: warning: "ftello" redefined
>> ../../src/include/pg_config_os.h:229:0: note: this is the location of the
>> previo
>> us definition
>> make[2]: *** [crypt.o] Error 1
>> make[2]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src/port'
>> make[1]: *** [all-port-recurse] Error 2
>> make[1]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src'
>> make: *** [all-src-recurse] Error 2
>>
>> Make version 3.81.
>> Status file included.
>> What is wrong ?
>>
>>
>> 
>> pasman
>>
>
>
> --
> 
> pasman
>


-- 

pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Robert Haas  writes:
> On further reflection, I don't see that this helps: it just moves the
> problem around.  With resetState as a separate variable, nextMsgNum is
> never changed by anyone other than the owner, so we can never have a
> stale load.  But if we overload nextMsgNum to also indicate whether
> our state has been reset, then there's a race between when we load
> nextMsgNum and when we load maxMsgNum (instead of code I posted
> previously, which has a race between when we load resetState and when
> we load maxMsgNum).  Now, as you say, it seems really, really
> difficult to hit that in practice, but I don't see a way of getting
> rid of the theoretical possibility without either (1) a spinlock or
> (2) a fence.  (Of course, on x86, the fence could be optimized down to
> a compiler barrier.)  I guess the question is "should we worry about
> that?".

Uh, yes.  I've lost count of the number of times I've seen people hit
one-instruction-wide race condition windows, SIGSEGV crash based on
accessing only one byte past the valid data structure, etc etc.
The worst thing about this type of bug is that you can't reproduce the
failure when you try; doesn't mean your users won't see it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch  wrote:
> On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote:
>> On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch  wrote:
>> > This is attractive, and I don't see any problems with it.  (In theory, you 
>> > could
>> > hit a case where the load of resetState gives an ancient "false" just as 
>> > the
>> > counters wrap to match.  Given that the wrap interval is 100x as long 
>> > as the
>> > reset interval, I'm not worried about problems on actual silicon.)
>>
>> It's actually 262,144 times as long - see MSGNUMWRAPAROUND.
>
> Ah, so it is.
>
>> It would be pretty easy to eliminate even the theoretical possibility
>> of a race by getting rid of resetState altogether and using nextMsgNum
>> = -1 to mean that.  Maybe I should go ahead and do that.
>
> Seems like a nice simplification.

On further reflection, I don't see that this helps: it just moves the
problem around.  With resetState as a separate variable, nextMsgNum is
never changed by anyone other than the owner, so we can never have a
stale load.  But if we overload nextMsgNum to also indicate whether
our state has been reset, then there's a race between when we load
nextMsgNum and when we load maxMsgNum (instead of code I posted
previously, which has a race between when we load resetState and when
we load maxMsgNum).  Now, as you say, it seems really, really
difficult to hit that in practice, but I don't see a way of getting
rid of the theoretical possibility without either (1) a spinlock or
(2) a fence.  (Of course, on x86, the fence could be optimized down to
a compiler barrier.)  I guess the question is "should we worry about
that?".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 4:40M, Pavan Deolasee  wrote:
> On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas  wrote:
>> On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith  wrote:
>>> On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.
>>>
>>> Database scale?  One or multiple pgbench worker threads?  A reminder on the
>>> amount of RAM in the server would be helpful for interpreting the results
>>> too.
>>
>> Ah, sorry.  scale = 100, so small.  pgbench invocation is:
>>
>
> It might be worthwhile to test only with the accounts and history
> table and also increasing the number of statements in a transaction.
> Otherwise the tiny tables can quickly become a bottleneck.

+1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 12:29 PM, Pavan Deolasee
 wrote:

>
> So many transactions trying to update a small set of rows in a table.
> Is that what we really want to measure ? My thinking is that we might
> see different result if they are updating different parts of the table
> and the transaction start/stop overhead is spread across few
> statements.
>

I think what I am suggesting is that the default pgbench test setup
would probably not give you a good scalability as number of clients
are increased and one reason could be the contention in the small
table. So it might be a good idea to get rid of that and see if we get
much better scalability and understand other bottlenecks.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 12:24 PM, Robert Haas  wrote:
> On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee
>  wrote:
>> On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas  wrote:
>>> On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith  wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:
>
> I did 5-minute pgbench runs with unlogged tables and with permanent
> tables, restarting the database server and reinitializing the tables
> between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.
>>>
>>> Ah, sorry.  scale = 100, so small.  pgbench invocation is:
>>>
>>
>> It might be worthwhile to test only with the accounts and history
>> table and also increasing the number of statements in a transaction.
>> Otherwise the tiny tables can quickly become a bottleneck.
>
> What kind of bottleneck?
>

So many transactions trying to update a small set of rows in a table.
Is that what we really want to measure ? My thinking is that we might
see different result if they are updating different parts of the table
and the transaction start/stop overhead is spread across few
statements.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread David Fetter
On Tue, Jul 26, 2011 at 12:18:31PM -0400, Timothy D. F. Lewis wrote:
> On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas  wrote:
> 
> > On Mon, Jul 25, 2011 at 9:37 AM, Tim  wrote:
> > > Updated the patch to also apply when the no-action flag is enabled.
> >
> > You may want to read this:
> >
> > http://wiki.postgresql.org/wiki/Submitting_a_Patch
> >
> > And add your patch here:
> >
> > https://commitfest.postgresql.org/action/commitfest_view/open
> >
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> 
> I'm not sure what David did for me so as per Roberts suggestion I have
> addedthis
> patch to the commit fest.
> I'm hoping I have not put this patch in more than one
> workflow
> .

I just put your name as you've asked me to phrase it into the "pending
patches" part of the PostgreSQL Weekly News.

Thanks for the contribution, and here's hoping your experience here
will be pleasant and productive :)

Cheers,
David.

p.s.  We don't top-post on the PostgreSQL mailing lists.  As with the
customary "reply-to-all," it's a convention we've decided works well
for us. :)
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee
 wrote:
> On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas  wrote:
>> On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith  wrote:
>>> On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.
>>>
>>> Database scale?  One or multiple pgbench worker threads?  A reminder on the
>>> amount of RAM in the server would be helpful for interpreting the results
>>> too.
>>
>> Ah, sorry.  scale = 100, so small.  pgbench invocation is:
>>
>
> It might be worthwhile to test only with the accounts and history
> table and also increasing the number of statements in a transaction.
> Otherwise the tiny tables can quickly become a bottleneck.

What kind of bottleneck?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 18:04 , Noah Misch wrote:
> On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
>> On Jul26, 2011, at 01:57 , Noah Misch wrote:
>>> We could rearrange things so the xml2-config -L
>>> flags (or lack thereof) take priority over a --with-libraries directory for
>>> the purpose of finding libxml2.
>> 
>> Hm, how would we do that just for the purpose of finding libxml? Do autotools
>> provide a way to specify per-library -L flags?
> 
> Autoconf (built-in macros, that is) and Automake do not get involved in that. 
>  I
> vaguely recall Libtool having support for it, but PostgreSQL does not use
> Automake or Libtool.  I also spoke too loosely: -L is never per-library, but 
> you
> can emulate that by completing the search externally and passing a full path 
> to
> the linker.

Yeah, that was what I though would be the only way too. I had the slight hope
that I had missed something, but unfortunately it seems I didn't :-(

> Honestly, the benefit is probably too light to justify going to this trouble.

Yep. It'd probably be weeks, if not months, until we made that work on all
supported platforms. And by the time it does, we'd probably have re-invented
a sizeable portion of libtool.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Timothy D. F. Lewis
I'm not sure what David did for me so as per Roberts suggestion I have
addedthis
patch to the commit fest.
I'm hoping I have not put this patch in more than one
workflow
.

On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas  wrote:

> On Mon, Jul 25, 2011 at 9:37 AM, Tim  wrote:
> > Updated the patch to also apply when the no-action flag is enabled.
>
> You may want to read this:
>
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
>
> And add your patch here:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
> On Jul26, 2011, at 01:57 , Noah Misch wrote:
> > We could rearrange things so the xml2-config -L
> > flags (or lack thereof) take priority over a --with-libraries directory for
> > the purpose of finding libxml2.
> 
> Hm, how would we do that just for the purpose of finding libxml? Do autotools
> provide a way to specify per-library -L flags?

Autoconf (built-in macros, that is) and Automake do not get involved in that.  I
vaguely recall Libtool having support for it, but PostgreSQL does not use
Automake or Libtool.  I also spoke too loosely: -L is never per-library, but you
can emulate that by completing the search externally and passing a full path to
the linker.

Honestly, the benefit is probably too light to justify going to this trouble.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Jerry Sievers
Andrew Dunstan  writes:

> On 07/25/2011 10:31 PM, Jerry Sievers wrote:
>> Hackers;
>>
>> I just noticed that somewhere between 8.2 and 8.4, an exception is
>> raised trying to alter table ONLY some_partition_parent ADD CHECK
>> (foo).
>>
>
>
> 8.4 had this change:
>
>*
>
>  Force child tables to inherit CHECK constraints from parents
>  (Alex Hunsaker, Nikhil Sontakke, Tom)
>
>  Formerly it was possible to drop such a constraint from a
>  child table, allowing rows that violate the constraint to be
>  visible when scanning the parent table. This was deemed
>  inconsistent, as well as contrary to SQL standard.
>
>
> You're not the only one who occasionally bangs his head against it.
>
> cheers
>
> andrew

Thanks Andrew!...  Yeah, I figured it was a documented change but too
lazy tonight to browse release notes :-)

The previous behavior was to me convenient, but I agree, probably lead
to some confusion too.

That our version of partitioning can be overloaded like this though I
think adds power.  A bit of which we lost adding the restrictgion.
>
>
>
>

-- 
Jerry Sievers
e: jerry.siev...@comcast.net
p: 305.321.1144

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
Hello Guys,


   Lately, I saw this syntax in ./src/backend/parser/Gray.y as following!
   
  CreateFunctionStmt:
CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
RETURNS func_return createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = $5;
n->returnType = $7;
n->options = $8;
n->withClause = $9;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  RETURNS TABLE '(' table_func_column_list ')' 
createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = 
mergeTableFuncParameters($5, $9);
n->returnType = TableFuncTypeName($9);
n->returnType->location = @7;
n->options = $11;
n->withClause = $12;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = $5;
n->returnType = NULL;
n->options = $6;
n->withClause = $7;
$$ = (Node *)n;
}
;


  I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


  May someone help me to write a simple example of this syntax!  Thank 
you very much. Looking forward for your help!

Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Tim Lewis
Hi Aron,

Thanks for the input. The "small change" you suggest would change the
behavior of the patch which I would prefer not to do as I have reasons for
the previous behavior.
Because you gave no reasons and "stop after removing LIMIT LOs" was not
changed to "stop after attempting to remove LIMIT LOs" I suspect you were
just trying to keep the code clean.

The difference:
In your version of the patch vacuumlo will stop after N lo_unlink(OID)
attempts.
The previous behavior of the patch is that vacuumlo will stop after N
successful lo_unlink(OID)s.

If you have good reason for your behavior please add another flag so that it
is optional.
There should be a clear distinction between "counting vs not", and "aborting
vs continuing" when a lo_unlink(OID) is unsuccessful.


On Tue, Jul 26, 2011 at 4:18 AM, Aron  wrote:

> Here's another small change to the patch, it works fine for me and it quite
> saved my day.
>
> I try to submit the patch by email.
>
>
> diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
> index f6e2a28..1f88d72 100644
> --- a/contrib/vacuumlo/vacuumlo.c
> +++ b/contrib/vacuumlo/vacuumlo.c
> @@ -48,6 +48,7 @@ struct _param
>char   *pg_host;
>int verbose;
>int dry_run;
> +   int transaction_limit;
>  };
>
>  intvacuumlo(char *, struct _param *);
> @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
> }
>else
>deleted++;
> +   if(param->transaction_limit!=0 &&
> deleted>=param->transaction_limit)
> +   break;
>}
>PQclear(res);
>
> @@ -313,6 +316,7 @@ usage(const char *progname)
> printf("  -h HOSTNAME  database server host or socket
> directory\n");
>printf("  -n   don't remove large objects, just show what
> would be
> done\n");
>printf("  -p PORT  database server port\n");
> +   printf("  -l LIMIT stop after removing LIMIT LOs\n");
>printf("  -U USERNAME  user name to connect as\n");
>printf("  -w   never prompt for password\n");
>printf("  -W   force password prompt\n");
> @@ -342,6 +346,7 @@ main(int argc, char **argv)
> param.pg_port = NULL;
>param.verbose = 0;
>param.dry_run = 0;
> +   param.transaction_limit = 0;
>
>if (argc > 1)
>{
> @@ -359,7 +364,7 @@ main(int argc, char **argv)
>
>while (1)
>{
> -   c = getopt(argc, argv, "h:U:p:vnwW");
> +   c = getopt(argc, argv, "h:U:p:l:vnwW");
>if (c == -1)
>break;
>
> @@ -395,6 +400,14 @@ main(int argc, char **argv)
> }
>param.pg_port = strdup(optarg);
>break;
> +   case 'l':
> +   param.transaction_limit = strtol(optarg,
> NULL, 10);
> +   if ((param.transaction_limit < 0) ||
> (param.transaction_limit >
> 2147483647))
> +   {
> +   fprintf(stderr, "%s: invalid
> transaction limit number: %s, valid
> range is form 0(disabled) to 2147483647.\n", progname, optarg);
> +   exit(1);
> +   }
> +   break;
>case 'h':
>param.pg_host = strdup(optarg);
>break;
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 

Noodle
Connecting People, Content & Capabilities within the Enterprise


Toll Free: 866-258-6951 x 701
tim.le...@vialect.com
http://www.vialect.com

Noodle is a product of Vialect Inc

Follow Noodle on Twitter
http://www.twitter.com/noodle_news


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread David Fetter
On Tue, Jul 26, 2011 at 10:51:58AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
> >  wrote:
> >> Hmmm, but then it does open up the possibility of naive users shooting
> >> themselves in the foot. It can be easy to conjure up a
> >> parent-only-constraint that does not gel too well with its children. And
> >> that's precisely why this feature was added in the first place..
> 
> > Yeah, but I think we need to take that chance.  At the very least, we
> > need to support the equivalent of a non-inherited CHECK (false) on
> > parent tables.
> 
> No, the right solution is to invent an actual concept of partitioned
> tables, not to keep adding ever-weirder frammishes to inheritance so
> that it can continue to provide an awkward, poorly-performing emulation
> of them.

Other SQL engines have partitions of types list, range and hash, and
some can sub-partition.  I'm thinking it might be easiest to do the
first before adding layers of partition structure, although we should
probably bear in mind that such layers will eventually exist.

Does the wiki on this need updating?

http://wiki.postgresql.org/wiki/Table_partitioning

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas  wrote:
> On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith  wrote:
>> On 07/25/2011 04:07 PM, Robert Haas wrote:
>>>
>>> I did 5-minute pgbench runs with unlogged tables and with permanent
>>> tables, restarting the database server and reinitializing the tables
>>> between each run.
>>
>> Database scale?  One or multiple pgbench worker threads?  A reminder on the
>> amount of RAM in the server would be helpful for interpreting the results
>> too.
>
> Ah, sorry.  scale = 100, so small.  pgbench invocation is:
>

It might be worthwhile to test only with the accounts and history
table and also increasing the number of statements in a transaction.
Otherwise the tiny tables can quickly become a bottleneck.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Tom Lane  writes:
> Robert Haas  writes:
>> On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>>> I don't think so.  It would just be another headache for packagers ---
>>> in Red Hat distros, at least, packages are explicitly forbidden from
>>> containing any rpath settings.

>> So what do they do about Perl and Python?

> Patch the source to not add rpath switches.

No, I take that back.  What the RH packages do is configure with
--disable-rpath, and so long as that applies to this too, I'd
have no objection.

What I was mis-remembering is that there's a patch that *puts back*
plperl's rpath for libperl, because for some reason that no one has ever
satisfactorily explained to me, perl has an exemption from the distro
policy that loadable libraries must be installed so that they're known
to ldconfig.  Which of course is exactly why rpath isn't (supposed to
be) needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
> Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011:
>> Here's another small change to the patch, it works fine for me and it quite
>> saved my day.
>> 
>> I try to submit the patch by email.
> 
> There's a problem with this patch: long lines are being wrapped by
> your email client, which makes it impossible to apply.  I think it would
> be safer if you sent it as an attachment instead of pasting it in the
> body of the message.

I posted using nabble, edited the post and uploaded the file as an attachment.

But here you go again, this time as an attachment.



vacuumlo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
Hi Tim,

I have to correct my previous answer, my change does not alter the behavior of 
your patch significantly.

> The difference:
> In your version of the patch vacuumlo will stop after N lo_unlink(OID) 
> attempts.
> The previous behavior of the patch is that vacuumlo will stop after N 
> successful lo_unlink(OID)s.
> 
> If you have good reason for your behavior please add another flag so that it 
> is optional.
> There should be a clear distinction between "counting vs not", and "aborting 
> vs continuing" when a lo_unlink(OID) is unsuccessful.



if (param->dry_run == 0)
{
if (lo_unlink(conn, lo) < 0)
{
fprintf(stderr, "\nFailed to remove lo %u: ", 
lo);
fprintf(stderr, "%s", PQerrorMessage(conn));
}
else
deleted++;
}
else
deleted++;
if(param->transaction_limit!=0 && 
deleted>=param->transaction_limit)
break;


The variable "deleted" is only incremented if a lo_unlink was successful, so my 
patch only introduces a negligible overhead but no actual change in behavior.

I'm very grateful for your patch and I think it should be accepted as soon as 
possible, one or two "if" does not matter to me.

Aron
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
Hi Tim,

you correctly assumed I was just trying to clean up the code. There was no 
reason to change the behavior, so please ignore my change to the patch.

Aron


On 26.07.2011, at 15:44, Tim Lewis wrote:

> Hi Aron,
> 
> Thanks for the input. The "small change" you suggest would change the 
> behavior of the patch which I would prefer not to do as I have reasons for 
> the previous behavior.
> Because you gave no reasons and "stop after removing LIMIT LOs" was not 
> changed to "stop after attempting to remove LIMIT LOs" I suspect you were 
> just trying to keep the code clean.
> 
> The difference:
> In your version of the patch vacuumlo will stop after N lo_unlink(OID) 
> attempts.
> The previous behavior of the patch is that vacuumlo will stop after N 
> successful lo_unlink(OID)s.
> 
> If you have good reason for your behavior please add another flag so that it 
> is optional.
> There should be a clear distinction between "counting vs not", and "aborting 
> vs continuing" when a lo_unlink(OID) is unsuccessful.
> 
> 
> On Tue, Jul 26, 2011 at 4:18 AM, Aron  wrote:
> Here's another small change to the patch, it works fine for me and it quite
> saved my day.
> 
> I try to submit the patch by email.
> 
> 
> diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
> index f6e2a28..1f88d72 100644
> --- a/contrib/vacuumlo/vacuumlo.c
> +++ b/contrib/vacuumlo/vacuumlo.c
> @@ -48,6 +48,7 @@ struct _param
>char   *pg_host;
>int verbose;
>int dry_run;
> +   int transaction_limit;
>  };
> 
>  intvacuumlo(char *, struct _param *);
> @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
>}
>else
>deleted++;
> +   if(param->transaction_limit!=0 &&
> deleted>=param->transaction_limit)
> +   break;
>}
>PQclear(res);
> 
> @@ -313,6 +316,7 @@ usage(const char *progname)
>printf("  -h HOSTNAME  database server host or socket directory\n");
>printf("  -n   don't remove large objects, just show what 
> would be
> done\n");
>printf("  -p PORT  database server port\n");
> +   printf("  -l LIMIT stop after removing LIMIT LOs\n");
>printf("  -U USERNAME  user name to connect as\n");
>printf("  -w   never prompt for password\n");
>printf("  -W   force password prompt\n");
> @@ -342,6 +346,7 @@ main(int argc, char **argv)
>param.pg_port = NULL;
>param.verbose = 0;
>param.dry_run = 0;
> +   param.transaction_limit = 0;
> 
>if (argc > 1)
>{
> @@ -359,7 +364,7 @@ main(int argc, char **argv)
> 
>while (1)
>{
> -   c = getopt(argc, argv, "h:U:p:vnwW");
> +   c = getopt(argc, argv, "h:U:p:l:vnwW");
>if (c == -1)
>break;
> 
> @@ -395,6 +400,14 @@ main(int argc, char **argv)
>}
>param.pg_port = strdup(optarg);
>break;
> +   case 'l':
> +   param.transaction_limit = strtol(optarg, 
> NULL, 10);
> +   if ((param.transaction_limit < 0) || 
> (param.transaction_limit >
> 2147483647))
> +   {
> +   fprintf(stderr, "%s: invalid transaction 
> limit number: %s, valid
> range is form 0(disabled) to 2147483647.\n", progname, optarg);
> +   exit(1);
> +   }
> +   break;
>case 'h':
>param.pg_host = strdup(optarg);
>break;
> 
> 
> --
> View this message in context: 
> http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 
> 
> -- 
> 
> Noodle
> Connecting People, Content & Capabilities within the Enterprise
> 
> 
> Toll Free: 866-258-6951 x 701
> tim.le...@vialect.com
> http://www.vialect.com
> 
> Noodle is a product of Vialect Inc
> 
> Follow Noodle on Twitter
> http://www.twitter.com/noodle_news
> 



Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
> to elog(FATAL)? Shall I do that too, or leave it for now?

No objection here --- I had considered writing it that way myself.
I refrained for fear of making a bad situation even worse, but if
other people think FATAL would be the safer way, fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>> I don't think so.  It would just be another headache for packagers ---
>> in Red Hat distros, at least, packages are explicitly forbidden from
>> containing any rpath settings.

> So what do they do about Perl and Python?

Patch the source to not add rpath switches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
>  wrote:
>> Hmmm, but then it does open up the possibility of naive users shooting
>> themselves in the foot. It can be easy to conjure up a
>> parent-only-constraint that does not gel too well with its children. And
>> that's precisely why this feature was added in the first place..

> Yeah, but I think we need to take that chance.  At the very least, we
> need to support the equivalent of a non-inherited CHECK (false) on
> parent tables.

No, the right solution is to invent an actual concept of partitioned
tables, not to keep adding ever-weirder frammishes to inheritance so
that it can continue to provide an awkward, poorly-performing emulation
of them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 16:22 , Tom Lane wrote:
> Florian Pflug  writes:
>> On further reflection, instead of checking whether we can restore the error
>> handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
>> in pg_xml_done() to ereport(ERROR), and include a hint that explains the
>> probably cause.
> 
>> The upside being that we only fail when we actually need to restore the
>> error handler. Since there's one caller (parse_xml_decl) who calls
>> pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
>> isn't only academic. At least XML would output will continue to work
>> work after libxml is upgraded from < 2.7.4 to >= 2.7.4.
> 
> Good point.  But what about failing in pg_xml_init?  That is, after
> calling xmlSetStructuredErrorFunc, check that it set the variable we
> expected it to set.

Yeah, that's even better. Will do it that way.

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

> The purpose of the check in pg_xml_done is not to detect library issues,
> but to detect omitted save/restore pairs and similar coding mistakes.
> I don't want to upgrade it to an ERROR, and I don't want to confuse
> people by hinting that the problem is with libxml.

I can see the concern about possible confusion, and I agree that
pg_xml_init(), right after we set our error handler, is the most logical
place to test whether we can read it back.

I guess I don't really see the benefit of the check in pg_xml_done()
triggering a WARNING instead of an ERROR, but since we won't be hijacking
that check now anyway, I don't mind it being a WARNING either.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Alvaro Herrera
Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011:
> Here's another small change to the patch, it works fine for me and it quite
> saved my day.
> 
> I try to submit the patch by email.

There's a problem with this patch: long lines are being wrapped by
your email client, which makes it impossible to apply.  I think it would
be safer if you sent it as an attachment instead of pasting it in the
body of the message.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> On further reflection, instead of checking whether we can restore the error
> handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
> in pg_xml_done() to ereport(ERROR), and include a hint that explains the
> probably cause.

> The upside being that we only fail when we actually need to restore the
> error handler. Since there's one caller (parse_xml_decl) who calls
> pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
> isn't only academic. At least XML would output will continue to work
> work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

Good point.  But what about failing in pg_xml_init?  That is, after
calling xmlSetStructuredErrorFunc, check that it set the variable we
expected it to set.

The purpose of the check in pg_xml_done is not to detect library issues,
but to detect omitted save/restore pairs and similar coding mistakes.
I don't want to upgrade it to an ERROR, and I don't want to confuse
people by hinting that the problem is with libxml.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 15:52 , Tom Lane wrote:
> Florian Pflug  writes:
>> The only fix I can come up with is to explicitly test whether or not we're
>> able to restore the structured error context in pg_xml_init_library().
> 
> Yeah, I think you are right.
> 
>> The downside of this is that a libxml version upgrade might break postgres,
> 
> Such an upgrade would break Postgres anyway --- better we are able to
> detect and report it clearly than that we fail in bizarre ways.

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

If nobody objects, I'll post a patch to that effect later today.

BTW, come to think of it, I believe the elog(ERROR) that you put into
xml_errorHandler should to be upgraded to elog(FATAL). Once we longjmp()
out of libxml, it doesn't seem safe to re-enter it, so making the backend
exit seems to be the only safe thing to do.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferred partial/expression unique constraints

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 2:29 PM, Jeff Davis  wrote:
> On Fri, 2011-07-22 at 23:35 +0300, Peter Eisentraut wrote:
>> On ons, 2011-07-13 at 11:26 -0400, Tom Lane wrote:
>> > Our standard reason for not implementing UNIQUE constraints on
>> > expressions has been that then you would have a thing that claims to be
>> > a UNIQUE constraint but isn't representable in the information_schema
>> > views that are supposed to show UNIQUE constraints.  We avoid this
>> > objection in the current design by shoving all that functionality into
>> > EXCLUDE constraints, which are clearly outside the scope of the spec.
>>
>> I have never heard that reason before, and I think it's a pretty poor
>> one.  There are a lot of other things that are not representable in the
>> information schema.

+1.

> I think what Tom is saying is that the information_schema might appear
> inconsistent to someone following the spec.
>
> Can you give another example where we do something like that?

http://archives.postgresql.org/pgsql-bugs/2010-08/msg00374.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>>> As a side note, we don't add an rpath for libxml2 like we do for Perl and
>>> Python.
>
>> Sounds like we should change that.
>
> I don't think so.  It would just be another headache for packagers ---
> in Red Hat distros, at least, packages are explicitly forbidden from
> containing any rpath settings.

So what do they do about Perl and Python?

Not sure I understand the virtue of being inconsistent here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: bogus descriptions displayed by \d+

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt  wrote:
> That seems like a good way to document this; patch for master updated.
> I avoided mucking with the documentation for COMMENT ON RULE and
> COMMENT ON TRIGGER this time; they both say "table" when they really
> mean "table or view", but maybe trying to differentiate between
> "table", "table_or_view", and "relation" will make things overly
> complicated.

I think this is basically the right approach but I found what you did
here a bit wordy, so I simplified it, committed it, and back-patched
to 9.0 with suitable adjustment.  Hopefully I didn't simplify it into
a form you don't like.

>>> Also, a patch against master to:
>>>  * get rid of the bogus "Description" outputs for \d+ sequence_name
>>> and \d+ index_name
>>
>> This part looks OK, but instead of doing a negative test (not-index,
>> not-sequence) let's have it do a positive test, for the same types
>> comment.c allows.
>
> Right, fixed.

Committed this part to head with minor tweaks.

>>> And while I'm messing with this, some further nitpicks about psql not
>>> addressed by these patches:
>>>  * The "Storage" column for \d+ sequence_name is correct, I suppose,
>>> but repetitive
>>
>> I'm OK with removing that.
>
> Hrm, would it be better to keep that  Storage bit around in some
> non-repetitive form, maybe on its own line below the table output?

Well, I don't really see that it has any value.  I'd probably just
leave it the way it is, but if we're going to change something, I
would favor removing it over relocating it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> What's more, xml.c actually does attempt to protect against this situation
> by calling LIBXML_TEST_VERSION in pg_xml_init_library().

> But that check doesn't fire in our case, because it explicitly allows the
> actual library version to be newer than the header file version, as long
> as the major versions agree (the major version being "2" in this case).

> So in essence, libxml promises ABI backwards-compatibility, but breaks
> that promise when it comes to restoring the structured error context.

Fun.

> The only fix I can come up with is to explicitly test whether or not we're
> able to restore the structured error context in pg_xml_init_library().

Yeah, I think you are right.

> The downside of this is that a libxml version upgrade might break postgres,

Such an upgrade would break Postgres anyway --- better we are able to
detect and report it clearly than that we fail in bizarre ways.

>> As a side note, we don't add an rpath for libxml2 like we do for Perl and
>> Python.

> Sounds like we should change that.

I don't think so.  It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Andrew Dunstan



On 07/26/2011 09:08 AM, Robert Haas wrote:

On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
  wrote:

Yeah.  I think it's good that there's a barrier to blindly dropping a
constraint that may be important to have on children, but there should
be a way to override that.

Hmmm, but then it does open up the possibility of naive users shooting
themselves in the foot. It can be easy to conjure up a
parent-only-constraint that does not gel too well with its children. And
that's precisely why this feature was added in the first place..

Yeah, but I think we need to take that chance.  At the very least, we
need to support the equivalent of a non-inherited CHECK (false) on
parent tables.


Indeed. I usually enforce that with a trigger that raises an exception, 
but of course that doesn't help at all with constraint exclusion, and I 
saw a result just a few weeks ago (I forget the exact details) where it 
appeared that the plan chosen was significantly worse because the parent 
table wasn't excluded, so there's a  non-trivial downside from having 
this restriction.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] longstanding mingw warning

2011-07-26 Thread Andrew Dunstan


Why do we get this warning on Mingw?:

   x86_64-w64-mingw32-gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing 
-fwrapv -g -I../../../../src/include 
-I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include 
-I../pgsql/src/include/port/win32 -DEXEC_BACKEND  -I/c/prog/mingwdep/include 
"-I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include/port/win32" 
-DBUILDING_DLL  -c -o mingwcompat.o 
/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c
   
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c:60:1:
 warning: 'RegisterWaitForSingleObject' redeclared without dllimport attribute: 
previous dllimport ignored


Can we get rid of it?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
 wrote:
>> Yeah.  I think it's good that there's a barrier to blindly dropping a
>> constraint that may be important to have on children, but there should
>> be a way to override that.
>
> Hmmm, but then it does open up the possibility of naive users shooting
> themselves in the foot. It can be easy to conjure up a
> parent-only-constraint that does not gel too well with its children. And
> that's precisely why this feature was added in the first place..

Yeah, but I think we need to take that chance.  At the very least, we
need to support the equivalent of a non-inherited CHECK (false) on
parent tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith  wrote:
> On 07/25/2011 04:07 PM, Robert Haas wrote:
>>
>> I did 5-minute pgbench runs with unlogged tables and with permanent
>> tables, restarting the database server and reinitializing the tables
>> between each run.
>
> Database scale?  One or multiple pgbench worker threads?  A reminder on the
> amount of RAM in the server would be helpful for interpreting the results
> too.

Ah, sorry.  scale = 100, so small.  pgbench invocation is:

pgbench -T $time -c $clients -j $clients

128GB RAM.

> The other thing I'd recommend if you're running more write-heavy tests is to
> turn off autovacuum.  Whether or not it kicks in depends on the test
> duration and the TPS rate, which adds a source of variability better avoided
> here. It also means that faster tests end up getting penalized by having it
> run near their end, which makes them no longer look like fast results.

Well, I kind of don't want to do that, because surely you wouldn't run
that way in production.  Anyway, it seems clear that nearly all the
contention is in CLogControlLock, ProcArrayLock, and WALInsertLock,
and I don't think shutting off autovacuum is going to change that.
The point of my benchmarking here may be a bit different from the
benchmarking you normally do: I'm not trying to figure out whether
I've got the best hardware, or how good it is, so I don't really care
about squeezing out every last tps.  I just need number that can be
reproduced quickly with different patches to see whether a given code
change helps and by how much.  I wasn't sure that 5-minute pgbench
runs would be good enough for that, but so far the results have been
stable enough that I'm fairly confident I'm seeing real effects rather
than random noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 01:57 , Noah Misch wrote:
> On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:
>> On Jul25, 2011, at 20:37 , Bernd Helmle wrote:
>>> Ah, but i got now what's wrong here: configure is confusing both libxml2
>>> installations, and a quick look into config.log proves that: it uses the
>>> xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
>>> MacPorts, though i seem to recall to have changed this in the past).
> 
>> Hm, but I still think there's a bug lurking there. Using a different libxml2
>> version for the configure checks than for actual builds surely isn't good...
>> 
>> From looking at configure.in, it seems that we use xml2-config to figure out
>> the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
>> somehow end up not using these flags when we later test for
>> xmlStructuredErrorContext, but do use them during the actual build. Or maybe
>> the order of the -I and -L flags just ends up being different in the two 
>> cases.
> 
> I can reproduce similar behavior on GNU/Linux.  If my setup was sufficiently
> similar, Bernd's problematic build would have used this sequence of directives
> during both configuration and build:
> 
>  -I/usr/include/libxml2  -I/opt/local/include   -L/opt/local/lib
> 
> The directories passed using --with-includes and --with-libraries took
> priority over those from xml2-config.  Since libxml2 headers live in a
> `libxml2' subdirectory, --with-includes=/opt/local/include did not affect
> finding them.  --with-libraries=/opt/local/lib *did* affect finding the
> library binaries, though.  Therefore, he built entirely against /usr headers
> and /opt/local libraries.

Right. Thanks for detailed analysis, Noah!

What's more, xml.c actually does attempt to protect against this situation
by calling LIBXML_TEST_VERSION in pg_xml_init_library().

But that check doesn't fire in our case, because it explicitly allows the
actual library version to be newer than the header file version, as long
as the major versions agree (the major version being "2" in this case).

So in essence, libxml promises ABI backwards-compatibility, but breaks
that promise when it comes to restoring the structured error context.

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library(). I'm
envisioning we'd so something like

xmlStructuredErrorFunc *saved_errfunc = xmlStructuredError;
#if HAVE_XMLSTRUCTUREDERRORCONTEXT
void *saved_errctx = xmlStructuredErrorContext;
#else
void *saved_errctx = xmlGenericErrorContext;
#endif

xmlSetStructuredErrorFunc((void *) MAGIC, xml_errorHandler);

#ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT
if (xmlStructuredErrorContext != MAGIC)
#else 
if (xmlGenericErrorContext != MAGIC)
#endif
{
  ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("unable to restore the libxml error context"),
 errhint("Postgres was probably built against libxml < 2.7.4 but loaded 
a version >= 2.7.4 at runtime")))
}

xmlSetStructuredErrorFunc(saved_errctx, saved_errfunc);

The downside of this is that a libxml version upgrade might break postgres,
of course. But by the time postgres 9.1 is released, 2.7.4 will be nearly 3
years old, so I dunno how like it is that a distro would afterwards decide to
upgrade from a version earlier than that to a version later than that.

> We could rearrange things so the xml2-config -L
> flags (or lack thereof) take priority over a --with-libraries directory for
> the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

> As a side note, we don't add an rpath for libxml2 like we do for Perl and
> Python.  That doesn't matter on Darwin, but with GNU libc, it entails setting
> LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
> the library binary used at build time.

Sounds like we should change that.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 回复: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
thank you!
 
 
-- 原始邮件 --
发件人: "Heikki Linnakangas"; 
发送时间: 2011年7月26日(星期二) 下午3:58
收件人: "_??"; 
抄送: "pgsql-hackers"; 
主题: Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

 
On 26.07.2011 10:22, _?? wrote:
>I do not know how to use the second syntax:RETURNS TABLE '(' 
> table_func_column_list ')' createfunc_opt_list opt_definition.

This is hardly a question related to PostgreSQL development, 
pgsql-general mailing list would've been more appropriate.

This is documented at:
http://www.postgresql.org/docs/9.0/static/sql-createfunction.html

There's an example using that syntax on that page.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron
Here's another small change to the patch, it works fine for me and it quite
saved my day.

I try to submit the patch by email.


diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
index f6e2a28..1f88d72 100644
--- a/contrib/vacuumlo/vacuumlo.c
+++ b/contrib/vacuumlo/vacuumlo.c
@@ -48,6 +48,7 @@ struct _param
char   *pg_host;
int verbose;
int dry_run;
+   int transaction_limit;
 };
 
 intvacuumlo(char *, struct _param *);
@@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
}
else
deleted++;
+   if(param->transaction_limit!=0 &&
deleted>=param->transaction_limit)
+   break;
}
PQclear(res);
 
@@ -313,6 +316,7 @@ usage(const char *progname)
printf("  -h HOSTNAME  database server host or socket directory\n");
printf("  -n   don't remove large objects, just show what would 
be
done\n");
printf("  -p PORT  database server port\n");
+   printf("  -l LIMIT stop after removing LIMIT LOs\n");
printf("  -U USERNAME  user name to connect as\n");
printf("  -w   never prompt for password\n");
printf("  -W   force password prompt\n");
@@ -342,6 +346,7 @@ main(int argc, char **argv)
param.pg_port = NULL;
param.verbose = 0;
param.dry_run = 0;
+   param.transaction_limit = 0;
 
if (argc > 1)
{
@@ -359,7 +364,7 @@ main(int argc, char **argv)
 
while (1)
{
-   c = getopt(argc, argv, "h:U:p:vnwW");
+   c = getopt(argc, argv, "h:U:p:l:vnwW");
if (c == -1)
break;
 
@@ -395,6 +400,14 @@ main(int argc, char **argv)
}
param.pg_port = strdup(optarg);
break;
+   case 'l':
+   param.transaction_limit = strtol(optarg, NULL, 
10);
+   if ((param.transaction_limit < 0) || 
(param.transaction_limit >
2147483647))
+   {
+   fprintf(stderr, "%s: invalid transaction 
limit number: %s, valid
range is form 0(disabled) to 2147483647.\n", progname, optarg);
+   exit(1);
+   }
+   break;
case 'h':
param.pg_host = strdup(optarg);
break;


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Nikhil Sontakke
> > 8.4 had this change:
> >
> > *
> >   Force child tables to inherit CHECK constraints from parents
> >   (Alex Hunsaker, Nikhil Sontakke, Tom)
>
> > You're not the only one who occasionally bangs his head against it.
>
>
Sorry for the occasional head bumps :)


> Yeah.  I think it's good that there's a barrier to blindly dropping a
> constraint that may be important to have on children, but there should
> be a way to override that.
>
>
Hmmm, but then it does open up the possibility of naive users shooting
themselves in the foot. It can be easy to conjure up a
parent-only-constraint that does not gel too well with its children. And
that's precisely why this feature was added in the first place..

Regards,
Nikhils


Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread Heikki Linnakangas

On 26.07.2011 10:22, _ʯͷ wrote:

   I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


This is hardly a question related to PostgreSQL development, 
pgsql-general mailing list would've been more appropriate.


This is documented at:
http://www.postgresql.org/docs/9.0/static/sql-createfunction.html

There's an example using that syntax on that page.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
Hello Guys,


   Lately, I saw this syntax in ./src/backend/parser/Gray.y as following!
   
  CreateFunctionStmt:
CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
RETURNS func_return createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = $5;
n->returnType = $7;
n->options = $8;
n->withClause = $9;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  RETURNS TABLE '(' table_func_column_list ')' 
createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = 
mergeTableFuncParameters($5, $9);
n->returnType = TableFuncTypeName($9);
n->returnType->location = @7;
n->options = $11;
n->withClause = $12;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n->replace = $2;
n->funcname = $4;
n->parameters = $5;
n->returnType = NULL;
n->options = $6;
n->withClause = $7;
$$ = (Node *)n;
}
;


  I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


  May someone help me to write a simple example of this syntax!  Thank 
you very much. Looking forward for your help!