date:20130723

Re: [HACKERS] Performance problem in PLPgSQL

2013-07-23 Thread Amit Kapila

On Wednesday, July 24, 2013 4:23 AM Tom Lane wrote:
> I wrote:
> > Marc Cousin  writes:
> >> The example below is of course extremely simplified, and obviously
> not
> >> what we are really doing in the database, but it exhibits the
> slowdown
> >> between 9.1.9 and 9.2.4.
> 
> > Hm.  Some experimentation shows that the plan cache is failing to
> switch
> > to a generic plan, but I'm not sure why the cast would have anything
> to
> > do with that ...
> 
> Hah, I see why:
> 
> (gdb) s
> 1009if (plansource->generic_cost < avg_custom_cost * 1.1)
> (gdb) p plansource->generic_cost
> $18 = 0.012501
> (gdb) p avg_custom_cost
> $19 = 0.01
> (gdb) p avg_custom_cost * 1.1
> $20 = 0.011001
> 
> That is, the estimated cost of the custom plan is just the evaluation
> time for a simple constant, while the estimated cost of the generic
> plan
> includes a charge for evaluation of int4_numeric().  That makes the
> latter more than ten percent more expensive, and since this logic isn't
> considering anything else at all (particularly not the cost of
> planning), it thinks that makes the custom plan worth picking.
> 
> We've been around on this before, but not yet thought of a reasonable
> way to estimate planning cost, let alone compare it to estimated
> execution costs.  Up to now I hadn't thought that was a particularly
> urgent issue, but this example shows it's worth worrying about.
>
> One thing that was suggested in the previous discussion is to base the
> planning cost estimate on the length of the rangetable.  We could
> do something trivial like add "10 * (length(plan->rangetable) + 1)"
> in this comparison.
> 
> Another thing that would fix this particular issue, though perhaps not
> related ones, is to start charging something nonzero for ModifyTable
> nodes, say along the lines of one seq_page_cost per inserted/modified
> row.  That would knock the total estimated cost for an INSERT up enough
> that the ten percent threshold wouldn't be exceeded.

Shouldn't it consider new value of boundparam to decide whether a new custom
plan is needed,
as that can be one of the main reason why it would need different plan.

Current behavior is either it will choose generic plan or build a new custom
plan with new parameters based on 
Choose_custom_plan().

Shouldn't the behavior of this be as below:
a. choose generic plan
b. choose one of existing custom plan
c. create new custom plan 

The choice can be made based on the new value of bound parameter.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Amit Langote

On Wed, Jul 24, 2013 at 3:20 AM, Jeff Janes  wrote:
> On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote  wrote:
>> Hello,
>>
>> While understanding the effect of maintenance_work_mem on time taken
>> by CREATE INDEX, I observed that for the values of
>> maintenance_work_mem less than the value for which an internal sort is
>> performed, the time taken by CREATE INDEX increases as
>> maintenance_work_increases.
>>
>> My guess is that for all those values an external sort is chosen at
>> some point and larger the value of maintenance_work_mem, later the
>> switch to external sort would be made causing CREATE INDEX to take
>> longer. That is a smaller value of maintenance_work_mem would be
>> preferred for when external sort is performed anyway. Does that make
>> sense?
>
> The heap structure used in external sorts is cache-unfriendly.  The
> bigger the heap used, the more this unfriendliness becomes apparent.
> And the bigger maintenance_work_mem, the bigger the heap used.
>
> The bigger heap also means you have fewer "runs" to merge in the
> external sort.  However, as long as the number of runs still fits in
> the same number of merge passes, this is generally not a meaningful
> difference.

Does fewer runs mean more time (in whichever phase of external sort)?


-- 
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 5:06 AM Josh Berkus wrote:
> All,
> 
> Christophe just discovered something with include files which is going
> to cause issues with ALTER SYSTEM SET.
> 
> So, take as a hypothetical that you use the default postgresql.conf
> file, which sets shared_buffers = 32MB.
> 
> Instead of editing this file, you do ALTER SYSTEM SET shared_buffers =
> '1GB', which updates config/postgresql.auto.conf.
> 
> Then you restart PostgreSQL.
> 
> Everything is hunky-dory, until a later occasion where you *reload*
> postgresql.  Then postgres startup hits the first "shared_buffers=32MB"
> (in postgresql.conf), says "I can't change shared buffers on a reload",
> and aborts before ever getting to postgresql.auto.conf.

It doesn't abort after showing initial message "parameter %s cannot be changed 
without restart", rather it processes all remaining parameters.
We can even test it by setting 1 postmaster and 1 sighup variable, after 
reload, even though it log message for postmaster variable, but it will set
the sighup variable.

So in this the real problem is that there is some message in log which can 
create confusion, otherwise the behavior will be exactly what user wants as per 
Postgres specs.

To avoid the message, following can be done:
1. We need to make sure that in function ProcessConfigFile before calling 
set_config_option(), list of parameters is unique.
   a. it can be done while storing the elements in list as suggested by me in 
my yesterday's mail.

http://www.postgresql.org/message-id/004c01ce8761$b8a4ab20$29ee0160$@kap...@huawei.com
   b. after parsing all the config files, make sure the list contain one entry 
for any variable 
  (giving priority to elements that come later)
2. As this is just a confusing message issue, we can even update the docs to 
explain this behavior.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-23 Thread Andrew Gierth

Noah Misch said:
> Other aggregates based on this syntax might not desire such type unification.

Then there would have to be some way to distinguish that. Maybe those could
have -1 and the standard hypothetical set functions -2, with some flag in
CREATE AGGREGATE to sort it out.

> Having parse analysis do that distorts the character of an "any" argument.  I
> think the proper place for such processing is the first call to a transition
> function.

Except there isn't one.

> But let's not make the
> parser presume that an aggordnargs=-1 aggregate always wants its "any"
> arguments handled in the manner of the standard hypothetical set functions.

This has to happen in the parser because these are errors that should be
caught before execution:

rank(foo) within group (order by bar,baz)
rank(integercol) within group (order by textcol)

And collations have to be resolved (pairwise) before sorting can happen:

rank(textval COLLATE "C") within group (order by foo)  -- sorts in "C"
rank(textval COLLATE "C") within group (order by bar COLLATE "en_US")  -- error

(basically, in rank(x) within group (order by y) where x and y are
collatable, the collation rules apply exactly as though you were doing
(x < y), with all the implicit vs. explicit stuff included)

And this:

rank(1.1) within group (order by intcol)

should become  rank(1.1) within group (order by intcol::numeric)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shorter iterations of join_info_list

2013-07-23 Thread Tom Lane

[ sorry for slow response, this month has been mostly crazy ]

Antonin Houska  writes:
> As far as I understand, deconstruct_recurse() ensures that 
> SpecialJoinInfo of a new join always gets added to higher position in 
> join_info_list than SJ infos of all joins located below the new join in 
> the tree. I wonder if we can rely on that fact sometimes.

FWIW, I think of most of those planner lists as being unordered sets.
Depending on a particular ordering definitely adds fragility; so I'd
not want to introduce such a dependency without solid evidence of a
substantial speed improvement.  The cases you mention don't seem very
likely to offer any noticeable gain at all --- at least, I don't recall
seeing any of those functions show up high in profiles.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Suggestion for concurrent index creation using a single full scan operation

2013-07-23 Thread Noah Misch

On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote:
> I haven't given this a lot of thought, but it struck me that when
> rebuilding tables (be it for a restore process, or some other operational
> activity) - there is more often than not a need to build an index or two,
> sometimes many indexes, against the same relation.
> 
> It strikes me that in order to build just one index, we probably need to
> perform a full table scan (in a lot of cases).   If we are building
> multiple indexes sequentially against that same table, then we're probably
> performing multiple sequential scans in succession, once for each index.

Check.

> Could we architect a mechanism that allowed multiple index creation
> statements to execute concurrently, with all of their inputs fed directly
> from a single sequential scan against the full relation?
> 
> From a language construct point of view, this may not be trivial to
> implement for raw/interactive SQL - but possibly this is a candidate for
> the custom format restore?

As Greg Stark mentioned, pg_restore can already issue index build commands in
parallel.  Where applicable, that's probably superior to having one backend
build multiple indexes during a single heap scan.  Index builds are
CPU-intensive, and the pg_restore approach takes advantage of additional CPU
cores in addition to possibly saving I/O.

However, the pg_restore method is not applicable if you want CREATE INDEX
CONCURRENTLY, and it's not applicable for implicit index building such as
happens for ALTER TABLE rewrites and for VACUUM FULL.  Backend-managed
concurrent index builds could shine there.

> I presume this would substantially increase the memory overhead required to
> build those indexes, though the performance gains may be advantageous.

The multi-index-build should respect maintenance_work_mem overall.  Avoiding
cases where that makes concurrent builds slower than sequential builds is a
key challenge for such a project:

- If the index builds each fit in maintenance_work_mem when run sequentially
  and some spill to disk when run concurrently, expect concurrency to lose.
- If the heap is small enough to stay in cache from one index build to the
  next, performing the builds concurrently is probably a wash or a loss.
- Concurrency should help when a wide-row table large enough to exhaust OS
  cache has narrow indexes that all fit in maintenance_work_mem.  I don't know
  whether concurrency would help for a huge-table scenario where the indexes
  do overspill maintenance_work_mem.  You would have N indexes worth of
  external merge files competing for disk bandwidth; that could cancel out
  heap I/O savings.

Overall, it's easy to end up with a loss.  We could punt by having an
index_build_concurrency GUC, much like pg_restore relies on the user to
discover a good "-j" value.  But if finding cases where concurrency helps is
too hard, leaving the GUC at one would become the standard advice.

> Apologies in advance if this is not the correct forum for suggestions..

It's the right forum.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Amit Langote

On Wed, Jul 24, 2013 at 11:30 AM, Amit Langote  wrote:
> On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes  wrote:

>> If you are using trace_sort to report that, it reports the switch as
>> happening as soon as it runs out of memory.
>>
>> At point, all we have been doing is reading tuples into memory.  The
>> time it takes to do that will depend on maintenance_work_mem, because
>> that affects how many tuples fit in memory.  But all the rest of the
>> tuples need to be read sooner or later anyway, so pushing more of them
>> to later doesn't improve things overall it just shifts timing around.
>>
>> After it reports the switch, it still needs to heapify the existing
>> in-memory tuples before the tapesort proper can begin.  This is where
>> the true lost opportunities start to arise, as the large heap starts
>> driving cache misses which would not happen at all under different
>> settings.
>>
>> Once the existing tuples are heapified, it then continues to use the
>> heap to pop tuples from it to write out to "tape", and to push newly
>> read tuples onto it.  This also suffers lost opportunities.
>>
>> Once all the tuples are written out and it starts merging, then the
>> large maintenance_work_mem is no longer a penalty as the new heap is
>> limited by the number of tapes, which is almost always much smaller.
>> In fact this stage will actually be faster, but not by enough to make
>> up for the earlier slow down.
>>
>> So it is not surprising that the time before the switch is reported is
>> a small part of the overall time difference.
>>
>
> So, is it the actual sorting (before merging) that suffers with larger
> maintenance_work_mem? I am sorry but I can not grasp the complexity of
> external sort code at this point, so all I can say is that during an
> external sort a smaller value of maintenance_work_mem is beneficial
> (based on my observations in tests). But how that follows from what is
> going on in the implementation of external sort is still something I
> am working on understanding.
>

Or does the increased create index time follow from something else
altogether (not any part of external sort) may be still another
question. Since we have to relate  that to maintenance_work_mem, the
first thing I could think of was to look at sorting part of it.



-- 
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Amit Langote

On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes  wrote:
> On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote  wrote:
>> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote  
>> wrote:
>>> Hello,
>>>
>>> While understanding the effect of maintenance_work_mem on time taken
>>> by CREATE INDEX, I observed that for the values of
>>> maintenance_work_mem less than the value for which an internal sort is
>>> performed, the time taken by CREATE INDEX increases as
>>> maintenance_work_increases.
>>>
>>> My guess is that for all those values an external sort is chosen at
>>> some point and larger the value of maintenance_work_mem, later the
>>> switch to external sort would be made causing CREATE INDEX to take
>>> longer. That is a smaller value of maintenance_work_mem would be
>>> preferred for when external sort is performed anyway. Does that make
>>> sense?
>>>
>>
>> Upon further investigation, it is found that the delay to switch to
>> external sort caused by a larger value of maintenance_work_mem is
>> small compared to the total time of CREATE INDEX.
>
> If you are using trace_sort to report that, it reports the switch as
> happening as soon as it runs out of memory.
>
> At point, all we have been doing is reading tuples into memory.  The
> time it takes to do that will depend on maintenance_work_mem, because
> that affects how many tuples fit in memory.  But all the rest of the
> tuples need to be read sooner or later anyway, so pushing more of them
> to later doesn't improve things overall it just shifts timing around.
>
> After it reports the switch, it still needs to heapify the existing
> in-memory tuples before the tapesort proper can begin.  This is where
> the true lost opportunities start to arise, as the large heap starts
> driving cache misses which would not happen at all under different
> settings.
>
> Once the existing tuples are heapified, it then continues to use the
> heap to pop tuples from it to write out to "tape", and to push newly
> read tuples onto it.  This also suffers lost opportunities.
>
> Once all the tuples are written out and it starts merging, then the
> large maintenance_work_mem is no longer a penalty as the new heap is
> limited by the number of tapes, which is almost always much smaller.
> In fact this stage will actually be faster, but not by enough to make
> up for the earlier slow down.
>
> So it is not surprising that the time before the switch is reported is
> a small part of the overall time difference.
>

So, is it the actual sorting (before merging) that suffers with larger
maintenance_work_mem? I am sorry but I can not grasp the complexity of
external sort code at this point, so all I can say is that during an
external sort a smaller value of maintenance_work_mem is beneficial
(based on my observations in tests). But how that follows from what is
going on in the implementation of external sort is still something I
am working on understanding.


-- 
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-07-23 Thread Noah Misch

On Tue, Jul 23, 2013 at 01:21:52AM +, Andrew Gierth wrote:
> For hypothetical set functions we add a special case, aggordnargs=-1,
> for which both the aggregate and the finalfn must be defined as
> (variadic "any") and parse analysis detects this case and unifies the
> types of the normal args with those of the ORDER BY args.

Other aggregates based on this syntax might not desire such type unification.
Having parse analysis do that distorts the character of an "any" argument.  I
think the proper place for such processing is the first call to a transition
function.  The transition functions could certainly call a new API exposed
under src/backend/parser to do the heavy lifting.  But let's not make the
parser presume that an aggordnargs=-1 aggregate always wants its "any"
arguments handled in the manner of the standard hypothetical set functions.

The rest of the plan looks good so far.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preventing tuple-table leakage in plpgsql

2013-07-23 Thread Noah Misch

On Mon, Jul 22, 2013 at 10:02:30PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote:
> >> Hmm ... good point.  The other plan I'd been considering was to add
> >> explicit tracking inside spi.c of all tuple tables created within the
> >> current procedure, and then have AtEOSubXact_SPI flush any that were
> >> created inside a failed subxact.
> 
> > Is there reason to believe we wouldn't eventually find a half dozen other
> > allocations calling for similar bespoke treatment?  Does something make 
> > tuple
> > tables special among memory allocations, or are they just the garden-variety
> > allocation that happens to bother the test case at hand?
> 
> It's hard to speculate about the memory management habits of third-party
> SPI-using code.  But in plpgsql, the convention is that random bits of
> memory should be allocated in a short-term context separate from the SPI
> procCxt --- typically, the estate->eval_econtext expression context,
> which exec_stmt_block already takes care to clean up when catching an
> exception.  So the problem is that that doesn't work for tuple tables,
> which have procCxt lifespan.  The fact that they tend to be big (at
> least 8K apiece) compounds the issue.

Reasonable to treat them specially, per your plan, then.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Tom Lane

Stephen Frost  writes:
> * Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
>> If you have legitimate technical concerns then let's hear them, now.

> Fine- I'd have been as happy leaving this be and letting Greg commit it,
> but if you'd really like to hear my concerns, I'd start with pointing
> out that it's pretty horrid to have to copy every record around like
> this and that the hack of CreateTupleDescCopyExtend is pretty terrible
> and likely to catch people by surprise.  Having FunctionNext() operate
> very differently depending on WITH ORDINALITY is ugly and the cause of
> the bug that was found.  All-in-all, I'm not convinced that this is
> really a good approach and feel it'd be better off implemented at a
> different level, eg a node type instead of a hack on top of the existing
> SRF code.

I took the time to read through this patch, finally.  i think the $64
question is pretty much what Stephen says above: do we like tying this
behavior to FunctionScan, as opposed to having it be some kind of
expression node?  You could certainly imagine a WithOrdinality
expression node that takes in values of a set-returning expression,
and returns them with an extra column tacked on.  This would resolve
the problem that was mentioned awhile back that the current approach
can't support SRFs in targetlists.

If it weren't that we've been speculating for years about deprecating
SRFs-in-tlists once we had LATERAL, I would personally consider this
patch DOA in this form.  If we do think we'll probably deprecate that
feature, then not extending WITH ORDINALITY to such cases is at least
defensible.  On the other hand, considering that we've yet to ship a
release supporting LATERAL, it's probably premature to commit to such
deprecation --- we don't really know whether people will find LATERAL
to be a convenient and performant substitute.

As far as performance goes, despite Stephen's gripe above, I think this
way is likely better than any alternative.  The reason is that once
we've disassembled the function result tuple and tacked on the counter,
we have a reasonable shot at things staying like that (in the form of
a virtual TupleTableSlot).  The next higher level of evaluation can
probably use the column Datums as-is.  A WithOrdinality expression node
would have to disassemble the input tuple and then make a new tuple,
which the next higher expression level would likely pull apart again :-(.
Also, any other approach would lead to needing to store the ordinality
values inside the FunctionScan's tuplestore, bloating that storage with
rather-low-value data.

The other big technical issue I see is representation of the rowtype of
the result.  If we did it with a WithOrdinality expression node, the
result would always be of type RECORD, and we'd have to use blessed tuple
descriptors to keep track of exactly which record type a particular
instance emits.  This is certainly do-able (see RowExpr for precedent).
Attaching the functionality to FunctionScan reduces the need for that
because the system mostly avoids trying to associate any type OID with
the rowtype of a FROM item.  Instead though, we've got a lot of ad-hoc
code that deals with RangeTblEntry type information.  A big part of the
patch is dealing with extending that code, and frankly I've got about
zero confidence that the patch has found everything that needs to be
found in that line.  A patch using an expression node would probably
need to touch only a much more localized set of places to handle the
type description issue.

Anyway, on balance I'm satisfied with this overall approach, though it's
not without room for debate.

I am fairly dissatisfied with the patch at a stylistic level, though.
It seems way too short on comments.  People griped about the code in
nodeFunctionscan in particular, but I think all the changes in ad-hoc
type-management code elsewhere are even more deserving of comments,
and they mostly didn't get that.  Likewise, there's no documentation
anywhere that I can see of the new interrelationships of struct fields,
such as that if a RangeTblEntry has funcordinality set, then its various
column-related fields such as eref->colnames include a trailing INT8
column for the ordinality.  Also, maybe I'm misreading the patch (have
I mentioned lately that large patches in -u format are practically
illegible?), but it sure looks like it flat out removed several existing
regression-test cases and a few existing comments as well.  How is that
considered acceptable?

FWIW, I concur with the gripe I remember seeing upthread that the
default name of the added column ought not be "?column?".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-23 Thread Quan Zongliang


On 07/23/2013 09:42 PM, Craig Ringer wrote:

(Replying on phone, please forgive bad quoting)

Isn't this pretty much what adopting ICU is supposed to give us? OS-independent 
collations?


Yes, we need OS-independent collations.


I'd be interested in seeing the rest data for this performance report, partly 
as I'd like to see how ICU collations would compare when ICU is crudely hacked 
into place for testing.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Andrew Gierth

Stephen Frost said:
> [stuff about foreign tables]

I think the issue with foreign tables is probably moot because if you
_did_ want to have some types of foreign tables with a fixed
ordinality, you'd probably also want qual pushdown to work for it
(i.e. so that WHERE rownum=123 doesn't have to filter all the rows),
whereas with SRFs this doesn't really apply.

For this to work, foreign tables with a fixed ordering would have to
provide that in the FDW - which is in any case the only place that
knows whether a fixed order would even make any sense.

So I see no overlap here with the SRF ordinality case.

As for VALUES, the spec regards those as constructing a table and
therefore not having any inherent order - the user can add their own
ordinal column if need be. Even if you did want to add WITH ORDINALITY
for VALUES, though, it would actually make more sense to do it in the
Values Scan node since that maintains its own ordinal position already.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Stephen Frost

On Tuesday, July 23, 2013, David Fetter wrote:
>
> Are you saying that there's stuff that if I don't put it in now will
> impede our ability to add this to FTs later?
>

I'm saying that it'd be a completely different implementation and that this
one would get in the way and essentially have to be ripped out.

No one is saying that this patch wouldn't work for the specific use-case
that it set out to meet, and maybe it's unfair for us to consider possible
use-cases beyond the patch's goal and the spec requirement, but that, IMO,
is also one of the things that makes PG great. MVCC isn't necessary and
isn't required by spec either.

Thanks,

Stephen

[HACKERS] Failure to use generic plans (was: Re: Performance problem in PLPgSQL)

2013-07-23 Thread Andrew Gierth

As failures to use a generic plan goes, that one's fairly tame. I've
seen much worse. For example:

PREPARE foo(integer[]) AS SELECT * FROM complexview WHERE id = ANY ($1);

where the caller typically supplies 1-5 array elements (or any number
less than 10, because generic parameter arrays are assumed to have 10
elements). This one can be a massive performance regression between
9.1 and 9.2; the first guy who mentioned this on IRC was getting a 40x
slowdown (~20ms planning time vs. 0.5ms execution time).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance problem in PLPgSQL

2013-07-23 Thread Tom Lane

I wrote:
> Marc Cousin  writes:
>> The example below is of course extremely simplified, and obviously not
>> what we are really doing in the database, but it exhibits the slowdown
>> between 9.1.9 and 9.2.4.

> Hm.  Some experimentation shows that the plan cache is failing to switch
> to a generic plan, but I'm not sure why the cast would have anything to
> do with that ...

Hah, I see why:

(gdb) s
1009if (plansource->generic_cost < avg_custom_cost * 1.1)
(gdb) p plansource->generic_cost
$18 = 0.012501
(gdb) p avg_custom_cost
$19 = 0.01
(gdb) p avg_custom_cost * 1.1
$20 = 0.011001

That is, the estimated cost of the custom plan is just the evaluation
time for a simple constant, while the estimated cost of the generic plan
includes a charge for evaluation of int4_numeric().  That makes the
latter more than ten percent more expensive, and since this logic isn't
considering anything else at all (particularly not the cost of
planning), it thinks that makes the custom plan worth picking.

We've been around on this before, but not yet thought of a reasonable
way to estimate planning cost, let alone compare it to estimated
execution costs.  Up to now I hadn't thought that was a particularly
urgent issue, but this example shows it's worth worrying about.

One thing that was suggested in the previous discussion is to base the
planning cost estimate on the length of the rangetable.  We could
do something trivial like add "10 * (length(plan->rangetable) + 1)"
in this comparison.

Another thing that would fix this particular issue, though perhaps not
related ones, is to start charging something nonzero for ModifyTable
nodes, say along the lines of one seq_page_cost per inserted/modified
row.  That would knock the total estimated cost for an INSERT up enough
that the ten percent threshold wouldn't be exceeded.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-23 Thread Josh Berkus

On 07/23/2013 03:34 PM, Greg Smith wrote:

> I happen to think the review structure is one of the most important
> components to PostgreSQL release quality.  It used to be a single level
> review with just the committers, now it's a two level structure.  The
> reason the Postgres code is so good isn't that the submitted development
> is inherently any better than other projects.  There's plenty of bogus
> material submitted here.  It's the high standards for review and commit
> that are the key filter.  The importance of the process to the result
> isn't weighed as heavily as I think it should be.

I think we're in violent agreement here.  Now, we just need to convince
everyone else on this list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-23 Thread Greg Smith


On 7/23/13 2:30 PM, Josh Berkus wrote:


You know as well as me that, as consultants, we can get clients to pay for 10% 
extra time
for review in the course of developing a feature


Before this number gets quoted anywhere, I think it's on the low side. 
I've spent a good bit of time measuring how much time it takes to do a 
fair offsetting review--one where you put as much time in as it takes to 
review your submission--and I keep getting numbers more in the 20 to 25% 
range.  The work involved to do a high quality review takes a while.


I happen to think the review structure is one of the most important 
components to PostgreSQL release quality.  It used to be a single level 
review with just the committers, now it's a two level structure.  The 
reason the Postgres code is so good isn't that the submitted development 
is inherently any better than other projects.  There's plenty of bogus 
material submitted here.  It's the high standards for review and commit 
that are the key filter.  The importance of the process to the result 
isn't weighed as heavily as I think it should be.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread David Fetter

On Tue, Jul 23, 2013 at 06:09:20PM -0400, Stephen Frost wrote:
> David
> 
> On Tuesday, July 23, 2013, David Fetter wrote:
> >
> > There are a lot of ways foreign tables don't yet act like local
> > ones.  Much as I'm a booster for fixing that problem, I'm thinking
> > improvements in this direction are for a separate patch.
> >
> 
> This isn't about making FDWs more like local tables- indeed, quite
> the opposite. The question is if we should declare that WITH
> ORDINALITY will only ever be for SRFs or if we should consider that
> it might be useful with FDWs specifically because they are not
> unordered sets as tables are.  Claiming that question is unrelated
> to the implementation of WITH ORDINALITY is rather... Bizarre.

Are you saying that there's stuff that if I don't put it in now will
impede our ability to add this to FTs later?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Stephen Frost

David

On Tuesday, July 23, 2013, David Fetter wrote:
>
> There are a lot of ways foreign tables don't yet act like local ones.
> Much as I'm a booster for fixing that problem, I'm thinking
> improvements in this direction are for a separate patch.
>

This isn't about making FDWs more like local tables- indeed, quite the
opposite. The question is if we should declare that WITH ORDINALITY will
only ever be for SRFs or if we should consider that it might be useful with
FDWs specifically because they are not unordered sets as tables are.
Claiming that question is unrelated to the implementation of WITH
ORDINALITY is rather... Bizarre.

Thanks,

Stephen

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread David Fetter

On Tue, Jul 23, 2013 at 05:23:17PM -0400, Stephen Frost wrote:
> * Greg Stark (st...@mit.edu) wrote:
> > So given that WITH ORDINALITY is really only useful for UNNEST and we
> > can generalize it to all SRFs on the basis that Postgres SRFs do
> > produce ordered sets not unordered relations it isn't crazy for the
> > work to be in the Function node.
> 
> I agree, it isn't *crazy*. :)
> 
> > Now that I've written that though it occurs to me to wonder whether
> > FDW tables might be usefully said to be ordered too though?
> 
> My thought around this was a VALUES() construct, but FDWs are an
> interesting case to consider also; with FDWs it's possible that
> something is said in SQL/MED regarding this question- perhaps it would
> make sense to look there?

There are a lot of ways foreign tables don't yet act like local ones.
Much as I'm a booster for fixing that problem, I'm thinking
improvements in this direction are for a separate patch.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-23 Thread Josh Berkus


> Everything was already *not* hunky-dory as soon as you did that, since
> a SIGHUP would have had the same problem.
> 
> I'd be inclined to think that ALTER SYSTEM SET should not be allowed to
> modify any PGC_POSTMASTER parameters.

That makes alter system set a bunch less useful, but might be
insurmountable.

Anyway, I also regard this as a problem we need to resolve now that we
have the config directory if we expect people to build autotuning tools.
 Once someone is relying on an autotuning tool which drops a file in
config/, it becomes a real problem that there are uncommented
PGC_POSTMASTER options in the default postgresql.conf.

I'm not sure what the solution to that problem should be, but I do know
that we're going to hear it a lot from users as includes and the config
directory get more common.  Certainly any solution which says "first,
manually edit postgresql.conf" makes the config directory into a kind of
joke.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Stephen Frost

* Greg Stark (st...@mit.edu) wrote:
> So given that WITH ORDINALITY is really only useful for UNNEST and we
> can generalize it to all SRFs on the basis that Postgres SRFs do
> produce ordered sets not unordered relations it isn't crazy for the
> work to be in the Function node.

I agree, it isn't *crazy*. :)

> Now that I've written that though it occurs to me to wonder whether
> FDW tables might be usefully said to be ordered too though?

My thought around this was a VALUES() construct, but FDWs are an
interesting case to consider also; with FDWs it's possible that
something is said in SQL/MED regarding this question- perhaps it would
make sense to look there?

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Greg Stark

On Tue, Jul 23, 2013 at 9:27 PM, Stephen Frost  wrote:
>
> Fine- I'd have been as happy leaving this be and letting Greg commit it,
> but if you'd really like to hear my concerns, I'd start with pointing
> out that it's pretty horrid to have to copy every record around like
> this and that the hack of CreateTupleDescCopyExtend is pretty terrible
> and likely to catch people by surprise.  Having FunctionNext() operate
> very differently depending on WITH ORDINALITY is ugly and the cause of
> the bug that was found.  All-in-all, I'm not convinced that this is
> really a good approach and feel it'd be better off implemented at a
> different level, eg a node type instead of a hack on top of the existing
> SRF code.

Fwiw I've been mulling over the same questions and came to the
conclusion that the existing approach makes the most sense.

In an ideal world an extra executor node would be the prettiest,
cleanest implementation. But the amount of extra code and busywork
that would necessitate just isn't justified for the amount of work it
would be doing.

It might be different if WITH ORDINALITY made sense for any other
types of target tables. But it really only makes sense for SRFs. The
whole motivation for having them in the spec is that UNNEST is taking
an ordered list and turning it into a relation which is unordered. You
can't just do row_number() because there's nothing to make it ordered
by.

By the same token for any other data source you would just use
row_number *precisely because* any other data source is unordered and
you should have to specify an order in order to make row_number
produce something meaningful.

So given that WITH ORDINALITY is really only useful for UNNEST and we
can generalize it to all SRFs on the basis that Postgres SRFs do
produce ordered sets not unordered relations it isn't crazy for the
work to be in the Function node.

Now that I've written that though it occurs to me to wonder whether
FDW tables might be usefully said to be ordered too though?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Jeff Janes

On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote  wrote:
> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote  wrote:
>> Hello,
>>
>> While understanding the effect of maintenance_work_mem on time taken
>> by CREATE INDEX, I observed that for the values of
>> maintenance_work_mem less than the value for which an internal sort is
>> performed, the time taken by CREATE INDEX increases as
>> maintenance_work_increases.
>>
>> My guess is that for all those values an external sort is chosen at
>> some point and larger the value of maintenance_work_mem, later the
>> switch to external sort would be made causing CREATE INDEX to take
>> longer. That is a smaller value of maintenance_work_mem would be
>> preferred for when external sort is performed anyway. Does that make
>> sense?
>>
>
> Upon further investigation, it is found that the delay to switch to
> external sort caused by a larger value of maintenance_work_mem is
> small compared to the total time of CREATE INDEX.

If you are using trace_sort to report that, it reports the switch as
happening as soon as it runs out of memory.

At point, all we have been doing is reading tuples into memory.  The
time it takes to do that will depend on maintenance_work_mem, because
that affects how many tuples fit in memory.  But all the rest of the
tuples need to be read sooner or later anyway, so pushing more of them
to later doesn't improve things overall it just shifts timing around.

After it reports the switch, it still needs to heapify the existing
in-memory tuples before the tapesort proper can begin.  This is where
the true lost opportunities start to arise, as the large heap starts
driving cache misses which would not happen at all under different
settings.

Once the existing tuples are heapified, it then continues to use the
heap to pop tuples from it to write out to "tape", and to push newly
read tuples onto it.  This also suffers lost opportunities.

Once all the tuples are written out and it starts merging, then the
large maintenance_work_mem is no longer a penalty as the new heap is
limited by the number of tapes, which is almost always much smaller.
In fact this stage will actually be faster, but not by enough to make
up for the earlier slow down.

So it is not surprising that the time before the switch is reported is
a small part of the overall time difference.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Stephen Frost

Andrew,

* Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> Right, and we all know that all code is perfect when committed. sheesh.

That clearly wasn't what was claimed.

> (This is actually the first time in six months that I've had occasion
> to look at that part of the code; that's how long it's been sitting in
> the queue.

While such issues are frustrating for all of us, huffing about it here
isn't particularly useful.

> And yes, it was one of my bits, not David's.  Maybe I
> should have left the bug in to see how long it took you to spot it?)

That attitude is certainly discouraging.

> What I'm very notably not seeing from you is any substantive feedback.
> You've repeatedly described this patch as broken on the basis of
> nothing more than garbled hearsay while loudly proclaiming not to have
> actually looked at it; I find this both incomprehensible and insulting.

As Greg is the one looking to possibly commit this, I certainly didn't
consider his comments on the patch to be garbled hearsay.  It would have
been great if he had been more clear in his original comments, but I
don't feel that you can fault any of us for reading his email and
voicing what concerns we had from his review.  While you might wish that
we all read every patch submitted, none of us has time for that- simply
keeping up with this mailing list requires a significant amount of time.

> If you have legitimate technical concerns then let's hear them, now.

Fine- I'd have been as happy leaving this be and letting Greg commit it,
but if you'd really like to hear my concerns, I'd start with pointing
out that it's pretty horrid to have to copy every record around like
this and that the hack of CreateTupleDescCopyExtend is pretty terrible
and likely to catch people by surprise.  Having FunctionNext() operate
very differently depending on WITH ORDINALITY is ugly and the cause of
the bug that was found.  All-in-all, I'm not convinced that this is
really a good approach and feel it'd be better off implemented at a
different level, eg a node type instead of a hack on top of the existing
SRF code.

Now, what would be great to see would be your response to Dean's
comments and suggestions rather than berating someone who's barely
written 5 sentences on this whole thread.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Andrew Gierth

Tom Lane said:
> If you yourself are still finding bugs in the code as of this afternoon,
> onlookers could be forgiven for doubting that the code is quite as
> readable or ready to commit as all that.

Right, and we all know that all code is perfect when committed. sheesh.

(This is actually the first time in six months that I've had occasion
to look at that part of the code; that's how long it's been sitting in
the queue.  And yes, it was one of my bits, not David's.  Maybe I
should have left the bug in to see how long it took you to spot it?)

What I'm very notably not seeing from you is any substantive feedback.
You've repeatedly described this patch as broken on the basis of
nothing more than garbled hearsay while loudly proclaiming not to have
actually looked at it; I find this both incomprehensible and insulting.
If you have legitimate technical concerns then let's hear them, now.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Adding Zigzag Merge Join to Index Nested Loops Join

2013-07-23 Thread tubadzin

Hi.
I want add Zigzag Merge join to Index Nested Loops Join alghoritm.
http://www.cs.berkeley.edu/~fox/summaries/database/query_eval_5-8.html 
Which files are responsible for Index nested loops join ? (not simple nested 
loops join which has double foreach in nodeNestloop.c) nodeIndexscan.c? 
nodeIndexonlyscan.c?
Best Regards 
Tom

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Tom Lane

Robert Haas  writes:
> OK, so GetActiveSnapshot()?

Works for me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Robert Haas

On Tue, Jul 23, 2013 at 2:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane  wrote:
>>> As far as get_actual_variable_range() is concerned, an MVCC snapshot
>>> would probably be the thing to use anyway;
>
>> That surprises me, though.  I really thought the point was to cost the
>> index scan, and surely that will be slowed down even by entries we
>> can't see.
>
> No, the usage (or the main usage anyway) is for selectivity estimation,
> ie how many rows will the query fetch.

OK, so GetActiveSnapshot()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-23 Thread Josh Berkus

Greg,

> It's more than the available experienced reviewers are willing to chew
> on fully as volunteers.  The reward for spending review time is pretty
> low right now.

Short of paying for review time, I don't think we have another solution
for getting the "big patches" reviewed, except to rely on the major
contributors who are paid full-time to hack Postgres.   You know as well
as me that, as consultants, we can get clients to pay for 10% extra time
for review in the course of developing a feature, but the kind of time
which patches like Row Security, Changesets, or other "big patches" need
nobody is going to pay for on a contract basis.  And nobody who is doing
this in their "spare time" has that kind of block.

So I don't think there's any good solution for the "big patches".

I do think our project could do much more to recruit reviewers for the
small-medium patches, to take workload off the core contributors in
general.  Historically, however, this project (and the contributors on
this list) has made material decisions not to encourage or recruit new
people as reviewers, and has repeatedly stated that reviewers are not
important.  Until that changes, we are not going to get more reviewers
(and I'm not going to have much sympathy for existing contributors who
say they have no time for review).

If we want more reviewers and people spending more time on review, then
we need to give reviewers the *same* respect and the *same* rewards that
feature contributors get.  Not something else, the exact same.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane  wrote:
>> As far as get_actual_variable_range() is concerned, an MVCC snapshot
>> would probably be the thing to use anyway;

> That surprises me, though.  I really thought the point was to cost the
> index scan, and surely that will be slowed down even by entries we
> can't see.

No, the usage (or the main usage anyway) is for selectivity estimation,
ie how many rows will the query fetch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Jeff Janes

On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote  wrote:
> Hello,
>
> While understanding the effect of maintenance_work_mem on time taken
> by CREATE INDEX, I observed that for the values of
> maintenance_work_mem less than the value for which an internal sort is
> performed, the time taken by CREATE INDEX increases as
> maintenance_work_increases.
>
> My guess is that for all those values an external sort is chosen at
> some point and larger the value of maintenance_work_mem, later the
> switch to external sort would be made causing CREATE INDEX to take
> longer. That is a smaller value of maintenance_work_mem would be
> preferred for when external sort is performed anyway. Does that make
> sense?

The heap structure used in external sorts is cache-unfriendly.  The
bigger the heap used, the more this unfriendliness becomes apparent.
And the bigger maintenance_work_mem, the bigger the heap used.

The bigger heap also means you have fewer "runs" to merge in the
external sort.  However, as long as the number of runs still fits in
the same number of merge passes, this is generally not a meaningful
difference.

Ideally the planner (or something) would figure out how much memory
would be needed to complete an external sort in just one external
pass, and then lower the effective maintenance_work_mem to that
amount.  But that would be hard to do with complete precision, and the
consequences of getting it wrong are asymmetric.

(More thoroughly, it would figure out the number of passes needed for
the given maintenance_work_mem, and then lower the effective
maintenance_work_mem to the smallest value that gives the same number
of passes. But for nearly all practical situations, I think the number
of passes for an index build is going to be 0 or 1.)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new "row-level lock" error messages

2013-07-23 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Peter Eisentraut wrote:
> 
> > I would suggest that these changes be undone, except that the old
> > "SELECT FOR ..." be replaced by a dynamic string that reverse-parses the
> > LockingClause to provide the actual clause that was used.
> 
> Here's a patch for this.

Pushed to 9.3 and master.  Sample output:

alvherre=# select * from foo, bar for update of foof for share of bar;
ERROR:  relation "foof" in FOR UPDATE clause not found in FROM clause

alvherre=# select * from foo, bar for update of foo for share of barf;
ERROR:  relation "barf" in FOR SHARE clause not found in FROM clause

Amusingly, the only test in which these error messages appeared, in
contrib/file_fdw, was removed after the two commits that changed the
wording.  So there's not a single test which needed to be tweaked for
this change.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Robert Haas

On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>>> 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow
>>> to use SnapshotSelf instead.  These include pgrowlocks(),
>>> pgstat_heap(), and get_actual_variable_range().
>
>> Tom proposed that we use SnapshotDirty for this case; let me just ask
>> whether there are any security concerns around that.  pgstattuple only
>> displays aggregate information so I think that's OK, but I wonder if
>> the value found in get_actual_variable_range() can leak out in EXPLAIN
>> output or whatever.  I can't particularly think of any reason why that
>> would actually matter, but I've generally shied away from exposing
>> data written by uncommitted transactions, and this would be a step in
>> the other direction.  Does this worry anyone else or am I being
>> paranoid?
>
> As far as get_actual_variable_range() is concerned, an MVCC snapshot
> would probably be the thing to use anyway; I see no need for the planner
> to be using estimates that are "more up to date" than that.  pgrowlocks
> and pgstat_heap() might be in a different category.
>
>> But thinking about it a little more, I wonder why
>> get_actual_variable_range() is using a snapshot at all.  Presumably
>> what we want there is to find the last index key, regardless of the
>> visibility of the heap tuple to which it points.
>
> No, what we ideally want is to know the current variable range that
> would be seen by the query being planned.

Oh, really?  Well, should we use GetActiveSnapshot() then?

That surprises me, though.  I really thought the point was to cost the
index scan, and surely that will be slowed down even by entries we
can't see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-23 Thread Greg Smith


On 7/23/13 12:10 PM, Josh Berkus wrote:

Apparently it's a little much for experienced reviewers to chew on, too,
since I've been trying to get someone to review it since the beginning
of the Commitfest.


It's more than the available experienced reviewers are willing to chew 
on fully as volunteers.  The reward for spending review time is pretty 
low right now.



While I understand the call for "resources", this is a bit unfair to
KaiGai, who has put in his time reviewing other people's patches.


If you read Dean Rasheed's comments, it's obvious he put a useful amount 
of work into his review suggestions.  It is not the case here that 
KaiGai worked on a review and got nothing in return.  Unfortunately that 
has happened to this particular patch before, but the community did a 
little better this time.


The goal of the CF is usually to reach one of these outcomes for every 
submission:


-The submission is ready for commit
-The submission needs improvement in X

Review here went far enough to identify an X to be improved.  It was a 
big enough issue that a rewrite is needed, commit at this time isn't 
possible, and now KaiGai has something we hope is productive he can 
continue working on.  That's all we can really promise here--that if we 
say something isn't ready for commit yet, that there's some feedback as 
to why.


I would have preferred to see multiple X issues identified here, given 
that we know there has to be more than just the one in a submission of 
this size.  The rough fairness promises of the CommitFest seem satisfied 
to me though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Tom Lane

Robert Haas  writes:
>> 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow
>> to use SnapshotSelf instead.  These include pgrowlocks(),
>> pgstat_heap(), and get_actual_variable_range().

> Tom proposed that we use SnapshotDirty for this case; let me just ask
> whether there are any security concerns around that.  pgstattuple only
> displays aggregate information so I think that's OK, but I wonder if
> the value found in get_actual_variable_range() can leak out in EXPLAIN
> output or whatever.  I can't particularly think of any reason why that
> would actually matter, but I've generally shied away from exposing
> data written by uncommitted transactions, and this would be a step in
> the other direction.  Does this worry anyone else or am I being
> paranoid?

As far as get_actual_variable_range() is concerned, an MVCC snapshot
would probably be the thing to use anyway; I see no need for the planner
to be using estimates that are "more up to date" than that.  pgrowlocks
and pgstat_heap() might be in a different category.

> But thinking about it a little more, I wonder why
> get_actual_variable_range() is using a snapshot at all.  Presumably
> what we want there is to find the last index key, regardless of the
> visibility of the heap tuple to which it points.

No, what we ideally want is to know the current variable range that
would be seen by the query being planned.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-23 Thread Kevin Grittner

Andres Freund  wrote:

> Hm. There seems to be more things that need some more improvement
> from a quick look.
>
> First, I have my doubts of the general modus operandy of
> CONCURRENTLY here. I think in many, many cases it would be faster
> to simply swap in the newly built heap + indexes in concurrently.
> I think I have seen that mentioned in the review while mostly
> skipping the discussion, but still. That would be easy enough as
> of Robert's mvcc patch.

Yeah, in many cases you would not want to choose to specify this
technique. The point was to have a technique which could run
without blocking while a heavy load of queries (including perhaps a
very long-running query) was executing.  If subsequent events
provide an alternative way to do that, it's worth looking at it;
but my hope was to get this out of the way to allow time to develop
incremental maintenance of matviews for the next CF.  If we spend
too much time reworking this to micro-optimize it for new patches,
incremental maintenance might not happen for 9.4.

The bigger point is perhaps syntax.  If MVCC catalogs are really at
a point where we can substitute a new heap and new indexes for a
matview without taking an AccessExclusiveLock, then we should just
make the current code do that.  I think there is still a place for
a differential update for large matviews which only need small
changes on a REFRESH, but perhaps the right term for that is not
CONCURRENTLY.

> * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance
>   be done a good bit earlier? We're executing queries before
>   that.

This can't be in effect while we are creating temp tables.  If
we're going to support a differential REFRESH technique, it must be
done more "surgically" than to wrap the whole REFRESH statement
execution.

> * The loop over indexes in refresh_by_match_merge should
>   index_open(ExclusiveLock) the indexes initially instead of
>   searching the syscache manually. They are opened inside the
>   loop in many cases anyway. There probably aren't any hazards
>   currently, but I am not even sure about that. The index_open()
>   in the loop at the very least processes the invalidation
>   messages other backends send...

Fair enough.  That seems like a simple change.

>   I'd even suggest using BuildIndexInfo() or such on the indexes,
>   then you could use ->ii_Expressions et al instead of peeking
>   into indkeys by hand.

Given that the function is in a source file described as containing
"code to create and destroy POSTGRES index relations" and the
comments for that function say that it "stores the information
about the index that's needed by FormIndexDatum, which is used for
both index_build() and later insertion of individual index tuples,"
and we're not going to create or destroy an index or call
FormIndexDatum or insert individual index tuples from this code, it
would seem to be a significant expansion of the charter of that
function.  What benefits do you see to using that level?

> * All not explicitly looked up operators should be qualified
>   using OPERATOR(). It's easy to define your own = operator for
>   tid and set the search_path to public,pg_catalog. Now, the
>   whole thing is restricted to talbe owners afaics, making this
>   decidedly less dicey, but still. But if anyobdy actually uses a
>   search_path like the above you can easily hurt them.
> * Same seems to hold true for the y = x and y.* IS DISTINCT FROM
>   x.* operations.
> * (y.*) = (x.*) also needs to do proper operator lookups.

I wasn't aware that people could override the equality operators
for tid and RECORD, so that seemed like unnecessary overhead.  I
can pull the operators from the btree AM if people can do that.

> * OpenMatViewIncrementalMaintenance() et al seem to be
>   underdocumented.

If the adjacent comment for
MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain
about what OpenMatViewIncrementalMaintenance() and
CloseMatViewIncrementalMaintenance() are for, I can add comments.
At the time I wrote it, it seemed to me to be redundant to have
such comments, and would cause people to waste more time looking at
them then would be saved by anything they could add to what the
function names and one-line function bodies conveyed; but if there
was doubt in your mind about when they should be called, I'll add
something.

Would it help to move the static functions underneath the
(documented) public function and its comment?

> * why is it even allowed that matview_maintenance_depth is > 1?
>   Unless I miss something there doesn't seem to be support for a
>   recursive refresh whatever that would mean?

I was trying to create the function in a way that would be useful
for incremental maintenance, too.  Those could conceivably recurse.
Also, this way bugs in usage are more easily detected.

> * INSERT and DELETE should also alias the table names, to make
>   sure there cannot be issues with the nested queries.

I don't see the hazard -- could you elabor

Re: [HACKERS] Comma Comma Comma 8601

2013-07-23 Thread David E. Wheeler

On Jul 23, 2013, at 1:17 AM, Tom Lane  wrote:

> Does that create any ambiguities against formats we already support?
> I'm worried about examples like this one:
> 
> select 'monday, july 22, 22:30 2013'::timestamptz;
>  timestamptz   
> 
> 2013-07-22 22:30:00-04
> (1 row)
> 
> Right now I'm pretty sure that the datetime parser treats comma as a
> noise symbol.  If that stops being true, we're likely to break some
> applications out there (admittedly, possibly rather strange ones,
> but still ...)

I kind of suspect not, since this fails:

david=# select '12:24:53 654'::time;
ERROR:  invalid input syntax for type time: "12:24:53 654"
LINE 1: select '12:24:53 654'::time;
   ^

I would have guessed that the time parser gets to a state where it knows it is 
dealing with a ISO-8601-style time. But I have not looked at the code, of 
course.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List

2013-07-23 Thread Josh Berkus

Greg,

> As far as motivating new reviewers goes, let's talk about positive
> feedback.  Anything that complicates the release notes is a non-starter
> because that resource is tightly controlled by a small number of people,
> and it's trying to satisfy a lot of purposes.  

Greg, you're re-arguing something we obtained a consensus on a week ago.
  Please read the thread "Kudos for reviewers".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-23 Thread Greg Smith


On 7/23/13 10:56 AM, Robert Haas wrote:

On Mon, Jul 22, 2013 at 11:48 PM, Greg Smith  wrote:

We know that a 1GB relation segment can take a really long time to write
out.  That could include up to 128 changed 8K pages, and we allow all of
them to get dirty before any are forced to disk with fsync.


By my count, it can include up to 131,072 changed 8K pages.


Even better!  I can pinpoint exactly what time last night I got tired 
enough to start making trivial mistakes.  Everywhere I said 128 it's 
actually 131,072, which just changes the range of the GUC I proposed.


Getting the number right really highlights just how bad the current 
situation is.  Would you expect the database to dump up to 128K writes 
into a file and then have low latency when it's flushed to disk with 
fsync?  Of course not.  But that's the job the checkpointer process is 
trying to do right now.  And it's doing it blind--it has no idea how 
many dirty pages might have accumulated before it started.


I'm not exactly sure how best to use the information collected.  fsync 
every N writes is one approach.  Another is to use accumulated writes to 
predict how long fsync on that relation should take.  Whenever I tried 
to spread fsync calls out before, the scale of the piled up writes from 
backends was the input I really wanted available.  The segment write 
count gives an alternate way to sort the blocks too, you might start 
with the heaviest hit ones.


In all these cases, the fundamental I keep coming back to is wanting to 
cue off past write statistics.  If you want to predict relative I/O 
delay times with any hope of accuracy, you have to start the checkpoint 
knowing something about the backend and background writer activity since 
the last one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-07-23 Thread Josh Berkus

On 07/22/2013 01:27 PM, Greg Smith wrote:
> 
> Anyone who would like to see RLS committed should consider how you can
> get resources to support a skilled PostgreSQL reviewer spending time on
> it.  (This is a bit much to expect new reviewers to chew on usefully)
> I've been working on that here, but I don't have anything I can publicly
> commit to yet.

Apparently it's a little much for experienced reviewers to chew on, too,
since I've been trying to get someone to review it since the beginning
of the Commitfest.

While I understand the call for "resources", this is a bit unfair to
KaiGai, who has put in his time reviewing other people's patches.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bloom Filter lookup for hash joins

2013-07-23 Thread Atri Sharma

On Tue, Jul 23, 2013 at 7:32 PM, Greg Stark  wrote:
> This exact idea was discussed a whole back. I think it was even implemented.
>
> The problem Tom raised at the time is that the memory usage of the bloom
> filter implies smaller or less efficient hash table. It's difficult to
> determine whether you're coming out ahead or behind.
>
> I think it should be possible to figure this out though. Bloom fillers have
> well understood math for the error rate given the size and number of hash
> functions (and please read up on it and implement the optimal combination
> for the target error rate, not just an wag) and it should be possible to
> determine the resulting advantage.
>
> Measuring the cost of the memory usage is harder but some empirical results
> should give some idea.  I expect the result to be wildly uneven one way or
> the other so hopefully it doesn't matter of its not perfect. If it's close
> then probably is not worth doing anyways.
>
> I would suggest looking up the archives of the previous discussion. You mind
> find the patch still usable. Iirc there's been no major changes to the hash
> join code.
>
Right, I will definitely have a look on the thread. Thanks for the info!

Regards,

Atri


-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-23 Thread Robert Haas

On Thu, Jul 18, 2013 at 8:46 AM, Robert Haas  wrote:
> There seems to be a consensus that we should try to get rid of
> SnapshotNow entirely now that we have MVCC catalog scans, so I'm
> attaching two patches that together come close to achieving that goal:
>
> 1. snapshot-error-v1.patch introduces a new special snapshot, called
> SnapshotError.  In the cases where we set SnapshotNow as a sort of
> default snapshot, this patch changes the code to use SnapshotError
> instead.  This affects scan->xs_snapshot in genam.c and
> estate->es_snapshot in execUtils.c.  This passes make check-world, so
> apparently there is no code in the core distribution that does this.
> However, this is safer for third-party code, which will ERROR instead
> of seg faulting.  The alternative approach would be to use
> InvalidSnapshot, which I think would be OK too if people dislike this
> approach.

It seems the consensus was mildly for InvalidSnapshot, so I did it that way.

> 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow
> to use SnapshotSelf instead.  These include pgrowlocks(),
> pgstat_heap(), and get_actual_variable_range().  In all of those
> cases, only an approximately-correct answer is needed, so the change
> should be fine.  I'd also generally expect that it's very unlikely for
> any of these things to get called in a context where the table being
> scanned has been updated by the current transaction after the most
> recent command-counter increment, so in practice the change in
> semantics will probably not be noticeable at all.

Tom proposed that we use SnapshotDirty for this case; let me just ask
whether there are any security concerns around that.  pgstattuple only
displays aggregate information so I think that's OK, but I wonder if
the value found in get_actual_variable_range() can leak out in EXPLAIN
output or whatever.  I can't particularly think of any reason why that
would actually matter, but I've generally shied away from exposing
data written by uncommitted transactions, and this would be a step in
the other direction.  Does this worry anyone else or am I being
paranoid?

But thinking about it a little more, I wonder why
get_actual_variable_range() is using a snapshot at all.  Presumably
what we want there is to find the last index key, regardless of the
visibility of the heap tuple to which it points.  We don't really need
to consult the heap at all, one would think; the value we need ought
to be present in the index tuple.  If we're going to use a snapshot
for simplicity of coding, maybe the right thing is SnapshotAny.  After
all, even if the index tuples are all dead, we still have to scan
them, so it's still relevant for costing purposes.

Thoughts?

> With that done, the only remaining uses of SnapshotNow in our code
> base will be in currtid_byreloid() and currtid_byrelname().  So far no
> one on this list has been able to understand clearly what the purpose
> of those functions is, so I'm copying this email to pgsql-odbc in case
> someone there can provide more insight.  If I were a betting man, I'd
> bet that they are used in contexts where the difference between
> SnapshotNow and SnapshotSelf wouldn't matter there, either.  For
> example, if those functions are always invoked in a query that does
> nothing but call those functions, the difference wouldn't be visible.
> If we don't want to risk any change to the semantics, we can (1) grit
> our teeth and keep SnapshotNow around or (2) use an instantaneous MVCC
> snapshot there, and accept that people who have very large connection
> counts and extremely heavy use of those functions may see a
> performance regression.

It seems like we're leaning toward a fresh MVCC snapshot for this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-23 Thread Robert Haas

On Mon, Jul 22, 2013 at 11:48 PM, Greg Smith  wrote:
> Recently I've been dismissing a lot of suggested changes to checkpoint fsync
> timing without suggesting an alternative.  I have a simple one in mind that
> captures the biggest problem I see:  that the number of backend and
> checkpoint writes to a file are not connected at all.
>
> We know that a 1GB relation segment can take a really long time to write
> out.  That could include up to 128 changed 8K pages, and we allow all of
> them to get dirty before any are forced to disk with fsync.

By my count, it can include up to 131,072 changed 8K pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-23 Thread Robert Haas

On Tue, Jul 23, 2013 at 9:22 AM, Greg Smith  wrote:
> On 7/23/13 9:06 AM, Robert Haas wrote:
>>
>> But just BTW, you kind of messed up that CommitFest entry.  The link you
>> added as a new version of the
>> patch was actually to a different patch from the same flock.
>
> Fixed now, since I find myself looking back through history on submissions
> often enough to care about it being right.  My eyes were starting to glaze
> over by the end of staring at this "flock".

Cool.

I'm pretty sure that's the correct collective noun for patches.  Right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-23 Thread Robert Haas

On Tue, Jul 23, 2013 at 9:42 AM, Craig Ringer  wrote:
> (Replying on phone, please forgive bad quoting)
>
> Isn't this pretty much what adopting ICU is supposed to give us? 
> OS-independent collations?

Yes.

> I'd be interested in seeing the rest data for this performance report, partly 
> as I'd like to see how ICU collations would compare when ICU is crudely 
> hacked into place for testing.

I pretty much lost interest in ICU upon reading that they use UTF-16
as their internal format.

http://userguide.icu-project.org/strings#TOC-Strings-in-ICU

What that would mean for us is that instead of copying the input
strings into a temporary buffer and passing the buffer to strcoll(),
we'd need to convert them to ICU's representation (which means writing
twice as many bytes as the length of the input string in cases where
the input string is mostly single-byte characters) and then call ICU's
strcoll() equivalent.  I agree that it might be worth testing, but I
can't work up much optimism.  It seems to me that something that
operates directly on the server encoding could run a whole lot faster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 7:06 PM Andres Freund wrote:
> On 2013-07-23 18:59:11 +0530, Amit Kapila wrote:
> > > * I'd be very surprised if this doesn't make WAL replay of update
> heavy
> > >   workloads slower by at least factor of 2.
> >
> > Yes, if you just consider the cost of replay, but it involves
> other
> > operations as well
> > like for standby case transfer of WAL, Write of WAL, Read from
> WAL and
> > then apply.
> > So among them most operation's will be benefited from reduced WAL
> size,
> > except apply where you need to decode.
> 
> I still think it's rather unlikely that they offset those. I've seen
> wal
> replay be a major bottleneck more than once...
>
> > > * It makes data recovery from WAL *noticeably* harder since data
> > >   corruption now is carried forwards and you need the old data to
> > > decode
> > >   new data
> >
> >This is one of the reasons why this optimization is done only when
> the
> > new row goes in same page.
> 
> That doesn't help all that much. It somewhat eases recovering data if
> full_page_writes are on, but it's realy hard to stitch together all
> changes if the corruption occured within a 1h long checkpoint...
 
I think once a record is corrupted on page, user has to reconstruct that
page, it might be difficult
to just reconstruct a record and this optimization will not carry forward
any corruption from one to another 
Page.   
 
> > > * It makes changeset extraction either more expensive or it would
> have
> > >   to be disabled there.
> 
> > I think, if there is any such implication, we can probably have
> the
> > option of disable it
> 
> That can just be done on wal_level = logical, that's not the
> problem. It's certainly not with precedence that we have wal_level
> dependent optimizations.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bloom Filter lookup for hash joins

2013-07-23 Thread Greg Stark

This exact idea was discussed a whole back. I think it was even
implemented.

The problem Tom raised at the time is that the memory usage of the bloom
filter implies smaller or less efficient hash table. It's difficult to
determine whether you're coming out ahead or behind.

I think it should be possible to figure this out though. Bloom fillers have
well understood math for the error rate given the size and number of hash
functions (and please read up on it and implement the optimal combination
for the target error rate, not just an wag) and it should be possible to
determine the resulting advantage.

Measuring the cost of the memory usage is harder but some empirical results
should give some idea.  I expect the result to be wildly uneven one way or
the other so hopefully it doesn't matter of its not perfect. If it's close
then probably is not worth doing anyways.

I would suggest looking up the archives of the previous discussion. You
mind find the patch still usable. Iirc there's been no major changes to the
hash join code.

-- 
greg
On Jun 26, 2013 7:31 PM, "Jeff Janes"  wrote:

> On Wed, Jun 26, 2013 at 5:01 AM, Atri Sharma  wrote:
>
>>
>> > The problem here is that if the hash table is in memory, doing a hash
>> > table lookup directly is likely to be cheaper than a bloom filter
>> > lookup, even if the bloom filter fits into the processor cache and the
>> > hash table doesn't (10 last level cache hits is slower than one cache
>> > miss). Bloom filter will help when its not feasible to use an actual
>> > hash table (lots of large keys), the real lookup is slow (e.g. an
>> > index lookup), you are doing a lot of lookups to amortize the
>> > construction cost and the condition is expected to be selective (most
>> > lookups give a negative). There might be some dataware house types of
>> > queries where it would help, but it seems like an awfully narrow use
>> > case with a potential for performance regressions when the planner has
>> > a bad estimate.
>>
>> Ok, sounds good. Cant we use bloom filters for the case where the hash
>> table doesnt fit in memory? Specifically, when reading from disk is
>> inevitable for accessing the hash table, we can use bloom filters for
>> deciding which keys to actually read from disk.
>
>
> I don't think that sounds all that promising.  When the hash table does
> not fit in memory, it is either partitioned into multiple passes, each of
> which do fit in memory, or it chooses a different plan altogether.  Do we
> know under what conditions a Bloom filter would be superior to those
> options, and could we reliably detect those conditions?
>
> Cheers,
>
> Jeff
>

Re: [HACKERS] Performance problem in PLPgSQL

2013-07-23 Thread Tom Lane

Marc Cousin  writes:
> The example below is of course extremely simplified, and obviously not
> what we are really doing in the database, but it exhibits the slowdown
> between 9.1.9 and 9.2.4.

Hm.  Some experimentation shows that the plan cache is failing to switch
to a generic plan, but I'm not sure why the cast would have anything to
do with that ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make --silent

2013-07-23 Thread Andres Freund

On 2013-07-23 09:27:18 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote:
> >> In file included from gram.y:13612:0:
> >> scan.c: In function ‘yy_try_NUL_trans’:
> >> scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable]
> >> PostgreSQL, contrib, and documentation successfully made. Ready to install.
> 
> > FWIW, I've patched debian's flex just to get rid of this ;)
> 
> I think it's fixed officially as of flex 2.5.35.

Unfortunately (because debian has 2.5.35) it seems to be 2.5.36. The
former was released 2008 :(

It's 62617a3183abb2945beedc0b0ff106182af4f266 in flex's repository if
somebody wants to submit it to debian.

The repository is at git://git.code.sf.net/p/flex/flex - their website
stated a different url the last time I looked.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improve Chinese locale performance

2013-07-23 Thread Craig Ringer

(Replying on phone, please forgive bad quoting)

Isn't this pretty much what adopting ICU is supposed to give us? OS-independent 
collations?

I'd be interested in seeing the rest data for this performance report, partly 
as I'd like to see how ICU collations would compare when ICU is crudely hacked 
into place for testing.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Andres Freund

On 2013-07-23 18:59:11 +0530, Amit Kapila wrote:
> > * I'd be very surprised if this doesn't make WAL replay of update heavy
> >   workloads slower by at least factor of 2.
> 
> Yes, if you just consider the cost of replay, but it involves other
> operations as well
> like for standby case transfer of WAL, Write of WAL, Read from WAL and
> then apply.
> So among them most operation's will be benefited from reduced WAL size,
> except apply where you need to decode.

I still think it's rather unlikely that they offset those. I've seen wal
replay be a major bottleneck more than once...

> > * It makes data recovery from WAL *noticeably* harder since data
> >   corruption now is carried forwards and you need the old data to
> > decode
> >   new data
>   
>This is one of the reasons why this optimization is done only when the
> new row goes in same page.

That doesn't help all that much. It somewhat eases recovering data if
full_page_writes are on, but it's realy hard to stitch together all
changes if the corruption occured within a 1h long checkpoint...

> > * It makes changeset extraction either more expensive or it would have
> >   to be disabled there.

> I think, if there is any such implication, we can probably have the
> option of disable it

That can just be done on wal_level = logical, that's not the
problem. It's certainly not with precedence that we have wal_level
dependent optimizations.

> > I think my primary issue is that philosophically/architecturally I am
> > of
> > the opinion that a wal record should make sense of it's own without
> > depending on heap data. And this patch looses that.
> 
> Is the main worry about corruption getting propagated?

Not really. It "feels" wrong to me architecturally. That's subjective, I
know.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make --silent

2013-07-23 Thread Andres Freund

On 2013-07-23 09:21:54 -0400, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote:
> > > Writing postgres.bki
> > > Writing schemapg.h
> > > Writing postgres.description
> > > Writing postgres.shdescription
> > > Writing fmgroids.h
> > > Writing fmgrtab.c
> > 
> > I personally don't feel the need for those to go away. They only appear
> > when doing a clean rebuild and/or when changing something
> > significant. Also it's just a couple of lines.
> 
> For my 2c, I'd be for getting rid of the above.  I agree it's not a huge
> deal, but I'm also a big fan of "don't spam me when things go right".
> Is there any particular reason we need to see them with every (clean)
> build..?

I don't have a big problem with getting rid of them either. I find them
moderately reassuring to see those triggered if I change something
relevant because I don't fully trust the dependency mechanism around
them, but that's about it.

> > > In file included from gram.y:13612:0:
> > > scan.c: In function ‘yy_try_NUL_trans’:
> > > scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable]
> > > PostgreSQL, contrib, and documentation successfully made. Ready to 
> > > install.
> > 
> > FWIW, I've patched debian's flex just to get rid of this ;)
> 
> Any idea if that might be accepted upstream or perhaps by Debian..?  It
> would surely be nice to make that error go away..

I think Tom ushered a patch upstream, debian just hasn't updated in a
good while, even in unstable. I've looked at the packages bug page some
time back and it didn't look very active so I didn't think further about
proposing a debian-only backport of that patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation

2013-07-23 Thread Andres Freund

On 2013-07-23 14:17:13 +0100, Greg Stark wrote:
> We already do this in pg_restore by starting multiple worker processes.
> Those processes should get the benefit of synchronised sequential scans.
> 
> The way the api for indexes works y wouldn't really be hard to start
> multiple parallel index builds. I'm not sure how well the pg_restore thing
> works and sometimes the goal isn't to maximise the speed so starting
> multiple processes isn't always ideal.  It might sometimes be interesting
> to be able to do it explicit in a single process.

That's true for normal index modifications, but for ambuild (the initial
index creation function) much less so. That's pretty much a black box
from the outside. It's possible to change that, but it's certainly not
trivial.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 12:27 AM Andres Freund wrote:
> On 2013-07-19 10:40:01 +0530, Hari Babu wrote:
> >
> > On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
> > >On 7/9/13 12:09 AM, Amit Kapila wrote:
> > >>I think the first thing to verify is whether the results posted
> can be validated in some other environment setup by another person.
> > >>The testcase used is posted at below link:
> > >>http://www.postgresql.org/message-
> id/51366323.8070...@vmware.com
> >
> > >That seems easy enough to do here, Heikki's test script is
> excellent.
> > >The latest patch Hari posted on July 2 has one hunk that doesn't
> apply
> > >anymore now.
> >
> > The Head code change from Heikki is correct.
> > During the patch rebase to latest PG LZ optimization code, the above
> code change is missed.
> >
> > Apart from the above changed some more changes are done in the patch,
> those are.
> 
> FWIW I don't like this approach very much:
> 
> * I'd be very surprised if this doesn't make WAL replay of update heavy
>   workloads slower by at least factor of 2.

Yes, if you just consider the cost of replay, but it involves other
operations as well
like for standby case transfer of WAL, Write of WAL, Read from WAL and
then apply.
So among them most operation's will be benefited from reduced WAL size,
except apply where you need to decode. 
 
> * It makes data recovery from WAL *noticeably* harder since data
>   corruption now is carried forwards and you need the old data to
> decode
>   new data
  
   This is one of the reasons why this optimization is done only when the
new row goes in same page.

> * It makes changeset extraction either more expensive or it would have
>   to be disabled there.

I think, if there is any such implication, we can probably have the
option of disable it

> I think my primary issue is that philosophically/architecturally I am
> of
> the opinion that a wal record should make sense of it's own without
> depending on heap data. And this patch looses that.

Is the main worry about corruption getting propagated?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make --silent

2013-07-23 Thread Tom Lane

Andres Freund  writes:
> On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote:
>> I have noticed that some people post examples using make --silent (-s).
>> I found this actually kind of neat to use from time to time, because
>> then you only see output if you have warnings or errors.  But we get
>> some extra output that doesn't quite fit:

>> Writing postgres.bki
>> Writing schemapg.h
>> Writing postgres.description
>> Writing postgres.shdescription
>> Writing fmgroids.h
>> Writing fmgrtab.c

> I personally don't feel the need for those to go away.

I agree --- these let you know that something happened that does not
happen every build, so they're kind of useful.  I concur that the noise
from doc building is just noise, though.

>> In file included from gram.y:13612:0:
>> scan.c: In function âyy_try_NUL_transâ:
>> scan.c:10181:23: warning: unused variable âyygâ [-Wunused-variable]
>> PostgreSQL, contrib, and documentation successfully made. Ready to install.

> FWIW, I've patched debian's flex just to get rid of this ;)

I think it's fixed officially as of flex 2.5.35.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-23 Thread Greg Smith


On 7/23/13 9:06 AM, Robert Haas wrote:

But just BTW, you kind of messed up that CommitFest entry.  The link you added 
as a new version of the
patch was actually to a different patch from the same flock.


Fixed now, since I find myself looking back through history on 
submissions often enough to care about it being right.  My eyes were 
starting to glaze over by the end of staring at this "flock".


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make --silent

2013-07-23 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote:
> > Writing postgres.bki
> > Writing schemapg.h
> > Writing postgres.description
> > Writing postgres.shdescription
> > Writing fmgroids.h
> > Writing fmgrtab.c
> 
> I personally don't feel the need for those to go away. They only appear
> when doing a clean rebuild and/or when changing something
> significant. Also it's just a couple of lines.

For my 2c, I'd be for getting rid of the above.  I agree it's not a huge
deal, but I'm also a big fan of "don't spam me when things go right".
Is there any particular reason we need to see them with every (clean)
build..?

> > Processing HTML.index...
> > 2406 entries loaded...
> > 0 entries ignored...
> > Done.
> > Note: Writing man3/SPI_connect.3
> > Note: Writing man3/SPI_finish.3
> > Note: Writing man3/SPI_push.3
> > ...
> > Note: Writing man1/pg_test_timing.1
> > Note: Writing man1/pg_upgrade.1
> > Note: Writing man1/pg_xlogdump.1
> 
> Those should imo be silenced. It's quite a bit of output, and especially
> during a parallel build they tend to hide more important output.

Agreed.

> > In file included from gram.y:13612:0:
> > scan.c: In function ‘yy_try_NUL_trans’:
> > scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable]
> > PostgreSQL, contrib, and documentation successfully made. Ready to install.
> 
> FWIW, I've patched debian's flex just to get rid of this ;)

Any idea if that might be accepted upstream or perhaps by Debian..?  It
would surely be nice to make that error go away..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation

2013-07-23 Thread Greg Stark

We already do this in pg_restore by starting multiple worker processes.
Those processes should get the benefit of synchronised sequential scans.

The way the api for indexes works y wouldn't really be hard to start
multiple parallel index builds. I'm not sure how well the pg_restore thing
works and sometimes the goal isn't to maximise the speed so starting
multiple processes isn't always ideal.  It might sometimes be interesting
to be able to do it explicit in a single process.

-- 
greg
On Jul 23, 2013 1:06 PM, "Tim Kane"  wrote:

> Hi all,
>
> I haven't given this a lot of thought, but it struck me that when
> rebuilding tables (be it for a restore process, or some other operational
> activity) - there is more often than not a need to build an index or two,
> sometimes many indexes, against the same relation.
>
> It strikes me that in order to build just one index, we probably need to
> perform a full table scan (in a lot of cases).   If we are building
> multiple indexes sequentially against that same table, then we're probably
> performing multiple sequential scans in succession, once for each index.
>
> Could we architect a mechanism that allowed multiple index creation
> statements to execute concurrently, with all of their inputs fed directly
> from a single sequential scan against the full relation?
>
> From a language construct point of view, this may not be trivial to
> implement for raw/interactive SQL - but possibly this is a candidate for
> the custom format restore?
>
> I presume this would substantially increase the memory overhead required
> to build those indexes, though the performance gains may be advantageous.
>
> Feel free to shoot holes through this :)
>
> Apologies in advance if this is not the correct forum for suggestions..
>
>

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-23 Thread Robert Haas

On Tue, Jul 23, 2013 at 12:04 AM, Greg Smith  wrote:
> A further look at recently ALTER OPERATOR FAMILY changes shows that Robert
> has been working on those quite a bit.  I'm putting him down as the
> committer on this last one.

I missed that one, now committed.  But just BTW, you kind of messed up
that CommitFest entry.  The link you added as a new version of the
patch was actually to a different patch from the same flock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade across more than two neighboring major releases

2013-07-23 Thread Bruce Momjian

On Tue, Jul 23, 2013 at 09:48:16PM +0900, Michael Paquier wrote:
> 
> 
> 
> On Tue, Jul 23, 2013 at 8:53 PM, Pavel Raiskup  wrote:
> 
> Is pg_upgrade supposed to work even across multiple major releases?  (I
> know
> that the README.rpm-dist is mentioning just one major release step, but it
> could be bound just to actual pg_upgrade which is packaged..)
> 
> Yes you should be able to do a one-step upgrade as long as you update from at
> least a 8.3 server. There are some restrictions depending on the server 
> version
> that is going to be upgraded though, you can have a look at the documentation
> for more details here:
> http://www.postgresql.org/docs/9.2/static/pgupgrade.html

Yes, you can easily skip many major versions in on pg_upgrade run. 
Please show us the errors you see.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add more regression tests for ALTER OPERATOR FAMILY.. ADD / DROP

2013-07-23 Thread Robert Haas

On Thu, May 23, 2013 at 6:52 PM, Robins  wrote:
> Please find attached a patch to take code-coverage of ALTER OPERATOR
> FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%.
>
> Any and all feedback is welcome.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make --silent

2013-07-23 Thread Andres Freund

On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote:
> I have noticed that some people post examples using make --silent (-s).
> I found this actually kind of neat to use from time to time, because
> then you only see output if you have warnings or errors.  But we get
> some extra output that doesn't quite fit:

I am one of the people doing that regularly. I think it's fantastic
because it's so much harder to miss errors that way... Also it shortens
the buildtime measurably.

> Writing postgres.bki
> Writing schemapg.h
> Writing postgres.description
> Writing postgres.shdescription
> Writing fmgroids.h
> Writing fmgrtab.c

I personally don't feel the need for those to go away. They only appear
when doing a clean rebuild and/or when changing something
significant. Also it's just a couple of lines.

> Processing HTML.index...
> 2406 entries loaded...
> 0 entries ignored...
> Done.
> Note: Writing man3/SPI_connect.3
> Note: Writing man3/SPI_finish.3
> Note: Writing man3/SPI_push.3
> ...
> Note: Writing man1/pg_test_timing.1
> Note: Writing man1/pg_upgrade.1
> Note: Writing man1/pg_xlogdump.1

Those should imo be silenced. It's quite a bit of output, and especially
during a parallel build they tend to hide more important output.

> In file included from gram.y:13612:0:
> scan.c: In function ‘yy_try_NUL_trans’:
> scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable]
> PostgreSQL, contrib, and documentation successfully made. Ready to install.

FWIW, I've patched debian's flex just to get rid of this ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-23 Thread Dean Rasheed

On 23 July 2013 06:10, Tom Lane  wrote:
> Andrew Gierth  writes:
>> I must admit to finding all of this criticism of unread code a bit
>> bizarre.
>
> If you yourself are still finding bugs in the code as of this afternoon,
> onlookers could be forgiven for doubting that the code is quite as
> readable or ready to commit as all that.
>

I had another look at this --- the bug fix looks reasonable, and
includes a sensible set of additional regression tests.

This was not a bug that implies anything fundamentally wrong with the
patch. Rather it was just a fairly trivial easy-to-overlook bug of
omission --- I certainly overlooked it in my previous reviews (sorry)
and at least 3 more experienced hackers than me overlooked it during
detailed code inspection.

I don't think that really reflects negatively on the quality of the
patch, or the approach taken, which I still think is good. That's also
backed up by the fact that Greg isn't able to find much he wants to
change.

I also don't see much that needs changing in the patch, except
possibly in the area where Greg expressed concerns over the comments
and code clarity. I had similar concerns during my inital review, so I
tend to agree that perhaps it's worth adding a separate boolean
has_ordinality flag to FunctionScanState just to improve code
readability. FWIW, I would probably have extended FunctionScanState
like so:

/* 
 *   FunctionScanState information
 *
 *  Function nodes are used to scan the results of a
 *  function appearing in FROM (typically a function returning set).
 *
 *  eflags  node's capability flags
 *  tupdesc node's expected return tuple description
 *  tuplestorestate private state of tuplestore.c
 *  funcexprstate for function expression being evaluated
 *  has_ordinality  function scan WITH ORDINALITY?
 *  func_tupdescfor WITH ORDINALITY, the raw function tuple
 *  description, without the added ordinality column
 *  func_slot   for WITH ORDINALITY, a slot for the raw function
 *  return tuples
 *  ordinal for WITH ORDINALITY, the ordinality of the return
 *  tuple
 * 
 */
typedef struct FunctionScanState
{
ScanState   ss; /* its first field is NodeTag */
int eflags;
TupleDesc   tupdesc;
Tuplestorestate *tuplestorestate;
ExprState  *funcexpr;
boolhas_ordinality;
/* these fields are used for a function scan WITH ORDINALITY */
TupleDesc   func_tupdesc;
TupleTableSlot *func_slot;
int64   ordinal;
} FunctionScanState;

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade across more than two neighboring major releases

2013-07-23 Thread Michael Paquier

On Tue, Jul 23, 2013 at 8:53 PM, Pavel Raiskup  wrote:

> Is pg_upgrade supposed to work even across multiple major releases?  (I
> know
> that the README.rpm-dist is mentioning just one major release step, but it
> could be bound just to actual pg_upgrade which is packaged..)
>
Yes you should be able to do a one-step upgrade as long as you update from
at least a 8.3 server. There are some restrictions depending on the server
version that is going to be upgraded though, you can have a look at the
documentation for more details here:
http://www.postgresql.org/docs/9.2/static/pgupgrade.html

Thanks,
-- 
Michael

Re: [HACKERS] proposal - psql - show longest tables

2013-07-23 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Even if we thought the functionality was worth the trouble, which I
> continue to doubt, this particular syntax proposal is a disaster.

Agreed.  While there might be things worthwhile to add to psql's
backslash commands, this isn't one of those.

Thanks,

Stephen

signature.asc
Description: Digital signature

[HACKERS] make --silent

2013-07-23 Thread Peter Eisentraut

I have noticed that some people post examples using make --silent (-s).
I found this actually kind of neat to use from time to time, because
then you only see output if you have warnings or errors.  But we get
some extra output that doesn't quite fit:

Writing postgres.bki
Writing schemapg.h
Writing postgres.description
Writing postgres.shdescription
Writing fmgroids.h
Writing fmgrtab.c
Processing HTML.index...
2406 entries loaded...
0 entries ignored...
Done.
Note: Writing man3/SPI_connect.3
Note: Writing man3/SPI_finish.3
Note: Writing man3/SPI_push.3
...
Note: Writing man1/pg_test_timing.1
Note: Writing man1/pg_upgrade.1
Note: Writing man1/pg_xlogdump.1
In file included from gram.y:13612:0:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable]
PostgreSQL, contrib, and documentation successfully made. Ready to install.

Is there interest in silencing this extra output (not the compiler
warning) when make -s is used?  The documentation tools have options to
do so, the rest are our own scripts.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF 1] And then there were 5

2013-07-23 Thread Stephen Frost

Greg,

First, thanks for helping out with the CF.

* Greg Smith (g...@2ndquadrant.com) wrote:
> That leaves only 1 patch where I have no idea who will commit it:
> 
>   access to calls stack from GET DIAGNOSTICS statement

I'll take a look at this.

Thanks again,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] improve Chinese locale performance

2013-07-23 Thread Robert Haas

On Mon, Jul 22, 2013 at 12:49 PM, Greg Stark  wrote:
> On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut  wrote:
>> I think part of the problem is that we call strcoll for each comparison,
>> instead of doing strxfrm once for each datum and then just strcmp for
>> each comparison.  That is effectively equivalent to what the proposal
>> implements.
>
> Fwiw I used to be a big proponent of using strxfrm. But upon further
> analysis I realized it was a real difficult tradeoff. strxrfm saves
> potentially a lot of cpu cost but at the expense of expanding the size
> of the sort key. If the sort spills to disk or even if it's just
> memory bandwidth limited it might actually be slower than doing the
> additional cpu work of calling strcoll.
>
> It's hard to see how to decide in advance which way will be faster. I
> suspect strxfrm is still the better bet, especially for complex large
> character set based locales like Chinese. strcoll might still win by a
> large margin on simple mostly-ascii character sets.

The storage blow-up on systems I've tested is on the order of 10x.
That's possibly fine if the data still fits in memory, but it pretty
much sucks if it makes your sort spill to disk, which seems like a
likely outcome in many cases.

But I don't have much trouble believing the OP's contention that he's
coded a locale-specific version that is faster than the version that
ships with the OS.  On glibc, for example, we copy the strings we want
to compare, so that we can add a terminating zero byte.  The first
thing that glibc does is call strlen().  That's pretty horrible, and
I'm not at all sure the horror ends there, either.

It would be great to have support for user-defined collations in
PostgreSQL.  Let the user provide their own comparison function and
whatever else is needed and use that instead of the OS-specific
support.  Aside from the performance advantages, one could even create
collations that have the same names and orderings on all platforms we
support.  Our support team has gotten more than one inquiry of the
form "what's the equivalent of Linux collation XYZ on Windows?" - and
telling them that there is no exact equivalent is not the answer the
want to hear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 12:02 AM Greg Smith wrote:
> The v3 patch applies perfectly here now.  Attached is a spreadsheet
> with test results from two platforms, a Mac laptop and a Linux server.
> I used systems with high disk speed because that seemed like a worst
> case for this improvement.  The actual improvement for shrinking WAL
> should be even better on a system with slower disks.

You are absolutely right. 
To mimic it on our system, by configuring RAMFS for database, it shows similar 
results.
 
> There are enough problems with the "hundred tiny fields" results that I
> think this not quite ready for commit yet.  More comments on that
> below.
>   This seems close though, close enough that I am planning to follow up
> to see if the slow disk results are better.

Thanks for going extra mile to try for slower disks. 

> Reviewing the wal-update-testsuite.sh test program, I think the only
> case missing that would be useful to add is "ten tiny fields, one
> changed".  I think that one is interesting to highlight because it's
> what benchmark programs like pgbench do very often.
> The GUC added by the program looks like this:
> 
> postgres=# show wal_update_compression_ratio ;
>   wal_update_compression_ratio
> --
>   25
> 
> Is possible to add a setting here that disables the feature altogether?

Yes, it can be done in below 2 ways:
1. Provide a new configuration parameter (wal_update_compression) to turn 
on/off this feature.
2. At table level user can be given option to configure

>   That always makes it easier to consider a commit, knowing people can
> roll back the change if it makes performance worse.  That would make
> performance testing easier too.  wal-update-testsuit.sh takes as long
> as
> 13 minutes, it's long enough that I'd like the easier to automate
> comparison GUC disabling adds.  If that's not practical to do given the
> intrusiveness of the code, it's not really necessary.  I haven't looked
> at the change enough to be sure how hard this is.
> 
> On the Mac, the only case that seems to have a slowdown now is "hundred
> tiny fields, half nulled".  It would be nice to understand just what is
> going on with that one.  I got some ugly results in "two short fields,
> no change" too, along with a couple of other weird results, but I think
> those were testing procedure issues that can be ignored.  The pgbench
> throttle work I did recently highlights that I can't really make a Mac
> quiet/consistent for benchmarking very well.  Note that I ran all of
> the Mac tests with assertions on, to try and catch platform specific
> bugs.
> The Linux ones used the default build parameters.
> 
> On Linux "hundred tiny fields, half nulled" was also by far the worst
> performing one, with a >30% increase in duration despite the 14% drop
> in WAL.  Exactly what's going on there really needs to be investigated
> before this seems safe to commit.  All of the "hundred tiny fields"
> cases seem pretty bad on Linux, with the rest of them running about a
> 11% duration increase.

The main benefit of this patch is to reduce WAL for improving time in I/O,
But I think for faster I/O systems, the calculation to reduce WAL has overhead. 
I will check how to optimize that calculation, but I think this feature should 
be consider 
with configuration knob as it can improve many cases.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Suggestion for concurrent index creation using a single full scan operation

2013-07-23 Thread Tim Kane

Hi all,

I haven't given this a lot of thought, but it struck me that when
rebuilding tables (be it for a restore process, or some other operational
activity) - there is more often than not a need to build an index or two,
sometimes many indexes, against the same relation.

It strikes me that in order to build just one index, we probably need to
perform a full table scan (in a lot of cases).   If we are building
multiple indexes sequentially against that same table, then we're probably
performing multiple sequential scans in succession, once for each index.

Could we architect a mechanism that allowed multiple index creation
statements to execute concurrently, with all of their inputs fed directly
from a single sequential scan against the full relation?

>From a language construct point of view, this may not be trivial to
implement for raw/interactive SQL - but possibly this is a candidate for
the custom format restore?

I presume this would substantially increase the memory overhead required to
build those indexes, though the performance gains may be advantageous.

Feel free to shoot holes through this :)

Apologies in advance if this is not the correct forum for suggestions..

[HACKERS] pg_upgrade across more than two neighboring major releases

2013-07-23 Thread Pavel Raiskup

Hello all,

don't know much about pg_upgrade implementation so I rather asking here before
I start trying the process once again with debugging and observing deeply
the code.

I tried to setup our postgresql RPM to package pg_upgrade in version
8.4.13 to automatize post-rpm-upgrade update of postgresql's data to
9.2.4.  I was unable to reach successful result (getting weird errors).

Is pg_upgrade supposed to work even across multiple major releases?  (I know
that the README.rpm-dist is mentioning just one major release step, but it
could be bound just to actual pg_upgrade which is packaged..)

I can imagine that it may be difficult to handle everything even between two
following major releases - thus if it was possible, I would consider it as
heartwarming :).

Thanks for your answer,
Pavel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-07-23 Thread Greg Smith

On 7/23/13 2:16 AM, Satoshi Nagayasu wrote:
> I've been working on new pgstattuple function to allow
> block sampling [1] in order to reduce block reads while
> scanning a table. A PoC patch is attached.

Take a look at all of the messages linked in
https://commitfest.postgresql.org/action/patch_view?id=778

Jaime and I tried to do what you're working on then, including a random
block sampling mechanism modeled on the stats_target mechanism.  We
didn't do that as part of pgstattuple though, which was a mistake.

Noah created some test cases as part of his thorough review that were
not computing the correct results.  Getting the results correct for all
of the various types of PostgreSQL tables and indexes ended up being
much harder than the sampling part.  See
http://www.postgresql.org/message-id/20120222052747.ge8...@tornado.leadboat.com
in particular for that.

> This new function, pgstattuple2(), samples only 3,000 blocks
> (which accounts 24MB) from the table randomly, and estimates
> several parameters of the entire table.

There should be an input parameter to the function for how much sampling
to do, and if it's possible to make the scale for it to look like
ANALYZE that's helpful too.

I have a project for this summer that includes reviving this topic and
making sure it works on some real-world systems.  If you want to work on
this too, I can easily combine that project into what you're doing.

-- 
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-23 Thread Markus Wanner

On 07/16/2013 09:14 PM, I wrote:
> But okay, you're saying we *have* and *want* a guarantee that even a
> superuser cannot execute arbitrary native code via libpq (at least in
> default installs w/o extensions).

I stand corrected and have to change my position, again. For the record:

We do not have such a guarantee. Nor does it seem reasonable to want
one. On a default install, it's well possible for the superuser to run
arbitrary code via just libpq.

There are various ways to do it, but the simplest one I was shown is:
 - upload a DSO from the client into a large object
 - SELECT lo_export() that LO to a file on the server
 - LOAD it

There are a couple other options, so even if we let LOAD perform
permission checks (as I proposed before in this thread), the superuser
can still fiddle with function definitions. To the point that it doesn't
seem reasonable to try to protect against that.

Thus, the argument against the original proposal based on security
grounds is moot. Put another way: There already are a couple of
"backdoors" a superuser can use. By default. Or with plpgsql removed.

Thanks to Dimitri and Andres for patiently explaining and providing
examples.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Performance problem in PLPgSQL

2013-07-23 Thread Marc Cousin

Hi,

I've been trying to diagnose a severe performance regression we've been
having in one of our plpgsql procedures.

The example below is of course extremely simplified, and obviously not
what we are really doing in the database, but it exhibits the slowdown
between 9.1.9 and 9.2.4.

So here is the example:

create table table_test_int (col_01 int);
create table table_test_numeric (col_01 numeric);

CREATE OR REPLACE FUNCTION public.test_insert(nb integer)
 RETURNS text
 LANGUAGE plpgsql
AS $function$
DECLARE
time_start timestamp;
time_stop timestamp;
tmp_numeric numeric;
BEGIN

time_start :=clock_timestamp();
FOR i IN 1..nb LOOP
INSERT INTO table_test_int(col_01) VALUES (i);
END LOOP;
time_stop :=clock_timestamp();
RAISE NOTICE 'time for int:%',time_stop-time_start;



time_start :=clock_timestamp();
FOR i IN 1..nb LOOP
INSERT INTO table_test_numeric(col_01) VALUES (i);
END LOOP;
time_stop :=clock_timestamp();
RAISE NOTICE 'time for numeric:%',time_stop-time_start;



time_start :=clock_timestamp();
FOR i IN 1..nb LOOP
INSERT INTO table_test_numeric(col_01) VALUES (i::numeric);
END LOOP;
time_stop :=clock_timestamp();
RAISE NOTICE 'time for numeric, casted:%',time_stop-time_start;



time_start :=clock_timestamp();
FOR i IN 1..nb LOOP
tmp_numeric:=cast(i as numeric);
INSERT INTO table_test_numeric(col_01) VALUES (tmp_numeric);
END LOOP;
time_stop :=clock_timestamp();
RAISE NOTICE 'time for numeric with tmp variable:%',time_stop-time_start;



RETURN 1;
END;
$function$
;


It just inserts nb records in a loop in 4 different maneers:
- Directly in an int field
- Then in a numeric field (that's where we're having problems)
- Then in the same numeric field, but trying a cast (it doesn't change a
thing)
- Then tries with an intermediary temp variable of numeric type (which
solves the problem).


Here are the runtimes (tables were truncated beforehand):

9.1.9:
select test_insert(100);
NOTICE:  time for int:00:00:09.526009
NOTICE:  time for numeric:00:00:10.557126
NOTICE:  time for numeric, casted:00:00:10.821369
NOTICE:  time for numeric with tmp variable:00:00:10.850847


9.2.4:
select test_insert(100);
NOTICE:  time for int:00:00:09.477044
NOTICE:  time for numeric:00:00:24.757032  <
NOTICE:  time for numeric, casted:00:00:24.791016  <
NOTICE:  time for numeric with tmp variable:00:00:10.89332


I really don't know exactly where the problem comes from… but it's been
hurting a function very badly (there are several of these static queries
with types mismatch). And of course, the problem is not limited to
numeric… text has the exact same problem.

Regards,

Marc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API

2013-07-23 Thread Albe Laurenz

Magnus Hagander wrote:
> In that case, doesn't this patch break Windows? We no longer do the
> anonymous bind on Windows, since it's now in the #ifdef HAVE_LIBLDAP.
>
> Don't we need to keep the ldap_simple_bind() call in the Windows case,
> or break it up so the call to ldap_sasl_bind_s() is moved outside the
> #ifdef? At least I can't find anything in the docs that indicate that
> ldap_connect() on Windows would actually call that for us - only the
> other way around?


This patch works for the Windows case, because ldap_connect performs
an anonymous bind, see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366171%28v=vs.85%29.aspx

  If the call to ldap_connect succeeds, the client is connected
  to the LDAP server as an anonymous user. The session handle
  should be freed with a call to ldap_unbind when it is no longer required.

> I'm going to set this patch as returned with feedback for now, but
> please feel free to comment on above and possibly resubmit if
> necessary before the CF and I'll see if I can deal with it before the
> next CF anyway, as it's a bug fix.

The patch should still be good, but if we keep the deprecated
OpenLDAP API, it might be more consistent to use ldap_simple_bind_s
instead of ldap_sasl_bind_s.

If you agree, I'll change that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-07-23 Thread Andres Freund

On 2013-07-22 13:50:08 -0400, Robert Haas wrote:
> On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund  wrote:
> > The git tree is at:
> > git://git.postgresql.org/git/users/andresfreund/postgres.git branch 
> > xlog-decoding-rebasing-cf4
> > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4
> >
> > On 2013-06-15 00:48:17 +0200, Andres Freund wrote:
> >> Overview of the attached patches:
> >> 0001: indirect toast tuples; required but submitted independently
> >> 0002: functions for testing; not required,
> >> 0003: (tablespace, filenode) syscache; required
> >> 0004: RelationMapFilenodeToOid: required, simple
> >> 0005: pg_relation_by_filenode() function; not required but useful
> >> 0006: Introduce InvalidCommandId: required, simple
> >> 0007: Adjust Satisfies* interface: required, mechanical,
> >> 0008: Allow walsender to attach to a database: required, needs review
> >> 0009: New GetOldestXmin() parameter; required, pretty boring
> >> 0010: Log xl_running_xact regularly in the bgwriter: required
> >> 0011: make fsync_fname() public; required, needs to be in a different file
> >> 0012: Relcache support for an Relation's primary key: required
> >> 0013: Actual changeset extraction; required
> >> 0014: Output plugin demo; not required (except for testing) but useful
> >> 0015: Add pg_receivellog program: not required but useful
> >> 0016: Add test_logical_decoding extension; not required, but contains
> >>   the tests for the feature. Uses 0014
> >> 0017: Snapshot building docs; not required
> 
> I've now also committed patch #7 from this series.  My earlier commit
> fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago
> I committed #1.

Thanks!

> I am not entirely convinced of the necessity or
> desirability of patch #6, but as of now I haven't studied the issues
> closely.

Fair enough. It's certainly possible to work around not having it, but
it seems cleaner to introduce the notion of an invalid CommandId like we
have for transaction ids et al.
Allowing 2^32-2 instead of 2^32-1 subtransactions doesn't seem like a
problem to me ;)

> Patch #2 does not seem useful in isolation; it adds new
> regression-testing stuff but doesn't use it anywhere.

Yes. I found it useful to test stuff around making replication
synchronous or such, but while I think we should have a facility like it
in core for both, logical and physical replication, I don't think this
patch is ready for prime time due to it's busy looping. I've even marked
it as such above ;)
My first idea to properly implement that seems to be to reuse the
syncrep infrastructure but that doesn't look trivial.

> I doubt that any of the remaining patches (#8-#17) can be applied
> separately without understanding the shape of the whole patch set, so
> I think I, or someone else, will need to set aside more time for
> detailed review before proceeding further with this patch set.  I
> suggest that we close out the CommitFest entry for this patch set one
> way or another, as there is no way we're going to get the whole thing
> done under the auspices of CF1.

Generally agreed. The biggest chunk of the code is in #13 anyway...

Some may be applyable independently:

> 0010: Log xl_running_xact regularly in the bgwriter: required
Should be useful independently since it can significantly speed up
startup of physical replicas. Ony many systems checkpoint_timeout
will be set to an hour which can make the time till a standby gets
consistent be quite high since that will be first time it sees a
xl_running_xacts again.

> 0011: make fsync_fname() public; required, needs to be in a different file
Isn't in the shape for it atm, but could be applied as an
independent infrastructure patch. And it should be easy enough to
clean it up.

> 0012: Relcache support for an Relation's primary key: required
Might actually be a good idea independently as well. E.g. the
materalized key patch could use the information that there's a
candidate key around to avoid a good bit of useless work.


> I'll try to find some more time to spend on this relatively soon, but
> I think this is about as far as I can take this today.

Was pretty helpful already, so ... ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back branches vs. gcc 4.8.0

2013-07-23 Thread Andres Freund

On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote:
> On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
> > The reason I'm thinking it's a good idea is that it would expose any
> > remaining places where we have nominally var-length arrays embedded in
> > larger structs.  Now that I've seen the failures with gcc 4.8.0, I'm
> > quite worried that there might be some more declarations like that
> > which we've not identified yet, but that by chance aren't causing
> > obvious failures today. 
> 
> Here is a rough patch that replaces almost all occurrences of
> something[1] in a struct with FLEXIBLE_ARRAY_MEMBER.  It crashes left
> and right (because of sizeof issues, probably), but at least so far the
> compiler hasn't complained about any flexible-array members not at the
> end of the struct, which is what it did last time.  So the answer to
> your concern so far is negative.

I think this point in the cycle would be a good one to apply something
like this.

> Completing this patch will be quite a bit more debugging work.  Some
> kind of electric fence for palloc would be helpful.

Noah's recently added valgrind mode should provide this.

Do you have an updated version of this patch already? I'd be willing to
make a pass over it to check whether I find any missed updates...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

2013-07-23 Thread Amit Langote

On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote  wrote:
> Hello,
>
> While understanding the effect of maintenance_work_mem on time taken
> by CREATE INDEX, I observed that for the values of
> maintenance_work_mem less than the value for which an internal sort is
> performed, the time taken by CREATE INDEX increases as
> maintenance_work_increases.
>
> My guess is that for all those values an external sort is chosen at
> some point and larger the value of maintenance_work_mem, later the
> switch to external sort would be made causing CREATE INDEX to take
> longer. That is a smaller value of maintenance_work_mem would be
> preferred for when external sort is performed anyway. Does that make
> sense?
>

Upon further investigation, it is found that the delay to switch to
external sort caused by a larger value of maintenance_work_mem is
small compared to the total time of CREATE INDEX. So, plotting for a
number of maintenance_work_mem values shows that its effect is
negligible. Are there any other parts of external sort logic that
might make it slower with increasing values of maintenance_work_mem.
It seems merge order, number of tapes seem are related with
state->allowedMem.

Does that mean, external sort is affected by the value of
maintenance_work_mem in a way roughly similar to above?

--
Amit Langote

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anchovy failing on 9.1 and earlier since using gcc 4.8

2013-07-23 Thread Andres Freund

On 2013-07-23 09:11:00 +0200, Andres Freund wrote:
> Hi,
> 
> Anchovy is failing on 9.1 and earlier for quite some time now:
> http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=anchovy&br=REL9_1_STABLE
> The commit at that date looks rather unconspicuous. Looking at the
> 'config' section reveals the difference between the failing
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-31%2023%3A23%3A01
> and
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-30%2000%3A23%3A01
> 
> 2013-03-30: config:
> 
> configure:3244: checking for C compiler version
> configure:3252: gcc --version >&5
> gcc (GCC) 4.7.2
> 
> 2013-03-31: config:
> 
> configure:3244: checking for C compiler version
> configure:3252: gcc --version >&5
> gcc (GCC) 4.8.0
> 
> (note that it's using 4.8.1 these days)
> 
> So, it started failing the day it switched to gcc 4.8.
> 
> Given that anchovy builds 9.2 and master happily and given that it is
> the only gcc-4.8 animal in the buildfarm it seems quite possible that
> there is some misoptimization issue in 9.1 and earlier with 4.8.
> 
> I seem to recall some discussions about whether to backpatch some
> hardening against compiler optimizations, but I don't remember any
> specifics.

Ah, yes. Exactly this case has been discussed some weeks ago:
http://archives.postgresql.org/message-id/14242.1365200084%40sss.pgh.pa.us

I'd personally vote for doing both (1) and (2) mentioned in that email
in the backbranches. I don't think -fno-agressive-loop-optimizations is
the only case the compiler can actually use such knowledge. It very well
might optimize entire code blocks away.
And I don't think this change is the only one where the aggressive loop
optimizations might play a role, especially in the older branches.

I naturally only found the mailing list entry after debugging the issue
myself...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using ini file to setup replication

2013-07-23 Thread Sawada Masahiko

On Mon, Jul 22, 2013 at 10:59 PM, Greg Stark  wrote:
> On Fri, Jul 19, 2013 at 2:20 PM, Samrat Revagade
>  wrote:
>
>> for example:
>> if i want to configure 2 servers then it will add 6 lines,for 3 -9, for 4-12
>> setting's for particular server will be:
>>
>> considering the way of setting value to conf parameters  : guc . value
>>
>> standby_name.'AAA'
>> synchronous_transfer. commit
>> wal_sender_timeout.60
>
>
> I have a feeling Samrat and Sawada-san have some good use cases where
> this extra syntax could be a big step up in expressiveness. But I'm
> having a hard time figuring out exactly what they have in mind. If
> either of you could explain in more detail how the extra syntax would
> apply to your use case and how it would let you do something that you
> can't already do it might be helpful.
>
> I'm assuming the idea is something like having a single config file
> which can work for the server regardless of whether it's acting as the
> primary or standby and then be able to switch roles by switching a
> single flag which would control which parts of the config file were
> applied. But I'm having a hard time seeing how exactly that would
> work.
in this approach which GUC parameters is written in postgresql.conf,
user would write many extra line in postgresql.conf by a standby
server as Samrat suggested. It will increase size of postgresql.conf.
I think it is not good that all those parameters are written in
postgresql.conf.
that is, I think that those parameters should be written in separately
file. e.g., we can set separately any parameter using "include" (or
"include if exist") in postgresql.conf.
if include file doesn't exist, we would set default value to each wal
sender. that is, we give up ini file, and we provide mechanism of
setting to each wal sender as option of overwrite.
of course to support this approach, it needs to use the patch which
Andres suggested, and server should be able to handle toke which is
two or mote separated by a dot. so we would like to help this
approach.

Regards,


Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax

2013-07-23 Thread Karol Trzcionka

W dniu 23.07.2013 06:22, David Fetter pisze:
> What problem or problems did you notice, and what did you change to
> fix them?
"UPDATE ... FROM" generated "ERROR:  variable not found in subplan
target lists". I've added some workaround in add_vars_to_targetlist:
- if it is an "after" - simple change var->varno to base RTE (it should
always exists, the value is temporary, it will change to OUTER_VAR)
- if it is a "before" - add to targetlist new var independently from
rel->attr_needed[attno].
Additionally I've change build_joinrel_tlist to ignore any "BEFORE"
RTEs. The regression tests are updated.
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] anchovy failing on 9.1 and earlier since using gcc 4.8

2013-07-23 Thread Andres Freund

Hi,

Anchovy is failing on 9.1 and earlier for quite some time now:
http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=anchovy&br=REL9_1_STABLE
The commit at that date looks rather unconspicuous. Looking at the
'config' section reveals the difference between the failing
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-31%2023%3A23%3A01
and
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-30%2000%3A23%3A01

2013-03-30: config:

configure:3244: checking for C compiler version
configure:3252: gcc --version >&5
gcc (GCC) 4.7.2

2013-03-31: config:

configure:3244: checking for C compiler version
configure:3252: gcc --version >&5
gcc (GCC) 4.8.0

(note that it's using 4.8.1 these days)

So, it started failing the day it switched to gcc 4.8.

Given that anchovy builds 9.2 and master happily and given that it is
the only gcc-4.8 animal in the buildfarm it seems quite possible that
there is some misoptimization issue in 9.1 and earlier with 4.8.

I seem to recall some discussions about whether to backpatch some
hardening against compiler optimizations, but I don't remember any
specifics.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-23 Thread Peter Geoghegan

On Mon, Jul 22, 2013 at 8:48 PM, Greg Smith  wrote:
> And I can't get too excited about making this as my volunteer effort when I
> consider what the resulting credit will look like.  Coding is by far the
> smallest part of work like this, first behind coming up with the design in
> the first place.  And both of those are way, way behind how long review
> benchmarking takes on something like this.  The way credit is distributed
> for this sort of feature puts coding first, design not credited at all, and
> maybe you'll see some small review credit for benchmarks.  That's completely
> backwards from the actual work ratio.  If all I'm getting out of something
> is credit, I'd at least like it to be an appropriate amount of it.

FWIW, I think that's a reasonable request.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

85 matches

Mail list logo